Multi-Model Semantic Interaction for Scalable Text Analytics
Committee: Chris North (chair), Naren Ramakrishnan, Doug Bowman, Leanna House, William Pike
Learning from text data often involves a loop of tasks that iterate between foraging for information and synthesizing it in incremental hypotheses. Past research has shown the advantages of using spatial workspaces as a means for synthesizing information through externalizing hypotheses and creating spatial schemas. However, spatializing the entirety of datasets becomes prohibitive as the number of documents available to the analysts grows, particularly when only a small subset are relevant to the tasks at hand. To address this issue, we developed the multi-model semantic interaction (MSI) technique, which leverages user interactions to aid in the display layout (as was seen in previous semantic interaction work), forage for new, relevant documents as implied by the interactions, and then place them in context of the user’s existing spatial layout. This results in the ability for the user to conduct both implicit queries and traditional explicit searches. A comparative user study of StarSPIRE discovered that while adding implicit querying did not impact the quality of the foraging, it enabled users to 1) synthesize more information than users with only explicit querying, 2) externalize more hypotheses, 3) complete more synthesis-related semantic interactions. Also, 18% of relevant documents were found by implicitly generated queries when given the option. StarSPIRE has also been integrated with web-based search engines, allowing users to work across vastly different levels of data scale to complete exploratory data analysis tasks (e.g. literature review, investigative journalism).
The core contribution of this work is multi-model semantic interaction (MSI) for usable big data analytics. This work has expanded the understanding of how user interactions can be interpreted and mapped to underlying models to steer multiple algorithms simultaneously and at varying levels of data scale. This is represented in an extendable multi-model semantic interaction pipeline. The lessons learned from this dissertation work can be applied to other visual analytics systems, promoting direct manipulation of the data in context of the visualization rather than tweaking algorithmic parameters and creating usable and intuitive interfaces for big data analytics.
This research was funded in part by the National Science Foundation, IIS-1218346, IIS-144746, and CCF-0937133, Department of Homeland Security Visual Analytics for Command, Control, and Interoperability Environments (VACCINE), and the Ted and Karyn Hume Center for National Security and Technology.