Usable Big Data Analytics through Multi-Model Interactive Visualization
Gaining big insight from big data requires big analytics, which poses big usability problems. Analyses of big data often rely on several computational and statistical models that operate on multiple levels of data scale to discover and characterize latent data structure. The models work jointly or in sequence to filter, group, summarize, and visualize big data so that analysts may assess the data. As a simple example in big text analytics, massive text is first sampled for relevant or representative words, then further reduced by topic modeling, then visualized by applying a dimension reduction algorithm. As the size of data increases, so does the number of models and, likewise, the need for human interaction in the analytical process. By interacting, humans inject expert judgment into the analytical process, and efficiently explore and make sense of big data from varying perspectives. However, because of complex low-level parameters and enforced premature formality, interacting with any individual model is difficult, and now, there is a need to interact with a growing number of models. In this proposal, current human-computer-interaction research is merged with complex statistical methods and fast computation to make big data analytics usable and accessible to professional and student users.
The proposed solution is to scale up Visual to Parametric Interaction (V2PI) to a new framework called Multi-scale V2PI (MV2PI). V2PI currently supports usable small-data analytics, and enables users to adjust model parameters by interacting directly with data in a visualization. That is, V2PI interprets visual interactions quantitatively to update parameters and produce new visualizations. MV2PI is a new interactive framework that links together multiple models operating at multiple levels of data scale in a unified interactive space. Model results are combined into a common visual representation. Directly manipulating the small-scale visual representation propagates to larger scale models by inverting the models to update their parameters, ultimately producing a new output result. In the text analytics example, if the user drags several data points together to hypothesize a cluster, the inverted dimensionality reduction model computes updated dimension weights, queries relevant new hits at the large scale, identifies changed topics, and updates the layout to show big-data support for the new cluster. This approach enables users to interactively explore large-scale data and complex inter-relationships between models in real time, and in a usable fashion that directly supports their natural cognitive sensemaking process.
Intellectual merits are in the fundamentally novel approach to interactively combining multiple statistical data models across levels of data scale to enable usable big-data analytics. This research will (1) create the conceptual MV2PI pipeline, and identify alternatives for communication flow between models, visualization, and interaction, including possible shared parameters; (2) establish several new useful models, covering different levels of scale, that support the V2PI model inversion approach to machine learning and can operate within the new pipeline; (3) develop new computational methods for high-performance updates to inverted models in support of real-time interaction with MV2PI; and (4) evaluate the usability of MV2PI and measure its impact on human sensemaking in big data analytics.
Broader impacts stem from bringing attention to the critical role of usability in big data analytics. The outcomes of this research include (1) clear impacts of making big-data analytics accessible to end users who are experts in various data domains, but not in advanced statistical data models and algorithms; (2) development of educational programs in support of pedagogy for exploratory analytical thinking in the context of big data; (3) establishing a workshop focused on usability in big-data analytics to increase awareness and promote collaboration between computational and usability researchers; (4) outreach to government agencies with needs in big text analytics, through our involvement in DHS VACCINE and the national laboratories; and (5) involvement of diverse student populations in the research project as evidenced by our strong track record in diversity and undergraduate research.
This project is supported by NSF BIGDATA grant #1447416 entitled "Usable Big Data Analytics through Multiple-Scale Interactive Visualization", by PIs Chris North, Leanna House, Scotland Leman, Wu Feng.
- The BaVA research group
- Semantic Interaction Pipeline: software architecture multi-scale big data analytics
- BigSpire: Semantic Interaction with Big Text Data on Bing & IEEE Explore
- Andromeda: Semantic Interaction for Quantitative Data
- GPU Based Methods for Interactive Visualization of Big Data
- Visual Analytics with Biclusters
- Be the Data: Embodied Visual Analytics for STEM education outreach
- New big data visualization modules in classes: CMDA 3654 Intro to Data Analytics and Visualization, CS 5764 Information Visualization
- Publications open access