TY - CONF T1 - Interactive Visualization for Data Science Scripts T2 - 2022 IEEE Visualization in Data Science (VDS) Y1 - 2022 A1 - Faust, Rebecca A1 - C. Scheidegger A1 - K. Isaacs A1 - W. Z. Bernstein A1 - M. Sharp A1 - North, Chris KW - behavioral sciences KW - codes KW - data science KW - Data visualization KW - debugging KW - prototypes KW - visualization AB - As the field of data science continues to grow, so does the need for adequate tools to understand and debug data science scripts. Current debugging practices fall short when applied to a data science setting, due to the exploratory and iterative nature of analysis scripts. Additionally, computational notebooks, the preferred scripting environment of many data scientists, present additional challenges to understanding and debugging workflows, including the non-linear execution of code snippets. This paper presents Anteater, a trace-based visual debugging method for data science scripts. Anteater automatically traces and visualizes execution data with minimal analyst input. The visualizations illustrate execution and value behaviors that aid in understanding the results of analysis scripts. To maximize the number of workflows supported, we present prototype implementations in both Python and Jupyter. Last, to demonstrate Anteater’s support for analysis understanding tasks, we provide two usage scenarios on real world analysis scripts. JF - 2022 IEEE Visualization in Data Science (VDS) PB - IEEE Computer Society CY - Los Alamitos, CA, USA UR - https://doi.ieeecomputersociety.org/10.1109/VDS57266.2022.00009 ER - TY - JOUR T1 - Interactive Visual Analytics for Sensemaking with Big Text JF - Big Data Research Y1 - 2019 A1 - Michelle Dowling A1 - Nathan Wycoff A1 - Brian Mayer A1 - Wenskovitch, John A1 - Leman, Scotland A1 - House, Leanna A1 - Nicholas Polys A1 - North, Chris A1 - Peter Hauck KW - Big data KW - interactive visual analytics KW - Semantic interaction KW - text analytics KW - Topic modeling KW - visualization AB - Analysts face many steep challenges when performing sensemaking tasks on collections of textual information larger than can be reasonably analyzed without computational assistance. To scale up such sensemaking tasks, new methods are needed to interactively integrate human cognitive sensemaking activity with machine learning. Towards that goal, we offer a human-in-the-loop computational model that mirrors the human sensemaking process, and consists of foraging and synthesis sub-processes. We model the synthesis loop as an interactive spatial projection and the foraging loop as an interactive relevance ranking combined with topic modeling. We combine these two components of the sensemaking process using semantic interaction such that the human's spatial synthesis actions are transformed into automated foraging and synthesis of new relevant information. Ultimately, the model's ability to forage as a result of the analyst's synthesis activities makes interacting with big text data easier and more efficient, thereby facilitating analysts' sensemaking ability. We discuss the interaction design and theory behind our interactive sensemaking model. The model is embodied in a novel visual analytics prototype called Cosmos in which analysts synthesize structure within the larger corpus by directly interacting with a reduced-dimensionality space to express relationships on a subset of data. We then demonstrate how Cosmos supports sensemaking tasks with a realistic scenario that investigates the affect of natural disasters in Adelaide, Australia in September 2016 using a database of over 30,000 news articles. VL - 16 UR - http://www.sciencedirect.com/science/article/pii/S2214579618302995 ER - TY - CONF T1 - The Cognitive and Computational Benefits and Limitations of Clustering for Sensemaking T2 - CHI '18 Workshop on Sensemaking in a Senseless World Y1 - 2018 A1 - Wenskovitch, John A1 - Michelle Dowling A1 - North, Chris KW - clustering KW - exploratory data analysis KW - interaction KW - sensemaking KW - tasks KW - visualization AB - The cognitive process of sensemaking refers to acquiring, representing, and organizing information in order to understand that information. The organization component naturally supports the introduction of clusters, an important enabler for grouping objects such that similar objects are placed in the same cluster. This paper explores the benefits and limitations of introducing clusters into systems for exploratory data analysis. We consider these issues for tasks that the system may support, methods for visualizing and interacting with data in the system, and algorithms that are encoded into the system. We discuss the use of clusters in these systems with respect to cognition and computation, and we call out future areas of research in this area. JF - CHI '18 Workshop on Sensemaking in a Senseless World CY - Montreal, QC, Canada ER - TY - Generic T1 - Towards a Systematic Combination of Dimension Reduction and Clustering in Visual Analytics Y1 - 2018 A1 - Wenskovitch, John A1 - Ian Crandell A1 - Ramakrishnan, Naren A1 - House, Leanna A1 - Leman, Scotland A1 - North, Chris KW - Algorithm design and analysis KW - clustering KW - Clustering algorithms KW - Data visualization KW - Dimension reduction;algorithms KW - Manifolds KW - Partitioning algorithms KW - Visual Analytics KW - visualization JF - IEEE Transactions on Visualization and Computer Graphics VL - 24 ER - TY - CONF T1 - Semantic interaction for visual text analytics T2 - Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems Y1 - 2012 A1 - Endert, Alex A1 - Fiaux, Patrick A1 - North, Chris KW - interaction KW - Visual Analytics KW - visualization AB - Visual analytics emphasizes sensemaking of large, complex datasets through interactively exploring visualizations generated by statistical models. For example, dimensionality reduction methods use various similarity metrics to visualize textual document collections in a spatial metaphor, where similarities between documents are approximately represented through their relative spatial distances to each other in a 2D layout. This metaphor is designed to mimic analysts' mental models of the document collection and support their analytic processes, such as clustering similar documents together. However, in current methods, users must interact with such visualizations using controls external to the visual metaphor, such as sliders, menus, or text fields, to directly control underlying model parameters that they do not understand and that do not relate to their analytic process occurring within the visual metaphor. In this paper, we present the opportunity for a new design space for visual analytic interaction, called semantic interaction, which seeks to enable analysts to spatially interact with such models directly within the visual metaphor using interactions that derive from their analytic process, such as searching, highlighting, annotating, and repositioning documents. Further, we demonstrate how semantic interactions can be implemented using machine learning techniques in a visual analytic tool, called ForceSPIRE, for interactive analysis of textual data within a spatial visualization. Analysts can express their expert domain knowledge about the documents by simply moving them, which guides the underlying model to improve the overall layout, taking the user's feedback into account. JF - Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems T3 - CHI '12 PB - ACM CY - New York, NY, USA SN - 978-1-4503-1015-4 UR - http://doi.acm.org/10.1145/2207676.2207741 ER - TY - CONF T1 - The semantics of clustering: analysis of user-generated spatializations of text documents T2 - Proceedings of the International Working Conference on Advanced Visual Interfaces Y1 - 2012 A1 - Endert, Alex A1 - Fox, Seth A1 - Maiti, Dipayan A1 - Leman, Scotland A1 - North, Chris KW - clustering KW - text analytics KW - Visual Analytics KW - visualization AB - Analyzing complex textual datasets consists of identifying connections and relationships within the data based on users' intuition and domain expertise. In a spatial workspace, users can do so implicitly by spatially arranging documents into clusters to convey similarity or relationships. Algorithms exist that spatialize and cluster such information mathematically based on similarity metrics. However, analysts often find inconsistencies in these generated clusters based on their expertise. Therefore, to support sensemaking, layouts must be co-created by the user and the model. In this paper, we present the results of a study observing individual users performing a sensemaking task in a spatial workspace. We examine the users' interactions during their analytic process, and also the clusters the users manually created. We found that specific interactions can act as valuable indicators of important structure within a dataset. Further, we analyze and characterize the structure of the user-generated clusters to identify useful metrics to guide future algorithms. Through a deeper understanding of how users spatially cluster information, we can inform the design of interactive algorithms to generate more meaningful spatializations for text analysis tasks, to better respond to user interactions during the analytics process, and ultimately to allow analysts to more rapidly gain insight. JF - Proceedings of the International Working Conference on Advanced Visual Interfaces T3 - AVI '12 PB - ACM CY - New York, NY, USA SN - 978-1-4503-1287-5 UR - http://doi.acm.org/10.1145/2254556.2254660 ER - TY - CONF T1 - Analytic provenance: process+interaction+insight T2 - Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems Y1 - 2011 A1 - North, Chris A1 - Chang, Remco A1 - Endert, Alex A1 - Dou, Wenwen A1 - May, Richard A1 - Pike, Bill A1 - Fink, G. KW - analytic provenance KW - user interaction KW - Visual Analytics KW - visualization JF - Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems T3 - CHI EA '11 PB - ACM CY - New York, NY, USA SN - 978-1-4503-0268-5 UR - http://doi.acm.org/10.1145/1979742.1979570 ER -