Key Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (2024)

Key Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (2)

Advanced Search

mod

research-article

Open Access

HILDA 24: Proceedings of the 2024 Workshop on Human-In-the-Loop Data AnalyticsJune 2024Pages 1–5https://doi.org/10.1145/3665939.3665961

Published:18 June 2024Publication HistoryKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (7)

  • 0citation
  • 0
  • Downloads

Metrics

Total Citations0Total Downloads0

Last 12 Months0

Last 6 weeks0

  • Get Citation Alerts

    New Citation Alert added!

    This alert has been successfully added and will be sent to:

    You will be notified whenever a record that you have chosen has been cited.

    To manage your alert preferences, click on the button below.

    Manage my Alerts

    New Citation Alert!

    Please log in to your account

  • Publisher Site
  • eReader
  • PDF

HILDA 24: Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics

Key Insights from a Feature Discovery User Study

Pages 1–5

PreviousChapterNextChapter

Key Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (8)

ABSTRACT

Multiple works in data management research focus on automating the processes of data augmentation and feature discovery to save users from having to perform these tasks manually. Yet, this automation often leads to a disconnect with the users, as it fails to consider the specific needs and preferences of the actual end-users of data management systems for machine learning. To explore this issue further, we conducted 19 semi-structured, think-aloud use-case studies based on a scenario in which data specialists were tasked with augmenting a base table with additional features to train a machine learning model. In this paper, we share key insights into the practices of feature discovery on tabular data performed by real-world data specialists derived from our user study. Our research uncovered differences between the user assumptions reported in the literature and the actual practices, as well as some areas where literature and real-world practices align.

References

  1. Sara Alspaugh, Nava Zokaei, Andrea Liu, Cindy Jin, and Marti A Hearst. 2018. Futzing and moseying: Interviews with professional data analysts on exploration practices. IEEE transactions on visualization and computer graphics 25, 1 (2018), 22--31.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (9)
  2. Sumon Biswas, Mohammad Wardat, and Hridesh Rajan. 2022. The art and practice of data science pipelines: A comprehensive study of data science pipelines in theory, in-the-small, and in-the-large. In Proceedings of the 44th International Conference on Software Engineering. 2091--2103.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (10)Digital Library
  3. Erik Blair. 2015. A reflexive exploration of two qualitative data coding techniques. Journal of Methods and Measurement in the Social Sciences 6, 1 (2015), 14--29.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (12)Cross Ref
  4. Chengliang Chai, Jiayi Wang, Yuyu Luo, Zeping Niu, and Guoliang Li. 2022. Data management for machine learning: A survey. IEEE Transactions on Knowledge and Data Engineering 35, 5 (2022), 4646--4667.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (14)
  5. Nadiia Chepurko, Ryan Marcus, Emanuel Zgraggen, Raul Castro Fernandez, Tim Kraska, and David Karger. 2020. ARDA: automatic relational data augmentation for machine learning. PVLDB (2020), 1373--1387.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (15)
  6. Tianji Cong, James Gale, Jason Frantz, HV Jagadish, and Çağatay Demiralp. 2022. WarpGate: A Semantic Join Discovery System for Cloud Data Warehouse. arXiv preprint arXiv:2212.14155 (2022).Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (16)
  7. Anamaria Crisan, Brittany Fiore-Gartland, and Melanie Tory. 2020. Passing the data baton: A retrospective analysis on data science work and workers. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 1860--1870.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (17)Cross Ref
  8. Mahdi Esmailoghli, Jorge-Arnulfo Quiané-Ruiz, and Ziawasch Abedjan. 2021. COCOA: COrrelation COefficient-Aware Data Augmentation.. In EDBT. 331--336.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (19)
  9. Raul Castro Fernandez, Ziawasch Abedjan, Famien Koko, Gina Yuan, Samuel Madden, and Michael Stonebraker. 2018. Aurum: A data discovery system. In ICDE. 1001--1012.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (20)
  10. Andra Ionescu, Kiril Vailev, Florena Buse, Rihan Hai, and Asterios Katsifodimos. 2024. AutoFeat: Transitive Feature Discovery over Join Paths. In ICDE. IEEE, 1861--1873.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (21)
  11. Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Enterprise data analysis and visualization: An interview study. IEEE transactions on visualization and computer graphics 18, 12 (2012), 2917--2926.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (22)
  12. Eser Kandogan, Aruna Balakrishnan, Eben M Haber, and Jeffrey S Pierce. 2014. From data to insight: work practices of analysts in the enterprise. IEEE computer graphics and applications 34, 5 (2014), 42--50.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (23)
  13. Stephen Kasica, Charles Berret, and Tamara Munzner. 2023. Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1--18.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (24)Digital Library
  14. Aamod Khatiwada, Roee Shraga, Wolfgang Gatterbauer, and Renée J Miller. 2022. Integrating Data Lake Tables. Proceedings of the VLDB Endowment 16, 4 (2022), 932--945.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (26)Digital Library
  15. Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2016. The emerging role of data scientists on software development teams. In Proceedings of the 38th International Conference on Software Engineering. 96--107.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (28)Digital Library
  16. Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2017. Data scientists in software teams: State of the art and challenges. IEEE Transactions on Software Engineering 44, 11 (2017), 1024--1038.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (30)Cross Ref
  17. Arun Kumar, Jeffrey Naughton, Jignesh M Patel, and Xiaojin Zhu. 2016. To join or not to join? Thinking twice about joins before feature selection. In SIGMOD. 19--34.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (32)
  18. Jiabin Liu, Chengliang Chai, Yuyu Luo, Yin Lou, Jianhua Feng, and Nan Tang. 2022. Feature augmentation with reinforcement learning. In ICDE. IEEE, 3360--3372.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (33)
  19. Yaoli Mao, Dakuo Wang, Michael Muller, Kush R Varshney, Ioana Baldini, Casey Dugan, and Aleksandra Mojsilović. 2019. How data scientists work together with domain experts in scientific collaborations: To find the right answer or to ask the right question? Proceedings of the ACM on Human-Computer Interaction 3, GROUP (2019), 1--23.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (34)Digital Library
  20. Alessandra Maciel Paz Milani, Fernando V Paulovich, and Isabel Harb Manssour. 2020. Visualization in the preprocessing phase: Getting insights from enterprise professionals. Information Visualization 19, 4 (2020), 273--287.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (36)Cross Ref
  21. Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How data science workers work with data: Discovery, capture, curation, design, creation. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1--15.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (38)Digital Library
  22. Fatemeh Nargesian, Abolfazl Asudeh, and HV Jagadish. 2022. Responsible Data Integration: Next-generation Challenges. In SIGMOD. 2458--2464.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (40)
  23. Fahad Pervaiz, Aditya Vashistha, and Richard Anderson. 2019. Examining the challenges in development data pipeline. In Proceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies. 13--21.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (41)Digital Library
  24. Sergey Redyuk. 2019. Automated documentation of end-to-end experiments in data science. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2076--2080.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (43)Cross Ref
  25. Roee Shraga and Renée J Miller. 2023. Explaining Dataset Changes for Semantic Data Versioning with Explain-Da-V. Proceedings of the VLDB Endowment 16, 6 (2023), 1587--1600.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (45)Digital Library
  26. April Yi Wang, Dakuo Wang, Jaimie Drozdal, Xuye Liu, Soya Park, Steve Oney, and Christopher Brooks. 2021. What makes a well-documented notebook? a case study of data scientists' documentation practices in kaggle. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1--7.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (47)Digital Library
  27. April Yi Wang, Dakuo Wang, Jaimie Drozdal, Michael Muller, Soya Park, Justin D Weisz, Xuye Liu, Lingfei Wu, and Casey Dugan. 2022. Documentation matters: Human-centered ai system to assist data science code documentation in computational notebooks. ACM Transactions on Computer-Human Interaction 29, 2 (2022), 1--33.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (49)Digital Library
  28. Kanit Wongsuphasawat, Yang Liu, and Jeffrey Heer. 2019. Goals, process, and challenges of exploratory data analysis: An interview study. arXiv preprint arXiv:1911.00568 (2019).Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (51)
  29. Zixuan Zhao and Raul Castro Fernandez. 2022. Leva: Boosting machine learning performance with relational embedding data augmentation. In SIGMOD. 1504--1517.Google ScholarKey Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (52)

Cited By

View all

Key Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (53)

    Recommendations

    • Study on the Discovery Algorithm of the Frequent Item Sets

      ASIA '09: Proceedings of the 2009 International Asia Symposium on Intelligent Interaction and Affective Computing

      Data mining technology is an interdisciplinary which has developed rapidly at home. It involves database, statistics, artificial intelligence, machine learning and other fields. The popularity of computer use produced a large amount of data. Data mining ...

      Read More

    • The key user discovery model based on user importance calculation

      Recently, more and more users publish their views on events in social media. Identifying influential users in social media can help to analyse the impact of hot events or enterprise products in the real world. The existing mainstream methods are based on ...

      Read More

    • Query construction for user-guided knowledge discovery in databases

      Knowledge discovery in databases (KDD) and data mining have good potential in many applications. However, in order to make KDD useful, many problems remain to be solved. One such problem is the query formulation problem: ''What to do if one does not ...

      Read More

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    Get this Publication

    • Information
    • Contributors
    • Published in

      Key Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (54)

      HILDA 24: Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics

      June 2024

      91 pages

      ISBN:9798400706936

      DOI:10.1145/3665939

      • Program Chairs:
      • Jean-Daniel Fekete

        Inria & Université Paris-Saclay

        ,
      • Behrooz Omidvar-Tehrani

        AWS AI Labs

        ,
      • Kexin Rong

        Georgia Institute of Technology

        ,
      • Roee Shraga

        Worcester Polytechnic Institute

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

        In-Cooperation

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2024

          Check for updates

          Key Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (55)

          Qualifiers

          • research-article

          Conference

          Acceptance Rates

          Overall Acceptance Rate28of56submissions,50%

          Funding Sources

          • Key Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (56)

            Other Metrics

            View Article Metrics

          • Bibliometrics
          • Citations0
          • Article Metrics

            • Total Citations

              View Citations
            • Total Downloads

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0

            Other Metrics

            View Author Metrics

          • Cited By

            This publication has not been cited yet

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Digital Edition

          View this article in digital edition.

          View Digital Edition

          • Figures
          • Other

            Close Figure Viewer

            Browse AllReturn

            Caption

            View Table of Contents

            Export Citations

              Your Search Results Download Request

              We are preparing your search results for download ...

              We will inform you here when the file is ready.

              Download now!

              Your Search Results Download Request

              Your file of search results citations is now ready.

              Download now!

              Your Search Results Download Request

              Your search export query has expired. Please try again.

              Key Insights from a Feature Discovery User Study | Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics (2024)
              Top Articles
              Latest Posts
              Article information

              Author: Greg Kuvalis

              Last Updated:

              Views: 6215

              Rating: 4.4 / 5 (55 voted)

              Reviews: 94% of readers found this page helpful

              Author information

              Name: Greg Kuvalis

              Birthday: 1996-12-20

              Address: 53157 Trantow Inlet, Townemouth, FL 92564-0267

              Phone: +68218650356656

              Job: IT Representative

              Hobby: Knitting, Amateur radio, Skiing, Running, Mountain biking, Slacklining, Electronics

              Introduction: My name is Greg Kuvalis, I am a witty, spotless, beautiful, charming, delightful, thankful, beautiful person who loves writing and wants to share my knowledge and understanding with you.