IBM Data Science Experience

As a design research intern at IBM, I collaborated with user experience designers and product managers to form a deeper understanding of a data scientist's workflow. As the sole researcher in the SF design studio, I led a complete user research study and ideated key features for a product known as Data Science Experience

Product Overview: Data Science Experience is a cloud-based social workspace that enables data scientists to learn, collaborate, create, and share across multiple data science tools. This product was designed because there is a greater demand for data scientists than there are in the field. Many less experienced data scientists, including recent data science graduates or professionals from different industries, are hired as data scientists and struggle with the high learning curve of the data science process. This is where we discovered an opportunity - bring the data science workflow into one workspace to make data science easier for less experienced data scientists. 


User Research Project: Data Science Workflow

The Problem: Current data science tools only address single facets of the data science  process. They are designed to serve a linear process, but the data scientist's process is not linear, it’s cyclical,  which means data scientists must toggle back-and-forth between research and development. Data scientists must conduct research during each step of the data science process. Data scientists must allocate 80% of their time to cleaning data which leaves little time for research. However, finding relevant online resources and understand undocumented code is frustrating and time-consuming. This is especially difficult for unexperienced data scientists who need more time during each step of the data science process because they are who are unaware of the tricks and methods involved in data science research.

The Opportunity: How might we help unexperienced data scientists quickly find relevant, credible resources so they can save time and avoid duplicate work?  This is where we discovered an opportunity - bring the data science research workflow into one workspace to make data science research faster and easier for less experienced data scientists. 

Research Process 

Step 1: Preparation

I collaborated with interaction designers, visual designers, and offering managers at IBM Design to understand their research needs and questions. We led a workshop to establish desired research outcomes. As a team, we created an empathy map of to document assumptions about our user, and to form our initial basis for research objectives. After understanding our gaps of knowledge, I prioritized our research needs in order to define my own three research objectives. I created a research plan and shared it with the design team for feedback:


  1. What is the data scientist research process? What pain points do less experienced data scientists encounter while conducting research? 
  2. How do data scientists leverage online resources during each part of their data science process? What online resources were they using and what value did they find in the artifacts they used.

After completing my research plan, I begun recruiting users via UserTesting, emailing data scientists within the company, intercept-messaging, and contacting connections at San Francisco's Data Science Institute, Galvanize. Recruiting users was my main pain point during my research project at IBM.  Desk research was a huge part of this phase. I looked through more than 30 archived interviews for data relevant to my research objectives. Going through primary existing research helped me understand complex data science terminology and brainstorm interview questions. 

Step 2: Data Collection & Documentation

After doing some initial subject research and recruiting users, I begun collecting data. I conducted in-depth interviews with data scientists in order to understand the Data Science research process. I conducted 10 one-on-one interviews, attended 2 data science meet-ups, and shadowed data scientists in a classroom and office setting.

I used human-centered design methodologies and IBM design studios research best practices to uncover pain points and needs of the user.  I tried to get non-researchers involved in this phase by inviting designers on my team to scribe or observe interviews. After conducting interviews, I'd add to a persona empathy map I had posted near my desk: 

Step 3: Data Synthesis

synthesized my research and divided the data science research process into four phases: discovery, understanding, evaluating, building, and sharing phase. I grouped similar pain points and trends among the data, and begun forming my hypothesis and further defining the problem.  



High level research insights:

Data scientists conduct research in order to find a starting point, get inspired, surpass a roadblock, and avoid duplicate work. Most commonly, data scientists rely on peer-supported resources that have been critiqued, discussed, and shared in online or offline data science communities. I tried to understand what value data scientists find in the communities they are a part of, and what specific community resources they prefer to use. 

We discovered that the community is the strongest tool a data scientist can access. 

We came to this conclusion after conducting interviews and watching a data scientist browse for resources. Whether he was scrolling through lists in community database or scanning forums for code, he had criteria for assessing the value of these artifacts. I watched him pull code from several different projects and seek advice on API implementation from a forum. It became obvious that a data science project can’t just stand on its own. It needs support and validation from the community.  An artifact, whether code snippet, API, or academic paper, is only as strong as the people who use it. The more an artifact is employed, the more people there are to discuss it. The public use of an artifact sharpens its quality. The value of an asset is determined by the discussion around it — its documentation, its versioning, and its critics. The evolution of data science is fueled by the collaborative processes of building off of each other’s work. This understanding led us to an important design principle, Community first.


Step 4: Construction

After synthesizing my insights, I explored ways to make my research digestible. I designed artifacts that provide a high-level, big picture analysis of my research. I pulled out high level insights that portrayed the "why" of a pain point. 

I used these artifacts and insights in order to design a research presentation for the design team.

Influencing Strategy:

I shared my research insights and design recommendations during 3 different research presentations. After my presentations, I led brainstorming sessions with the design team where we further defined an as-is journey map of the data science process and identified pain points that could be an opportunity for design.  At IBM, pain points are translated into "Hills" to empower the design/non-design teams to ideate around and communicate user and product needs.


After reflecting on the as-is data science workflow, we ideated blue sky ideas to tackle user pain points. We then designed and got user feedback on a "to-be" storyboard that incorporated ideated features that solved user pain points. 


Design Recommendations: Community Page

Due to confidentiality reasons, I can only summarize the design recommendations I provided for the design team:

Data Science Experience Community is a library within Data science experience that extracts valuable artifacts such as data sets, academic papers, notebooks, and other research artifacts into a single workspace. Data scientists can discuss, bookmark, upvote, tag, and filter through various artifacts in order to quickly find credible resources. Essentially, we help data scientists quickly harness the power of their community. 


Artifact Classification and Search Filters: I suggested search filters that match the data scientist's current mental model as they search for data. During a contextual inquiries, I observed a data scientist as he worked on his data science capstone project. At any given point, he needed a tutorial, an academic paper, or a data set to move to the next step of their process, and each of these assets had to be saved and interacted with in a different environment. For example, Data scientists first look for specific data types, languages they know, and a specific machine learning algorithm. The process they used to manage and sift through their resources helped us establish a tentative system for artifact classification within product. 

Resource Meta-Data: I suggested specific meta-data that should be available on each resource "card". This information will help data scientists quickly assess the credibility of a resource. For example, if data scientists are working with a specific data set, they should be able to view how many people "liked" or downloaded this data, and what open-source algorithms/models people have built using this data set. 

Resource Recommendations: This feature will help data scientists quickly find similar projects. These projects will help data scientists surpass a road block and avoid duplicating code. Data scientists should be prompted to notebooks and papers that contain similar domain-specific data points. This will help data scientists learn about the domain and understand how other people are dealing with problems within the industry. They are prompted to notebooks that use the same machine learning algorithm as they intend to use, these notebooks would help them learn how to practically apply an algorithm and avoid duplicate work.  

Valuable Resources: Data scientists have different research processes and resource preferences, often dependent on how much experience they have. After understanding data scientists resource preferences, I recommended specific resources the DSX community page should gather and make available for it's users. For example, recently graduated data scientists refer to more academic papers and conference presentations than experienced data scientists because they are still learning the basics of algorithms. They would like to see a collection of academic and conference presentations in one place. As a team, we also ideated ways we could use Watson Machine Learning platform to provide more relevant and personalized resources for Data Scientists. 

Step 5: User Testing, Prototype Iteration, and a Roadblock 

Unfortunately, the project was handed off to another design team at this point of our project due to a global re-organization of the IBM analytics department. The next steps of this project would have been to get user feedback on our initial prototype that incorporated the ideated features. Though handing off this project was difficult for me and my team, this process taught me a lot about effective communication across design, engineering and product management teams. Checkout the next the organizational design project I worked on to increase collaboration across product teams at IBM!

I continued my work on Data Science Experience six months later, during my summer design internship with the same team. Our team won a Red Dot Award for Communication Design in the Interface & User Experience Design, and received an Honorable Mention in the General Excellence category of the Fast Company 2017 Innovation By Design Awards.