Need help with a data analysis project?
One of the most invaluable experiences our MS Data Analytics students have in the program is collaborating with organizations on team-based data analytics projects. These hands-on projects are a win-win: Well-trained students examine your data and answer your strategic questions, and our students get real-world experience in solving real problems through data analysis.
If your business, organization or nonprofit has a data project that could use the help of our data analytics students, let us know. We are looking for projects that involve reasonably large amounts of data. They will be completed during the spring or summer (June) semester. You will need to interact with the students three to four times during the project duration. A final presentation will be made to you about the analysis and findings.
Interested? Please read the project guidelines below, and then send an email to Manoj Agarwal, program director, at firstname.lastname@example.org with the following information:
- Organizational setting of the project
- Goals of the project
- Actions that the project will inform
- Internal and external data available for the project
- Potential research questions to be answered
- What broad analysis would you like the teams to do?
- Any ethical, privacy, security issues with the data?
- What deliverables do you expect?
- Any timeline deadlines?
- Contact person at your institution: name, email and phone
Please remember that the projects will be done in small teams of students who are not expert data scientists, so please scope the project accordingly. Please reach out to email@example.com with any questions.
Please consider the following guidelines as you think about your project request and scope your data science project. There are many approaches to scoping a problem. As always, the scoping process is fairly iterative and the scope might get refined both during the scoping process as well as during the project.
Step 1: Goals
Define the goal(s) of the project
This is the most critical step in the scoping process. Most projects start with a very vague and abstract goal (for example, improving education or healthcare), get a little more concrete (for example, increasing the percentage of students who will graduate on time or decreasing the number of children who get lead poisoning), and keep getting refined until the goal is both concrete and achieves the aims of the organization. This step is difficult because most organizations have not explicitly defined analytical goals for many of the problems they’re tackling. Sometimes, these goals exist but are locked implicitly in the minds of people within the organization. Other times, there are several goals that different parts of the organization are trying to optimize. The objective here is to take the outcome you are trying to achieve and turn it into a goal that is measurable. This will also help the student team focus on their main task.
Step 2: Actions
What actions/interventions do you have that this project will inform?
What actions can the organization take to achieve these goals? These actions often need to be fairly concrete. A well-scoped project ideally has a set of actions that the organizations is taking that can be now be better informed using data science. Generally, it’s a good strategy to first focus on informing existing actions instead of starting with completely new actions. Enumerating the set of actions allows the project to be actionable.
Step 3: Data
What data do you have access to internally? What data do you need? What can you augment from external and/or public sources?
Once you have determined the goals and actions, the next step is to find out what data sources exist inside (and outside) the organization that will be relevant to this problem and what data sources we need to solve this problem effectively. For each data source, it’s good practice to find out how it’s stored, how often it’s collected, what’s its level of granularity, how far back does it go, is there a collection bias, how often does new data come in?
You first want to make a list of data sources that are available inside the organization. This is an iterative process as well since most organization don’t necessarily have a comprehensive list of data sources they have.
- Matching the Data to the Actions
- This step also helps you figure out if your data matches the actions you need to inform. If the actions are individual level, then you most likely need data at an individual level. It’s important to match the granularity, frequency, and time horizon of the actions to the granularity, frequency, and time horizon of the data you have.
- External and/or Public Data
- Once you’ve determined what data you need and what data exists inside the organization, you then want to figure out what external and/or public data you can get that fills the gaps. Each domain often has commonly used data sources that you want to know about.
Step 4: Analysis
What analysis needs to be done? Does it involve description, detection, or prediction? How will the analysis be validated?
The final step in the scoping process is to determine the analysis that needs to be done to inform the actions using the data we have to achieve our goals.
The analysis can use methods and tools from different areas: computer science, machine learning, data science, statistics, and social sciences. One way to think about the analysis that can be done is to break it down into 4 types:
- Description: primarily focused on understanding events and behaviors that have happened in the past
- Detection: Less focused on the past and more focused on ongoing events. Detection tasks often involve detecting events and anomalies that are currently happening.
- Prediction: Focused on the future and predicting future behaviors and events.
- Optimization: Focused on optimizing resource allocations to maximize outcome of interest.
The questions to answer in this step are:
- What analysis needs to be done? Is this a descriptive analysis, a predictive model, or a detection or behavior change task? Often, the analysis involves several of the types of analysis described above
- How will the analysis be validated? What validation can be done using existing, historical data? What field trial needs to be designed to validate this in the field before it can be deployed?
- How will the analysis inform the actions?
Step 5: Ethical Considerations
Are there any privacy, transparency, discrimination/equity, and accountability issues around this project?
Some of the issues that might need to be discussed include:
- Do the people who “own” the data know you’re using it?
- What actions are you taking on individuals based on this data?
- Do the people you’re “targeting” know why and if they’re being “targeted”? What recourse do they have?
- Which stakeholders should know about which parts of the project?
- Who are the people responsible and accountable for all the things above?
Privacy/Confidentiality and Security
- What are the privacy considerations (legal as well as ethical)?
- How is the privacy of the individuals in the data being protected?
- What about confidentiality?
- What are the security considerations and protections? Who has access to which parts of the data? For what purposes? What is the security audit process?
Step 6: Working with the Student Teams
We do not want to impose a lot of time commitments from your side. But for the project to be successful, we would like to see the following:
- Getting the student teams to understand the “Business” and the data science problem you want to work on
- Are there any data security/privacy/access issues that are important? Who is allowed to have access to the data?
- Are there any Non-Disclosure Agreements (NDA) that have to be signed? At the completion of the projects, what information can the students put on their resumes and discuss with recruiters, while protecting your privacy?
- Which personnel can they contact as and when they have questions? How often is this access available?
- What are the deliverables you expect from the student teams (Presentations, reports, video reports, etc.)?
- What is the timeline for this project? Are there any deadlines that you are facing that need to be met?
The project guidelines are adapted from this resource.