Welcome to DATA 110!
Originally developed with Dr. Rick Marks, then modified with Dr. Can Chen and Dr. Chudi Zhong.
The Introduction to Data Science course is a broad, high-level survey of the major aspects of data science including ethics, best practices in communication (e.g. data visualization), mathematical/statistical concepts, and computational thinking. Students will gain an understanding of the fundamentals of data science to support more in-depth, advanced coursework that are requirements for the BA and BS in Data Science. The curriculum and format are designed specifically for students who are considering a major in data science and may not have taken statistics or computer science courses. The course is a requirement of the BA and BS in Data Science.
Schedule (Spring ’25)
Introduction to Data Science
Week | Date | Lec | Topic(s) | Release | Due |
---|---|---|---|---|---|
1 | Jan 8 | 1 | Syllabus, What is data science? | ||
Jan 10 | Lab 1 | ||||
2 | Jan 13 | 2 | Data modality, Tables | HW1 |
Python Programming
Week | Date | Lec | Topic(s) | Release | Due |
---|---|---|---|---|---|
Jan 15 | 3 | What is programming? Variables, data types | |||
Jan 17 | Lab 2 | ||||
3 | Jan 20 | No class | |||
Jan 22 | 4 | Python control flow (if, for) | HW1 | ||
Jan 24 | Lab 3 | ||||
4 | Jan 27 | 5 | Table column operations | HW2 | |
Jan 29 | 6 | Table row operations | |||
Jan 31 | Lab 4 |
Data Communication
Week | Date | Lec | Topic(s) | Release | Due |
---|---|---|---|---|---|
5 | Feb 3 | 7 | Guest lecture: Odum Institute & UNC libraries Data lifecycle and ethics (if time) |
||
Feb 5 | 8 | Line plots, Scatter plots | HW2 | ||
Feb 7 | Lab 5 | ||||
6 | Feb 10 | No class | HW3 | ||
Feb 12 | 9 | Bar charts, Histograms |
Statistics
Week | Date | Lec | Topic(s) | Release | Due |
---|---|---|---|---|---|
Feb 14 | Lab 6 | ||||
7 | Feb 17 | 10 | Summary Statistics, Boxplot | ||
Feb 19 | 11 | Probability, Uniform distribution | HW3 | ||
Feb 21 | Exam 1 review | ||||
8 | Feb 24 | Exam 1 | HW4 | ||
Feb 26 | 12 | Sampling | |||
Feb 28 | Lab 7 | ||||
9 | Mar 3 | 13 | Association, Correlation, Causality | ||
Mar 5 | 14 | Gaussian distribution, Inference | HW4 | ||
Mar 7 | TBD | ||||
10 | Mar 10 | No class | |||
Mar 12 | No class | ||||
Mar 14 | No class | ||||
11 | Mar 17 | 15 | Hypothesis testing | HW5 | |
Mar 19 | 16 | Hypothesis testing | |||
Mar 21 | Lab 8 |
Prediction
Week | Date | Lec | Topic(s) | Release | Due |
---|---|---|---|---|---|
12 | Mar 24 | 17 | Data Science life cycle, Modeling Group project details |
Project proposal | |
Mar 26 | 18 | Intro to machine learning | HW5 | ||
Mar 28 | Lab 9 | ||||
13 | Mar 31 | 19 | Linear regression | HW6 | Project proposal |
Apr 2 | 20 | Decision trees | |||
Apr 4 | Lab 10 | ||||
14 | Apr 7 | 21 | Underfitting and overfitting | ||
Apr 9 | 22 | KNN | HW6 | ||
Apr 11 | Lab 11 | ||||
15 | Apr 14 | Exam 2 review | |||
Apr 16 | Exam 2 |
More on Data Science
Week | Date | Lec | Topic(s) | Release | Due |
---|---|---|---|---|---|
Apr 18 | No class | ||||
16 | Apr 21 | 23 | K-means Clustering | ||
Apr 23 | 24 | Big data, Next steps for data scientists | |||
Apr 25 | Office hours for final group project | ||||
17 | Apr 28 | Time for final group project | |||
May 5 | Final “exam” (poster presentation) |
Assignments
Weekly Lab Exercises [10%]
Lab assignments are small group exercises that are intended to be completed during Friday recitations, which are led by the TAs. You should submit individually at the end of 50 minutes. At the latest, it must be submitted by 11:59pm of the same day. Labs will be graded for participation and reasonable progress, and attendance is required. One lowest score out of the 11 labs will be dropped automatically. Excused absences policy and request form: link. Do not request extension or excused absences via email.
Homework [30%]
You may discuss problems with other students/course staff, but complete and submit independently. Due dates for every Homework assignment are provided on the course syllabus and course schedule. Unless otherwise stated, assignments are due on those days at 11:59pm. Submit them to Gradescope, which can be accessed via Canvas. You have 3 late days that you can use throughout the semester. They must be requested at least 24 hours in advance by filling out the form: link. Do not request extension via email.
- HW1 (5%): Introduction to Data Science
- HW2 (5%): Introduction to Programming and Pandas
- HW3 (5%): Data Visualization and Communication
- HW4 (5%): Probability, Statistics, and Sampling
- HW5 (5%): Inference & Hypothesis Testing
- HW6 (5%): Machine Learning
Exams [35%]
- Exam 1 (15%) will cover week 1 through week 7 (summary statistics and boxplot).
- Exam 2 (20%) will cover week 7 (probability and uniform distribution) through week 14.
They are pen-and-paper exams during the usual lecture time in the usual lecture room. Question types include multiple choice, True/False, matching, fill-in-the-blank, short answers (no more than a paragraph).
Group Project [25%]
Poster session presentation + peer review during final exam block. Group project will be assessed based on:
- Group project proposal
- Group poster presentation
- Group write-up and code
- Individual reflection and teamwork assessment
- Individual feedback to other teams
Extra Credit [up to 1%]
Based on in-class and online participation
Course Goals & Student Learning Outcomes (SLOs)
The goal of this course is to lay the foundation for subsequent courses required for the BS and BA in Data Science, as well as to introduce core concepts and ideas in the field of data science to any student, regardless of major. The course provides a high-level survey of current and emerging concepts in key data science domains, including computational thinking, mathematics/statistical skills, data management, communication best practices, and ethics. The course will offer hands-on analysis of real-world datasets, exposing students to the type of insights and problem-solving that the field of data science can deliver.
Accessibility and Equity
Students from all backgrounds should be able to take Intro to Data Science. As such, no prerequisites in statistics or programming are required for the course; only basic high-school algebra are necessary.
Diversity
Intro to Data Science can be taken by students from any major across campus and should be acceptable as a potential pre-requisite for statistics, math, or computing many majors.
Pedagogical Clarity
Intro to Data Science is designed to first teach introductory programming, then statistics through a computation lens, and ultimately concludes with basic methods in inference.
Core Concepts and Learning Outcomes
Data Management: Describe differences in types of data and the ways in which individuals and organizations store, manage, and interact with data. Identify and appropriately acknowledge sources of data. Apply basic data cleaning techniques to prepare data for analysis.
Mathematical and Statistical Foundations: Select and use appropriate data analytics and statistical techniques to discover new relationships, deliver insights into research problems or organizational processes, and support decision-making. Draw accurate and useful conclusions from data analysis.
Computational Thinking: Build and understand algorithms for analyzing large data sets and accurate numerical modeling for problems.
Communication: Convey data analyses through written and oral communication skills as well as select the appropriate tools to visually display data.
Responsible Data Science: Identify security, privacy protection, governance, and ethical considerations in data management. Differentiate between ethical and unethical uses of data science.
This course is designed to meet the general education requirement of Quantitative Reasoning. Below are the corresponding learning outcomes and student questions from UNC Chapel Hill’s IDEAs in Action General Education Curriculum.
IDEAs in Action General Education Curriculum
FC-QUANT
Student Learning Outcomes:
- Summarize, interpret, and present quantitative data in mathematical forms, such as graphs, diagrams, tables, or mathematical text.
- Develop or compute representations of data using mathematical forms or equations as models and use statistical methods to assess their validity.
- Make and evaluate important assumptions in the estimation, modeling, and analysis of data, and recognize the limitations of the results.
- Apply mathematical concepts, data, procedures, and solutions to make judgments and draw conclusions.
- Synthesize and present quantitative data to others to explain findings or to provide quantitative evidence in support of a position.
Questions for Students:
- What is the role of mathematics in organizing and interpreting measurements of the world?
- How can mathematical models and quantitative analysis be used to summarize or synthesize data into knowledge and predictions?
- What methodology can we apply to validate or reject mathematical models or to express our degree of confidence in them?