Welcome to DATA 110!

Originally developed with Dr. Rick Marks, then modified with Dr. Can Chen and Dr. Chudi Zhong.

The Introduction to Data Science course is a broad, high-level survey of the major aspects of data science including ethics, best practices in communication (e.g. data visualization), mathematical/statistical concepts, and computational thinking. Students will gain an understanding of the fundamentals of data science to support more in-depth, advanced coursework that are requirements for the BA and BS in Data Science. The curriculum and format are designed specifically for students who are considering a major in data science and may not have taken statistics or computer science courses. The course is a requirement of the BA and BS in Data Science.


Schedule (Spring ’25)

Introduction to Data Science

Week Date Lec Topic(s) Release Due
1 Jan 8 1 Syllabus, What is data science?
Jan 10 Lab 1
2 Jan 13 2 Data modality, Tables HW1

Python Programming

Week Date Lec Topic(s) Release Due
Jan 15 3 What is programming? Variables, data types
Jan 17 Lab 2
3 Jan 20 No class
Jan 22 4 Python control flow (if, for) HW1
Jan 24 Lab 3
4 Jan 27 5 Table column operations HW2
Jan 29 6 Table row operations
Jan 31 Lab 4

Data Communication

Week Date Lec Topic(s) Release Due
5 Feb 3 7 Guest lecture: Odum Institute & UNC libraries
Data lifecycle and ethics (if time)
Feb 5 8 Line plots, Scatter plots HW2
Feb 7 Lab 5
6 Feb 10 No class HW3
Feb 12 9 Bar charts, Histograms

Statistics

Week Date Lec Topic(s) Release Due
Feb 14 Lab 6
7 Feb 17 10 Summary Statistics, Boxplot
Feb 19 11 Probability, Uniform distribution HW3
Feb 21 Exam 1 review
8 Feb 24 Exam 1 HW4
Feb 26 12 Sampling
Feb 28 Lab 7
9 Mar 3 13 Association, Correlation, Causality
Mar 5 14 Gaussian distribution, Inference HW4
Mar 7 TBD
10 Mar 10 No class
Mar 12 No class
Mar 14 No class
11 Mar 17 15 Hypothesis testing HW5
Mar 19 16 Hypothesis testing
Mar 21 Lab 8

Prediction

Week Date Lec Topic(s) Release Due
12 Mar 24 17 Data Science life cycle, Modeling
Group project details
Project proposal
Mar 26 18 Intro to machine learning HW5
Mar 28 Lab 9
13 Mar 31 19 Linear regression HW6 Project proposal
Apr 2 20 Decision trees
Apr 4 Lab 10
14 Apr 7 21 Underfitting and overfitting
Apr 9 22 KNN HW6
Apr 11 Lab 11
15 Apr 14 Exam 2 review
Apr 16 Exam 2

More on Data Science

Week Date Lec Topic(s) Release Due
Apr 18 No class
16 Apr 21 23 K-means Clustering
Apr 23 24 Big data, Next steps for data scientists
Apr 25 Office hours for final group project
17 Apr 28 Time for final group project
May 5 Final “exam” (poster presentation)

Assignments

Weekly Lab Exercises [10%]

Lab assignments are small group exercises that are intended to be completed during Friday recitations, which are led by the TAs. You should submit individually at the end of 50 minutes. At the latest, it must be submitted by 11:59pm of the same day. Labs will be graded for participation and reasonable progress, and attendance is required. One lowest score out of the 11 labs will be dropped automatically. Excused absences policy and request form: link. Do not request extension or excused absences via email.

Homework [30%]

You may discuss problems with other students/course staff, but complete and submit independently. Due dates for every Homework assignment are provided on the course syllabus and course schedule. Unless otherwise stated, assignments are due on those days at 11:59pm. Submit them to Gradescope, which can be accessed via Canvas. You have 3 late days that you can use throughout the semester. They must be requested at least 24 hours in advance by filling out the form: link. Do not request extension via email.

  • HW1 (5%): Introduction to Data Science
  • HW2 (5%): Introduction to Programming and Pandas
  • HW3 (5%): Data Visualization and Communication
  • HW4 (5%): Probability, Statistics, and Sampling
  • HW5 (5%): Inference & Hypothesis Testing
  • HW6 (5%): Machine Learning

Exams [35%]

  • Exam 1 (15%) will cover week 1 through week 7 (summary statistics and boxplot).
  • Exam 2 (20%) will cover week 7 (probability and uniform distribution) through week 14.

They are pen-and-paper exams during the usual lecture time in the usual lecture room. Question types include multiple choice, True/False, matching, fill-in-the-blank, short answers (no more than a paragraph).

Group Project [25%]

Poster session presentation + peer review during final exam block. Group project will be assessed based on:

  1. Group project proposal
  2. Group poster presentation
  3. Group write-up and code
  4. Individual reflection and teamwork assessment
  5. Individual feedback to other teams

Extra Credit [up to 1%]

Based on in-class and online participation


Course Goals & Student Learning Outcomes (SLOs)

The goal of this course is to lay the foundation for subsequent courses required for the BS and BA in Data Science, as well as to introduce core concepts and ideas in the field of data science to any student, regardless of major. The course provides a high-level survey of current and emerging concepts in key data science domains, including computational thinking, mathematics/statistical skills, data management, communication best practices, and ethics. The course will offer hands-on analysis of real-world datasets, exposing students to the type of insights and problem-solving that the field of data science can deliver.

Accessibility and Equity

Students from all backgrounds should be able to take Intro to Data Science. As such, no prerequisites in statistics or programming are required for the course; only basic high-school algebra are necessary.

Diversity

Intro to Data Science can be taken by students from any major across campus and should be acceptable as a potential pre-requisite for statistics, math, or computing many majors.

Pedagogical Clarity

Intro to Data Science is designed to first teach introductory programming, then statistics through a computation lens, and ultimately concludes with basic methods in inference.

Core Concepts and Learning Outcomes

  • Data Management: Describe differences in types of data and the ways in which individuals and organizations store, manage, and interact with data. Identify and appropriately acknowledge sources of data. Apply basic data cleaning techniques to prepare data for analysis.

  • Mathematical and Statistical Foundations: Select and use appropriate data analytics and statistical techniques to discover new relationships, deliver insights into research problems or organizational processes, and support decision-making. Draw accurate and useful conclusions from data analysis.

  • Computational Thinking: Build and understand algorithms for analyzing large data sets and accurate numerical modeling for problems.

  • Communication: Convey data analyses through written and oral communication skills as well as select the appropriate tools to visually display data.

  • Responsible Data Science: Identify security, privacy protection, governance, and ethical considerations in data management. Differentiate between ethical and unethical uses of data science.


This course is designed to meet the general education requirement of Quantitative Reasoning. Below are the corresponding learning outcomes and student questions from UNC Chapel Hill’s IDEAs in Action General Education Curriculum.

IDEAs in Action General Education Curriculum

FC-QUANT

Student Learning Outcomes:

  1. Summarize, interpret, and present quantitative data in mathematical forms, such as graphs, diagrams, tables, or mathematical text.
  2. Develop or compute representations of data using mathematical forms or equations as models and use statistical methods to assess their validity.
  3. Make and evaluate important assumptions in the estimation, modeling, and analysis of data, and recognize the limitations of the results.
  4. Apply mathematical concepts, data, procedures, and solutions to make judgments and draw conclusions.
  5. Synthesize and present quantitative data to others to explain findings or to provide quantitative evidence in support of a position.

Questions for Students:

  1. What is the role of mathematics in organizing and interpreting measurements of the world?
  2. How can mathematical models and quantitative analysis be used to summarize or synthesize data into knowledge and predictions?
  3. What methodology can we apply to validate or reject mathematical models or to express our degree of confidence in them?