Syllabus

Downloadable Version

We provide an annotated downloadable pdf version of the syllabus. It also also linked-to from other sections of the site.

The core content of the downloadable syllabus follows below for easier on-line browsing. Note that many of the listed dates and times still correspond to the Fall 2020 instance of the predecessor course STAT 430 “DSPM”.

Overview

Data science is emerging as a field that is revolutionizing science and industries alike. Work across nearly all domains is becoming more data driven, affecting both the jobs that are available and the skills that are required. As more data and ways of analyzing them become available, more aspects of the economy, society, and daily life will become dependent on data.

Source: Envisioning the Data Science Discipline: The Undergraduate Perspective. National Acadamies, 2018.

This course provides the principal programming foundations for working with data at scale.

Data analysts are in demand, and particularly those who can walk the walk and not only talk the talk. This course aims for a “hands-on, roll-up-your-sleeves” learning-by-doing approach which can be highly rewarding to those willing to put in the required effort.

Learning Objectives

After this course, students should be able to …

  • analyze code in multiple data science programming languages;
  • write short programs in several relevant languages;
  • manipulate files and data using the command-line;
  • utilize git for version control, collaboration and publishing;
  • use R as a language and environment for programming with data;
  • solve new data science problems using these tools;
  • know a variety a tools based on first-hand experience;
  • show your skills via a group-project with a topic of your choice.

Credits

The course counts for three credits for undergraduate students, and for four credits for graduate students. Graduate students are required to submit more extensive homework assignments.

Key Facts

Core Content

  • shell for managing files, commands, information flow, …
  • git for modern version control supporting social computing
  • sql as a base layer for data management and control
  • markdown for programmatic control of html, pdf, … communication
  • R for programming with data, and our core building block
  • plus some extras such as Docker and more

Instructional Staff

Assuming all team members return, the setup may be as follows:

Title Name Office Hours
Instructor Dirk Eddelbuettel Online/Zoom Mon 7pm - 8pm
TA Alton Barbehenn Online/Zoom Tue 2pm - 3pm
Online/Zoom Wed 2pm - 3pm
TA TBD TBD TBD

Lecture Location

What When
Location online
Times no fixed times, aiming for weekly availability
Hours Office hours as scheduled, see below

Homework Schedule

Homework assignments (generally) cover the preceding four lectures, and are not cumulative. They prepare for the quiz (see next section) covering the same period, and permit students to do rigorous exercises which are graded electronically using PrairieLearn and CBTF.

Week Given Due
Homework 1 – Week 2 Thu, Sep 3 Thu, Sep 10
Homework 2 – Week 4 Thu, Sep 17 Thu, Sep 24
Homework 3 – Week 6 Thu, Oct 1 Thu, Oct 8
Homework 4 – Week 8 Thu, Oct 15 Thu, Oct 22
Homework 5 – Week 10 Thu, Oct 29 Thu, Nov 5
Homework 6 – Week 12 Thu, Nov 12 Thu, Nov 19

Homeworks are generally released at 10:00am and due the following week at 10:00am. Graduate students receive (generally two) additional required questions. These questions are typically more substantial in nature and require extra effort than the regular questions for both undergraduate and graduate students. Undergraduates may opt to answer one or both of these questions for additional points, or challenges. Scoring is however capped at 100%.

Computer-Based Testing Quiz Schedule

Quizzes follow the bi-weekly schedule of the homework, and cover the same (typically two week) set of lectures, and are also not cumulative.

Quiz Date Weeks Covered
Quiz 1 Thu, Sep 10 Weeks 1 and 2
Quiz 2 Thu, Sep 24 Weeks 3 and 4
Quiz 3 Thu, Oct 8 Weeks 5 and 6
Quiz 4 Thu, Oct 22 Weeks 7 and 8
Quiz 5 Thu, Nov 5 Weeks 9 and 10
Quiz 6 Mon, Nov 19 Weeks 11 and 12

Each exam will be a CBTF Online session of 50 minutes. Quiz times are generally at 11:00am, with alternate times at 8pm the same day and 8am the following day.

Prerequisites

Prior to taking this course, students should have:

  • Taken a rigorous Statistics course such as STAT 410.
  • Motivation for participation in an online class: readings, exercises, …
  • Basic computer skills

Online Access and Identification

The course is delivered primarily online and tested online. Students use Single-Sign-On with the University of Illinois ‘netid’ to access

  • all lectures and videos stored on \href{https://uofi.app.box.com}{uofi.app.box.com}
  • access to \href{https://rstudio.cloud}{RStudio Cloud} for computing resources
  • \href{https://github.com}{GitHub} via a U of I-administered instance also behind SSO
  • CBTF and PrairieLearn to access homeworks and quizzes
  • compass2g (or its replacement) for grade and other course information

In addition, CBTF Online quizzes use CBTF proctors for student identity verification. The group project requires (recorded) group presentations also identifying each student.

Office Hours

This course offers office hours from different members of the course staff that are held at throughout the week at pre-scheduled times.

GitHub Forum

For class discussion, we will use a GitHub repository and its issue system. This forum will be private and restricted to those in the course.

It is very important that each student

  • registers a Github account (unless they already have one); since the Fall 2020 term we have been using a University of Illinois Single-Sign-On administered GitHub instance.
  • let the instructors know about the Github id so that we can invite the student to the (private, controlled via Single-Sign-On with NetId) course discussion project

Email

Before you start writing an e-mail to a member of the course staff please make sure your question is not:

  • Already answered in this syllabus or course FAQ: the syllabus serves as the guiding document for the course.
  • About exercises or homework: Questions should be asked via GitHub issues so that all students have access to the answer.
  • A technical issue or code error
    • Try to google the error verbatim (e.g. copy and paste into Google).

But please ensure your e-mails meet the following criteria:

  • The e-mail must be sent from an @illinois.edu account.
  • The start of the subject line should contain the tag: [STAT 447]
  • It should be followed by a space and a brief description.
  • Good: ‘[STAT 447] Cannot load data file: error …’
  • Bad: ‘[STAT 447] Need help’ or ‘[STAT 447] Code not working…’.
  • Use the course Staff E-Mail address. TODO: Setup for 447

We try our best to respond within 24 hours.

Do not post homework code. The campus rules on academic integrity apply to all communication, including email.

External Tutors

Please see the FAQ item on for hire tutors.

Assessments

Attendance

As an on-line course there is no attendance count. You are strongly encouraged to follow all the lecture slides and video, study the readings and possibly some or most of the extra readings. Most importantly, you need to try the examples and code we show, and experiment with it. As a proxy for class participation, we consider participation in the Github issue topic discussion which, for an online class, is the closest we have to class discussions.

Homework

Homework assignments serve as a way to interact with the material outside of the classroom. Homework will be due at either 10:00 AM on the assigned due date, which should generally be Thursday. We score the mean of the top five homeworks, i.e. with the lowest homework score being dropped. As this gives one automatic “out”, late homework will generally not be accepted.

In general, there will be no exceptions to this policy. Please start early, make sure your environment is working correctly, and that you are able to produce a working document. We have two course assistants with on-campus office hours, but in order to ask meaningful questions you need to try answering the material first.

Collaboration Policy

While working on homework, students are encouraged to study in groups. But students should strive to independently supply answers to the homework problems. As we use an automated platform, submissions can be compared easily. Academic integrity standards apply.

Distribution Policy

Each homework will be distributed via PrairieLearn,

will be stored as combination of your NetId and the question.

Assignment Submission

Here are a few do and don’t tips for the PrairieLearn web submission. Consider the following stanza from an actual homework:

# Enter your code below: Do not alter the function signature:
# ensure it remains named 'iris_summary' and takes one argument.
# Ensure you return a data.frame as indicated in the question.

iris_summary <- function(irisdata) {

  # Enter code here

}

Consider the following recommendations carefully:

  • Do follow the structure of the provided function.
  • Do enter code where it says # Enter code here.
  • Do not write code before the opening brace.
  • Do not write code after the closing brace.
  • Do use the supplied irisdata object. The function signature clearly states that that is the (only) input you need and are given.
  • Do not load other data. You do not need data(something). You do not need to load anything (unless specifically asked when a question is about data loading or saving).
  • Do use the stated variable names: when the interface (or our instructions) say irisdata, do not deviate to iris or iris_data or any other form. Do write code to match the name exactly.
  • Do not load other packages unless asked to do so. We generally expect you to use an explicitly named package, or just the functions already in R, i.e. what is called ‘base R’.
  • Do follow the instructions. When it asks to return a data.frame do not return a matrix or data.table. Return a data.frame.
  • Do use the GitHub issue ticket linked to each question.
  • Do not post code or (partial or complete) answers at GitHub.

Grading

Each homework assignment will be a variable number of points; however, each homework assignment will have equal weight towards your final grade.

As stated above, we count best five out of six.

Quizzes

Instead of examinations, there will be to six weekly quizzes—see the section Schedule. The quizzes, just like the homework, will (generally) focus on the preceding (two weeks of) lectures and are (generally) not cumulative over the full course content.

And just like with the homework, you can drop one quiz grade over the course of the semester. We aim for six quizzes in total, and with the lowest quiz score being dropped the score will be the mean of the top five quiz scores.

Because of Covid-19, the Fall 2020 instance of the course now uses CBTF Online (instead of in-person testing at the CBTF Facility). CBTF Online is proctored by CBTF stuff over Zoom, and depends on the availability of the proctors. We will have one default time for each quiz along with two alternate times each. Use the CBTF ‘Conflict Request’ form to request an alternate time.

The policies of the CBTF are the policies of this course, and academic integrity infractions related to the CBTF are infractions in this course.

If you have accommodations identified by the Division of Rehabilitation-Education Services (DRES) for exams, please take your Letter of Accommodation (LOA) to the CBTF proctors in person before you make your first quiz reservation. The proctors will advise you as to whether the CBTF provides your accommodations or whether you will need to make other arrangements with your instructor.

Any problem with testing in the CBTF must be reported to CBTF staff at the time the problem occurs. If you do not inform a proctor of a problem during the test then you forfeit all rights to redress.

Group Project

There are several components associated with the group final project:

  • Project Proposal: The repository should contain an outline of what is planned, the sources of the data, possible transformation and possible modeling strategies and/or possible data visualizations. This can be provided via the README.md file of the repository.

  • Project Report: The project report can be thought of as an (informal) paper. Guided by the format of an academic paper, it describes the projects in a succinct yet complete fashion along with references. Markdown should be used to write it, the result can be either in html or pdf format.

  • Project Presentation and Slides: At the end of terms, a short recorded group video presentation, akin to a lightning talk, should introduce, present and summarize the work of the project in a form that is suitable for a general audience. A length of five minutes is a goal. The presentation should be supported by five to six slides, also produced in Markdown.

  • Evaluation of Peers, and Evaluations from Peers: We require a short informal statement of each team member briefly stating who within in the team did (roughly) what percentage of the work.

The Group Project provides an excellent opportunity to “shine” and to demonstrate your passion, skill, and capabilities for data science programming work. It provides a great chance to make a mark to create something special and distinguished.

Exams

There are no midterm or final examinations in this course. Instead, we have homework, quizzes, and a group project.

Late or Missing Work

Late work will not be accepted for either homework or the group project. <!–Watch the deadlines, and plan accordingly.

As the date and time of an exam is chosen by a student over an examination window, there will be no make-up exams administered once the window closes.–>

Course Grades

Type Weight
Homework One Third
Quizzes One Third
Group Project One Third

Grading is discretionary, and performed by the instructor and the course assistant(s). There are no retakes; we mark ‘best five out of six’ for homework and quizzes so everybody gets to drop one each.

Grading Scale

Minimum Grade Points
A- to A+ 90 to 100
B- to B+ 80 to 89.99
C- to C+ 70 to 79.99
D- to D+ 60 to 69.99
F below 60

Each ten point range is equally split over the three components (i.e. from minus to plus). Grades may be curved at the end of term before being finalized.

University Policies

Academic Integrity

The official University of Illinois policy related to academic integrity can be found in Article 1, Part 4 of the Student Code. Section 1-402 in particular outlines behavior which is considered an infraction of academic integrity. These sections of the Student Code will be upheld in the STAT 430 classroom. Any violations will be dealt with in a swift, fair and strict manner.

You may discuss methods for completing assignments with other students, but the execution of these methods and the preparation of the document must be done independently. Furthermore, there can be no discussion with other students or collaboration of any kind on exams. Sufficient evidence of sharing results, collaborating on written assignments, or simply relying on internet resources will generally result in:

  • First offense: receiving an undroppable zero on the assignment and being written up for an academic integrity violation.
  • Second offense: receiving an F in the course, an academic integrity violation, and recommendation for expulsion from the University.

If the evidence is indicative of a larger pattern, then the harshest penalty will be pursued.

Note that cheating includes both obtaining others’ work, as well as distributing your own work.

  • You may discuss the assignment with your classmates, but your final answers must be your own. Your final document should be created independently.
  • To avoid any issues, do note copy and paste code. (With an exception for code provided for the course.)
  • Do not share RMarkdown or other submission files.

If we detect academic integrity violations, we will contact you through the FAIR system.

In short, please do not cheat.

Support resources and supporting fellow students in distress

As members of the Illinois community, we each have a responsibility to express care and concern for one another. If you come across a classmate whose behavior concerns you, whether in regards to their well-being or yours, we encourage you to refer this behavior to the Student Assistance Center (333-0050) or online. Based upon your report, staff in the Student Assistance Center reaches out to students to make sure they have the support they need to be healthy and safe.

Further, we understand the impact that struggles with mental health can have on your experience at Illinois; significant stress, strained relationships, anxiety, excessive worry, alcohol/drug problems, a loss of motivation, or problems with eating and/or sleeping can all interfere with optimal academic performance. We encourage all students to reach out to talk with someone, and want to make sure you are aware that you can access mental health support at the Counseling Center or McKinley Health Center. For mental health emergencies, you can call 911 or walk-in to the Counseling Center, no appointment needed.

Accessibility

To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to the DRES website.

Disclaimer

The instructor reserves the right to make changes that are academically advisable. Such changes, if any, will be announced in class. Please note that it is your responsibility to attend the class and keep track of the proceedings.