We provide an annotated downloadable (draft) pdf version of the syllabus. It also also linked-to from other sections of the site. If there is a disagreement between the website and the pdf version, the website version may be newer. Let us know if you see differences. (See below for email etiquette.)
The core content of the downloadable syllabus follows below for easier on-line browsing. Note that some of the dates and times may correspond to earlier instances of STAT 447.
Data science is emerging as a field that is revolutionizing science and industries alike. Work across nearly all domains is becoming more data driven, affecting both the jobs that are available and the skills that are required. As more data and ways of analyzing them become available, more aspects of the economy, society, and daily life will become dependent on data.
Source: Envisioning the Data Science Discipline: The Undergraduate Perspective. National Acadamies, 2018.
This course provides the principal programming foundations for working with data at scale.
Data analysts are in demand, and particularly those who can walk the walk and not only talk the talk. This course aims for a “hands-on, roll-up-your-sleeves” learning-by-doing approach which can be highly rewarding to those willing to put in the required effort.
After this course, students should be able to …
The course counts for three credits for undergraduate students, and for four credits for graduate students. Graduate students are required to submit more extensive homework assignments.
shell
for managing files, commands, information flow, …git
for modern version control supporting social computingsql
as a base layer for data management and controlmarkdown
for programmatic control of html, pdf, … communicationR
for programming with data, and our core building blockThe setup and timetable for office hours is as follows:
Title | Name | Location | Hours | Type / Booking |
---|---|---|---|---|
Instructor | Dirk Eddelbuettel | Zoom | Mon 11am - noon | Open |
Zoom | Mon 4pm - 5pm | Open | ||
Zoom via calendly appointment | Thu 4pm - 5:30pm | 15m, one-on-one | ||
TA | Linjun Huang | Zoom | Wed 11am - noon | Open |
Zoom | Fru 11am - noon | Open |
We offer two types of office hours. The first type is open with an open door where you can walk in and out, attend every week, or never—as you see fit. The second type are individual one-on-one office hours that are fifteen minutes each, and which you book via the calendly link above. We ask that you limit your use of these to two or three per term to allow everybody a turn. Under genuinely exceptional circumstances, additional visits can be scheduled on demand. (Note that the Zoom links above differ per time slot. Make sure you pick the correct one.)
(See also schedule which has the same information.)
What | When |
---|---|
Location | online |
Times | no fixed times, aiming for weekly availability |
Hours | Office hours as scheduled, see below |
For each week, deliverables consists of
A (tentative, subject to change) list of lectures, generally two per week over a full (Spring) term, follows:
Week | Starting | Topics |
---|---|---|
1 | Jan 20 | Course Overview, RStudio, GitHub, General Setup |
2 | Jan 27 | Shell Lectures I and II |
3 | Feb 3 | Lecture on sed and awk; Markdown |
4 | Feb 10 | Git Lectures I and II |
5 | Feb 17 | SQL Lectures I an II |
6 | Feb 24 | R Foundations; R Data Input/Output |
7 | Mar 3 | R Data Wrangling; R Scripting |
8 | Mar 10 | data.table; dplyr |
Mar 17 | Spring Break: No Classes | |
9 | Mar 24 | Parallel R; Efficient R |
10 | Mar 31 | Visualization I and II |
11 | Apr 7 | Shiny; Guest Lecture (TBD) |
12 | Apr 14 | R Packages Lectures I and II |
13 | Apr 21 | GitHub Actions; Docker |
14 | Apr 28 | No lectures – time for project |
15 | May 5 | No lectures – time for project |
Homework assignments (generally) cover the preceding four lectures, and are (generally) not cumulative (though the last one may review earlier lectures). They prepare for the quiz (see next section) covering the same period, and permit students to do rigorous exercises which are graded electronically using PrairieLearn and PrairieTest.
Week | Given | Due |
---|---|---|
Homework 1 (Shell, Markdown) – Week 3 | Feb 6 @ noon | Feb 12 @ noon |
Homework 2 (Git,SQL) – Week 5 | Feb 20 @ noon | Feb 26 @ noon |
Homework 3 (R Part I) – Week 7 | Mar 6 @ noon | Mar 11 @ noon |
Homework 4 (R Part II) – Week 9 | Mar 27 @ noon | Apr 2 @ noon |
Homework 5 (Visualization, Shiny) – Week 11 | Apr 10 @ noon | Apr 16 @ noon |
Homework 6 (Automation) – Week 13 | Apr 24 @ noon | Apr 30 @ noon |
These are indicative dates which may be adjusted as needed.
Homeworks are generally released at noon, and due a week later at noon. Note that as the spring break, as well as ‘busier’ times at the CBTF site have to be accomodated, not all homeworks follow the Thursday to Thursday schedule. Graduate students receive (generally two) additional required questions. These questions are typically more substantial in nature and require extra effort than the regular questions for both undergraduate and graduate students. Undergraduates may opt to answer one or both of these questions for additional points, or challenges. Scoring is however capped at 100%.
The following dates have been (tentatively) reserved (but are as always subject to change):
Quiz | First Date | Last Date | Self-Reserve | Weeks Covered |
---|---|---|---|---|
Quiz 1 (Shell, Markdown) | Feb 13 | Feb 16 | Jan 30 | Weeks 2 and 3 |
Quiz 2 (Git, SQL) | Feb 27 | Mar 2 | Feb 13 | Weeks 4 and 5 |
Quiz 3 (R Part I) | Mar 12 | Mar 14 | Feb 27 | Weeks 6 and 7 |
Quiz 4 (R Part II) | Apr 3 | Apr 6 | Mar 13 | Weeks 8 and 9 |
Quiz 5 (Visualization, Shiny) | Apr 17 | Apr 23 | Apr 3 | Weeks 10 and 11 |
Quiz 6 (Automation) | May 1 | May 4 | Apr 17 | Weeks 12 and 13 |
Quizzes follow the bi-weekly schedule of the homework, and cover the same (typically two week) set of lectures, and are also not cumulative. You can schedule your exam time starting the Reserve data (at 01:00h Central time per CBTF standards) via the PrairieTest site. Each exam will be a session of 50 minutes. These are in-person exams.
Under exceptional circumstance, accomodations may be made by course staff upon written request (also see email etiquette) with proof of exceptional circumstances to allow for online exams for fully-remote students not residing in Urbana-Champaign for the full length of term. Again, proof of such cirumstances will be required as this must be need-based and is not an elective choice for Urbana-Champaign based students who are expected to test at the CBTF facility in person. Requesting online testing when you were able to attend the CBTF in person may be treated an academic integrity violation with its full consequences.
Please consult the PrairieTest and CBTF site sites for full details.
The course has no formal prerequisite.
Prior to taking this course, students should have:
The course is delivered primarily online and tested online. Students use Single-Sign-On with the University of Illinois ‘netid’ to access
In the past, CBTF Online quizzes used CBTF proctors for student identity verification. In the Fall 2022 term, we switched to PrairieTest for (on-line) exams/quizzes. If a project is elected, it requires a (recorded) presentation.
This course offers office hours from different members of the course staff that are held at throughout the week at pre-scheduled times shown above.
For class discussion, we will use a GitHub repository and its issue system. This forum will be private and restricted to those in the course.
It is very important that each student
Before you start writing an e-mail to a member of the course staff please make sure your question is not:
But please ensure your e-mails meet the following criteria:
@illinois.edu
account. [STAT 447]
instructors@stat447.com
(or if you prefer
help@stat447.com
).We try our best to respond within 24 hours. Homework questions sent the same day homework is due will likely not receive a response before the homework is due. Plan accordingly.
Make sure the email does not contain homework code. The campus rules on academic integrity apply to all communication, including email.
Lastly, professional tone and written style matter, in email as in other written communication. Proper titles when addressing recipients is common style and recommended.
Please see the FAQ item on for hire tutors.
As an on-line course there is no attendance count. You are strongly encouraged to follow all the lecture slides and video, study the readings and possibly some or most of the extra readings. Most importantly, you need to try the examples and code we show, and experiment with it. As a proxy for class participation, we consider participation in the Github issue topic discussion which, for an online class, is the closest we have to class discussions.
Homework assignments serve as a way to interact with the material outside of the classroom. Homework will be due at either 10:00 AM on the assigned due date, which should generally be Thursday. We score the mean of the top five homeworks, i.e. with the lowest homework score being dropped. As this gives one automatic “out”, late homework will generally not be accepted.
In general, there will be no exceptions to this policy. Please start early, make sure your environment is working correctly, and that you are able to produce a working document. We have a teaching assistant with on-campus office hours, but in order to ask meaningful questions you need to try answering the material first.
While working on homework, students are encouraged to study in groups. But students should strive to independently supply answers to the homework problems. As we use an automated platform, submissions can be compared easily. Academic integrity standards apply.
Each homework will be distributed via PrairieLearn where you are identified via your NetId so your submissions will be stored as combination of your NetId and the question.
Here are a few do and don’t tips for the PrairieLearn web submission. Consider the following stanza from an actual homework:
# Enter your code below: Do not alter the function signature:
# ensure it remains named 'iris_summary' and takes one argument.
# Ensure you return a data.frame as indicated in the question.
iris_summary <- function(irisdata) {
# Enter code here
}
Consider the following recommendations carefully:
# Enter code here
. irisdata
object. The function signature clearly states that that
is the (only) input you need and are given.data(something)
. You do not need to load anything
(unless specifically asked when a question is about data loading or saving).irisdata
, do not deviate to iris
or iris_data
or
any other form. Do write code to match the name exactly.data.frame
do not return a matrix
or data.table
. Return a data.frame
.Each homework assignment will be a variable number of points; however, each homework assignment will have equal weight towards your final grade.
We count best five out of six. In other words the grade of your worst home work does not count.
Instead of examinations, there will be to five weekly quizzes—see the section Schedule. The quizzes, just like the homework, will (generally) focus on the preceding (two weeks of) lectures and are (generally) not cumulative over the full course content.
And just like with the homework, you can drop one quiz grade over the course of the semester. We aim for five quizzes in total, and with the lowest quiz score being dropped the score will be the mean of the top five quiz scores.
The policies of the CBTF are the policies of this course, and academic integrity infractions related to the CBTF are infractions in this course.
If you have accommodations identified by the Division of Rehabilitation-Education Services (DRES) for exams, please take your Letter of Accommodation (LOA) to the CBTF proctors in person before you make your first quiz reservation. The proctors will advise you as to whether the CBTF provides your accommodations or whether you will need to make other arrangements with your instructor.
Any problem with testing in the CBTF must be reported to CBTF staff at the time the problem occurs. If you do not inform a proctor of a problem during the test then you forfeit all rights to redress.
There are several components associated with the final project:
Project Proposal: The repository should contain an outline of what is planned, the sources of the data, possible transformation and possible modeling strategies and/or possible data visualizations. This can be provided via the README.md file of the repository.
Project Report: The project report can be thought of as an (informal) paper. Guided by the format of an academic paper, it describes the projects in a succinct yet complete fashion along with references. Markdown should be used to write it, the result can be either in html or pdf format.
Project Presentation and Slides: At the end of terms, a short recorded video presentation, akin to a lightning talk, should introduce, present and summarize the work of the project in a form that is suitable for a general audience. A length of five minutes is a goal. The presentation should be supported by five to six slides, also produced in Markdown.
Evaluation of Peers, and Evaluations from Peers (if done as a Group Project) : We require a short informal statement of each team member briefly stating who within in the team did (roughly) what percentage of the work.
The Project provides an excellent opportunity to “shine” and to demonstrate your passion, skill, and capabilities for data science programming work. It provides a great chance to make a mark to create something special and distinguished.
The group projects have to be registered by Spring Break. This avoids a last minute rush deciding to do a project in weeks, say, twelve to fourteen which is likely to be underwhelming. A project requires commitment and recurrent work throughout the term. The sooner it is started, the better are the chances of it being amazing.
The group projects have to be finalized by noon (12:00h, Central) time on the due date which is Reading Day, May 8.
There are no midterm or final examinations in this course. Instead, we have homework, quizzes, and a an optional (but recommended) project.
Late work will not be accepted for either homework or the group project. <!–Watch the deadlines, and plan accordingly.
As the date and time of an exam is chosen by a student over an examination window, there will be no make-up exams administered once the window closes.–>
Type | Weight |
---|---|
Homework | One Half |
Quizzes | One Half |
Type | Weight |
---|---|
Homework | 40% |
Quizzes | 40% |
Project | 25% |
so we score out of 105% and the project permits to gain some extra credit.
Grading is discretionary, and performed by the instructor and the teaching / course assistant(s). There are no retakes; we mark ‘best five out of six’ for homework and quizzes so everybody gets to drop one each.
Minimum Grade | Points |
---|---|
A- to A+ | 90 to 100 |
B- to B+ | 80 to 89.99 |
C- to C+ | 70 to 79.99 |
D- to D+ | 60 to 69.99 |
F | below 60 |
Each ten point range is equally split over the three components (i.e. from minus to plus). Grades may be curved at the end of term before being finalized.
The official University of Illinois policy related to academic integrity can be found in Article 1, Part 4 of the Student Code. Section 1-402 in particular outlines behavior which is considered an infraction of academic integrity. These sections of the Student Code will be upheld in the STAT 430 classroom. Any violations will be dealt with in a swift, fair and strict manner.
You may discuss methods for completing assignments with other students, but the execution of these methods and the preparation of the document must be done independently. Furthermore, there can be no discussion with other students or collaboration of any kind on exams. Sufficient evidence of sharing results, collaborating on written assignments, or simply relying on internet resources will generally result in:
If the evidence is indicative of a larger pattern, then the harshest penalty will be pursued.
Note that cheating includes both obtaining others’ work, as well as distributing your own work.
If we detect academic integrity violations, we will contact you through the FAIR system.
In short, please do not cheat.
As members of the Illinois community, we each have a responsibility to express care and concern for one another. If you come across a classmate whose behavior concerns you, whether in regards to their well-being or yours, we encourage you to refer this behavior to the Student Assistance Center (333-0050) or online. Based upon your report, staff in the Student Assistance Center reaches out to students to make sure they have the support they need to be healthy and safe.
Further, we understand the impact that struggles with mental health can have on your experience at Illinois; significant stress, strained relationships, anxiety, excessive worry, alcohol/drug problems, a loss of motivation, or problems with eating and/or sleeping can all interfere with optimal academic performance. We encourage all students to reach out to talk with someone, and want to make sure you are aware that you can access mental health support at the Counseling Center or McKinley Health Center. For mental health emergencies, you can call 911 or walk-in to the Counseling Center, no appointment needed.
To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, you may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to the DRES website.
The instructor reserves the right to make changes that are academically advisable. Such changes, if any, will be announced in class. Please note that it is your responsibility to attend the class and keep track of the proceedings.