Lecture 2
Cornell University
INFO 2951 - Spring 2025
January 23, 2025
Q - What data science background does this course assume?
A - None! Sort of…
Q - Is this an intro stats course?
A - No. We presume you have already met the prereq and taken one of AEM 2100, BTRY 3010, CEE 3040, ECON 3110, ECON 3130, ENGRD 2700, ILRST 2100, MATH 1710, PAM 2100, PSYCH 2500, SOC 3010, STSCI 2100, STSCI 2150, STSCI 2200 🙄
While statistics \(\ne\) data science, they are very closely related and have tremendous of overlap.
Q - Will we be doing computing?
A - Yes! Lots of it.
Q - Is this an intro CS course?
A - No – you’ve already taken CS 1110 or 1112
Q - What computing language will we learn?
A - R.
Q: How is this course different from INFO 2950?
A: R rather than Python.
Q: I don’t want to learn R! When can I take INFO 2950?
A: Probably next fall.
Course operation
Doing data science
By the end of the semester, you will…
What does it mean for a data analysis to be “reproducible”?
Near-term goals:
Long-term goals:
Fully reproducible documents – each time you render the analysis is run from the beginning
YAML header to define document settings
Code goes in chunks
Narrative goes outside of chunks
A visual editor for a familiar / Google docs-like editing experience
Plain-text file format for easy editing and version control
More robust and flexible compared to Jupyter Notebooks
But you can still use the Jupyter engine to run Python natively
Important
The environment of your Quarto document is separate from the Console!
Remember this, and expect it to bite you a few times as you’re learning to work with Quarto!
Project-based workflows benefit from reproducible environments
More information: Introduction to renv
In order to pass the test, a movie must have
Inspiration: FiveThirtyEight
Instructions
ae-00-bechdel
.renv::restore()
to install the required packagesWarning
ae-00 is hosted on GitHub.com because we have not configured your authentication method for Cornell’s GitHub. We will do this tomorrow in lab.
GitHub is the home for your Git-based projects on the internet – like DropBox but much, much better
We will use GitHub (Enterprise) as a platform for web hosting and collaboration