Report

Project
Modified

January 22, 2025

Your written report must be completed in the report.qmd file and must be reproducible. All team members should contribute to the GitHub repository, with regular meaningful commits.

Before you finalize your write up, make sure the printing of code chunks is off with the option echo: false in the YAML.

The mandatory components of the report are below. You are free to add additional sections as necessary. The report, including visualizations, should be no more than 10 pages long. There is no minimum page requirement; however, you should comprehensively address all of the analysis in your report.

Be selective in what you include in your final write-up. The goal is to write a cohesive narrative that demonstrates a thorough and comprehensive analysis rather than explain every step of the analysis.

You are welcome to include an appendix with additional work at the end of the written report document; however, grading will largely be based on the content in the main body of the report. You should assume the reader will not see the material in the appendix unless prompted to view it in the main body of the report. The appendix should be neatly formatted and easy for the reader to navigate. It is not included in the 10-page limit.

Report components

Introduction

This section includes an introduction to the project motivation, data, and research question. What is the context of the work? What research question are you trying to answer? What are your main findings? Include a brief summary of your results.

Data description

This should be inspired by the format presented in Gebru et al, 2018. Answer any relevant questions from sections 3.1-3.5 of the Gebru et al article, especially the following questions:

  • What are the observations (rows) and the attributes (columns)?
  • Why was this dataset created?
  • Who funded the creation of the dataset?
  • What processes might have influenced what data was observed and recorded and what was not?
  • What preprocessing was done, and how did the data come to be in the form that you are using?
  • If people are involved, were they aware of the data collection and if so, what purpose did they expect the data to be used for?

Data analysis

Use summary functions like mean and standard deviation along with visual displays like scatterplots and histograms to describe data.

Provide at least one model showing patterns or relationships between variables that addresses your research question. This could be regression or clustering, or something else that measures some property of the dataset.

Evaluation of significance

Use hypothesis tests, simulation, randomization, or any other techniques we have learned to compare the patterns you observe in the dataset to simple randomness.

Interpretation and conclusions

What did you find over the course of your data analysis, and how confident are you in these conclusions? Detail your results more so than in the introduction, now that the reader is familiar with your methods and analysis. Interpret these results in the wider context of the real-life application from where your data hails.

Limitations

What are the limitations of your study? What are the biases in your data or assumptions of your analyses that specifically affect the conclusions you’re able to draw?

Acknowledgments

Recognize any people or online resources that you found helpful. These can be tutorials, software packages, Stack Overflow questions, peers, and data sources. Showing gratitude is a great way to feel happier! But it also has the nice side-effect of reassuring us that you’re not passing off someone else’s work as your own. Crossover with other courses is permitted and encouraged, but it must be clearly stated, and it must be obvious what parts were and were not done for 2951. Copying without attribution robs you of the chance to learn, and wastes our time investigating.

Appendicies

You should submit your appendix(-ces) in the appendices.qmd file in your project repo.

  • At minimum, you should have an appendix for your data cleaning. Submit an updated version of your data cleaning description from phase II that describes all data cleaning steps performed on your raw data to turn it into the analysis-read dataset submitted with your final project. When rendered, it should output the dataset you submit as part of your project (e.g. written as a .csv file).
  • (Optional) Other appendices. You will almost certainly feel that you have done a lot of work that didn’t end up in the final report. We want you to edit and focus, but we also want to make sure that there’s a place for work that didn’t work out or that didn’t fit in the final presentation. You may include any analyses you tried but were tangential to the final direction of your main report. Graders may briefly look at these appendices, but they also may not. You want to make your final report interesting enough that the graders don’t feel the need to look at other things you tried. “Interesting” doesn’t necessarily mean that the results in your final report were all statistically significant; it could be that your results were not significant but you were able to interpret them in an interesting and informed way.

Organization + formatting

While not a separate written section, you will be assessed on the overall presentation and formatting of the written report. A non-exhaustive list of criteria include:

  • The report neatly written and organized with clear section headers and appropriately sized figures with informative labels.
  • Numerical results are displayed with a reasonable number of digits, and all visualizations are neatly formatted.
  • All citations and links are properly formatted.
  • If there is an appendix, it is reasonably organized and easy for the reader to find relevant information.
  • All code, warnings, and messages are suppressed.
  • The main body of the written report (not including the appendix) is no longer than 10 pages.

Evaluation criteria

Category Less developed projects Typical projects More developed projects
Introduction

Less focused and organized. They may jump to technical details without explaining why results are important.

Research questions are not clearly stated and/or results are not clearly summarized at the end of the introduction.

Provides background information and context.

Introduces key terms and data sources.

Outlines research question(s).

Ends with a brief summary of findings.

All expectations of typical projects + clearly describes why the setting is important and what is at stake in the results of the analysis. Even if the reader doesn’t know much about the subject, they know why they care about the results of your analysis.
Data description

Simple description of some aspects of the dataset, little consideration for sources.

The description is missing answers to applicable questions detailed in the “Datasheets for Datasets” paper.

Answers all relevant questions in the “Datasheets for Datasets” paper. All expectations of typical projects + credits and values data sources.
Data analysis

Code closely matches examples from class, and does not go much further.

Analyses selected are not clearly purposeful.

Preregistered analyses are not presented.

Code goes further than the examples presented in class.

Analyses selected are purposeful and further the data narrative, but questions raised are not adequately addressed.

Preregistered analyses are presented.

All expectations of typical projects + analyses are carefully selected to answer all reasonable questions.

Questions raised by one analysis are addressed in subsequent analyses.

Evaluation of significance Metrics of statistical significance are present, but not interpreted for the reader and/or relevant to the analysis performed. Metrics of statistical significance appropriate to the analysis performed are presented and are interpreted to some degree for the reader. Metrics of statistical significance appropriate to the analysis performed are presented and clearly interpreted for the reader. Limitations of significance metrics are acknowledged.
Interpretation and conclusions

Results are presented as numeric values and plots, with little to no written discussion.

Values are printed out of context, with no/few labels.

Values are interpreted in a way that is clear and addresses what the values mean and explain to some extent why they are important.

Values are printed with clear labels.

Interprets numeric values in a way that supports a clear story and conclusion creatively ties analysis together to present the results of the analysis through a well-written discussion.

Values are presented in context and with clear labels.

Limitations

The limitations are not explained in depth.

There is no mention of how these limitations may affect the meaning of results.

Identifies potential harms and data gaps, and describes how these could affect the meaning of results. Creatively identifies potential harms and data gaps, and describes how these could affect the meaning of results, as well as the impact of results on people.
Writing May have spelling and grammatical errors, or awkward or incomplete sentences, indicating that they were written in haste without editing. Language will be polished and free from errors. Writing is clear and complicated ideas are presented such that they are immediately understandable.
Organization and focus

Work appears to have been done independently by team members and then merged at the last moment.

Analyses may be exhaustive but carry little meaning or interpretation.

There is not a very clear story throughout the entire report.

Most elements of the project are clear and provide a connected conclusion. Some parts could have been removed to make the report more focused.

There is a clear story that flows throughout most of the report.

All elements of the project support a clear and connected conclusion. Every part is essential and cohesive.

There is a clear story to the entire report that flows throughout.