Week 3

Zoom Link: https://ualberta-ca.zoom.us/j/97755262261

This week we’ll be discussing data cleaning! As biologists, an inordinate about of our time is spent preparing data for analysis. This process is usually known as ‘data cleaning’ and while this is somewhat of a trial and error process, there is a method to the madness.

Conceptual Overview of Data cleaning

Data cleaning can mean many things, but one indisputable fact is that the better the data were initially collected, the less need there is to clean them. Often we as biologists collect and enter our own data, but even if we don’t, it’s important to know how streamlining this collection process than have payoffs down the road when it’s time to clean and analyze those data. In this video Emma will introduce the conceptual overview of data cleaning in R, then discuss some ways it can be avoided.

Data Cleaning as a Process

Data cleaning is hardly ever a single event where you sit down at your computer and by the time you get up to get coffee the data are cleaned. It’s often an iterative process that will likely be a few iterations to get right. Cleaning different kinds of data also employ slightly different methods. As a whole, data cleaning is truly a skill that you as a biologist would be well-advised to spend time on developing. Until we have computers that can clean all our data for us, it’s a big part of our jobs. In this video Cole will define a bit more what ‘clean’ data look like, and expand on the process of cleaning data itself.


Content Summary

Data to Follow Along

week_three_data.csv