Data relating to MOOC activity has been trickling out for quite some time. For instance, the University of Edinburgh released this 42-page report detailing their analysis of statistics related to six courses they released via Coursera in 2013. And data related to edX’s popular Circuits and Electronics course has been making the rounds for quite some time.
But if you’re looking for an easy-to-understand, straight-to-the-point lesson on how to interpret data generated from a MOOC class, who better to provide it than someone who has been teaching a MOOC on the very subject of metadata?
Jeffrey Pomerantz from the University of North Carolina at Chapel Hill offered a Coursera MOOC entitled Metadata: Organizing and Discovering Information in 2013. And part of doing business with Coursera involved receiving weekly downloads of statistics related to activity in his class, data the professor shared with students (also on a weekly basis) so that the class could use their own behavior as a subject of study.
Pomerantz recently summarized some of his findings in a series of four blog entries which start here. And while even the author admits that his work so far represents a starting vs. an end point for discovery, there are some important lessons to be learned, even through what should be considered an early entry into a “Big Data” story sure to grow longer and more interesting.
To begin with, his stats help settle (or at least inform) one of the big issues hanging over the head of MOOC enthusiasts: high drop-out rates (sometimes as high as 95%). I’ve commented frequently about why using the number of people who just sign up for a MOOC in the denominator of our fractions when calculating MOOC attrition rates might be an error (given that signing up simply represents the number of people willing to register to get something of value for free).
Pomerantz puts it a bit differently, claiming that since there is no penalty for signing up for a MOOC and never attending that we should not be so hung up on what Coursera refers to as the number of “Total Registered Students.” But he shares my attitude that treating a student who supplies their name and e-mail address to Coursera (or some other MOOC web site) should not be considered the equivalent of a student who enrolls in a class at a brick-and-mortar university.
So if we’re not going to divide course completion numbers by total registrations to calculate a MOOC attrition rate, what statistic would make a better denominator? Pomerantz walks through a few candidates including Total Active Students (students who have logged onto the site at least once after registering, which increases the pass rate from 5% to 10%), unique students who have watched at least one video (which increases pass numbers to 15%), and students who have completed at least one assignment (which jacks up the pass rate to a far-more-respectable 48%).
This last figure jibes with one of Anant Agarwal’s key talking points when confronted with questions regarding MOOC attrition rates in which he claims pass rates climb towards 40% when you assume that only students who actually complete an assignment (even a light quiz given on Week 1) should be considered serious enrollees. And if you take into account that many of the students who might enroll in a course but not do any assignments might be auditing the class, then you begin to get a better picture of what people are actually doing when they take part in a MOOC (other than dropping out).
Now this more informed picture is a two-edged sword for MOOC boosters. On the one hand, it gives them strong evidence to counter critics who like to hammer on drop-out rates to “prove” MOOCs are educationally worthless. At the same time, it’s hard to continue using huge front-end MOOC enrollments to impress the media, public and investors and then turn around and minimize the significance of that number when it comes time to calculate “true” completion statistics.
I’ll be returning to the subject of statistics again, but I highly recommend you read through all four parts of Pomerantz’s analysis which takes a look at video-viewing and discussion-forum statistics as well as information related to overall rates of participation.
And as you do so, keep in mind the distinction the professor makes between descriptive statistics (i.e., data deriving from observing a phenomenon) vs. more powerful predictive results you would get from a controlled experiment. At this stage, descriptive data is probably all we have and, as Pomerantz’s series highlights, it can be powerfully informative. But I look forward to the time when MOOC classes are designed with the type of controls and variables that would help better demonstrate what they actually accomplish.