So the good news is that massive courses have the technology and the audience needed to generate massive (or “Big”) data, enough data to give course developers the statistics they need to refine and revise testing so that it more capably screens those who know from those who don’t. And, if combined with some of the test-planning and item-writing principles outlined earlier this week, each generation of MOOC-delivered assessments can be better than the last.
But just as quality can be assigned to test questions, it can also be assigned to data. And we need to temper our enthusiasm for this use of “Big Data” with an understanding of some of the quality issues related to data derived from thousands of people taking tests in hundreds of differing environments.
When professional test developers gather similar data to validate tests like certification exams or the SAT, one of the key principles for data collection is that the individuals taking the test being validated need to be working in consistent environments.
For example, when we did beta tests for certifications I once created covering Digital Literacy skills, students taking those beta exams were all working under controlled conditions (i.e., taking the test in standardized testing centers that used the same computerized delivery system, with each student having to finish the exam within the same allotted time).
Most students don’t realize it, but when they take the SAT at least one section of the exam does not count for their score. Rather, it consists of beta test questions that are being calibrated for use in subsequent versions of the exam. And what better way to ensure that these questions are being taken in the same environment as the actual SAT than to make them part of an actual exam experience already standardized across all student populations?
But when I’ve taken exams delivered via any of the free online courses I’ve enrolled in so far, no similar controls are in place. There are no time limits for the exam (meaning one person can answer all the questions in ten minutes, while someone else might take ten days). And because there are no security measures in place, there is no way to separate those who answered the questions using their own internalized knowledge from those who looked up every answer on the Internet. And in situations where students are given the chance to try the same exam more than once, it’s pretty much child’s play to refine your answers with each test iteration until you get 100%.
So even if we tighten the bolts on some of the assessment development processes used to create quizzes and exams for widely delivered classes and use data being collected to see how well test items are performing, we need to keep in mind that tests delivered in multiple environments will never have the same level of rigor you would find in an exam built from start to finish using professional test-development principles.
But we also need to keep in mind that this level of rigor is not necessary for assessments to play an increasingly important role inside MOOCs, especially since this role for assessment can legitimately vary from course to course.
For instance, several of the courses I’ve been taking are clearly created by people passionate about a particular subject who want to minimize barriers that might limit the number of students exposing themselves the material being taught. And one of the ways of doing this is to include quizzes that anyone who has watched a lecture can answer easily (i.e., quizzes that just include simple comprehension questions that may not follow the rules of professional item writing). In those situations, really hard tests designed to screen out those who have not met a sufficient level of mastery may turn off students who could still get some value by watching and reading all of the material associated with a course.
Since the cost for creating a certification grade exam is probably comparable to the cost of creating a MOOC course (and given that the ability to deliver MOOCs in widely varied environments are a virtue of this teaching method) there is not likely to be an incentive to put all of the effort needed to build a fully validated exam into most MOOCs (unless, perhaps, some pathway is found to allow students to take a rigorous final exam for actual credit).
But this doesn’t mean that we can’t start applying some of the principles I’ve been talking about all week (test planning based in input from subject-matter experts, question development based on sound item-writing principles, use of data to screen out performing vs. non-performing items), even if we are “just” using these techniques to significantly improve MOOC-delivered assessments, rather than perfect them.