Getting back to the Obviousity scores we looked at a couple of days ago, the lessons to be drawn even from my simple experiment go beyond just reinforcing the need to follow the professional item-writing principles, like those I recommended a few months back.
Yes, MOOC developers should avoid true/false questions and do a better job at not telegraphing the right answers to multiple-choice questions (ideally as a first step towards paying more attention to assessment quality and polish overall).
But I think we can see an answer to other questions regarding MOOCs from these relatively simple Obviousity calculations. For example, why are MOOC classes that focus on science and technology subjects still generally considered to be better (or at least more challenging) than massive courses in social sciences or the humanities?
It may be that MOOCs on topics like computer science have just been around longer, which means they’ve benefited from being in the field where student input and teacher experience contributed to their ongoing improvement. Or perhaps the teachers and students drawn to a technology-based educational platform are simply more inclined to put energy into a class covering a scientific or technical subject.
But I would make the case that assessments associated with courses whose content intersects with mathematics lend themselves to open-ended test items requiring calculation, test items that are intrinsically more challenging than item types such as four-response multiple-choice where the baseline score for random guessing is going to be 25% (vs. ~0% for open ended items which reward almost nothing for guesswork). The low Obviousity score for my statistics class for example (vs. the higher score for the two humanities classes I analyzed) likely derives from the nature of the material as much as it does from decisions by course developers regarding how they would assess learning of that material.
The degree to which including True/False items in a test drives up Obviosuity scores helps explain why professional test developers eschew this item type. And given how much inclusion of different item styles (including not just multiple-choice but also multiple response and matching) decreases the chance of getting a question right by throwing darts at the screen, why not use as many item variants as possible to make assessments more interesting (and, again, more challenging)?
While peer-reviewed writing assignments present their own problems, a course which bases final grades on a mix of automated quizzes and peer-scored essays also stands a better chance of giving students the means to put their learning to work than do less balanced classes that base grades entirely on test scores. And some interesting variants of rubric-based self-scoring (especially in my Science and Cooking class where you self-evaluate your own short write ups of kitchen experiments) demonstrate interesting options for using open-ended assessment with non-mathematical material.
I understand that many MOOC professors already concerned about high drop-out rates might not care if it’s too easy to pass their courses (with harder testing being perceived as yet another barrier that might cause more students to quit a class). But for those of us who finish our classes, an 82% final grade in a course that really challenged me was far more satisfying than the 97% I got in a course that didn’t. So who’s to say whether more difficult assessment might increase vs. decrease engagement levels?
At the very least, the time it would take for course developers to put their own tests through the type of Obviousity Index calculations I performed is probably 5-10 minutes per test tops (actually, it took me far less than this even with the time needed to retrieve the quarter I was flipping which kept rolling under my desk). And while such an effort is no substitute for a more rigorous item analysis that would flag obvious problems based on things other than coin flips and text length, any procedure that can stop crappy questions from getting into field use is a step in the right direction.
Elizabeth says
I just got my certificate from the Animal Behavior class on coursera, and I think they did a pretty good job in developing the test questions. Although you did get three tries, each try would vary the questions a bit, and would vary the answers a bit for the questions that did remain the same.
They had one peer review assignment – we had to take a scientific paper and write a popular science press version of it (modeled on The Conversation website).
I thought it was a good mix of testing, and especially now that you’ve posted these articles, I think they followed a lot of these guidelines. I honestly don’t remember if there were any true/ false, and you definitely couldn’t choose the longest answer – they were all long!
This certificate (and my cert from the Holocaust class, which was solely based on three papers) are the ones I’m most proud of.