by John Rizzo, August 2008.
History and Context
Ultimate Moderation Process for JavaBlackBelt (UMP).Over the years, we had to invent an efficient way to make thousands of people contribute on thousands of questions.
The purpose of UMP is to automate even further the question moderations.
The first version of JavaBlackBelt introduced the notion of moderator (who handle every question improvements manually).
The second version of JavaBlackBelt requires question authors to react on user's comments before a moderator comes in. It also introduces question versioning.
The next moderation process (with v2), named YAMP, replaced the moderator's intervention by the crowd's opinion. We took the vote results into account to automate the quality review, and we ask more specific opinion to users during tests.
This fourth moderation process (with v3.3), named UMP is the one used now on the platform. It introduces the notion of problem and of move proposal (see below). A major consequances of UMP over it's predecessor (YAMP), is to move much more questions to the freezer instead of leaving some forever in the beta or repair zone. Another benefit is to better educate the contributor to what makes a question good. It should also help exams to have much fewer tricky questions (questions on details or attention that are usually too hard for the exam level).
The moderator intervenes for building exam's objectives, and for unusual interventions in the questions lifecycle.
Differences with the previous moderation system.
For those used to YAMP, we can point the main differences quickly:- one less zone: repair zone disapear, and any question in the beta, released or frozen state can have "manual works" open (edit/move proposal or problem).
- zones are now named "state". Incubator is now "beta".
- replacement of the star vote system with a binary +/- vote system.
- edit proposals can be approved/rejected by anyone (not only by the question author), with a +/- vote system.
- introduction of move proposals (to move questions to other exams).
- introduction of problems.
- comments have no influence on the moderation process anymore ("nice question" comments are welcome now ;-)
- english checks, objective check and light check process have been removed.
States
A question is in one of the 3 existing states:- Beta: the quality is being checked. New questions start in this state. For beta exams, the system selects questions in the beta state.
- Released: the quality has been checked and is good. Good questions end in this state. For real exams and samples exams, the system selects released questions. When a beta question has 10 more positive votes than negative votes, it's released.
- Frozen: the quality has been checked and is bad, or some pending manual work has to be done on a beta question and nobody did it. Questions in this zone are not shown to anybody druing an exam. Their only chance to go back to the beta state is that some contributor improve them. When a question does not get 10 more positive votes than negative votes, "fast" enough, it is frozen. When a question has been viewed more than 15 times since it has an open manual work (and nobody fixed it), it is also frozen.
Vote on quality for beta questions
Every registered user is welcome to vote (once) on any question, about its quality. You typically vote just after having taken an beta exam, from the result page where the questions are listed. It's quick because you already took time to read the question in order to answer it. You can also vote from the question detail page that you access from any question list.
You can vote to release the question (positive vote) or to freeze the question (negative vote).
When the amount of positive votes minus the amount of negative votes reaches 10, the question is released.
On the contrary, when this difference reaches -10, the question is frozen.
Note that some power users's vote count for more than one.
Simple example:
- a question is created 2008-9-01
- 2 weeks later, 15 people voted to release the question, and 5 people voted to freeze the question. 15-5 = +10.
- the question is released and will remain in the released state (it can not come back to beta).
Improvements: Manual Works
The vote process described above works together with an improvement process. The moderation process is not only about filtering. Every exam taker has the possibility to improve existing questions.
There are 3 kinds of manual work:
- problem
- edit proposal
- move proposal
Problem
To help exams consumers (and future question authors) to detect what differentiates a bad question from a good question, we have created a list of typical problems, including the following:- code too easy to copy/paste/execute
- missing explanation
- tricky question
- another contributor "cancels" it (because the problem would not be justified).
- a contributor fixed the question by editing or moving it (the problem is "solved").
Edit proposals
A versionning system on JavaBlackBelt enables anybody to edit a question. That new version of the question is subject to approval. A vote system enables contributors to approve or reject the new version. Experience shows that edits not always improve questions. For example, a user adding 3 choices to the question can make it too difficult for the exam level. Another typical example is when the original author describes the problem an abstract way, and somebody puts the code described into the question (which is too easy, not interesting anymore).Only one edit proposal can be active at a time. It has to be accepted or rejected before somebody can edit the question again.
Move proposals
It's now very easy to move a question to another category of the current question exam, or to another exam.This would typically happen when a too difficult question needs to be moved to a more difficult exam.
Questions are not moved immediately. As for the edit proposals, a vote system enables other users to accept or reject the move.
If you don't know where to move a question (because there is no existing appropriate exam, for example), report a problem of type "Wrong category or exam".
Complex example
- a question is created 2008-9-01, and it's poor.
- the author immediately notices a typo and edits his question. A new version of the question is created (v2) and it's automatically immediately accepted (vote system bypass) because nobody else has edited the question in the meantime.
- 2 days later, it has 1 positive vote and 5 negative (freeze) votes (1-5 = -4).
- At that point, somebody reports a problem: the question is too easy to copy/paste in the IDE and can be answered without understanding the question.
- Because the question has an open manual work (the problem), it's less likely to be selected for beta exams.
- 2008-9-10, a contributor scans the "Manual work questions" of the concerned exam and sees the question with an open problem.
- He edits the question to make it less easy to copy/paste and he check the box "reset votes counters".
- In the next couple of days, 2 other contributors vote on the edit proposal which is approved. The problem is closed, the question has no vote and it's taken by the select algorithm for people taking the beta exam.
- 2008-9-30, the question has 8 positive votes and 7 negative votes. 8-7 = 1.
- 25 persons "viewed" it since te last vote reset, 2008-09-10 and it seems there is no enthusiasm to release it (neither anoter attempt to improve it). That's too much: it goes to the freezer.
- 2009-01-15, the exam leader of the concerned exam wants more questions form the concerned category. He looks at the frozen questions and think that our frozen question is not so bad. He has an idea to improve it. He makes an edit proposal. The question in now in the beta state again, with reset vote counters.
- 2009-01-20, 2 contributions voted on the proposal which is accepted.
- 2009-02-15, the question has 13 positive votes, and 3 negative votes. 13-3 = +10. The question is released.
Transition to the new moderation process.
The Yamp to Ump transition happens as follow:- all non-frozen questions start in the beta state as soon as v3.3 is deployed, even questions that were in the repair and exam zones.
- questions that were in the repair zone because an edit proposal was pending, now have an associated manual work (in the beta state).
- questions that were in the repair zone because of an open comment, have no special manual work associated because of the comment (they are just in the beta state).
- questions that were in the freezer zone have now the frozen state.
To summarize: all question get a clean fresh (re)start in the beta state. But... we remember in which zone were these questions, to help the question selection algorithm as follow: For released exams, the algo first try to get:
- released questions (there will be none just after the deployement), then if no enough questions found,
- beta questions that were in the exam zone in Yamp (there are plenty), then if no enough questions found
- beta questions, fresh new or that were already beta (in the incubator) before the deployement. We might use this case for new non released exams.
So in practice, right after the v3.3 deployement, the question selection for real exams, is nearly the same than before the deployement.
But these questions (that were in the exam zone in YAMP) are also taken for beta exams because they have a beta state. The impact is that they might go the the freezer after a few weeks. They might also move to an other exam (move proposal) or improve (edit proposal). The general consequence (that we hope will happen), is a reselection of questions that will kill (or improve) the too hard tricky questions that where in the exam zone of too many exams, and much more quickly freeze new beta questions that are tricky too.
That's the main reason leading us to develop UMP, in fact.
Questionnaire questions selection.
JavaBlackBelt supports 3 types of exam:
- Beta
- Released
- Sample
Categories quota
For all exam types, we try to balance the amount of question selected accross the exam sub-categories, according to the per-category amount specified (visible in on each exam page).Example: an exam has 3 categories (Collections, IO, Threads). The exam defines an amount of question asked for each category:
- Collection: 2 questions
- IO: 3 questions
- Threads: 1 question.
Open Manual Work
In all exam types, we try to avoid the selection of questions having a pending edit/move proposal or an open problem. If we did show such questions (with open manual work) to other users during an exam (beta or released), exam takers might end up doing the same remark as the open manual work. It'd be a waste of community energy (that we prefer to put on other questions until the manual work is closed).If there are not enough questions with no open manual work, the selection algorithm try to get some with open manual work anyway.
Too few questions in a category.
For a real exam, if there is a lack of released questions in a category, we try to get some of the best (votes) beta questions.For a beta exam, if there is a lack of beta questions in a category, we try to get some released question.
After having selected questions in the other state, we still can't find enough questions, then the exam will contain fewer questions.
Real exam.
Within the priorities described above, we try to give a random selection of questions to the exam taker of a real exam.We don't try to avoid previously seen questions.
Example: the DB contains...
- 5 released questions for Collections category (from which 2 questions will be selected),
- 2 released questions for the IO category (from which 3 questions will be selected),
We need one more. 2 questions have an open edit proposal, and the 2 other questions have an open problem. We randomly select one of of the 2 questions having no edit proposal (but an open problem). We have enough questions.
In the IO category, one of the questions has no open manual work, and is selected first. The other question is then selected but we still miss a question. We take a question from the beta state. If there was no question in the beta state, we would have selected only 2 questions (instead of the 3 needed) for that category.
Beta exam.
For beta exams, we try to avoid selecting questions that have been seen by the user during a previous attempt. If we can't find enough non viewed (yet) questions, we take some from the already viewed questions. So, you should get the same question twice, only if in that category, you've already seen all the other beta questions.It should help users learning for an exam by taking beta exams, to learn from as many different questions as possible, and to contribute to as many questions as possible.
This feature is especially important for programming tasks exams (because answering a question is quite long, and if you have already had a question, it's painful to see it again).
Every other criteria being even, we prefer older questions, because they are more likely closer to the released or frozen state, and it's more efficient to focus the community ennergy on a few questions (that will quickly move forward) than spreading it accross too many questions (that will slowly and heavily move togheter). That's for beta exams. In released exams it's differnt because questions are not supposed needing much improvement/filtering and because we prefer a random selection to reduce cheating (for two coworkers, for example).
Example: the DB contains...
- 4 beta questions for Collections category (from which 2 questions will be selected),
- 1 beta question for the IO category (from which 3 questions will be selected),
We still need 2 question. 1 questions has an open edit proposal, and the 2 other questions have an open problem. They all have already been seen by the user in a previous beta exam. But we don't have other beta question to select and we take the oldest of the 2 questions having no edit proposal (but an open problem). We have enough questions.
In the IO category, we select the only beta question (should we have already seen it or not). Then we take one of the 2 questions from the released state (the one having no open manual work).
Sample exam.
Users have the possibility to take a sample exam with maximum 5 released questions. The result does not count for belts, as with beta exam.The purpose of sample exams is to give a sample of the released questions, for the exam taker to have a better idea of what to expect.
For sample exams, we select maximum one question from 5 categories. We return (nearly) always the same questions to all sample exam takers (for the concerned technology), to avoid people to use sample exams to "browse" the released question set.
Basically, for a category we select a questions having no open manual work (if any), with a static order.
View Counter
We maintain a counter, for each question, how often they have been displayed in a test.We increment the counter, max once by passed test (if we redisplay the question during a test, we don't count 2).
We only increment the counter when the question is shown to the user. Being selected for a questionnaire is not enough. If a test is aborted by a user, the question might not be shown. If the user has a score weaker than 40%, then we don't count views for that test.
Anonymous users count.
Abandoned Questions
In the previous moderation process, we had the notion of "abandonned questions".
Typically a abandonned question has an edit proposal pending (waiting for approval) but the question author does not respond (to accept or reject) it.
With UMP, the new moderation process, the author does not "own" the question anymore and we don't want to require him work long after having created the question. With the new edit proposals, people (anyone) can vote to accept/reject them.
Result/question
Data Collection
For each question, the system remembers how many times it has been correctly answered, badly answered, not answered.
These counters are only persisted in the question objects at the end of the test (result computation) if:
- 80% of the questions in this questionnaire have been answered
- the user has at least 40% of good answers.
A question success rate (QSR) is the number of correct result / (bad + unanswered + correct)
The average question success rate of an exam (AvQSR) is the average success rate of all the released questions for that exam.
Detection of inadequate questions
These results are the opportunity to detect questions that are too hard or too easy for the exam.
If an exam AvQSR is 70%, then a question having a QSR of:
- 95% is too easy,
- 10% is too hard.
A question QSR should be considered according to it's exam AvQSR, because some exams must be easy and other very hard. We compute the distance between the question QSR and its exam AvQSR.
We cannot tell that the 10% extreme questions regarding the AvQSR should be excluded from the exam, because there are always 10% extremes and this would empty the exam.
Instead, we trend to exclude (manually) questions with QSR outside the range |AvQSR-20%, AvQSR+20%|.
In a later version, the system may auto-create a problem when the QSR is out of range.

