Sunday, June 3, 2012

Qualitative & Quantitative Test Item Analysis

 
 
 
 
 
 
 
Qualitative & Quantitative Test Item Analysis
Megan Smith
EDU 645 Learning & Assessment for the 21st Century
Professor Griggs
May 2012
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
The psychology test consists of five questions total, without defined time limits or explanation along side test questions. Each of the individual questions contains it own uniqueness’s, but also they each contain there own sets of problems. Along with calculating the means, or average, of each questions, and the exam as a whole, I will be creating a graph to represent the mathematics and measures by which we can represent the outcomes of the exam on a visible scale. In addition, I will also be applying item theories and conduction both a quantitative and qualitative item analysis to show positive changes to be considered for bettering the overall functions and effectiveness of the text questions and components.
First we will calculate the means for each question and the text as a whole together. The means of the test questions are the arithmetic average of the scores, and the mean is calculated by taking the total number of scores for the test and dividing it by the sum of all the score. For this test, there were five questions worth two points each for 10 students, totaling 20 points each for each question. So for this example, the math is fairly simple; 100 points possible, for five questions, making total of 20 points possible for each question. So the mean score for the total, is the sum of all the correct answers divided by 100. In the textbook, the author states, “The mean has several characteristics that make it the measure of the central tendency most frequently used. One of the characteristics is stability. Since each score in the distribution enters into the computation of the mean, it is ore stables over time that other measures of central tendency, which consider only one or two score,”(p. 282). So it is almost as those the mean score calculation is one of the clearer and simpler ways, and for this and other reasons it has retained a steadfast part of the score calculation mathematics. After going through each question from the text, finding the mean score and analyzing the various elements of the text questions, and after this is completed I will show mean score values for both the text as a whole and the individual questions in a graph form towards the end of the essay.
For question one of the essay there for 1 out of 10 incorrect, or a total of 18 points out of 20 earned. So for the mean score we take the sum of all the points and divide it by the total amount of points possible to score, so M(mean score), would be M=18/20, which would equal 0.90 as the mean score. For this question the word choice was poor, “Who came up with psychosexual stages?”, the term, ‘came up with’ is vague in itself, there have to be other word choices that would be more suitable. Also the choices for the multiple choice are silly and shallow, Lady Gaga, as a choice, is really just a humorous distraction. Also, there is an asterisks next to Sigmund Freud as a choice, is also very much a distactor. In the textbook, it states, “Quantitative Item Analysis is the introspection on the quality or utility of item of testing or academia. It does so by identifying distractors or response options that are not doing what they are supposed to be doing. The quantitative item analysis procedures that we will describe are most appropriate for items on a norm-referenced test”(p.228). So for the choices being shallow and out of the ordinary, but also does not offer challenges or difficult choices. Also the second and third choice would only be the last name, which also does not make much sense of multiple choices, the answers should be clear and concise, not confusing and superficial.
For question 2, there were 14 points out of 20 points earned, so 7 out 10 scored the answer correctly. The mean score here would be 14/20=M. So the mean score would be 0.70 plugging it into the means calculation. For question 2, they use the same possible choices that were in question one, and the repeating these answer is not necessary and is also confusing and distracting for an multiple choice exam. And again, the word choice is bleak, using the word who ‘tested’ dogs, “Which psychologist tested classical conditioning with dogs?”, takes away from the concreteness of the possible answers. Many people could have tested classical conditioning, do the mean created? Little fundamental changes like this can make the questions more accessible and better for testing adequacy in the long term. Also they use word switches in the answer choices, shifting first and last names, and this only seems to take away more validity and possibilities with this exam, being that these are more distractors and are also very repetitious choices and random word flops. Perhaps this exam was not proof read, and perhaps if it had been answers would not have been repeated in the questioning. But when the choices are silly like these, the student may also suffer if they do not know the choice to pick, and this may throw them into even more confusions and distractions to take away from the focus to complete the text. Proofreading is important in many aspect
Always proofread, and usually make sure to have a third party proofread and edit as well.
As it says in the textbook, “A correct-for-guessing formula ca be used to penalize students for randomly answering test questions”(p.226). This can make or break the utility of the test, and we need to consider this when taking on that certain type of task.
Avoiding the mundane elements of the testing process can save time and help overall the function of the test to improve with each edit.
 
To begin with only one person got the test question points, so 2/20 points are what received, therefore, the mean score is 0.10. For question 3, the text takes a different turn from multiple choice to short essay answers. “Explain the difference between operant and classical conditioning and give an example of each.”. Not only does the text change without anticipation, from multiple choice to essay, but it repeats the same issues questions in the first two test questions. This is an unpredictable measure, and addition to this, the question does not poses breadth or depth, it just ask for explanations and asks for examples. This is part of the content validity of a test, the predictability and wavering requests. As it says in the textbook, it states, “Essentially, when we talk about qualitative item analysis, we are talking about matching items and objectives and editing poorly written items. It is appropriate to edit or rewrite items and assess their content validity after a test, as well as before it,”(p.234). The text goes on to say, “Check test direction. Check your directions for each item format to be sure they are clear. Directions should specify: 1. The number of items to which they apply. 2. How to record answers. 3. The basis on which to select answers. 4. Criteria for scoring.”(p.224). Therefore, to change test midway for no reason with no instructions or specifications. In addition to it repeating issues from questions one and two, the question is poorly stated. With lots of information in the to be tested on, they limit topics and then change formats. Especially considering that only 1/10 people answered correctly, it seems there is a lot to be analyzed of for this question. If essay question is the text the teacher wants them it needs to be specific more, like length required or details required. Or at least have instructions at the beginning of the text that explain the test format and why it is presented in this manner or this order.
For the fourth question the call all answered correctly, so there was a 10/10, which meant 20 points out of 20, making a mean score for this test question M=1. This question they manage to change test format again into a complete the sentenced multiple choice, or multiple choices were the sentence is completed. This type of test, finishing the sentence can be very effective, but to change format for the third time in the middle of the test seems out of the ordinary and does not flow well. In addition to this, one of the choices is all of the above, which has not been included in this test before, but seems to take away from the fluency and consistency of it. One of the answer choices has an asterisks and a … before the statement and this seems unnecessary and distracting. Also only the other two choices, punishment, or positive reinforcement. These responses are vague in the overall quality of choices and limits students ability to critical consider tough and probable answers. ‘The act of removing stimulus to gain certain behavior’ could be both punishment and negative or positive reinforcement considering the issue at hand. In the textbook, the authors state, “A correct-for-guessing formula ca be used to penalize students for randomly answering test questions”(p.226). So it seems that they vague word plays they have for finishing the sentence do not contribute to the strength or validity of the test, making it again distracting and unnecessary to employ those such choices out of all other word choices available for multiple choice finish the sentence test. For the students to take the exam seriously, they must be presented with serious legitimate choices for sentence completions. When test answers are treated with slack it will only influence students in the same manners. Perhaps the questions were typed incorrectly and not edited, and for these mistakes the text recommends pre-testing measures to assure this does not happen. In the text, it states, “Miskeying, When an item is miskeyed, most students who did well on the test will likely select an option that is a distractor, rather than the option that is keyed,” (p.232). Just as the same way the student will slack with confusing answers to chose from, they can just as easily not take the test seriously and instead answer sarcastically or without caring whether they are right or wrong. Adjusting these easy fixes ahead of time will keep the students taking the teacher seriously and studying for the test objectively.
For the 5th question, there was one student out of 10 that answered correctly. So the calculation for the mean score would be 1/10, or 2/20, or M=0.10. This time the question format is fill in the blank, and so this test has utilize four test methods all together for only five questions. This question is short answer, stemming from the start of sentence, “Psychology is ____”, and only gives one option to choose from, *the study of the psyche, the only choice of answer for that vague and open winded question. This very easily could have been test construction error, which may not have been noted in time, however, this quickly takes away from the challenge and the legitimacy of the test as a whole. Also, why do they insist on have asterisks in the word choices. The author in the text states, “Just as you can expect to make scoring errors, you can expect to make errors in test construction No test you construct will be perfect-it will include inappropriate, invalid or otherwise deficient items. In the remainder of this chapter we will introduce you to a technique called item analysis. Item analysis can be used to identify items that are deficient in some way, thus paving the way to improve or eliminate them, with the result being a better overall test,”(p.227). A mistake like this one, with only one choice available was most likely an error and/or a typing error, and does not make much sense in the way it has been placed, as the last text question. But, in addition, expecting a end of the sentence filler for the statement, “psychology is ____” is too broad a statement, and even with one choice, seems tough question to fill in so briefly. This question could work as multiple choice but should be more specific with what it is asking of for clarities purpose and to make test more legitimate and adequate.
As the textbook helped to explain, item analysis becomes a construct by which teachers can test their own tests. In addition to anonymous grading measures, freeing from preconceptions and unfairness, correct our own tests can be revolutionizing for educators, realizing that the outcome of the students rely on the manner the teachers use to test students with, and measurements that those tests produce. What would make an answer key effective and logical for structure test effectiveness? “Appropriate changes and upgrades and applicable objective structures, such things as listing test questions out of order, but wastes of time and uncertainty are identified and avoided quickly,”
(p.231). This can only be determined through trial and errors of tests, and practicing tests, and constantly upgrading and revamping measures within the test itself. Teachers must constantly edit their lesson plans, their techniques and their testing methods in order to achieve improvements and stronger statistics. Practicing to employ and develop the answers through a medium which is legitimate to the issues and purposeful and functional for growth and development and keep it consistent and useful.
For the entire test, the 50 question, there were 35 questions that were correct, and since they were each worth two points the mean score measurement will by 70/100. So the mean score for the entire text is 0.70 overall. Even if you calculated with 35/50 for the M, it would still calculate to be the same amount, M=0.70. Overall mean score for the entire test M=0.70. This statistic can be used as the test measurements and methods are improved and innovated as the test structure improves and becomes more efficient. The qualitative and quantitative item analysis are essential for the improvements and innovations for testing procedures, and educators must be critical and detailed in manners of improvements and functionality of test changing processes.
 
 
Overall mean score M=0.70 for the entire test.
questions Mean Score M= Average Score Points Average
1
2
3
4
5
 
 
 
 
 
References:
Kubiszyn, Tom & Borich, Gary (2010). Educational Testing & Measurement: Classroom Application and Practice (9th ed.). John Wiley & Sons, Inc, 111 River Street, Hoboken, NJ. (Kubiszyn & Borich, 2010).

No comments:

Post a Comment