Key Measurement Concepts
This unit covers the critical measurement concepts of reliability and validity. These concepts are commonly found in licensure exams and are the key to understanding and evaluating assessment procedures. By the use of specific statistical procedures, the utility of the reliability, validity, sensitivity, central tendency, variance, and specificity of a given assessment tool or test for a particular population or individual can be determined.
Repetition and practice in applying reliability and validity concepts are recommended for mastery. The study activities provide opportunities to engage in both strategies. Take advantage of optional resources as needed to reinforce and deepen your understanding. This unit may require more time for reading and review to master the content.
To successfully complete this learning unit, you will be expected to:
1. Assess your knowledge of the concepts of reliability and validity.
2. Examine an assessment’s adherence to basic measurement constructs.
3. Analyze concepts related to the application of assessment tools.
Learning Activities: Unit 4 Study 1
Use Principles and Applications of Assessment in Counseling to complete the following:
• Read Chapter 3, “Reliability,” pages 37–55.
• Read Chapter 4, “Validity and Item Analysis,” pages 56–77.
Use the Capella University Library to complete the following:
• Read Coffman, Guerin, and Gottfried’s 2006 article, “Reliability and Validity of the Parent-Child Relationship Inventory (PCRI): Evidence from a Longitudinal Cross-Informant Investigation,” in Psychological Assessment, volume 18, issue 2, pages 209–214.
Discussion 1: 1 page needed with minimum of 250 words and 2 references.
Reliability and Validity
Describe the relevance of psychometric properties in psychological testing.
After using the Reliability and Validity Exercise, further distinguish the types and significance of validity and reliability in test creation, application in counseling settings, and potential consequences of using assessment tools. Apply Coffman, Guerin, and Gottfried’s evaluation of the Parent-Child Relationship Inventory as a guide.
The transcript of Reliability and Validity Exercise interactive below.
Reliability and Validity Exercise
This exercise reviews the fundamental concepts related to reliability and validity in assessment. There is one tab for reliability and one tab for validity. You will need to match the word or concept with the correct definition. You can use the exercise as many times as you need until you feel confident with your knowledge of these fundamental principle of reliability and validity.
Average Inter-Item Correlation
Correlations are added together and the average is computed for the groups of related items.
Average Item-Total Correlation
This approach uses inter-item correlations. For example, a total score is computed for six items, and the total is the seventh variable in the analysis.
This is the mathematic equivalent to the average of all possible split-half estimates for scores of a given sample.
Internal Consistency Reliability
Used to assess the consistency of the results across items within a test. This form of reliability measures the degree to which items on a test or scale measure the same construct.
Inter-Rater or Inter-Observer Reliability
Used to assess the degree to which different raters/observers give consistent estimates of the same phenomenon.
Used to assess the consistency of the results of two tests constructed in the same way from the same content domain. This measure of reliability requires that there are multiple items designed to measure the same construct.
Assesses the accuracy or precision of a measurement.
Assesses correlations between two halves of a test.
Used to assess the consistency of a measure from one point in time to another.
How well scores from two different tests given at same time correlate with each other.
How well a test or instrument measures a given psychological trait.
Degree to which an instrument contains items related to the domain it is designed to measure.
Correlation between scores obtained from different tests, which measure similar constructs.
Degree to which a test correlates with some measure of performance.
Correlation between scores obtained from different tests, which measure different constructs.
Degree to which an instrument measures what it appears to measure.
Degree to which an instrument predicts some aspect of human behavior or performance
Discussion 2: 1 page needed with minimum of 250 words and 2 references.
Inept Assessment Development
If you have not already done so, watch the Inept Assessment Development video in this unit’s study. Identify three errors in the creation of the presented assessment tool. Incorporate your text to apply key assessment concepts that demonstrate why identified methods were flawed.
The transcript of Inept Assessment Development video below.
Inept Assessment Development
Researcher 1 (R1): I have identified an area where there are no current assessment tools. If we can add this to our catalog, there is no doubt that we can make some sales. It is a no-brainer.
Researcher 2 (R2): Sounds great. What is it?
R1: The Fall Color Appreciation Test. The F-CAT.
R2: What is it? Who will want it?
R1: Who cares? It just sounds cool. There are plenty of assessments that do not measure anything useful. Someone will want to know who really appreciates fall color.
R2: OK, keep going. How will it measure fall color appreciation?
R1: This will be a true-false test with 20 items related to fall color. We will have items that will inquire whether the test-taker likes the colors associated with fall…red, orange, yellow, burnt umber.
R2: Burnt Umber? I remember that one from crayon boxes! Love it.
R1 – Next, we can see if the person likes other sensory experiences associated with fall, like the crunching of leaves and really earthy smells. The last group of items will explore whether the subject likes chilly weather.
R2 – This will be a piece of cake to put together! You put together a test group and I will have the FCAT ready to go in no time!
Two Weeks Later…
R1 – This is great! The F-CAT has nearly perfect reliability. The test-retest results are in. The same test group that I used from the office next door two weeks ago just took the F-CAT again. Their scores are nearly identical! We have got a winner!
R2 – Well, reliability is great, but what about validity? Did you see the faces of those people taking the test? They were completely annoyed. I heard one guy mutter that you have really lost it. If it wasn’t for the $5 gift cards I was handing out, they would not have stuck around.
R1 – Huh? Aw, come on. Just throw something together and make it look good. We just need to get this thing on the Internet so people can start buying it.
R2 – What does this stupid test really measure anyway? I mean, who cares if someone likes the sound of crunchy leaves? What does that have to do with their appreciation for fall color?!
R1 – Statistics can say anything we want! You just fix it up to look good. This is just a trait test. It is stable so it must be valid. I even checked the split half reliability and it is solid.