Tuesday, October 5, 2010

Item Response Theory (IRT)

Limitation of Classical Test Theory

Examine characteristics cannot be separated from test characteristics
1) The discrimination or difficulty of a item is sample dependent.
2) It does not allow you to predict how an examine, given an ability level, is likely to respond to 
    particular item.
3) Only three sources of error can be estimated: A. error due the lack of internal consistency (of the
    items, coefficient alpha); B. error due to instability of a measure over repeated obervations (test-retest
    reliability); C. error due the lack of equivalence among parallel measures (correlation betweeen parallel 
    forms).
4) comparison of indivduals is limited to those situations when the same test was given to individuals you
    want to compare. ex: CTT makes the false assumption that error variance is the same across all subjects  
    (ex: there not relationship between you true score and error variance).


IRT allows for the development of items that are free from test and examinee biases.
IRT models are mathematical equations describing the association between a respondent's underlying levle on a latent trait or ability and the probability of a particular item response (correct response) using a nonlinear monotonic function.
Most IRT modeling is done with unidimensional models.

IRT Theory

One can consider each examinee to have a numerical value, a score, that places him or her somewhere on the ability scale. 1) at each ability level, there will be a certain probability that an examinee with that ability will give a correct answer to the item. 2) this probabilty will be small for examinee of low ability and larger for examinees of high ability.

Item  Characteristics Curve (ICC)
1) If one plotted probabilty of getting a question correct as function of ability, the result would be a smooth S-shaped.
2) Each item has it own ICC
3) The item characteristic curve is the basic building block of item response theory; all the other constructs of the theory depend upon this curve.

IRT
1) These measurement models use response to items on a test or survey questionnaire to simultaneously same latent continuum (or latent space in the case of multidimensional IRT).
2) This enables one to measure individuals on the latent trait defined by the set of items  (ex: ability, attitude, craving, satisfaction, quality of life, etc.) while simultaneously scaling each item on the very same dimension (ex: easy versus hard item s in the case of an ability test, unfavorable versus favorable statement in the case of an attitude questionnaire)

Two families: unidimensional and multidimensional
1) unidimensional: unidimensional models require a single trait (ability) dimension.
2) multidimensional: multidimensional IRT models response data hypothesized to arise from multiple traits.

Binary vs. polytomous items
IRT models can also be categorized based on the number of scored responses.
Dichotomous: presence/absence, correct/incorrect
Polychromous outcomes: where each response has a different score value (Likert scaling)

Item difficulty and discrimination
There are two technical properties of an item characteristic curve that are used to describe it.
1) item difficulty: the difficulty of an item describes where the item functions along the ability scale.(ex: an easy item functions among the low-ability examinees and a hard item functions among the high-ability )
2) item discrimination:

Number of IRT parameters:
IRT generally refers to three probabilistic measurement models:
1) 1-parameter logistic model (Rush model)-Latent trait: item difficulty defined: the logit point at which the probability of answering the item correctly is 50% (latent trait + item difficulty); guessing is irrelevant, and all items are equivalent in terms of discrimination.
2) 2-parameter logistic model (latent trait + item difficulty + item discrimination)
3) 3-parameter logistic model for dichotomous and polytomous responses. Latent trait + item difficulty + item discrimination + guessing parameter (this takes into consideration guessing by candidates at the lower end of  the ability continuum )

IRT Assumption
1) examinee characteristic can be separated from test characterics: the easy or diffuculty of a item is sample independent; it allows you to predict how an examinee, given an ability level, is likely to respond to a particular item.
2) unidimensionality only one ability or laten t reait is meaausred
3) local independence
4) assuming a large poopl of items-each measuring the same latent trait
5) assuming the existence of a large population of examinees, the descriptors of a test item are independent of the sample of examinees drawn for the purpose of item calibration
6) a statistic indicating the precision with which each examinee's ability is estimated is provided
7) person-free and item-free measurement.

IRT Item Selection/Test construction
1) describe the shape of the desired test information over the desired ranged of abilities target information function.
2) select items with item information functions that will fill up the hard-to-fill areas under the target information function.
3) after each item is added to the test, calculate the test information function for the selected test items.
4) continue selecting items until the test information function approximates the target information function to a satisfactory degree.

1 comment: