Thursday, September 30, 2010

Unidimensional/Multidimensional Scales

Part 2  - Take Action
Problem for student to solve:  What is my Plan?
Student Question 5: What can I do to learn what I don’t know?
            This past week, we went over unidimensioal/multidimensional scales, I guess I am kind of familiar with this section. In my first year of doctoral program, an experimental observation class did cover the scection, such as Guttman Scale & Likert Scale. However, I did not see lots of studies using Guttman scale. (Self-evaluation of current status and self-identified goals status. )
Student Question 6:  What could keep me from taking action?
maybe sometimes, too many things going on at the same time, but I have to say writing a blog is good habibbit since it can help me to organize my thoughts and I can go back to check what I said in the previous weeks. Also, I also feel like writing a note here can help me thinking process and clairfy what i do not understand based on our reading or class we have covered.
Student Question 7:  What can I do to remove these barriers?
I would say that i have to set up a regular time to update my thoughts and notes.
I think I have been writing my blog since September. I have already started developing my skills.
I also know some strategies to keep me posting news here.
Student Question 8:  When will I take action?
            Develop a schedule for your action plan.
            Implement your action plan.
            Self-monitor progress

Scale
1) Complex concept cann't measure with one question (like demographics)
2) Operationalization breaks concept down into a number of indicators
3) Scale composed of a number of questions (items) based on indicators
4) Recombined to create concept you intended to measure

Unidimensional/Multidimensional

1) first distinguish between items or questions that make up scale and the scale itself.
2) may be multidimensional, rate connotative meaning abstract concept. Analysis would be to put concept in geometric space, several dimensions.

Unidimensional: Guttman Scale
1) one underlying dimension
2) series of yes/no questions
3) patterned response-once says no, continue with no.
4) look for the hole (breaks in the pattern)

Guttman Scale (http://en.wikipedia.org/wiki/Guttman_scale)
for example:
I believe that this country should allow more immigrants in
I would be comfortable if a new immigrant moved next door to me
I would be comfortable with new immigrants moving into my community
It would be fine with me if new immigrants moved onto my block
I would be comfortable if my child dated a new immigrant
I would permit a child of mine to mary an immigrant.

Unidimensional Likert Scale
1) Likert scale bipolar; measures positive to negative response to a statement.
2) Most commonly used type of scale
3) Likert refers to type of question format

Disadvantages: acquiescence bias; write items some are posed to indicate a lot of the concept; cannot just go down. must read items carefully

Friday, September 24, 2010

Classical Test Theory Basic Concepts

Part I:
Student Question 1:  What do I want to learn?           
      I want to learn how to use appropriate tools to measure scales of a study
 Student Question 2:  What do you know now?
      Beyond my answer on question1: I know reliability and validity (how important   
      reliability and validity play the roles in the scale measures) ; CTT (the definitation
      of CTT, its advantages, and its limitations). However, in calss, we did not cover  
      the  whole of  information. sometimes, it's kind of frustrating to study alone at
      home. Need to pay more attention to study that alone.
Student Question 3:  What must change for me to learn what I do not know? 
       1) I will do some research online and check on the textbooks if I have barrier with
           understanding those materials;
       2) If the way still does not work, I will ask experts in this field, such as my friends  who are 
           familiar with measurements.
       3) finally, bring the questions to class, and further ask Dr. Farmer.
  Student Question 4:  What can I do to make this happen?
       Based on our two class assignments, final class project and annotated bibliography, the 
       projects will guide me to learn and how to use what I have learned to implement to the
       papers. 1) follow the weekly reading assignments; 2) follow two assignments due.
       Sometimes, I feel like it is really helpful to contribute what we have learned into a paper.
       Because there is actually sample, data, and result of data analysis, it's a good learning 
       experiences/process.         
     
X (observation) = T (true value of the observation: the true score of a person can be found by taking the mean score that the person would get on the same test if they had an infinite number of testing sessions) +e (across multiple observation of the same person error is normally distributed and uncorrelated with true score)

Variance & Reliability
VAR (X)= VAR (T) + VA R (E) + 2 COV- (no correlation between VAR (T) and VAR (E) )

Reliability= VAR (T)/VAR (X)- (can not directly observe VAR (T) )
reliability=1- VAR (E)/VAR (X)


What sorts of things create measurement error?
1) Error can result from the way the test is designed, factors related to the individual students, the testing situation, and many other sources. Some students may know the answers, but fatigue, distractions, and nervousness affect their ability to concentrate. Students may know correct answers but accidentally mark wrong answers on an answer sheet. Students may misunderstand the instructions on a test or misinterpret a
single question. Scores can also be an overestimate of true achievement. Students may make random guesses and get some questions right. Johnson et al., (2000)

2) Test-specific sources of error would be another measuremnet error.
For example, suppose the test uses reading selections as the basis for some questions. If a class happened to have previously studied the text passage being used, that class will probably do better than a class of students who have never seen the text before. For some tests, we know that changing the order of the items on the test leads to higher or lower scores. This means the order of the items is causing measurement error. Some test items may be biased in favor of or against particular groups of students. For example, if the reading passage contains a story that takes place on a farm, students from the inner city may be at a systematic disadvantage in making inferences based on the story.

Inter-rater Reliability (coefficient of agreeement)
1) analogous to alternate forms
2) have to two observers assess the same phenomena, assess consistency between the observers.
Source of measurement error? object of  phenomenon (observe1 vs. observe2)
-cause subjective bias

Cohen's Kappa (inter-rater reliability measure) more sophisticated, takes into account chance agreement
Value range from -1 (less agreement than expected by chnace) to +1 (perfect agreement)
+.75  "excellent"
.40-.75 "fair to good"
below .40 "poor"

Reliability coefficient value:
.90 and up " excellent"
.80-.89 "good"
.70-.79 "adequate"
below .70 "may have limited applicability"

Different procedure requiring two test administrations to same group:

Test-retest (coefficient of stability)
time1-time2- source of measurement error: time factor (intervention program)
1). A. Test-Retest Method: If you are concerned with error factors related to the passing of time then you 
    want to know how consistently examinees respond to this form at different time. Administer, wait, and 
    then re-administer. The correlation coefficient from this procedure is called the coefficient of stability.
     B. Test-Retest with Alternate forms: Administer form 1 of test, wait then administer form 2. The
    correlation coefficient is known as the coefficient of stability and equivalence.
2). Reliability conefficient reported is correlation between the two administrations. The assumption is that the 
    correlation is less than perfect (not 1.00) because of error.
3). However this technique is particularly prone to carry over effects from one administration to another. 
     Reliability will be overstimated.

Parallel or alternate-form (coefficient of equivalence)
Two supposedly equivalent forms of same instrument to the sam e individuals are administered either immediately or in delayed succession
Form A-Form B (through time 1 to time 2)
Alternate form method: To reduce possibility of cheating, similar tests need to be given over time (i.e. board exam). The errors of measurement that concerns the test user are those due to differences in content of the test forms. A correlation coefficient should be used to see how different the tests are. This is called the coefficient of equivalence. Usually between .8 and .9. (http://www.smaddicts.com/2008/09/what-is-reliability_28.html

Internal analysis (coefficient of internal consistency)-Internal consistency is a method of estimating reliability that is computed from a single administration of a test. The coefficients reflect the degree to which the items are measuring the same construct and are homogeneous. Cronbach's alpha and the Kuder-Richardson formulas are measures of the internal consistency of a test. (http://www.csus.edu/indiv/d/deaner/glossary.htm#i)

Tuesday, September 21, 2010

Measurement Theories – Classical & IRT (Session 4 and 5) Part I

Part 1- Problem for students to solve: 
1) What is my goal:my goal is to get familiar with those theories such as Classical Test Theory (CTT) & Item Response Theory (IRT); Reliability & Validity. To be honest, right now I am still in the process of understanding those reading and theories.
2) The way to solve the problem: that's why I created  a blog and try to write dome notes and what we have learnd from class. try to summarize them, according to weekly reading assignment. I think it is helpful to me.
           
Measurement Theories – Classical & IRT (Session 4 and 5)
            Topics: 
                        Principles of Classical Test Theory                            
-          O = T + E
-          Measurement Error
-           Reliability
-          Validity
                        Principles of Item Response Theory
-          Θ
-          Local independence of items;
-          item response function (IRF)


DeVellis, R.F. (2003). Scale development: Theory and applications (2nd edition). SAGE Publications, Inc.
-Guideline in Scale Development
step 1: determine clearly what it is you want
step 2: generate an item pool
step 3: determine the format for measurement
step 4: have intial intem popl reviewed by experts
step 5: consider inclusion of validation items
step 6: administer items to a development sample
step 7: evalute the items
step 8: optimize scale length

Scale development-Part 2

Administer items to a Development Sample
sample large: 1) reduce variance; 2) increase stability of covariance matrix; 3) increase representativeness (large and range of the attribute present in the popluation; similar on other factors that may influence their understanding/interpretation of items)

Sample Diversity

Evaluate the items
Goals:
1) item has high correlation with total score (latent variable)
2) items has high intercorrelation (item-item correlation)
3) item does not exit ceilling or floor effects.

Preliminary Analysis

1) check the distribution of items
2) check the coding of items
3) examine the item correlation matrix (make sure that all items are positively associated with each other)

Item-Scale Correlations
Item Variances
Item means
Factor analysis
Coefficient Alpha

Classical Test Theory (CTT) Statistics

CTT statistics
1) item difficulty
2) item-test correlation
3) reliability coefficient
4) standard error of measurement (SEM)

Item Diffuculty (ID)
A test that is too difficult or too easy reduces the reliability (ex: fewer test-takers get the answers correctly or vice-versa). A moderate level of difficulty increase test reliability.

1) For dichotomously scored items (1 for correct answer and 0 for incorrect answer)
2) Adjusted p-value for polytomously scored (Likert Scaled) items (this is computed so that the result will be on the similar scale as that of the dichotomous items).

Item-Test Correlation (Item Discrimination Power)
Point-biserial correlation correlation indicates the relation between individual's performance on an 0/1 scored item and their performance on the total test (measure)

Item-Test Correlation (Item Discrimination Power)
polytomously scored items (likert scaled). Perason Product Moment Correlation Coefficient.

Item Discrimination Power
1) Higher item-test correlation is desired, which indicates that high ability examines tend to get the item correct (have higher scores) and low ability examinees tend to get the item incorrect (lower scores).
2) item-test correlation tends to be senstive to item difficulty.
3) item discrimination indices (such as point-biseral correlation) plays an more important role in item selection than item difficulty.

Advantages:
1) easy to apply: hand calculations possible; widely available in statistical package
2) widely used  and easy to undetsand

Limitations:
1) sample dependence & test dependent of all item statistic item statistic apply only to that group of individuals on that collection of items (point: change samples and/or alter any items and th epsychometric properties of the measure changes).
2) restrictive assumptions about: is normally distributed; uncorrelated with true score; has a mean of zero

Wednesday, September 8, 2010

Reliability and Validity

Assessing Measures
1) some measures simple, easy to assess like gender.
2) the more complicated the measure the more complicated assessment.
3) Assess scales based on Classical Test Theory

Reliability-
How consistency does the scale measure  the concept it says it measures.
1) reliability assessed by determining how much transient error is in the measure.
2) relatively simple concepts measured by one question are probably subject to less transient error, such as gender. We rarely assess their reliability.
3) more complex concepts measured with a number of questions (items) with greater error. Several ways to assess reliability of scales.
4) test-retest
5) alternate form
6) interrater/interjudge reliability
7) split half techniques assessing internal consistency