Measurement

Tuesday, October 19, 2010

Scale Dimensionality (Sessions 7 and 8)

Validity-
How accurately does the scale measure the concept it says it measures?
How much systematic error do I have?

Face Validity
1) on the face of it, does it seem to measure what I say it does.
2) assessed by asking individuals in the field to review items.

Content Validity
1) A scale or measure has content validity when all aspects of the concept have been covered. These are frequently referred to as domains.

Criterion-Related Validity
1) researcher compares scores on the measure under development with some external criterion known to or believed to measure the same concept.
2) researcher creating measure determines criterion. The closer the criterion is to the measure in concept the better.
3) concurrent validity: criterion and present simultanously with the measure you are developing.
Predictive validity: criterion is in future.
4) construct validity: has the unobserved construct underlying the measure being developed been measured accurately
5) one traditional way of assessing construct validity is to look at series of studies using the measure being developed. How well do findings reflect the theory underlying the measure.
6) Statement of validity on the way a measure relates to other variables within a system of theoretical relationships.

Confirmatory Factor Analysis
1) another way is through confirmatory factor analysis in which one hypothesizes that the construct is made up of several domains and particular items belong to one particular domain.
2) one can then test the hypotheses and the model statictically

Thursday, October 14, 2010

Literature review of self-worth

Current, we have a group project,
Our group is focusing on adolescent's self-worth. Our group has been looking at a variety of literature related to adolecents' self-worth. A study conducted by Quarterly et al. (2006), adolescent' sperceptions of social support in relationships with mothers, close friends, and romatic partners and their contributions to individual adolescent self-worth and interpersonal competence.

Less is known about links between social support and adolescent wellbeing. Global self-worth is one measure of well-being: contemporary conceptualizations of self-esteem emphasize a distinctive array of perceived competencies in a variety of domains. Adolescents queried about different domains of interpersonal competence indicated that support from parents is associated with global self-worth that support from friends is associated with perceived friendship competence and social acceptance, and that support from romantic partners is associated with perceived romantic competence (Connolly & Konarski, 1994).

Global self-worth (M α = .84) provides an assessment of overall self-esteem (e.g., "Some teenagers are disappointed with themselves BUT other teenagers are pretty pleased with themselves"). Social acceptance (M α = .84) provides an assessment of competence in the peer group (e.g., "Some teens are popular with others their age BUT other teens are not very popular"). Friendship competence (M α = .76) provides an assessment of capabilities in friendships (e.g., "Some teens are able to make really close friends BUT other teens find it hard to make really close friends"). Romantic competence (M α = .74) provides an assessment of capabilities in romantic relationships (e.g., "Some teens feel that people their age will be romantically attracted to them BUT other teens feel worry about whether people their age will be attracted to them").

In a study conducted by Sargent J. T. et al., (2006), the relationship between contingencies of self–worth and vulnerability to depressive symptoms was investigated in a longitudinal sample of 629 freshmen over the first semester of college. Higher levels of external contingencies of self–worth, in a composite measure of four external contingencies of self–worth (approval from others, appearance, competition, academics), predicted increases in depressive symptoms over the first semester of college, even controlling for initial level of depressive symptoms, social desirability, gender, and race. Internal contingencies of self–worth (God’s love, virtue) were not associated with the level of depressive symptoms. We conclude that external contingencies of self–worth may contribute to vulnerability to depressive symptoms.

In another study conducted by Sanchez & Crocker (2005), the study examined the relationship between investment in gender ideals and well-being and the role of external contingencies of self-worth in a longitudinal survey of 677 college freshmen. The study proposed a model of how investment in gender ideals affects external contingencies and the consequences for self-esteem, depression, and symptoms of disordered eating. The study found that the negative relationship between investment in gender ideals and wellbeing is mediated through externally contingent self-worth. The model showed a good fit for the overall sample. Comparative model testing revealed a good fit for men and women as well as White Americans, Asian Americans, and African Americans.

The research examined effects of receiving negative interpersonal feedback on state self-esteem, affect, and goal pursuit as a function of trait self-esteem and contingencies of self-worth. Two same-sex participants interacted with each other and then received negative feedback. Participants then reported their state self esteem, affect, and self-presentation goals—how they wanted to be perceived by others at the moment. Among participants who received negative feedback, those who more strongly based their self-worth on others’ approval experienced lower state self-esteem, positive effect, and greater negative affect than those whose self-worth was less contingent on others’ approval. Participants with low self-esteem showed greater desire to appear physically attractive to others the more they based self worth on others’ approval and received negative feedback. In contrast, participants with high self-esteem showed greater desire to appear warm/caring/kind the more they based self-worth on others’ approval and received negative feedback.

Through the literature search of contingencies of self-worth, William James (1890) argued over a century ago that people derive self-esteem from succeeding in certain domains and not others. According to the contingencies of self worth model (Crocker & Wolfe, 2001), people differ in their bases of self-esteem, which are shaped by their beliefs about what they think they need to be or do to be a person of worth. Crocker and colleagues (2003b) identified seven domains in which people may derive their self-worth: Virtue, God’s love, family support, academic competence, physical attractiveness, competition, and gaining others’ approval. The more a person bases self-worth in a domain, the more he or she may be vulnerable to experiencing negative effects of self-threat in that domain. For example, research has shown that the more students base their self-worth on academics, the more likely they are to experience lower state self-esteem and greater negative affect and self evaluative thoughts when they perform poorly on academics tasks, receive lower than- expected grades, or are rejected from graduate schools

Tuesday, October 5, 2010

Item Response Theory (IRT)

Limitation of Classical Test Theory

Examine characteristics cannot be separated from test characteristics
1) The discrimination or difficulty of a item is sample dependent.
2) It does not allow you to predict how an examine, given an ability level, is likely to respond to
    particular item.
3) Only three sources of error can be estimated: A. error due the lack of internal consistency (of the
    items, coefficient alpha); B. error due to instability of a measure over repeated obervations (test-retest
    reliability); C. error due the lack of equivalence among parallel measures (correlation betweeen parallel
    forms).
4) comparison of indivduals is limited to those situations when the same test was given to individuals you
    want to compare. ex: CTT makes the false assumption that error variance is the same across all subjects
    (ex: there not relationship between you true score and error variance).

IRT allows for the development of items that are free from test and examinee biases.
IRT models are mathematical equations describing the association between a respondent's underlying levle on a latent trait or ability and the probability of a particular item response (correct response) using a nonlinear monotonic function.
Most IRT modeling is done with unidimensional models.

IRT Theory

One can consider each examinee to have a numerical value, a score, that places him or her somewhere on the ability scale. 1) at each ability level, there will be a certain probability that an examinee with that ability will give a correct answer to the item. 2) this probabilty will be small for examinee of low ability and larger for examinees of high ability.

Item Characteristics Curve (ICC)
1) If one plotted probabilty of getting a question correct as function of ability, the result would be a smooth S-shaped.
2) Each item has it own ICC
3) The item characteristic curve is the basic building block of item response theory; all the other constructs of the theory depend upon this curve.

IRT
1) These measurement models use response to items on a test or survey questionnaire to simultaneously same latent continuum (or latent space in the case of multidimensional IRT).
2) This enables one to measure individuals on the latent trait defined by the set of items (ex: ability, attitude, craving, satisfaction, quality of life, etc.) while simultaneously scaling each item on the very same dimension (ex: easy versus hard item s in the case of an ability test, unfavorable versus favorable statement in the case of an attitude questionnaire)

Two families: unidimensional and multidimensional
1) unidimensional: unidimensional models require a single trait (ability) dimension.
2) multidimensional: multidimensional IRT models response data hypothesized to arise from multiple traits.

Binary vs. polytomous items
IRT models can also be categorized based on the number of scored responses.
Dichotomous: presence/absence, correct/incorrect
Polychromous outcomes: where each response has a different score value (Likert scaling)

Item difficulty and discrimination
There are two technical properties of an item characteristic curve that are used to describe it.
1) item difficulty: the difficulty of an item describes where the item functions along the ability scale.(ex: an easy item functions among the low-ability examinees and a hard item functions among the high-ability )
2) item discrimination:

Number of IRT parameters:
IRT generally refers to three probabilistic measurement models:
1) 1-parameter logistic model (Rush model)-Latent trait: item difficulty defined: the logit point at which the probability of answering the item correctly is 50% (latent trait + item difficulty); guessing is irrelevant, and all items are equivalent in terms of discrimination.
2) 2-parameter logistic model (latent trait + item difficulty + item discrimination)
3) 3-parameter logistic model for dichotomous and polytomous responses. Latent trait + item difficulty + item discrimination + guessing parameter (this takes into consideration guessing by candidates at the lower end of the ability continuum )

IRT Assumption
1) examinee characteristic can be separated from test characterics: the easy or diffuculty of a item is sample independent; it allows you to predict how an examinee, given an ability level, is likely to respond to a particular item.
2) unidimensionality only one ability or laten t reait is meaausred
3) local independence
4) assuming a large poopl of items-each measuring the same latent trait
5) assuming the existence of a large population of examinees, the descriptors of a test item are independent of the sample of examinees drawn for the purpose of item calibration
6) a statistic indicating the precision with which each examinee's ability is estimated is provided
7) person-free and item-free measurement.

IRT Item Selection/Test construction
1) describe the shape of the desired test information over the desired ranged of abilities target information function.
2) select items with item information functions that will fill up the hard-to-fill areas under the target information function.
3) after each item is added to the test, calculate the test information function for the selected test items.
4) continue selecting items until the test information function approximates the target information function to a satisfactory degree.

Thursday, September 30, 2010

Unidimensional/Multidimensional Scales

Part 2 - Take Action

Problem for student to solve: What is my Plan?

Student Question 5: What can I do to learn what I don’t know?

This past week, we went over unidimensioal/multidimensional scales, I guess I am kind of familiar with this section. In my first year of doctoral program, an experimental observation class did cover the scection, such as Guttman Scale & Likert Scale. However, I did not see lots of studies using Guttman scale. (Self-evaluation of current status and self-identified goals status. )

Student Question 6: What could keep me from taking action?

maybe sometimes, too many things going on at the same time, but I have to say writing a blog is good habibbit since it can help me to organize my thoughts and I can go back to check what I said in the previous weeks. Also, I also feel like writing a note here can help me thinking process and clairfy what i do not understand based on our reading or class we have covered.

Student Question 7: What can I do to remove these barriers?

I would say that i have to set up a regular time to update my thoughts and notes.

I think I have been writing my blog since September. I have already started developing my skills.

I also know some strategies to keep me posting news here.

Student Question 8: When will I take action?

Develop a schedule for your action plan.

Implement your action plan.

Self-monitor progress

Scale
1) Complex concept cann't measure with one question (like demographics)
2) Operationalization breaks concept down into a number of indicators
3) Scale composed of a number of questions (items) based on indicators
4) Recombined to create concept you intended to measure

Unidimensional/Multidimensional

1) first distinguish between items or questions that make up scale and the scale itself.
2) may be multidimensional, rate connotative meaning abstract concept. Analysis would be to put concept in geometric space, several dimensions.

Unidimensional: Guttman Scale
1) one underlying dimension
2) series of yes/no questions
3) patterned response-once says no, continue with no.
4) look for the hole (breaks in the pattern)

Guttman Scale (http://en.wikipedia.org/wiki/Guttman_scale)
for example:
I believe that this country should allow more immigrants in
I would be comfortable if a new immigrant moved next door to me
I would be comfortable with new immigrants moving into my community
It would be fine with me if new immigrants moved onto my block
I would be comfortable if my child dated a new immigrant
I would permit a child of mine to mary an immigrant.

Unidimensional Likert Scale
1) Likert scale bipolar; measures positive to negative response to a statement.
2) Most commonly used type of scale
3) Likert refers to type of question format

Disadvantages: acquiescence bias; write items some are posed to indicate a lot of the concept; cannot just go down. must read items carefully

Friday, September 24, 2010

Classical Test Theory Basic Concepts

Part I:
Student Question 1: What do I want to learn?
I want to learn how to use appropriate tools to measure scales of a study

Student Question 2: What do you know now?

Beyond my answer on question1: I know reliability and validity (how important

reliability and validity play the roles in the scale measures) ; CTT (the definitation

of CTT, its advantages, and its limitations). However, in calss, we did not cover

the whole of information. sometimes, it's kind of frustrating to study alone at

home. Need to pay more attention to study that alone.

Student Question 3: What must change for me to learn what I do not know?

1) I will do some research online and check on the textbooks if I have barrier with

understanding those materials;

2) If the way still does not work, I will ask experts in this field, such as my friends who are
familiar with measurements.

3) finally, bring the questions to class, and further ask Dr. Farmer.

Student Question 4: What can I do to make this happen?

       Based on our two class assignments, final class project and annotated bibliography, the
       projects will guide me to learn and how to use what I have learned to implement to the
       papers. 1) follow the weekly reading assignments; 2) follow two assignments due.

Sometimes, I feel like it is really helpful to contribute what we have learned into a paper.

Because there is actually sample, data, and result of data analysis, it's a good learning

experiences/process.

X (observation) = T (true value of the observation: the true score of a person can be found by taking the mean score that the person would get on the same test if they had an infinite number of testing sessions) +e (across multiple observation of the same person error is normally distributed and uncorrelated with true score)

Variance & Reliability
VAR (X)= VAR (T) + VA R (E) + 2 COV- (no correlation between VAR (T) and VAR (E) )

Reliability= VAR (T)/VAR (X)- (can not directly observe VAR (T) )
reliability=1- VAR (E)/VAR (X)

What sorts of things create measurement error?

1) Error can result from the way the test is designed, factors related to the individual students, the testing situation, and many other sources. Some students may know the answers, but fatigue, distractions, and nervousness affect their ability to concentrate. Students may know correct answers but accidentally mark wrong answers on an answer sheet. Students may misunderstand the instructions on a test or misinterpret a

single question. Scores can also be an overestimate of true achievement. Students may make random guesses and get some questions right. Johnson et al., (2000)

2) Test-specific sources of error would be another measuremnet error.
For example, suppose the test uses reading selections as the basis for some questions. If a class happened to have previously studied the text passage being used, that class will probably do better than a class of students who have never seen the text before. For some tests, we know that changing the order of the items on the test leads to higher or lower scores. This means the order of the items is causing measurement error. Some test items may be biased in favor of or against particular groups of students. For example, if the reading passage contains a story that takes place on a farm, students from the inner city may be at a systematic disadvantage in making inferences based on the story.

Inter-rater Reliability (coefficient of agreeement)
1) analogous to alternate forms
2) have to two observers assess the same phenomena, assess consistency between the observers.
Source of measurement error? object of phenomenon (observe1 vs. observe2)
-cause subjective bias

Cohen's Kappa (inter-rater reliability measure) more sophisticated, takes into account chance agreement
Value range from -1 (less agreement than expected by chnace) to +1 (perfect agreement)
+.75 "excellent"
.40-.75 "fair to good"
below .40 "poor"

Reliability coefficient value:
.90 and up " excellent"
.80-.89 "good"
.70-.79 "adequate"
below .70 "may have limited applicability"

Different procedure requiring two test administrations to same group:

Test-retest (coefficient of stability)
time1-time2- source of measurement error: time factor (intervention program)
1). A. Test-Retest Method: If you are concerned with error factors related to the passing of time then you
    want to know how consistently examinees respond to this form at different time. Administer, wait, and
    then re-administer. The correlation coefficient from this procedure is called the coefficient of stability.
     B. Test-Retest with Alternate forms: Administer form 1 of test, wait then administer form 2. The
    correlation coefficient is known as the coefficient of stability and equivalence.
2). Reliability conefficient reported is correlation between the two administrations. The assumption is that the
    correlation is less than perfect (not 1.00) because of error.
3). However this technique is particularly prone to carry over effects from one administration to another.
     Reliability will be overstimated.

Parallel or alternate-form (coefficient of equivalence)
Two supposedly equivalent forms of same instrument to the sam e individuals are administered either immediately or in delayed succession
Form A-Form B (through time 1 to time 2)
Alternate form method: To reduce possibility of cheating, similar tests need to be given over time (i.e. board exam). The errors of measurement that concerns the test user are those due to differences in content of the test forms. A correlation coefficient should be used to see how different the tests are. This is called the coefficient of equivalence. Usually between .8 and .9. (http://www.smaddicts.com/2008/09/what-is-reliability_28.html

Internal analysis (coefficient of internal consistency)-Internal consistency is a method of estimating reliability that is computed from a single administration of a test. The coefficients reflect the degree to which the items are measuring the same construct and are homogeneous. Cronbach's alpha and the Kuder-Richardson formulas are measures of the internal consistency of a test. (http://www.csus.edu/indiv/d/deaner/glossary.htm#i)

Tuesday, September 21, 2010

Measurement Theories – Classical & IRT (Session 4 and 5) Part I

Part 1- Problem for students to solve:

1) What is my goal:my goal is to get familiar with those theories such as Classical Test Theory (CTT) & Item Response Theory (IRT); Reliability & Validity. To be honest, right now I am still in the process of understanding those reading and theories.

2) The way to solve the problem: that's why I created a blog and try to write dome notes and what we have learnd from class. try to summarize them, according to weekly reading assignment. I think it is helpful to me.

Measurement Theories – Classical & IRT (Session 4 and 5)

Topics:

Principles of Classical Test Theory

- O = T + E

- Measurement Error

- Reliability

- Validity

Principles of Item Response Theory

-

Θ

- Local independence of items;

- item response function (IRF)

DeVellis, R.F. (2003). Scale development: Theory and applications (2nd edition). SAGE Publications, Inc.

-Guideline in Scale Development
step 1: determine clearly what it is you want
step 2: generate an item pool
step 3: determine the format for measurement
step 4: have intial intem popl reviewed by experts
step 5: consider inclusion of validation items
step 6: administer items to a development sample
step 7: evalute the items
step 8: optimize scale length

Scale development-Part 2

Administer items to a Development Sample
sample large: 1) reduce variance; 2) increase stability of covariance matrix; 3) increase representativeness (large and range of the attribute present in the popluation; similar on other factors that may influence their understanding/interpretation of items)

Sample Diversity

Evaluate the items
Goals:
1) item has high correlation with total score (latent variable)
2) items has high intercorrelation (item-item correlation)
3) item does not exit ceilling or floor effects.

Preliminary Analysis

1) check the distribution of items
2) check the coding of items
3) examine the item correlation matrix (make sure that all items are positively associated with each other)

Item-Scale Correlations
Item Variances
Item means
Factor analysis
Coefficient Alpha