Monday, December 13, 2010

Analysis

Research hypotheses
Hypothesis on convergent validity:
1)      There will be a positive relationship between self worth and connection (family and 
          school dimension).

Hypothesis on divergent validity:
2)      There will be a negative relationship between self-worth and loneliness. 

HypothesiAccis on self-worth and gender:
3)      Adolescent boys will show a higher level of self worth perception than adolescent girls.

Last week, we were in the lab for runing the data. For our group, self-worth, we got a good result based reliability, .80. In terms of convergent and divergent validity, we also got a good result of both.

Tuesday, December 7, 2010

Warm up

Phase 3:  What have I learned?

This semester is almost towards the end. I have recalled what I have learnd from this class. I went back to check our first class' discussion questions as below:

1) What am I learning objectives for this class? I guess our group final project, team work with learning process. Building on the positive youth development research being carried out by Dr. Lerner and colleagues at Tuft University’s Institute for Applied Research in Youth Development (e.g., Bowers, et. al., 2010; Lerner, et. al.,  2005;2010), we have been carried out an observational study that examines the relationship among  dimensions of positive youth development, weight, healthy eating habits, career aspirations and community resources. Online Surveys were used to collect data for this study.  The 5-C measure of positive youth development (Lerner, et. al.,  2005)  will be used to measure positive youth development. Our group focous on self-worth of the study.  The class developed measures for youth’s demographics, health status/healthy eating habits, career aspirations and community resources. I think this course  examines a number of approaches to data collection in social work research such as surveys, scales, and observational techniques. We have been working on

2) What is measurement?

3) What role does measurement play in the research process? According to Barker et al., (2002), Measurement (Chapters 4 to 7). Having formulated the research questions, the next step is to decide how to measure the psychological constructs of interest. Using the term ‘‘measurement’’ in its broadest sense, to encompass qualitative as well as quantitative approaches to data collection. Quantitative measurement; Psychometric theory: (reliability; validity; generalizability theory; Item response theory; standards for reliability and validity); Qualitative methods: Self-report methods (open-ended and closed-ended questions; quantitative self-report methods); Observation (qualitative observation; reliability and validity issues).

4)What are the distinctions between these two interpretations of the measurement process (see Boumans, 2005)? 
    • Ellis - Associative interpretation
    • Correlation interpretation - Heidelberger (1994a, 1994b)


I have learnd how to development and evaluation of the psychometric properties of quantitative social science measurement tools.  Theories of measurement (True Score Theory and Item Response Theory), scale development, item and scale analysis using advanced statistical procedures (e.g., factor analysis and structural equation modeling) will be addressed. a) I have learned how measurement error impact on the quality of a research study.  b) how to match measurement strategies to their research design in order to reduce measurement error. Based on the two strategic above, Dr. Farmer has described the use of various approaches to measurement, including standardized scales; behavioral counts and ratings; and individualized rating scales. c) classical test theroy within the context of social work research, which the major assupmtions, strengths, and weakness of classical measurement theory. I posted lots of information about Item Response Theory (IRT) in the previous sections here. Furthermore, we will be more understandable for reliably and validity within the context of classical measurement theroy and IRT. According to develop and validate a measure, I guess my group did not engaging in the process of developing and validating a measure. The last step is to learn how to evaluate the dimensionality of a measure. For the past few weeks, Dr. farmer spent much time on describing EFA/CFA, which I also posted some information here.

      Monday, November 22, 2010

      Scale Dimensionality (Sessions 7 and 8)

      Phase 3:  What have I learned?
      Student Question 9:  What actions have I taken?
                  We are busy for writing up literature review and introuduction part for our final paper. But since like we also need to deal with different personalities in a group. Three of us has diffrent opinions, but i believe we should choose a person to our group leader so that we won't have too much arguement and conflicts. But because everyone is busy and we do not have much time to sit each other. I feel sad and frustrated. Most of  time we communicate each other via email. I feel like one of my goals is to get our final paper done and everyone in the group is on the same page.
      Student Question 10:  What barriers have been removed? 
      Gretta and I focus on introduction and literature reivew. Both of us try to make a time and send our own part to each other. The purpose of doing that is to merge our senctions together. Try to listen her opinions and thoughts regarding the paper. At the same time, areen made lots of timelines for us. Obviously, we did not really follow the scheduel.  She made lots of unrealistic timeline. But i have tried to discuss to her.
      Student Question 11:  What has changed about what I don’t know?
                 Gretta gave me some suggestions regarding my introduction. At the same time, Areen wants us to follow the outline on the paper.
      Student Question 12:  Do I know what I want to know?
                 Areen looks like want to focus on writing up chpater 3, methodology. Our study will be aimed at youth development data collected from a Web-based self-administered questionnaire. we will run the data by spss. we will have to compute and composite subscale of self-worth.
      Traditional statistical methods normally utilize one statistical test to determine the significance of the analysis. However, Structural Equation Modeling (SEM), CFA specifically, relies on several statistical tests to determine the adequacy of model fit to the data. The chi-square test indicates the amount of difference between expected and observed covariance matrices. A chi-square value close to zero indicates little difference between the expected and observed covariance matrices. In addition, the probability level must be greater than 0.05 when chi-square is close to zero.

      Tuesday, November 16, 2010

      survey development

      it's always a learning experience of developing survey for this class.
      I had develop survey though Monkeysurvey before, while I studied my MSW at rutgers. I developed the survey for my field place. My supervisor was kind of giving my some directions of how to create a survey, which is easily understandable to participants. First of all, we should give them an introduction of what the purpose of the study is, including confidentiality and withdrawal at any time. Them in the direction give the reader information on how to interpret the scale based on our questionnaire. Be careful to make the questions simply, according to the reader's language. try to estimate how much money that will finish the survey. do not expect the reader will have much time and pay more attention.

      Tuesday, October 19, 2010

      Scale Dimensionality (Sessions 7 and 8)

      Validity-
      How accurately does the scale measure the concept it says it measures?
      How much systematic error do I have?

      Face Validity
      1) on the face of it, does it seem to measure what I say it does.
      2) assessed by asking individuals in the field to review items.

      Content Validity
      1) A scale or measure has content validity when all aspects of the concept have been covered. These are frequently referred to as domains.

      Criterion-Related Validity
      1) researcher compares scores on the measure under development with some external criterion known to or believed to measure the same concept.
      2) researcher creating measure determines criterion. The closer the criterion is to the measure in concept the better.
      3) concurrent validity: criterion and present simultanously with  the measure you are developing.
      Predictive validity: criterion is in future.
      4) construct validity: has the unobserved construct underlying the measure being developed been measured accurately
      5) one traditional way of assessing construct validity is to look at series of studies using the measure being developed. How well do findings reflect the theory underlying the measure.
      6) Statement of validity on the way a measure relates to other variables within a system of theoretical relationships.

      Confirmatory Factor Analysis
      1) another way is through confirmatory factor analysis in which one hypothesizes that the construct is made up of several domains and particular items belong to one particular domain.
      2) one can then test the hypotheses and the model statictically

      Thursday, October 14, 2010

      Literature review of self-worth

      Current, we have a group project,
      Our group is focusing on adolescent's self-worth. Our group has been looking at a variety of literature related to adolecents' self-worth. A study conducted by Quarterly et al. (2006), adolescent' sperceptions of social support in relationships with mothers, close friends, and romatic partners and their contributions to individual adolescent self-worth and interpersonal competence.

      Less is known about links between social support and adolescent wellbeing. Global self-worth is one measure of well-being: contemporary conceptualizations of self-esteem emphasize a distinctive array of perceived competencies in a variety of domains. Adolescents queried about different domains of interpersonal competence indicated that support from parents is associated with global self-worth that support from friends is associated with perceived friendship competence and social acceptance, and that support from romantic partners is associated with perceived romantic competence (Connolly & Konarski, 1994).

      Global self-worth (M α = .84) provides an assessment of overall self-esteem (e.g., "Some teenagers are disappointed with themselves BUT other teenagers are pretty pleased with themselves"). Social acceptance (M α = .84) provides an assessment of competence in the peer group (e.g., "Some teens are popular with others their age BUT other teens are not very popular"). Friendship competence (M α = .76) provides an assessment of capabilities in friendships (e.g., "Some teens are able to make really close friends BUT other teens find it hard to make really close friends"). Romantic competence (M α = .74) provides an assessment of capabilities in romantic relationships (e.g., "Some teens feel that people their age will be romantically attracted to them BUT other teens feel worry about whether people their age will be attracted to them").

      In a study conducted by Sargent J. T. et al., (2006), the relationship between contingencies of self–worth and vulnerability to depressive symptoms was investigated in a longitudinal sample of 629 freshmen over the first semester of college. Higher levels of external contingencies of self–worth, in a composite measure of four external contingencies of self–worth (approval from others, appearance, competition, academics), predicted increases in depressive symptoms over the first semester of college, even controlling for initial level of depressive symptoms, social desirability, gender, and race. Internal contingencies of self–worth (God’s love, virtue) were not associated with the level of depressive symptoms. We conclude that external contingencies of self–worth may contribute to vulnerability to depressive symptoms.

      In another study conducted by Sanchez & Crocker (2005), the study examined the relationship between investment in gender ideals and well-being and the role of external contingencies of self-worth in a longitudinal survey of 677 college freshmen. The study proposed a model of how investment in gender ideals affects external contingencies and the consequences for self-esteem, depression, and symptoms of disordered eating. The study found that the negative relationship between investment in gender ideals and wellbeing is mediated through externally contingent self-worth. The model showed a good fit for the overall sample. Comparative model testing revealed a good fit for men and women as well as White Americans, Asian Americans, and African Americans.

      The research examined effects of receiving negative interpersonal feedback on state self-esteem, affect, and goal pursuit as a function of trait self-esteem and contingencies of self-worth. Two same-sex participants interacted with each other and then received negative feedback. Participants then reported their state self esteem, affect, and self-presentation goals—how they wanted to be perceived by others at the moment. Among participants who received negative feedback, those who more strongly based their self-worth on others’ approval experienced lower state self-esteem, positive effect, and greater negative affect than those whose self-worth was less contingent on others’ approval. Participants with low self-esteem showed greater desire to appear physically attractive to others the more they based self worth on others’ approval and received negative feedback. In contrast, participants with high self-esteem showed greater desire to appear warm/caring/kind the more they based self-worth on others’ approval and received negative feedback.

      Through the literature search of contingencies of self-worth,  William James (1890) argued over a century ago that people derive self-esteem from succeeding in certain domains and not others. According to the contingencies of self worth model (Crocker & Wolfe, 2001), people differ in their bases of self-esteem, which are shaped by their beliefs about what they think they need to be or do to be a person of worth. Crocker and colleagues (2003b) identified seven domains in which people may derive their self-worth: Virtue, God’s love, family support, academic competence, physical attractiveness, competition, and gaining others’ approval. The more a person bases self-worth in a domain, the more he or she may be vulnerable to experiencing negative effects of self-threat in that domain. For example, research has shown that the more students base their self-worth on academics, the more likely they are to experience lower state self-esteem and greater negative affect and self evaluative thoughts when they perform poorly on academics tasks, receive lower than- expected grades, or are rejected from graduate schools


      Tuesday, October 5, 2010

      Item Response Theory (IRT)

      Limitation of Classical Test Theory

      Examine characteristics cannot be separated from test characteristics
      1) The discrimination or difficulty of a item is sample dependent.
      2) It does not allow you to predict how an examine, given an ability level, is likely to respond to 
          particular item.
      3) Only three sources of error can be estimated: A. error due the lack of internal consistency (of the
          items, coefficient alpha); B. error due to instability of a measure over repeated obervations (test-retest
          reliability); C. error due the lack of equivalence among parallel measures (correlation betweeen parallel 
          forms).
      4) comparison of indivduals is limited to those situations when the same test was given to individuals you
          want to compare. ex: CTT makes the false assumption that error variance is the same across all subjects  
          (ex: there not relationship between you true score and error variance).


      IRT allows for the development of items that are free from test and examinee biases.
      IRT models are mathematical equations describing the association between a respondent's underlying levle on a latent trait or ability and the probability of a particular item response (correct response) using a nonlinear monotonic function.
      Most IRT modeling is done with unidimensional models.

      IRT Theory

      One can consider each examinee to have a numerical value, a score, that places him or her somewhere on the ability scale. 1) at each ability level, there will be a certain probability that an examinee with that ability will give a correct answer to the item. 2) this probabilty will be small for examinee of low ability and larger for examinees of high ability.

      Item  Characteristics Curve (ICC)
      1) If one plotted probabilty of getting a question correct as function of ability, the result would be a smooth S-shaped.
      2) Each item has it own ICC
      3) The item characteristic curve is the basic building block of item response theory; all the other constructs of the theory depend upon this curve.

      IRT
      1) These measurement models use response to items on a test or survey questionnaire to simultaneously same latent continuum (or latent space in the case of multidimensional IRT).
      2) This enables one to measure individuals on the latent trait defined by the set of items  (ex: ability, attitude, craving, satisfaction, quality of life, etc.) while simultaneously scaling each item on the very same dimension (ex: easy versus hard item s in the case of an ability test, unfavorable versus favorable statement in the case of an attitude questionnaire)

      Two families: unidimensional and multidimensional
      1) unidimensional: unidimensional models require a single trait (ability) dimension.
      2) multidimensional: multidimensional IRT models response data hypothesized to arise from multiple traits.

      Binary vs. polytomous items
      IRT models can also be categorized based on the number of scored responses.
      Dichotomous: presence/absence, correct/incorrect
      Polychromous outcomes: where each response has a different score value (Likert scaling)

      Item difficulty and discrimination
      There are two technical properties of an item characteristic curve that are used to describe it.
      1) item difficulty: the difficulty of an item describes where the item functions along the ability scale.(ex: an easy item functions among the low-ability examinees and a hard item functions among the high-ability )
      2) item discrimination:

      Number of IRT parameters:
      IRT generally refers to three probabilistic measurement models:
      1) 1-parameter logistic model (Rush model)-Latent trait: item difficulty defined: the logit point at which the probability of answering the item correctly is 50% (latent trait + item difficulty); guessing is irrelevant, and all items are equivalent in terms of discrimination.
      2) 2-parameter logistic model (latent trait + item difficulty + item discrimination)
      3) 3-parameter logistic model for dichotomous and polytomous responses. Latent trait + item difficulty + item discrimination + guessing parameter (this takes into consideration guessing by candidates at the lower end of  the ability continuum )

      IRT Assumption
      1) examinee characteristic can be separated from test characterics: the easy or diffuculty of a item is sample independent; it allows you to predict how an examinee, given an ability level, is likely to respond to a particular item.
      2) unidimensionality only one ability or laten t reait is meaausred
      3) local independence
      4) assuming a large poopl of items-each measuring the same latent trait
      5) assuming the existence of a large population of examinees, the descriptors of a test item are independent of the sample of examinees drawn for the purpose of item calibration
      6) a statistic indicating the precision with which each examinee's ability is estimated is provided
      7) person-free and item-free measurement.

      IRT Item Selection/Test construction
      1) describe the shape of the desired test information over the desired ranged of abilities target information function.
      2) select items with item information functions that will fill up the hard-to-fill areas under the target information function.
      3) after each item is added to the test, calculate the test information function for the selected test items.
      4) continue selecting items until the test information function approximates the target information function to a satisfactory degree.

      Thursday, September 30, 2010

      Unidimensional/Multidimensional Scales

      Part 2  - Take Action
      Problem for student to solve:  What is my Plan?
      Student Question 5: What can I do to learn what I don’t know?
                  This past week, we went over unidimensioal/multidimensional scales, I guess I am kind of familiar with this section. In my first year of doctoral program, an experimental observation class did cover the scection, such as Guttman Scale & Likert Scale. However, I did not see lots of studies using Guttman scale. (Self-evaluation of current status and self-identified goals status. )
      Student Question 6:  What could keep me from taking action?
      maybe sometimes, too many things going on at the same time, but I have to say writing a blog is good habibbit since it can help me to organize my thoughts and I can go back to check what I said in the previous weeks. Also, I also feel like writing a note here can help me thinking process and clairfy what i do not understand based on our reading or class we have covered.
      Student Question 7:  What can I do to remove these barriers?
      I would say that i have to set up a regular time to update my thoughts and notes.
      I think I have been writing my blog since September. I have already started developing my skills.
      I also know some strategies to keep me posting news here.
      Student Question 8:  When will I take action?
                  Develop a schedule for your action plan.
                  Implement your action plan.
                  Self-monitor progress

      Scale
      1) Complex concept cann't measure with one question (like demographics)
      2) Operationalization breaks concept down into a number of indicators
      3) Scale composed of a number of questions (items) based on indicators
      4) Recombined to create concept you intended to measure

      Unidimensional/Multidimensional

      1) first distinguish between items or questions that make up scale and the scale itself.
      2) may be multidimensional, rate connotative meaning abstract concept. Analysis would be to put concept in geometric space, several dimensions.

      Unidimensional: Guttman Scale
      1) one underlying dimension
      2) series of yes/no questions
      3) patterned response-once says no, continue with no.
      4) look for the hole (breaks in the pattern)

      Guttman Scale (http://en.wikipedia.org/wiki/Guttman_scale)
      for example:
      I believe that this country should allow more immigrants in
      I would be comfortable if a new immigrant moved next door to me
      I would be comfortable with new immigrants moving into my community
      It would be fine with me if new immigrants moved onto my block
      I would be comfortable if my child dated a new immigrant
      I would permit a child of mine to mary an immigrant.

      Unidimensional Likert Scale
      1) Likert scale bipolar; measures positive to negative response to a statement.
      2) Most commonly used type of scale
      3) Likert refers to type of question format

      Disadvantages: acquiescence bias; write items some are posed to indicate a lot of the concept; cannot just go down. must read items carefully

      Friday, September 24, 2010

      Classical Test Theory Basic Concepts

      Part I:
      Student Question 1:  What do I want to learn?           
            I want to learn how to use appropriate tools to measure scales of a study
       Student Question 2:  What do you know now?
            Beyond my answer on question1: I know reliability and validity (how important   
            reliability and validity play the roles in the scale measures) ; CTT (the definitation
            of CTT, its advantages, and its limitations). However, in calss, we did not cover  
            the  whole of  information. sometimes, it's kind of frustrating to study alone at
            home. Need to pay more attention to study that alone.
      Student Question 3:  What must change for me to learn what I do not know? 
             1) I will do some research online and check on the textbooks if I have barrier with
                 understanding those materials;
             2) If the way still does not work, I will ask experts in this field, such as my friends  who are 
                 familiar with measurements.
             3) finally, bring the questions to class, and further ask Dr. Farmer.
        Student Question 4:  What can I do to make this happen?
             Based on our two class assignments, final class project and annotated bibliography, the 
             projects will guide me to learn and how to use what I have learned to implement to the
             papers. 1) follow the weekly reading assignments; 2) follow two assignments due.
             Sometimes, I feel like it is really helpful to contribute what we have learned into a paper.
             Because there is actually sample, data, and result of data analysis, it's a good learning 
             experiences/process.         
           
      X (observation) = T (true value of the observation: the true score of a person can be found by taking the mean score that the person would get on the same test if they had an infinite number of testing sessions) +e (across multiple observation of the same person error is normally distributed and uncorrelated with true score)

      Variance & Reliability
      VAR (X)= VAR (T) + VA R (E) + 2 COV- (no correlation between VAR (T) and VAR (E) )

      Reliability= VAR (T)/VAR (X)- (can not directly observe VAR (T) )
      reliability=1- VAR (E)/VAR (X)


      What sorts of things create measurement error?
      1) Error can result from the way the test is designed, factors related to the individual students, the testing situation, and many other sources. Some students may know the answers, but fatigue, distractions, and nervousness affect their ability to concentrate. Students may know correct answers but accidentally mark wrong answers on an answer sheet. Students may misunderstand the instructions on a test or misinterpret a
      single question. Scores can also be an overestimate of true achievement. Students may make random guesses and get some questions right. Johnson et al., (2000)

      2) Test-specific sources of error would be another measuremnet error.
      For example, suppose the test uses reading selections as the basis for some questions. If a class happened to have previously studied the text passage being used, that class will probably do better than a class of students who have never seen the text before. For some tests, we know that changing the order of the items on the test leads to higher or lower scores. This means the order of the items is causing measurement error. Some test items may be biased in favor of or against particular groups of students. For example, if the reading passage contains a story that takes place on a farm, students from the inner city may be at a systematic disadvantage in making inferences based on the story.

      Inter-rater Reliability (coefficient of agreeement)
      1) analogous to alternate forms
      2) have to two observers assess the same phenomena, assess consistency between the observers.
      Source of measurement error? object of  phenomenon (observe1 vs. observe2)
      -cause subjective bias

      Cohen's Kappa (inter-rater reliability measure) more sophisticated, takes into account chance agreement
      Value range from -1 (less agreement than expected by chnace) to +1 (perfect agreement)
      +.75  "excellent"
      .40-.75 "fair to good"
      below .40 "poor"

      Reliability coefficient value:
      .90 and up " excellent"
      .80-.89 "good"
      .70-.79 "adequate"
      below .70 "may have limited applicability"

      Different procedure requiring two test administrations to same group:

      Test-retest (coefficient of stability)
      time1-time2- source of measurement error: time factor (intervention program)
      1). A. Test-Retest Method: If you are concerned with error factors related to the passing of time then you 
          want to know how consistently examinees respond to this form at different time. Administer, wait, and 
          then re-administer. The correlation coefficient from this procedure is called the coefficient of stability.
           B. Test-Retest with Alternate forms: Administer form 1 of test, wait then administer form 2. The
          correlation coefficient is known as the coefficient of stability and equivalence.
      2). Reliability conefficient reported is correlation between the two administrations. The assumption is that the 
          correlation is less than perfect (not 1.00) because of error.
      3). However this technique is particularly prone to carry over effects from one administration to another. 
           Reliability will be overstimated.

      Parallel or alternate-form (coefficient of equivalence)
      Two supposedly equivalent forms of same instrument to the sam e individuals are administered either immediately or in delayed succession
      Form A-Form B (through time 1 to time 2)
      Alternate form method: To reduce possibility of cheating, similar tests need to be given over time (i.e. board exam). The errors of measurement that concerns the test user are those due to differences in content of the test forms. A correlation coefficient should be used to see how different the tests are. This is called the coefficient of equivalence. Usually between .8 and .9. (http://www.smaddicts.com/2008/09/what-is-reliability_28.html

      Internal analysis (coefficient of internal consistency)-Internal consistency is a method of estimating reliability that is computed from a single administration of a test. The coefficients reflect the degree to which the items are measuring the same construct and are homogeneous. Cronbach's alpha and the Kuder-Richardson formulas are measures of the internal consistency of a test. (http://www.csus.edu/indiv/d/deaner/glossary.htm#i)

      Tuesday, September 21, 2010

      Measurement Theories – Classical & IRT (Session 4 and 5) Part I

      Part 1- Problem for students to solve: 
      1) What is my goal:my goal is to get familiar with those theories such as Classical Test Theory (CTT) & Item Response Theory (IRT); Reliability & Validity. To be honest, right now I am still in the process of understanding those reading and theories.
      2) The way to solve the problem: that's why I created  a blog and try to write dome notes and what we have learnd from class. try to summarize them, according to weekly reading assignment. I think it is helpful to me.
                 
      Measurement Theories – Classical & IRT (Session 4 and 5)
                  Topics: 
                              Principles of Classical Test Theory                            
      -          O = T + E
      -          Measurement Error
      -           Reliability
      -          Validity
                              Principles of Item Response Theory
      -          Θ
      -          Local independence of items;
      -          item response function (IRF)


      DeVellis, R.F. (2003). Scale development: Theory and applications (2nd edition). SAGE Publications, Inc.
      -Guideline in Scale Development
      step 1: determine clearly what it is you want
      step 2: generate an item pool
      step 3: determine the format for measurement
      step 4: have intial intem popl reviewed by experts
      step 5: consider inclusion of validation items
      step 6: administer items to a development sample
      step 7: evalute the items
      step 8: optimize scale length

      Scale development-Part 2

      Administer items to a Development Sample
      sample large: 1) reduce variance; 2) increase stability of covariance matrix; 3) increase representativeness (large and range of the attribute present in the popluation; similar on other factors that may influence their understanding/interpretation of items)

      Sample Diversity

      Evaluate the items
      Goals:
      1) item has high correlation with total score (latent variable)
      2) items has high intercorrelation (item-item correlation)
      3) item does not exit ceilling or floor effects.

      Preliminary Analysis

      1) check the distribution of items
      2) check the coding of items
      3) examine the item correlation matrix (make sure that all items are positively associated with each other)

      Item-Scale Correlations
      Item Variances
      Item means
      Factor analysis
      Coefficient Alpha

      Classical Test Theory (CTT) Statistics

      CTT statistics
      1) item difficulty
      2) item-test correlation
      3) reliability coefficient
      4) standard error of measurement (SEM)

      Item Diffuculty (ID)
      A test that is too difficult or too easy reduces the reliability (ex: fewer test-takers get the answers correctly or vice-versa). A moderate level of difficulty increase test reliability.

      1) For dichotomously scored items (1 for correct answer and 0 for incorrect answer)
      2) Adjusted p-value for polytomously scored (Likert Scaled) items (this is computed so that the result will be on the similar scale as that of the dichotomous items).

      Item-Test Correlation (Item Discrimination Power)
      Point-biserial correlation correlation indicates the relation between individual's performance on an 0/1 scored item and their performance on the total test (measure)

      Item-Test Correlation (Item Discrimination Power)
      polytomously scored items (likert scaled). Perason Product Moment Correlation Coefficient.

      Item Discrimination Power
      1) Higher item-test correlation is desired, which indicates that high ability examines tend to get the item correct (have higher scores) and low ability examinees tend to get the item incorrect (lower scores).
      2) item-test correlation tends to be senstive to item difficulty.
      3) item discrimination indices (such as point-biseral correlation) plays an more important role in item selection than item difficulty.

      Advantages:
      1) easy to apply: hand calculations possible; widely available in statistical package
      2) widely used  and easy to undetsand

      Limitations:
      1) sample dependence & test dependent of all item statistic item statistic apply only to that group of individuals on that collection of items (point: change samples and/or alter any items and th epsychometric properties of the measure changes).
      2) restrictive assumptions about: is normally distributed; uncorrelated with true score; has a mean of zero

      Wednesday, September 8, 2010

      Reliability and Validity

      Assessing Measures
      1) some measures simple, easy to assess like gender.
      2) the more complicated the measure the more complicated assessment.
      3) Assess scales based on Classical Test Theory

      Reliability-
      How consistency does the scale measure  the concept it says it measures.
      1) reliability assessed by determining how much transient error is in the measure.
      2) relatively simple concepts measured by one question are probably subject to less transient error, such as gender. We rarely assess their reliability.
      3) more complex concepts measured with a number of questions (items) with greater error. Several ways to assess reliability of scales.
      4) test-retest
      5) alternate form
      6) interrater/interjudge reliability
      7) split half techniques assessing internal consistency