Opportunities and Challenges for Developing and Evaluating Diagnostic

Скачать презентацию Opportunities and Challenges for Developing and Evaluating Diagnostic

c47e808e3fcfeda2cba9bb5abdb0bf3d.ppt

Количество слайдов: 24

Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments in STEM Education: -A Modern Psychometric Perspective – André A. Rupp, EDMS Department, University of Maryland DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 1

Toward a Definition of “Diagnostic Assessment Systems”

Proposed Panel Definition The term "diagnostic” comes from a combination of dia, to split apart, and gnosi, to learn, or knowledge. We use “diagnostic assessment (system)” to refer to assessment processes based on an explicit cognitive model, itself supported by empirical study, of proficient reasoning in a particular domain. The cognitive model must support delineation of students’ and / or teachers’ strengths and weaknesses that can be traced as they move from less to more proficient reasoning in the domain. The principled assessment design process should specify how observed behaviors are used to make inferences about what students or teachers know as they progress. We believe that diagnostic assessment has the potential to inform and assess the outcomes of instruction. DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 3

Conceptualization of Problem Space from Stevens, Beal, & Sprang (2009) DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 4

Toward an Understanding of Frameworks & Models

The Evidence-centered Design Framework adapted from Mislevy, Steinberg, Almond, & Lukas (2006) DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 6

Frameworks vs. Models A “principled assessment design framework” for diagnostic assessment such as evidence-centered design is NOT a “model”. It does NOT prescribe a particular statistical modeling approach. A “statistical / psychometric model” is a mathematical tool that plays a supporting role for generating evidence-based narratives about students’ and / or teachers’ strenghts and weaknesses. Its parameters do NOT have inherent meanings. A “cognitive model” for diagnostic assessment is a theory and data-driven description of how emergent understandings and misconceptions in a domain develop and how these can be traced back to unobservable cognitive underpinnings. It does NOT prescribe a singular assessment approach. DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 7

Evidence-based Reasoning for “Traditional” Assessments

Traditional Construct Operationalization I 1 I 2 : Test Score Construct Ik I 1 I 2 Test Score Construct : Ik I 1 Construct Test Score I 2 : Ik Theoretical Realm DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 Empirical Realm 9

Feedback Utility (Part I – Scoring Card) DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 10

Feedback Utility (Part II – Simple Progress Mapping) Level 3 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 Level 4 11

Evidence-based Reasoning for “Modern” Assessments

Complex Assessment Tasks for Diagnosis (Part I) from Seeratan & Mislevy (2008) DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 13

Complex Assessment Tasks for Diagnosis (Example II) from Behrens et al. (2009) DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 14

Evidence Identification, Aggregation, & Synthesis from Stevens, Beal, & Sprang (2009) DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 15

Proficiency Pathways from Stevens, Beal, & Sprang (2009) DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 16

Interventional Pathways from Stevens, Beal, & Sprang (2009) DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 17

Selected Statistical Tools for Evidence-based Reasoning

Selected Modeling Approaches for Diagnostic Assessments Approaches Resulting in Continuous Proficiency Scales 1. Unidimensional explanatory IRT or FA models (e. g. , de Boeck & Wilson, 2004) 2. Multidimensional CTT sumscores (e. g. , Henson, Templin, & Douglas, 2007) 3. Multidimensional explanatory IRT or FA models (e. g. , Reckase, 2009) 4. Structural equation models (e. g. , Kline, 2010) Approaches Resulting in Classifications of Respondents based on Discrete Scales 1. Bayesian inference networks (e. g. , Almond, Williamson, Mislevy, & Yan, in press) 2. Parametric diagnostic classification models (e. g. , Rupp, Templin, & Henson, 2010) 3. Non- / Semi-parametric classification approaches (e. g. , Tatsuoka, 2009) 4. Adapted clustering algorithms (e. g. , Nugent, Dean, & Ayers, 2010) DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 19

Psychometric Tools for Diagnostic Assessments New frontiers of educational measurement 1. Educational data mining for simulation- / games-based assessment (e. g. , Rupp et al. , 2010; Soller & Stevens, 2007; West et al. , 2009) 2. Diagnostic multiple-choice items / selected-response items (e. g. , Briggs et al. , 2006; de la Torre, 2009) 3. Computerized diagnostic adaptive assessment (e. g. , Cheng, 2009; Mc. Glohen & Chang, 2008) Useful ideas from large-scale assessment 1. Modeling dependencies in nested response data (e. g. , Jiao, von Davier, & Wang, 2010; Wainer, Bradlow, & Wang, 2007) 2. Item families / task variants & automatic test / form assembly (e. g. , Embretson & Daniel, 2008; Geerlings, Glas, & van der Linden, in press) 3. Survey designs using multiple test forms / booklets (e. g. , Frey, Hartig, & Rupp, 2009; Rutkowski, Gonzalez, Joncas, & von Davier, 2010) DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 20

Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments in STEM Education: -A Modern Psychometric Perspective – André A. Rupp EDMS Department, University of Maryland 1230 -A Benjamin Building College Park, MD 20742 Phone: (301) 405 – 3623 E-mail: ruppandr@umd. edu DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 21

References (Part I) Almond, R. G. , Williamson, D. M. , Mislevy, R. J. , & Yan, D. (in press). Bayes nets in educational assessment. New York: Springer. Beaton, A. E. , & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17, 191 -204. Borsboom, D. , & Mellenbergh, G. J. (2007). Test validity in cognitive assessment. In J. P. Leighton & M. J. Gierl (Eds. ), Cognitive diagnostic assessment for education: Theory and applications (pp. 85– 118). Cambridge, UK: Cambridge University Press. Briggs, D. C. , Alonzo, A. C. , Schwab, C. , & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, 33 -63. Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619 -632. de Boeck, P. , & Wilson, M. (2004). Explanatory item response theory models: A generalized linear and nonlinear approach. New York: Springer. de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33, 163 -183. Embretson, S. E. , & Daniel, R. C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem-solving items. Psychology Science Quarterly, 50, 328 -344. Frey, A. , Hartig, J. , & Rupp, A. A. (2009). An NCME instructional module on booklet designs in large -scale assessments of student achievement. Educational Measurement: Issues and Practice, 28(3), 39 -53. Geerlings, H. , Glas, C. A. W. , & van der Linden, W. (in press). Modeling rule-based item generation. Psychometrika. DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 22

References (Part II) Gomez, P. G. , Noah, A. , Schedl, M. , Wright, C. , & Yolkut, A. (2007). Proficiency descriptors based on a scale-anchoring study of the new TOEFL i. BT reading test. Language Testing, 24, 417 -444. Haberman, S. , & Sinharay, S. (2010). Reporting of subscores using multidimensional item response theory. Psychometrika, 75, 209 -227. Haberman, S. , Sinharay, S. , & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62, 79 -95. Jiao, H. , von Davier, M. , & Wang, S. (2010, April). Polytomous mixture Rasch testlet model. Presented at the annual meeting of the National Council for Measurement in Education, Denver, CO. Kane, M. T. (2006). Validation. In R L. Brennan (Ed. ), Educational measurement (4 th ed. , pp. 17– 64). Portsmouth, NH: Greenwood. Kline, R. (2010). Principles and practice of structural equation modeling (2 nd ed. ). New York: Guilford Press. Leighton, J. , & Gierl, M. (2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge, UK: Cambridge University Press. Mc. Glohen, M. , & Chang, H. -H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40, 808 -821. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741– 749. Mislevy, R. J. , Steinberg, L. S. , Almond, R. G. , & Lukas, J. F. (2006). Concepts, terminology, and basic models of evidence-centered design. In D. M. Williamson, I. I. Bejar, & R. J. Mislevy (Eds. ), Automated scoring of complex tasks in computer-based testing (pp. 15– 48). Mahwah, NJ: Erlbaum. DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 23

References (Part III) Nugent, R. , Dean, N. , & Ayers, B. (2010, July). Skill set profile clustering: The empty K-means algorithm with automatic specification of starting cluster centers. Presented at the International Educational Data Mining Conference, Pittsburgh, PA. Reckase, M. (2009). Multidimensional item response theory. New York: Springer. Rupp, A. A. , Templin, J. , & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York: Guildford Press. Rupp, A. A. , Gushta, M. , Mislevy, R. J. , & Shaffer, D. W. (2010). Evidence-centered design of epistemic games: Measurement principles for complex learning environments. Journal of Technology, Learning, & Assessment, 8(4). Available online at http: //escholarship. bc. edu/jtla/vol 8/4/ Rutkowski, L. , Gonzalez, E. , Joncas, M. , & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39, 142151. Tatsuoka, K. K. (2009). Cognitive assessment: An introduction to the rule-space method. Florence, KY: Routledge. Stevens, R. , Beal, C. , & Sprang, M. (2009, August). Developing versatile automated assessments of scientific problem-solving. Presented at the NSF conference on games- and simulation-based assessment, Washington, DC. Templin, J. , & Henson, R. (2009, April). Practical issues in using diagnostic estimates: Measuring the reliability and validity of diagnostic estimates. Presented at the annual meeting of the National Council of Measurement in Education, San Diego, CA. Wainer, H. , Bradlow, E. T. , & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press. West, P. , Rutstein, D. W. , Mislevy, R. J. , Liu, J. , Levy, R. , Di. Cerbo, K. E. , et al. (2009, June). A Bayes net approach to modeling learning progressions and task performances. Paper presented at the Learning Progressions in Science conference, Iowa City, IA. DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 24