All-access Pass

5 (1 vote)

This provides immediate access to ALL print and digital modules in the portal by "registering" you for each and displaying all modules as a single collection as part of this pass.

  • Module 45: Mokken-scale Analysis

    Product not yet rated Contains 1 Component(s)

    In this print module, Dr. Stefanie Wind provides an introduction to Mokken scale analysis (MSA) as a probabilistic nonparametric item response theory (IRT) framework in which to explore measurement quality with an emphasis on its application in the context of educational assessment. Keywords: item response theory, IRT, Mokken scaling, nonparametric item response theory, model fit, monotone homogeneity model, double monotonicity model, scaling coefficients

    Mokken scale analysis (MSA) is a probabilistic-nonparametric approach to item response theory (IRT) that can be used to evaluate fundamental measurement properties with less strict assumptions than parametric IRT models. This instructional module provides an introduction to MSA as a probabilistic-nonparametric framework in which to explore measurement quality, with an emphasis on its application in the context of educational assessment. The module describes both dichotomous and polytomous formulations of the MSA model. Examples of the application of MSA to educational assessment are provided using data from a multiple-choice physical science assessment and a rater-mediated writing assessment.

    Keywords: item response theory, IRT, Mokken scaling, nonparametric item response theory, model fit, monotone homogeneity model, double monotonicity model, scaling coefficients

    Stefanie A. Wind

    Assistant Professor, Department of Educational Research, University of Alabama, Tuscaloosa, AL

    Dr. Wind conducts methodological and applied research on educational assessments with an emphasis on issues related to raters, rating scales, Rasch models, nonparametric IRT, and parametric IRT. 

    Contact Stefanie via stefanie.wind@au.edu

  • Digital Module 11: Bayesian Psychometrics

    Contains 6 Component(s) Recorded On: 01/31/2020

    In this digital ITEMS module, Dr. Roy Levy discusses how Bayesian inference is a mechanism for reasoning in probability-modeling framework, describes how this plays out in a normal distribution model and unidimensional item response theory (IRT) models, and illustrates these steps using the JAGS software and R. Keywords: Bayesian psychometrics, Bayes theorem, dichotomous data, item response theory (IRT), JAGS, Markov-chain Monte Carlo (MCMC) estimation, normal distribution, R, unidimensional models

    In this digital ITEMS module, Dr. Roy Levy describes Bayesian approaches to psychometric modeling. He discusses how Bayesian inference is a mechanism for reasoning in probability-modeling framework and is well-suited to core problems in educational measurement: reasoning from student performances on an assessment to make inferences about their capabilities more broadly conceived as well as fitting models to characterize the psychometric properties of tasks. The approach is first developed in the general context of estimating a mean and variance of a normal distribution before turning to the context of unidimensional item response theory (IRT) models for dichotomously scored data. Dr. Levy illustrates the process of fitting Bayesian models using the JAGS software facilitated through the R statistical environment. The module is designed to be relevant for students, researchers, and data scientists in various disciplines such as education, psychology, sociology, political science, business, health and other social sciences. It contains audio-narrated slides, diagnostic quiz questions, and data-based activities with video solutions as well as curated resources and a glossary.

    Keywords: Bayesian psychometrics, Bayes theorem, dichotomous data, item response theory (IRT), JAGS, Markov-chain Monte Carlo (MCMC) estimation, normal distribution, R, unidimensional models

    Roy Levy

    Professor, Arizona State University

    Roy is a professor in the T. Denny Sanford School of Social & Family Dynamics at Arizona State University, specializing in Measurement and Statistical Analysis. He received his Ph.D. in Measurement, Statistics & Evaluation from the University of Maryland. His research and teaching interests include methodological investigations and applications in psychometrics and statistical modeling, focusing on item response theory, structural equation modeling, Bayesian networks, and Bayesian approaches to inference and modeling, as well as evidentiary principles and applications in complex assessments. He is the co-author of the book Bayesian Psychometric Modeling, and has published his work in a variety of leading methodological journals. For his work, he has received awards from the National Council on Measurement in Education, the American Educational Research Association, and the President of the United States. He currently serves on the editorial boards for Structural Equation Modeling: A Multidisciplinary Journal, Educational Measurement: Issues and Practice, Measurement: Interdisciplinary Research and Perspectives, and Educational Assessment.

    Contact Roy via email at roy.levy@asu.edu

  • Digital Module 12: Think-aloud Interviews and Cognitive Labs

    Contains 6 Component(s)

    ​In this digital ITEMS module, Dr. Jacqueline Leighton and Dr. Blair Lehman review differences between think-aloud interviews to measure problem-solving processes and cognitive labs to measure comprehension processes and illustrate both traditional and modern data-collection methods. Keywords: ABC tool, cognitive laboratory, cog lab, cognition, cognitive model, interrater agreement, kappa, probe, rubric, thematic analysis, think-aloud interview, verbal report

    In this digital ITEMS module, Dr. Jacqueline Leighton and Dr. Blair Lehman review differences between think-aloud interviews to measure problem-solving processes and cognitive labs to measure comprehension processes. Learners are introduced to historical, theoretical, and procedural differences between these methods and how to use and analyze distinct types of verbal reports in the collection of evidence of test-taker response processes. The module includes details on (a) the different types of cognition that are tapped by different interviewer probes, (b) traditional interviewing methods and new automated tools for collecting verbal reports, and (c) options for analyses of verbal reports. This includes a discussion of reliability and validity issues such as potential bias in the collection of verbal reports, ways to mitigate bias, and inter-rater agreement to enhance credibility of analysis. A novel digital tool for data-collection called the ABC tool is presented via illustrative videos. As always, the module contains audio-narrated slides, quiz questions with feedback, a glossary, and curated resources. 

    Keywords: ABC tool, cognitive laboratory, cog lab, cognitive model, interrater agreement, kappa, probe, rubric, thematic analysis, think-aloud interview, verbal report

  • Module 44: Quality-control for Continuous Mode Tests

    Contains 1 Component(s)

    In this print module, Dr. Avi Allalouf, Dr. Tony Gutentag, and Dr. Michal Baumer discuss errors that might occur at the different stages of the continuous mode tests (CMT) process as well as the recommended quality-control (QC) procedure to reduce the incidence of each error. Keywords: automated review, computer-based testing, CBT, continuous mode tests, CMT, human review, quality control, QC, scoring, test administration, test analysis, test scoring

    Quality control (QC) in testing is paramount. QC procedures for tests can be divided into two types. The first type, one that has been well researched, is QC for tests administered to large population groups on few administration dates using a small set of test forms (e.g., large-scale assessment). The second type is QC for tests, usually computerized, that are administered to small population groups on many administration dates using a wide array of test forms (CMT—continuous mode tests). Since the world of testing is headed in this direction, developing QC for CMT is crucial. In the current ITEMS module we discuss errors that might occur at the different stages of the CMT process, as well as the recommended QC procedure to reduce the incidence of each error. Illustration from a recent study is provided, and a computerized system that applies these procedures is presented. Instructions on how to develop one’s own QC procedure are also included.

    Keywords: automated review, computer-based testing, CBT, continuous mode tests, CMT, human review, quality control, QC, scoring, test administration, test analysis, test scoring

    Avi Allalouf

    Director of Scoring and Equating, National Institute for Testing and Evaluation, Jerusalem, Israel

    Dr. Avi Allalouf is the director of Scoring & Equating at National Institute for Testing and Evaluation (NITE). He received his PhD in Psychology from the Hebrew University in Jerusalem (1995). He teaches at the Academic College of Tel-Aviv -Yaffo. Primary areas of research: test adaptation, DIF, test scoring, quality control and testing & society. Dr. Allalouf leads the Exhibition on Testing & Measurement project and served as co editor of the International Journal of Testing (IJT) 

    Tony Gutentag

    PhD student, Department of Psychology, The Hebrew University, Jerusalem, Israel

    Michal Baumer

    Computerized Test Unit, National Institute for Testing and Evaluation, Jerusalem, Israel

  • Module 43: Data Mining for Classification and Regression

    Contains 4 Component(s)

    In this print module, Dr. Sandip Sinharay provides a review of data mining techniques for classification and regression, which is accessible to a wide audience in educational measurement. Keywords: bagging, boosting, classification and regression tree, CART, cross-validation error, data mining, ​predicted values, random forests, supervised learning, test error, TIMSS

    Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then demonstrates using three real-data examples that these methods may lead to an improvement over traditionally used methods such as linear and logistic regression in educational measurement. 

    Keywords: bagging, boosting, classification and regression tree, CART, cross-validation error, data mining, predicted values, random forests, supervised learning, test error, TIMSS

    Sandip Sinharay

    Principal Research Scientist, Educational Testing Service

    Sandip Sinharay is a principal research scientist in the Research and Development division at ETS.  He received his Ph.D degree in statistics from Iowa State University in 2001. He was editor of the Journal of Educational and Behavioral Statistics between 2011 and 2014. Sandip Sinharay has received four awards from the National Council on Measurement in Education: the award for Technical or Scientific Contributions to the Field of Educational Measurement (in 2009 and 2015), the Jason Millman Promising Measurement Scholar Award (2006), and the Alicia Cascallar Award for an Outstanding Paper by an Early Career Scholar (2005). He received the ETS Scientist award in 2008 and the ETS Presidential award twice. He has coedited two published volumes and authored or coauthored more than 75 articles in peer-reviewed statistics and psychometrics journals and edited books.

  • Module 42: Simulation Studies in Psychometrics

    Contains 8 Component(s)

    In this print module, Dr. Richard A. Feinberg and Dr. Jonathan D. Rubright provide a comprehensive introduction to the topic of simulation studies in psychometrics using R that can be easily understood by measurement specialists at all levels of training and experience. Keywords: bias, experimental design, mean absolute difference, MAD, mean squared error, MSE, root mean squared error, RMSE, psychometrics, R, research design, simulation study, standard error

    Simulation studies are fundamental to psychometric discourse and play a crucial role in operational and academic research. Yet, resources for psychometricians interested in conducting simulations are scarce. This Instructional Topics in Educational Measurement Series (ITEMS) module is meant to address this deficiency by providing a comprehensive introduction to the topic of simulation that can be easily understood by measurement specialists at all levels of training and experience. Specifically, this module describes the vocabulary used in simulations, reviews their applications in recent literature, and recommends specific guidelines for designing simulation studies and presenting results. Additionally, an example (including computer code in R) is given to demonstrate how common aspects of simulation studies can be implemented in practice and to provide a template to help users build their own simulation.

    Keywords: bias, experimental design, mean absolute difference, MAD, mean squared error, MSE, root mean squared error, RMSE, psychometrics, R, research design, simulation study, standard error

    Richard A. Feinberg

    National Board of Medical Examiners, Philadelphia, PA

    Richard Feinberg is a Senior Psychometrician with NBME, where he leads and oversees the data analysis and score reporting activities for large-scale high-stakes licensure and credentialing examinations. He is also an Assistant Professor at the Philadelphia College of Osteopathic Medicine, Philadelphia, PA, where he teaches a course on Research Methods and Statistics.

    His research interests include psychometric applications in the fields of educational and psychological testing.

    He earned a PhD in Research Methodology and Evaluation from the University of Delaware, Newark, DE.

    Jonathan D. Rubright

    National Board of Medical Examiners, Philadelphia, PA

  • Module 41:Latent DIF Analysis using Mixture Item Response Models

    Product not yet rated Contains 1 Component(s)

    In this print module, Dr. Sun-Joo Cho, Dr. Youngsuk Suh, and Dr. Woo-yeol Lee provide an introduction to differential item functioning (DIF) analysis using mixture item response theory (IRT) models, which involves comparing item profiles across latent, instead of manifest, groups. Keywords: differential item functioning, DIF, estimation, latent class, ​latent DIF, item response model, IRT, mixture model, model fit, model selection

    The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called latent DIF analysis, is provided and its applications in the literature are surveyed. Then, the methodological issues pertaining to latent DIF analysis are described, including mixture item response models, parameter estimation, and latent DIF detection methods. Finally, recommended steps for latent DIF analysis are illustrated using empirical data.

    Keywords: differential item functioning, DIF, estimation, latent class, latent DIF,  item response model, IRT, mixture model, model fit, model selection

    Sun-Joo Cho

    Associate Professor, Department of Psychology and Human Development, Vanderbilt University, Nashville, TN

    Dr. Cho has collaborated with researchers from a variety of disciplines including reading education, math education, special education, psycholinguistics, clinical psychology, cognitive psychology, neuropsychology, and audiology. She serves on the editorial boards of Journal of Educational PsychologyBehavior Research Methods, and International Journal of Testing

    Youngsuk Suh

    Department of Educational Psychology, Rutgers, The State University of New Jersey, New Brunswick, NJ

    Woo-yeol Lee

    Graduate Student, Department of Psychology and Human Development, Vanderbilt University, Nashville, TN

  • Module 40: Item Fit Statistics for Item Response Theory Models

    Product not yet rated Contains 8 Component(s)

    In this print module, Dr. Allison J. James and Dr. Randall D. Penfield provide an overview of methods used for evaluating the fit of item response theory (IRT) models. Keywords: Bayesian statistics, estimation, item response theory, IRT, Markov chain Monte Carlo, MCMC, model fit, posterior distribution, posterior predictive checks

    Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model-data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing this module, the reader will have an understanding of traditional and Bayesian approaches for evaluating model-data fit of IRT models, the relative advantages of each approach, and the software available to implement each method.

    Keywords: Bayesian statistics, estimation, item response theory, IRT, Markov chain Monte Carlo, MCMC, model fit, posterior distribution, posterior predictive checks

    Allison J. Ames

    Assistant Professor

    Allison is an assistant professor in the Educational Statistics and Research Methods program in the Department of Rehabilitation, Human Resources and Communication Disorders, Research Methodology, and Counseling at the University of Arkansas. There, she teaches courses in educational statistics, including a course on Bayesian inference. Allison received her Ph.D. from the University of North Carolina at Greensboro. Her research interests include Bayesian item response theory, with an emphasis on prior specification; model-data fit; and models for response processes. Her research has been published in prominent peer-reviewed journals. She enjoyed collaborating on this project with a graduate student, senior faculty member, and the Instructional Design Team.
    Contact Allison via boykin@uark.edu

    Randall D. Penfield

    Professor, Educational Research Methodology, University of North Carolina at Greensboro, NC

    Dr. Penfield is Dean of the School of Education and a Professor of educational measurement and assessment. His research focuses on issues of fairness in testing, validity of test scores, and the advancement of methods and statistical models used in the field of assessment. In recognition of his scholarly productivity he was awarded the 2005 early career award by the National Council on Measurement in Education, and was named a Fellow of the American Educational Research Association in 2011. In addition, he has served as co-principal investigator or consultant on a numerous federal grants funded by the National Science Foundation and the Department of Education.

  • Module 39: Polytomous Item Response Theory Models: Problems with the Step Metaphor

    Product not yet rated Contains 1 Component(s)

    In this print module, Dr. David Andrich discusses conceptual problems with the step metaphor for polytomous item response theory (IRT) models as a response to a previous ITEMS module. Keywords: graded response model, item response theory, IRT, polytomous items, polytomous Rasch model, step function, step metaphor

    Penfield’s (2014) “Instructional Module on Polytomous Item Response Theory Models” begins with a review of dichotomous response models. He refers to these as The Building Blocks of Polytomous IRT Models: The Step Function. The mathematics of these models and their interrelationships with the polytomous models is correct. Unfortunately,the step characterization for dichotomous responses, which he uses to explain the two most commonly used classes of plytomous models for ordered categories, is incompatible with the mathematical structure of these models. These two classes of models are referred to in Penfield’s paper as adjacent category models and cumulative models. At best, taken in the dynamic sense of taking a step, the step metaphor leads to a superficial understanding of the models as mere descriptions of the data; at worst it leads to a misunderstanding of the models and how they can be used to assess if the empirical ordering of the categories is consistent with the intended ordering. The purpose of this note is to explain why the step metaphor is incompatible with both models and to summarize the distinct processes for each. It is also shows, with concrete examples, how one of these models can be applied to better understand assessments in ordered categories.

    Keywords: graded response model, item response theory, IRT, polytomous items, polytomous Rasch model, step function, step metaphor

    David Andrich

    Chapple Professor, Graduate School of Education, The University of Western Australia, Crawley, Western Australia

  • Module 38: A Simple Equation to Predict a Subscore’s Value

    Contains 1 Component(s)

    In this print module, Dr. Richard A. Feinberg and Dr. Howard Wainer help analysts determine if a particular subscore adds enough value to be worth reporting through the use of a simple linear equation. Keywords: added value, ​classical test theory, CTT, linear equation, subscores, reliability, orthogonal, proportional reduction in mean squared error, PRMSE

    Subscores are often used to indicate test-takers’ relative strengths and weaknesses and so help focus remediation. But a subscore is not worth reporting if it is too unreliable to believe or if it contains no information that is not already contained in the total score. It is possible, through the use of a simple linear equation provided in this note, to determine if a particular subscore adds enough value to be worth reporting.

    Keywords:  added value, classical test theory, CTT, linear equation, subscores, reliability, orthogonal, proportional reduction in mean squared error, PRMSE

    Richard A. Feinberg

    National Board of Medical Examiners, Philadelphia, PA

    Richard Feinberg is a Senior Psychometrician with NBME, where he leads and oversees the data analysis and score reporting activities for large-scale high-stakes licensure and credentialing examinations. He is also an Assistant Professor at the Philadelphia College of Osteopathic Medicine, Philadelphia, PA, where he teaches a course on Research Methods and Statistics.

    His research interests include psychometric applications in the fields of educational and psychological testing.

    He earned a PhD in Research Methodology and Evaluation from the University of Delaware, Newark, DE.

    Howard Wainer

    Retired

    Howard Wainer is an American statistician, past principal research scientist at the Educational Testing Service, adjunct professor of statistics at the Wharton School of the University of Pennsylvania, and author, known for his contributions in the fields of statistics, psychometrics, and statistical graphics.