Catalog Advanced Search

Search by Categories
Search by Format
Search by Date Range
Products are filtered by different dates, depending on the combination of live and on-demand components that they contain, and on whether any live components are over or not.
Start
End
Search by Keyword
Sort By

45 Results

  • Module 45: Mokken-scale Analysis

    Product not yet rated Contains 1 Component(s)

    This instructional module provides an introduction to MSA as a probabilistic-nonparametric framework in which to explore measurement quality, with an emphasis on its application in the context of educational assessment.

    Mokken scale analysis (MSA) is a probabilistic-nonparametric approach to item response theory (IRT) that can be used to evaluate fundamental measurement properties with less strict assumptions than parametric IRT models. This instructional module provides an introduction to MSA as a probabilistic-nonparametric framework in which to explore measurement quality, with an emphasis on its application in the context of educational assessment. The module describes both dichotomous and polytomous formulations of the MSA model. Examples of the application of MSA to educational assessment are provided using data from a multiple-choice physical science assessment and a rater-mediated writing assessment.

    Keywords: Mokken scaling, nonparametric item response theory, monotone homogeneity model, double monotonicity model, scaling coefficients

    Stefanie A. Wind

    Assistant Professor, Department of Educational Research, University of Alabama, Tuscaloosa, AL

    Dr. Wind conducts methodological and applied research on educational assessments with an emphasis on issues related to raters, rating scales, Rasch models, nonparametric IRT, and parametric IRT.

  • Module 44: Quality-control for Continuous Mode Tests

    Contains 1 Component(s)

    In the current ITEMS module we discuss errors that might occur at the different stages of the continuous mode tests (CMT) process, as well as the recommended QC procedure to reduce the incidence of each error.

    Quality control (QC) in testing is paramount. QC procedures for tests can be divided into two types. The first type, one that has been well researched, is QC for tests administered to large population groups on few administration dates using a small set of test forms (e.g., large-scale assessment). The second type is QC for tests, usually computerized, that are administered to small population groups on many administration dates using a wide array of test forms (CMT—continuous mode tests). Since the world of testing is headed in this direction, developing QC for CMT is crucial. In the current ITEMS module we discuss errors that might occur at the different stages of the CMT process, as well as the recommended QC procedure to reduce the incidence of each error. Illustration from a recent study is provided, and a computerized system that applies these procedures is presented. Instructions on how to develop one’s own QC procedure are also included.

    Keywords: computer-based testing, continuous mode tests, quality control, scoring

    Avi Allalouf

    Director of Scoring and Equating, National Institute for Testing and Evaluation, Jerusalem, Israel

    Dr. Avi Allalouf is the director of Scoring & Equating at National Institute for Testing and Evaluation (NITE). He received his PhD in Psychology from the Hebrew University in Jerusalem (1995). He teaches at the Academic College of Tel-Aviv -Yaffo. Primary areas of research: test adaptation, DIF, test scoring, quality control and testing & society. Dr. Allalouf leads the Exhibition on Testing & Measurement project and served as co editor of the International Journal of Testing (IJT) 

    Tony Gutentag

    PhD student, Department of Psychology, The Hebrew University, Jerusalem, Israel

    Michal Baumer

    Computerized Test Unit, National Institute for Testing and Evaluation, Jerusalem, Israel

  • Module 43: Data Mining for Classification and Regression

    Product not yet rated Contains 4 Component(s)

    This ITEMS module first provides a review of data mining techniques for classification and regression, which should be accessible to a wide audience in education measurement.

    Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then demonstrates using three real-data examples that these methods may lead to an improvement over traditionally used methods such as linear and logistic regression in educational measurement. 

    Keywords: bagging, boosting, classification and regression tree, random forests, data mining

    Sandip Sinharay

    Principal Research Scientist, Educational Testing Service

    Sandip Sinharay is a principal research scientist in the Research and Development division at ETS.  He received his Ph.D degree in statistics from Iowa State University in 2001. He was editor of the Journal of Educational and Behavioral Statistics between 2011 and 2014. Sandip Sinharay has received four awards from the National Council on Measurement in Education: the award for Technical or Scientific Contributions to the Field of Educational Measurement (in 2009 and 2015), the Jason Millman Promising Measurement Scholar Award (2006), and the Alicia Cascallar Award for an Outstanding Paper by an Early Career Scholar (2005). He received the ETS Scientist award in 2008 and the ETS Presidential award twice. He has coedited two published volumes and authored or coauthored more than 75 articles in peer-reviewed statistics and psychometrics journals and edited books.

  • Module 42: Simulation Studies in Psychometrics

    Contains 8 Component(s)

    This ITEMS module provides a comprehensive introduction to the topic of simulation that can be easily understood by measurement specialists at all levels of training and experience.

    Simulation studies are fundamental to psychometric discourse and play a crucial role in operational and academic research. Yet, resources for psychometricians interested in conducting simulations are scarce. This Instructional Topics in Educational Measurement Series (ITEMS) module is meant to address this deficiency by providing a comprehensive introduction to the topic of simulation that can be easily understood by measurement specialists at all levels of training and experience. Specifically, this module describes the vocabulary used in simulations, reviews their applications in recent literature, and recommends specific guidelines for designing simulation studies and presenting results. Additionally, an example (including computer code in R) is given to demonstrate how common aspects of simulation studies can be implemented in practice and to provide a template to help users build their own simulation.

    Keywords: psychometrics, research design, simulation study

    Richard A. Feinberg

    National Board of Medical Examiners, Philadelphia, PA

    Richard Feinberg is a Senior Psychometrician with NBME, where he leads and oversees the data analysis and score reporting activities for large-scale high-stakes licensure and credentialing examinations. He is also an Assistant Professor at the Philadelphia College of Osteopathic Medicine, Philadelphia, PA, where he teaches a course on Research Methods and Statistics.

    His research interests include psychometric applications in the fields of educational and psychological testing.

    He earned a PhD in Research Methodology and Evaluation from the University of Delaware, Newark, DE.

    Jonathan D. Rubright

    National Board of Medical Examiners, Philadelphia, PA

  • Module 40: Item Fit Statistics for Item Response Theory Models

    Product not yet rated Contains 8 Component(s)

    This ITEM module provides an overview of methods used for evaluating the fit of IRT models.

    Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model-data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing this module, the reader will have an understanding of traditional and Bayesian approaches for evaluating model-data fit of IRT models, the relative advantages of each approach, and the software available to implement each method.

    Keywords: item response theory, model-data fit, posterior predictive checks

    Allison J. Ames

    Department of Educational Research Methodology Department, University of North Carolina at Greensboro, NC

    Randall D. Penfield

    Professor, Educational Research Methodology, University of North Carolina at Greensboro, NC

    Dr. Penfield is Dean of the School of Education and a Professor of educational measurement and assessment. His research focuses on issues of fairness in testing, validity of test scores, and the advancement of methods and statistical models used in the field of assessment. In recognition of his scholarly productivity he was awarded the 2005 early career award by the National Council on Measurement in Education, and was named a Fellow of the American Educational Research Association in 2011. In addition, he has served as co-principal investigator or consultant on a numerous federal grants funded by the National Science Foundation and the Department of Education.

  • Module 39: Polytomous Item Response Theory Models: Problems with the Step Metaphor

    Product not yet rated Contains 1 Component(s)

    The Problem With the Step Metaphor for Polytomous Models for Ordinal Assessments

    Penfield’s (2014) “Instructional Module on Polytomous Item Response Theory Models” begins with a review of dichotomous response models. He refers to these as The Building Blocks of Polytomous IRT Models: The Step Function. The mathematics of these models and their interrelationships with the polytomous models is correct. Unfortunately,the step characterization for dichotomous responses, which he uses to explain the two most commonly used classes of plytomous models for ordered categories, is incompatible with the mathematical structure of these models. These two classes of models are referred to in Penfield’s paper as adjacent category models and cumulative models. At best, taken in the dynamic sense of taking a step, the step metaphor leads to a superficial understanding of the models as mere descriptions of the data; at worst it leads to a misunderstanding of the models and how they can be used to assess if the empirical ordering of the categories is consistent with the intended ordering. The purpose of this note is to explain why the step metaphor is incompatible with both models and to summarize the distinct processes for each. It is also shows, with concrete examples, how one of these models can be applied to better understand assessments in ordered categories.

    Keywords: graded response model, item response theory, polytomous items, polytomous Rasch model

    David Andrich

    Chapple Professor, Graduate School of Education, The University of Western Australia, Crawley, Western Australia

  • Module 38: A Simple Equation to Predict a Subscore’s Value

    Product not yet rated Contains 1 Component(s)

    This ITEM module determines if a particular subscore adds enough value to be worth reporting, through the use of a simple linear equation.

    Subscores are often used to indicate test-takers’ relative strengths and weaknesses and so help focus remediation. But a subscore is not worth reporting if it is too unreliable to believe or if it contains no information that is not already contained in the total score. It is possible, through the use of a simple linear equation provided in this note, to determine if a particular subscore adds enough value to be worth reporting.

    Keywords: subscores, reliability, value added, orthogonally, proportional reduction in mean squared error

    Richard A. Feinberg

    National Board of Medical Examiners, Philadelphia, PA

    Richard Feinberg is a Senior Psychometrician with NBME, where he leads and oversees the data analysis and score reporting activities for large-scale high-stakes licensure and credentialing examinations. He is also an Assistant Professor at the Philadelphia College of Osteopathic Medicine, Philadelphia, PA, where he teaches a course on Research Methods and Statistics.

    His research interests include psychometric applications in the fields of educational and psychological testing.

    He earned a PhD in Research Methodology and Evaluation from the University of Delaware, Newark, DE.

    Howard Wainer

    Retired

    Howard Wainer is an American statistician, past principal research scientist at the Educational Testing Service, adjunct professor of statistics at the Wharton School of the University of Pennsylvania, and author, known for his contributions in the fields of statistics, psychometrics, and statistical graphics.

  • Module 37: Improving Subscore Value through Item Removal

    Product not yet rated Contains 1 Component(s)

    This ITEM module shows for a broad range of conditions of item overlap on subscores, that the value of the subscore is always improved through the removal of such items

    Subscores can be of diagnostic value for tests that cover multiple underlying traits. Some items require knowledge or ability that spans more than a single trait. It is thus natural for such items to be included on more than a single subscore. Subscores only have value if they are reliable enough to justify conclusions drawn from them and if they contain information about the examinee that is distinct from what is in the total test score. In this study we show, for a broad range of conditions of item overlap on subscores, that the value of the subscore is always improved through the removal of such items.

    Keywords: empirical Bayes, overlapping items, ReliaVAR plots, simulation, value added

    Richard A. Feinberg

    National Board of Medical Examiners, Philadelphia, PA

    Richard Feinberg is a Senior Psychometrician with NBME, where he leads and oversees the data analysis and score reporting activities for large-scale high-stakes licensure and credentialing examinations. He is also an Assistant Professor at the Philadelphia College of Osteopathic Medicine, Philadelphia, PA, where he teaches a course on Research Methods and Statistics.

    His research interests include psychometric applications in the fields of educational and psychological testing.

    He earned a PhD in Research Methodology and Evaluation from the University of Delaware, Newark, DE.

    Howard Wainer

    Retired

    Howard Wainer is an American statistician, past principal research scientist at the Educational Testing Service, adjunct professor of statistics at the Wharton School of the University of Pennsylvania, and author, known for his contributions in the fields of statistics, psychometrics, and statistical graphics.

  • Module 36: Quantifying Error and Uncertainty Reductions in Scaling Functions

    Product not yet rated Contains 1 Component(s)

    ​This ITEM module describes and extends X-to-Y regression measures that have been proposed for use in the assessment of X-to-Y scaling and equating results

    This module describes and extends X-to-Y regression measures that have been proposed for use in the assessment of X-to-Y scaling and equating results. Measures are developed that are similar to those based on prediction error in regression analyses but that are directly suited to interests in scaling and equating evaluations. The regression and scaling function measures are compared in terms of their uncertainty reductions, error variances, and the contribution of true score and measurement error variances to the total error variances. The measures are also demonstrated as applied to an assessment of scaling results for a math test and a reading test. The results of these analyses illustrate the similarity of the regression and scaling measures for scaling situations when the tests have a correlation of at least .80, and also show the extent to which the measures can be adequate summaries of nonlinear regression and nonlinear scaling functions, and of heteroskedastic errors. After reading this module, readers will have a comprehensive understanding of the purposes, uses, and differences of regression and scaling functions.

    Keywords: scaling, equating, concordance, regression, prediction error, scaling error

    Tim Moses

    Chief Psychometrician and Robert L. Brennan Chair of Psychometric Research at College Board

  • Module 35: Polytomous Item Response Theory Models

    Product not yet rated Contains 1 Component(s)

    This ITEMS module provides an accessible overview of polytomous IRT models.

    A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of polytomous IRT models. The module presents commonly encountered polytomous IRT models, describes their properties, and contrasts their defining principles and assumptions. After completing this module, the reader should have a sound understating of what a polytomous IRT model is, the manner in which the equations of the models are generated from the model’s underlying step functions, how widely used polytomous IRT models differ with respect to their definitional properties, and how to interpret the parameters of polytomous IRT models.

    Keywords: item response theory, polytomous items, partial credit model, graded response model, nominal
    response model

    Randall D. Penfield

    Professor, Educational Research Methodology, University of North Carolina at Greensboro, NC

    Dr. Penfield is Dean of the School of Education and a Professor of educational measurement and assessment. His research focuses on issues of fairness in testing, validity of test scores, and the advancement of methods and statistical models used in the field of assessment. In recognition of his scholarly productivity he was awarded the 2005 early career award by the National Council on Measurement in Education, and was named a Fellow of the American Educational Research Association in 2011. In addition, he has served as co-principal investigator or consultant on a numerous federal grants funded by the National Science Foundation and the Department of Education.