Catalog Advanced Search

Search by Categories
Search by Format
Search by Date Range
Products are filtered by different dates, depending on the combination of live and on-demand components that they contain, and on whether any live components are over or not.
Start
End
Search by Keyword
Sort By

54 Results

  • Module 35: Polytomous Item Response Theory Models

    Product not yet rated Contains 1 Component(s)

    This ITEMS module provides an accessible overview of polytomous IRT models.

    A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of polytomous IRT models. The module presents commonly encountered polytomous IRT models, describes their properties, and contrasts their defining principles and assumptions. After completing this module, the reader should have a sound understating of what a polytomous IRT model is, the manner in which the equations of the models are generated from the model’s underlying step functions, how widely used polytomous IRT models differ with respect to their definitional properties, and how to interpret the parameters of polytomous IRT models.

    Keywords: item response theory, polytomous items, partial credit model, graded response model, nominal
    response model

    Randall D. Penfield

    Professor, Educational Research Methodology, University of North Carolina at Greensboro, NC

    Dr. Penfield is Dean of the School of Education and a Professor of educational measurement and assessment. His research focuses on issues of fairness in testing, validity of test scores, and the advancement of methods and statistical models used in the field of assessment. In recognition of his scholarly productivity he was awarded the 2005 early career award by the National Council on Measurement in Education, and was named a Fellow of the American Educational Research Association in 2011. In addition, he has served as co-principal investigator or consultant on a numerous federal grants funded by the National Science Foundation and the Department of Education.

  • Module 34: Automated Item Generation

    Contains 1 Component(s)

    This ITEM module describes and illustrates a template-based method for generating test items.

    Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content-specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer technology. The purpose of this module is to describe and illustrate a template-based method for generating test items. We outline a three-step approach where test development specialists first create an item model. An item model is like a mould or rendering that highlights the features in an assessment task that must be manipulated to produce new items. Next, the content used for item generation is identified and structured. Finally, features in the item model are systematically manipulated with computer-based algorithms to generate new items. Using this template-based approach, hundreds or even thousands of new items can be generated with a single item model.

    Keywords: automatic item generation, item model, item development, test development, technology and testing

    Mark J.Gierl

    Professor, Centre for Research in Applied Measurement and Evaluation, Faculty of Education, University of Alberta, Edmonton, AB, Canada

    Dr. Gierl's research interests include Educational and psychological measurement, focusing on assessment engineering, including cognitive modeling, automatic item generation, automated test assembly, and automatic essay scoring; cognitive diagnostic assessment; differential item and bundle functioning; unidimensional and multidimensional item response theory; psychometric methods for evaluating test translation and adaptation.

    Hollis Lai

    Centre for Research in Applied Measurement and Evaluation, Faculty of Education, University of Alberta, Edmonton, AB, Canada

    Dr. Lai's research interests include Technology in Assessment, Automatic Item Generation, Computer Adaptive Testing, Assessment in Medical Education, Cognitive Diagnostic Assessment, Learning Science, Applications of Artificial Intelligence in Education

  • Module 33: Population Invariance in Linking and Equating

    Product not yet rated Contains 1 Component(s)

    This ITEM module provides a comprehensive overview of population invariance in linking and equating and the relevant methodology developed for evaluating violations of invariance.

    A goal for any linking or equating of two or more tests is that the linking function be invariant to the population used in conducting the linking or equating. Violations of population invariance in linking and equating jeopardize the fairness and validity of test scores, and pose particular problems for test-based accountability programs that require schools, districts, and states to report annual progress on academic indicators disaggregated by demographic group membership. This instructional module provides a comprehensive overview of population invariance in linking and equating and the relevant methodology developed for evaluating violations of invariance. A numeric example is used to illustrate the comparative properties of available methods, and important considerations for evaluating population invariance in linking and equating are presented.

    Keywords: equating, invariance, linking

    Anne C. Huggins

    Research and Evaluation Methodology, University of Florida

    Randall D. Penfield

    Professor, Educational Research Methodology, University of North Carolina at Greensboro, NC

    Dr. Penfield is Dean of the School of Education and a Professor of educational measurement and assessment. His research focuses on issues of fairness in testing, validity of test scores, and the advancement of methods and statistical models used in the field of assessment. In recognition of his scholarly productivity he was awarded the 2005 early career award by the National Council on Measurement in Education, and was named a Fellow of the American Educational Research Association in 2011. In addition, he has served as co-principal investigator or consultant on a numerous federal grants funded by the National Science Foundation and the Department of Education.

  • Module 32: Subscores

    Product not yet rated Contains 3 Component(s)

    This ITEMS module provides an introduction to subscores.

    The purpose of this ITEMS module is to provide an introduction to subscores. First, examples of subscores from an operational test are provided. Then, a review of methods that can be used to examine if subscores have adequate psychometric quality is provided. It is demonstrated, using results from operational and simulated data, that subscores have to be based on a sufficient number of items and have to be sufficiently distinct from each other to have adequate psychometric quality. It is also demonstrated that several operationally reported subscores do not have adequate psychometric quality. Recommendations are made for those interested in reporting subscores for educational tests.

    Keywords: augmented subscore, classical test theory, diagnostic score, item response theory, mean squared error

    Sandip Sinharay

    Principal Research Scientist, Educational Testing Service

    Sandip Sinharay is a principal research scientist in the Research and Development division at ETS.  He received his Ph.D degree in statistics from Iowa State University in 2001. He was editor of the Journal of Educational and Behavioral Statistics between 2011 and 2014. Sandip Sinharay has received four awards from the National Council on Measurement in Education: the award for Technical or Scientific Contributions to the Field of Educational Measurement (in 2009 and 2015), the Jason Millman Promising Measurement Scholar Award (2006), and the Alicia Cascallar Award for an Outstanding Paper by an Early Career Scholar (2005). He received the ETS Scientist award in 2008 and the ETS Presidential award twice. He has coedited two published volumes and authored or coauthored more than 75 articles in peer-reviewed statistics and psychometrics journals and edited books.

    Gautam Puhan

    Principal Psychometrician, Educational Testing Service

    Gautam Puhan is a Principal Psychometrician in the SAT team at Educational Testing Service. Gautam received a Ph.D. in educational psychology from the University of Alberta, Canada, and an M.A. in psychology from the University of Delhi, India. His research interests include test score equating, diagnostic subscores, test reliability, differential item functioning (DIF) and test fairness.

    Shelby J. Haberman

  • Module 31: Scaling

    Product not yet rated Contains 1 Component(s)

    This ITEM module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of information into a score scale, and introduces vertical scaling and its related designs and methodologies as a special type of scaling.

    Scaling is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees. Scaling typically is conducted to aid users in interpreting test results. This module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of information into a score scale, and introduces vertical scaling and its related designs and methodologies as a special type of scaling. After completion of this module, the reader should be able to understand the relationship between various types of raw scores, understand the relationship between raw scores and scale scores, construct a scale with desired properties, evaluate an existing score scale, understand how content and standards information are built into a scale, and understand how vertical scales are developed and used in practice.

    Keywords: raw score, scale score, scaling, vertical scaling

    Ye Tong

    Vice President, Pearson, Boston

    Michael J. Kolen

    Professor of Educational Measurement, University of Iowa, Iowa City, IA

    Michael J. Kolen is a Professor of Educational Measurement at the University of Iowa. Dr. Kolen received his doctorate from the University of Iowa in 1979, his MA degree from the University of Arizona in 1975, and his BS degree from the University of Iowa in 1973. He served on the faculty at Hofstra University from 1979-1981, and he worked at American College Testing (ACT) from 1981-1997, including being Director of the Measurement Research at ACT from 1990-1997.

    Dr. Kolen co-authored the book Test Equating: Methods and Practices, published by Springer-Verlag. He has published numerous articles and book chapters on various topics in educational measurement and statistics, including test equating and scaling.

    Dr. Kolen has been President of the National Council on Measurement in Education (NCME), and is past editor of the Journal of Educational Measurement. He is a Fellow of Division 5 of the American Psychological Association, a Fellow of the American Educational Research Association, and member of various other professional organizations. Dr. Kolen received the 1997 NCME Award for Outstanding Technical Contribution to the Field of Educational Measurement and the 2008 NCME Award for Career Contributions of Educational Measurement.

  • Module 30: Booklet Designs in Large-Scale Assessments

    Product not yet rated Contains 1 Component(s)

    This ITEM module describes the construction of booklet designs as the task of allocating items to booklets under context-specific constraints.

    In most large-scale assessments of student achievement, several broad content domains are tested. Because more items are needed to cover the content domains than can be presented in the limited testing time to each individual student, multiple test forms or booklets are utilized to distribute the items to the students. The construction of an appropriate booklet design is a complex and challenging endeavor that has far-reaching implications for data calibration and score reporting.This module describes the construction of booklet designs as the task of allocating items to booklets under context-specific constraints. Several types of experimental designs are presented that can be used as booklet designs. The theoretical properties and construction principles for each type of design are discussed and illustrated with examples. Finally, the evaluation of booklet designs is described and future directions for researching, teaching, and reporting on booklet designs for large-scale assessments of student achievement are identified.

    Keywords: booklet design, experimental design, item response theory, large-scale assessments, measurement

    Andre A. Rupp

    Research Director

    Dr. Rupp is Research Director of Integrated Scoring Research (iSCORE) group in the Psychometric Analysis and Research area at the Educational Testing Service (ETS) in Princeton, New Jersey. Dr. Rupp currently leads a research team whose work focuses on evidentiary reasoning for digitally-delivered performance-based assessments, specifically evaluations of human scoring processes and automated scoring systems for written, spoken, and multimodal performances. He has published widely on a variety of educational measurement topics, including applications of evidence-centered design, cognitive diagnostic measurement, and automated scoring, often with a didactic and conceptual synthesis approach. Notably larger volumes include a co-written book entitled Diagnostic Measurement: Theory, Methods, and Applications (2010), which won the Significant Contribution to Educational Measurement and Research Methodology award from AERA Division D in 2012 (with Jonathan Templin and Robert Henson) and the co-edited Handbook of Cognition and Assessment: Frameworks, Methodologies, and Applications (2016), which won the Outstanding Contribution to Practice award from the associated AERA SIG (with Jacqueline P. Leighton). He is currently working on a co-edited Handbook of Automated Scoring (with Duanli Yan and Peter Foltz). He is a reviewer for many well-known measurement journals and is the lead editor / developer of the ITEMS portal for NCME (2016-2019). He has extensive teaching experience from his prior positions in academia, most recently as an associate professor with tenure in the Quantitative Methodology Program in the Department of Human Development and Quantitative Methodology at the University of Maryland in College Park, Maryland.

    Andreas Frey

    Full Professor for Educational Research Methods, Friedrich Schiller University Jena, Germany

    Dr. Frey's research interests include:

    • Empirical educational research
    • Innovativ pedagogical-psychological diagnostics
    • Computerized Adaptive Testing (CAT)
    • Item Response Theory

    Johannes Hartig

    Research Professor, Educational Quality and Evaluation, German Institute for International Educational Research (DIPF)

    Johannes Hartig currently works at the Educational Quality and Evaluation, German Institute for International Educational Research (DIPF). Johannes does research in Educational Psychology, Psychometrics and Differential Psychology.

  • Module 29: Differential Step Functioning for Polytomous Items

    Product not yet rated Contains 1 Component(s)

    This ITEM module presents a didactic overview of the DSF framework and provides specific guidance and recommendations on how DSF can be used to enhance the examination of DIF in polytomous items.

    Traditional methods for examining differential item functioning (DIF) in polytomously scored test items yield a single item-level index of DIF and thus provide no information concerning which score levels are implicated in the DIF effect. To address this limitation of DIF methodology, the framework of differential step functioning (DSF) has recently been proposed, whereby measurement invariance is examined within each step underlying the polytomous response variable. The examination of DSF can provide valuable information concerning the nature of the DIF effect (i.e., is the DIF an item-level effect or an effect isolated to specific score levels), the location of the DIF effect (i.e., precisely which score levels are manifesting the DIF effect), and the potential causes of a DIF effect (i.e., what properties of the item stem or task are potentially biasing). This article presents a didactic overview of the DSF framework and provides specific guidance and recommendations on how DSF can be used to enhance the examination of DIF in polytomous items. An example with real testing data is presented to illustrate the comprehensive information provided by a DSF analysis.

    Keywords: differential item functioning, polytomous items, graded response model

    Randall D. Penfield

    Professor, Educational Research Methodology, University of North Carolina at Greensboro, NC

    Dr. Penfield is Dean of the School of Education and a Professor of educational measurement and assessment. His research focuses on issues of fairness in testing, validity of test scores, and the advancement of methods and statistical models used in the field of assessment. In recognition of his scholarly productivity he was awarded the 2005 early career award by the National Council on Measurement in Education, and was named a Fellow of the American Educational Research Association in 2011. In addition, he has served as co-principal investigator or consultant on a numerous federal grants funded by the National Science Foundation and the Department of Education.

    Karina Gattamorta

    Research Associate Professor at the School of Nursing and Health Studies, University of Miami

    Karina Gattamorta is a Research Associate Professor at the School of Nursing and Health Studies at the University of Miami. She earned her PhD in Educational Research, Measurement, and Evaluation in 2009 from The School of Education at UM and an EdS in School Psychology in 2005 from Florida International University. In her current role, she teaches courses in introductory and intermediate statistics, measurement, and research methods in both graduate and undergraduate programs. In 2013 she was awarded a Diversity Supplement that allowed her to expand on her interests tackling health disparities among Hispanic adolescents, and in particular, the interconnectedness of family functioning, mental health, and substance abuse. More recently, she began pursuing research interests examining the relationships between family functioning, mental health, substance abuse, and risky sexual behaviors in Hispanic lesbian, gay, bisexual, and transgender (LGBT) adolescents. Her current research examines the role of families and the coming out experiences of Hispanic sexual minorities. Her research aims to understand and ultimately help reduce health disparities in mental health, substance abuse, and HIV risk among sexual minorities.

    Ruth A. Childs

    Ontario Research Chair, Ontario Institute for Studies in Education of the University of Toronto

    Ruth Childs conducts research on the design and equity of large-scale assessments, admissions processes, and other evaluation systems. Her most recent large research projects investigated how elementary students deal with uncertainty when answering multiple-choice questions and what Ontario's universities are doing to improve access for underrepresented groups. 

  • Module 28: Raju’s Differential Functioning of Items and Tests

    Product not yet rated Contains 1 Component(s)

    This ITEM module explains DFIT and show how this methodology can be utilized in a variety of DIF applications.

    Nambury S. Raju (1937–2005) developed two model-based indices for differential item functioning(DIF) during his prolific career in psychometrics. Both methods, Raju’s area measures (Raju, 1988) and Raju’s DFIT (Raju, van der Linden, & Fleer, 1995), are based on quantifying the gap between item characteristic functions (ICFs). This approach provides an intuitive and flexible methodologyfor assessing DIF. The purpose of this tutorial is to explain DFIT and show how this methodology can be utilized in a variety of DIF applications.

    Keywords: differential item functioning (DIF), differential test functioning (DTF), measurement equivalence, item response theory (IRT)

    Chris Oshima

    Professor, Georgia State University

    Chris Oshima is a full professor in the Department of Educational Policy Studies at Georgia State University. She graduated from the University of Florida in 1989 with a Ph.D. in foundations of education specializing in research and evaluation design, testing and measurement and data analysis methods. During her tenure at Georgia State University she has taught numerous courses in quantitative methods and measurement, including Quantitative Methods and Analysis I, II, III, Educational Measurement, Introduction to Item Response Theory and Advanced Item Response Theory. Her primary research interests are in educational measurement and statistics, especially in the area of item response theory (IRT) and differential item functioning (DIF).

    Scott Morris

    Professor of Psychology, Lewis College of Human Sciences, Illinois Institute of Technology

    Scott Morris received a Ph.D. in Industrial-Organizational Psychology in 1994 from the University of Akron. He earned a B.A. in Psychology from the University of Northern Iowa in 1987. He is a fellow the Society for Industrial and Organizational Psychology, and serves on the editorial boards of Journal of Applied PsychologyOrganizational Research MethodsJournal of Business and Psychology, and International Journal of Testing.

    Morris teaches courses in personnel selection, covering topics such as job analysis, test development and validation, and legal issues. He also teaches courses in basic and multivariate statistics and multilevel data analysis.

    Morris is actively involved in research on applied statistics and personnel selection. Much of his work involves the development of statistical methods. This includes methods of meta-analysis for program evaluation research, statistics for assessing adverse impact in employee selection systems, and applications of advanced psychometric models (e.g., computer adaptive testing). He also conducts research exploring issues of validity and discrimination in employee selection systems.

  • Module 27: Markov Chain Monte Carlo Methods for Item Response Theory Models

    Product not yet rated Contains 1 Component(s)

    This ITEMS module provides an introduction to Markov chain Monte Carlo (MCMC) estimation for item response models.

    The purpose of this ITEMS module is to provide an introduction to Markov chain Monte Carlo (MCMC) estimation for item response models. A brief description of Bayesian inference is followed by an overview of the various facets of MCMC algorithms, including discussion of prior specification, sampling procedures, and methods for evaluating chain convergence. Model comparison and fit issues in the context of MCMC are also considered. Finally, an illustration is provided in which a two-parameter logistic (2PL) model is fit to item response data from a university mathematics placement test through MCMC using the WINBUGS 1.4 software. While MCMC procedures are often complex and can be easily misused, it is suggested that they offer an attractive methodology for experimentation with new and potentially complex IRT models, as are frequently needed in real-world applications in educational measurement.

    Keywords: Bayesian estimation, goodness-of-fit, item response theory models, Markov chain Monte Carlo, model comparison

    Jee-Seon Kim

    Professor of Department of Educational Psychology, University of Wisconsin, Madison

    Jee-Seon Kim is Professor in the Department of Educational Psychology and is also affiliated with the Interdisciplinary Training Program in the Education Sciences and the Center for Health Enhancement Systems Studies at the University of Wisconsin-Madison. Her research focuses on developing and applying statistical models to address practical issues in the behavioral sciences. Her research interests include multilevel modeling, imputation of missing data, longitudinal data analysis, latent variable modeling, and propensity score analysis. She has been a fellow of the Spencer Foundation, consulting editor for Psychological Methods, and book review editor for Psychometrika.

    Daniel M. Bolt

    Professor of Department of Educational Psychology, University of Wisconsin, Madison

    Dr. Bolt joined the department in the spring of 1999, coming from the Laboratory for Educational and Psychological Measurement at the University of Illinois. In addition to his own research, he collaborates on various projects related to the development and statistical analysis of educational and psychological tests. Dr. Bolt teaches courses in test theory, factor analysis, and hierarchical linear modeling.

  • Module 26: Structural Equation Modeling

    Product not yet rated Contains 1 Component(s)

    This module focuses on foundational issues to inform readers of the potentials as well as the limitations of SEM.

    Structural equation modeling (SEM) is a versatile statistical modeling tool. Its estimation techniques, modeling capacities, and breadth of applications are expanding rapidly. This module introduces some common terminologies. General steps of SEM are discussed along with important considerations in each step. Simple examples are provided to illustrate some of the ideas for beginners. In addition, several popular specialized SEM software programs are briefly discussed with regard to their features and availability. The intent of this module is to focus on foundational issues to inform readers of the potentials as well as the limitations of SEM. Interested readers are encouraged to consult additional references for advanced model types and more application examples.

    Keywords: structural equation modeling, path model, measurement model

    Pui-Wa Lei

    Professor of Education, Pennsylvania State University

    Dr. Lei’s teaching and research interests are in the areas of advanced statistical methods and measurement theories. Her research has focused on applications of item response theory (IRT) and methodological issues of multivariate statistical analyses. Currently, she is studying issues related to applications of structural equation modeling (SEM), multilevel modeling, and IRT modeling.

    Qiong Wu