Catalog Advanced Search
Module 25: Multistage TestingContains 1 Component(s)
This ITEM module describes multistage tests, including two-stage and testlet-based tests, and discusses the relative advantages and disadvantages of multistage testing as well as considerations and steps in creating such tests.
Multistage tests are those in which sets of items are administered adaptively and are scored as a unit. These tests have all of the advantages of adaptive testing, with more efficient and precise measurement across the proficiency scale as well as time savings, without many of the disadvantages of an item-level adaptive test. As a seemingly balanced compromise between linear paper-and-pencil and item-level adaptive tests, development and use of multistage tests is increasing. This module describes multistage tests, including two-stage and testlet-based tests, and discusses the relative advantages and disadvantages of multistage testing as well as considerations and steps in creating such tests.
Keywords: adaptive, multistage, testlet
Senior Director, Psychometrics at The College Board
Amy Hendrickson is an adjunct Assistant Professor in the Department of Measurement, Statistics and Evaluation at College Park, and currently works for the College Board. She received her M.S. in Educational Psychology from Iowa State University in 1997 and her Ph.D. in Educational Measurement and Statistics in 2002 from the University of Iowa. Her research interests include test equating and scaling, polytomous item response theory, and computerized adaptive testing.
Module 24: Quality Control for Scoring, Equating, and ReportingContains 1 Component(s)
This ITEM module describes quality control as a formal systematic process designed to ensure that expected quality standards are achieved during scoring, equating, and reporting of test scores.
There is significant potential for error in long production processes that consist of sequentialstages, each of which is heavily dependent on the previous stage, such as the SER (Scoring, Equating, and Reporting) process. Quality control procedures are required in order to monitor this process and to reduce the number of mistakes to a minimum. In the context of this module, quality control is a formal systematic process designed to ensure that expected quality standards are achieved during scoring, equating, and reporting of test scores. The module divides the SER process into 11 steps. For each step, possible mistakes that might occur are listed, followed by examples and quality control procedures for avoiding, detecting, or dealing with these mistakes. Most of the listed quality control procedures are also relevant for Internet-delivered and scored testing. Lessons from other industries are also discussed. The motto of this module is: There is a reason for every mistake. If you can identify the mistake, you can identify the reason it happened and prevent it from recurring.
Keywords: scoring, equating, score reporting, quality control, standards
Director of Scoring and Equating, National Institute for Testing and Evaluation, Jerusalem, Israel
Dr. Avi Allalouf is the director of Scoring & Equating at National Institute for Testing and Evaluation (NITE). He received his PhD in Psychology from the Hebrew University in Jerusalem (1995). He teaches at the Academic College of Tel-Aviv -Yaffo. Primary areas of research: test adaptation, DIF, test scoring, quality control and testing & society. Dr. Allalouf leads the Exhibition on Testing & Measurement project and served as co editor of the International Journal of Testing (IJT)
Module 23: Practice Analysis Questionnaires: Design and AdministrationContains 1 Component(s)
This ITEM module describes procedures for developing practice analysis surveys with emphasis on task inventory questionnaires.
The purpose of a credentialing examination is to assure the public that individuals who work in an occupation or profession have met certain standards. To be consistent with this purpose, credentialing examinations must be job related, and this requirement is typically met by developing test plans based on an empirical job or practice analysis. The purpose of this module is to describe procedures for developing practice analysis surveys, with emphasis on task inventory questionnaires. Editorial guidelines for writing task statements are presented, followed by a discussion of issues related to the development of scales for rating tasks and job responsibilities. The module also offers guidelines for designing and formatting both mail-out and Internet-based questionnaires. It concludes with a brief overview of the types of data analyses useful for practice analysis questionnaires.
Keywords: licensure, certification, job analysis
Mark R. Raymond
Module 22: Standard Setting: Contemporary MethodsContains 4 Component(s)
This ITEM module describes some common standard-setting procedures used to derive performance levels for achievement tests in education, licensure, and certification.
This module describes some common standard-setting procedures used to derive performance levels for achievement tests in education, licensure, and certification. Upon completing the module, readers will be able to: describe what standard setting is; understand why standard setting is necessary; recognize some of the purposes of standard setting; calculate cut scores using various methods; and identify elements to be considered when evaluating standind-setting procedures. A self-test and annotated bibliography are provided at the end of the module. Teaching aids to accompany the module are available through lYCMB.
Keywords: cut scores, performance standards, standard setting
Gregory J. Cizek
Michael B. Bunch
Module 21: Multidimensional Item Response TheoryContains 1 Component(s)
This ITEM module illustrates how test practitioners and researchers can apply multidimensional item response theory (MIRT) to understand better what their tests are measuring, how accurately the different composites of ability are being assessed, and how this information can be cycled back into the test development process.
Many educational and psychological tests are inherently multidimensional meaning these tests measure two or more dimensions or constructs. The purpose of this module is to illustrate how test practitioners and researchers can apply multidimensional item response theory (MIRT) to understand better what their tests are measuring, how accurately the different composites of ability are being assessed, and how this information can be cycled back into the test development process. Procedures for conducting MIRT analyses-from obtaining evidence that the test is multidimensional, to modeling the test as multidimensional, to illustrating the properties of multidimensional items graphically are described from both a theoretical and a substantive basis. This module also illustrates these procedures using data from a ninth-grade mathematics achievement test. It concludes with a discussion of future directions in MIRT research.
Keywords: dimensionality, multidimensional item response theory, test development and analysis
Module 20: Rule-space MethodologyContains 1 Component(s)
This ITEM module examined the logic of Tatsuoka's rule-space model, as it applies to test development and analysis.
K. Tatsuoka's rule-space model is a statistical method for classifying examinees' test item responses into a set of attribute-mastery patterns associated with different cognitive skills. A fundamental assumption in the model resides in the idea that test items may be described by specific cognitive skills called attributes, which can include distinct procedures, skills, or processes possessed by an examinee. The rule-space model junctions by collecting and ordering information about the attributes required to solve test items and then statistically classifying examinees' test item responses into a set of attribute-mastery patterns, each one associated with a unique cognitive blueprint. The logic of Tatsuoka's rule-space model, as it applies to test development and analysis, is examined-in this module. Controversies and unresolved issues are also presented and discussed.
Keywords: Rule-space methodology, Q-matrix, attribute, cognitive diagnosis
Module 19: Differential Item FunctioningContains 1 Component(s)
This ITEM module prepares the reader to use statistical procedures to detect differentially functioning test items.
This module is intended to prepare the reader to use statistical procedures to detect differentially functioning test items. To provide background, differential item functioning (DIF) is distinguished from item and test bias, and the importance of DIF screening within the overall test development process is discussed. The Mantel-Haenszel statistic, logistic regression, SIBTES'r, the Standardization procedure, and various IRT-based approaches are presented. For each of these procedures, the theoretical framework is presented, the relative strengths and weaknesses of the method are highlighted, and guidance is provided for interpretation of the resulting statistical indices. Numerous technical decisions are required in order for the practitioner to appropriately implement these procedures. These decisions are discussed in some detail, as are the policy decisions necessary to implement an operational DIF detection program. The module also includes an annotated bibliography and a self-test.
Keywords: differential item functioning, Mantel-Haenszel, logistic regression, SIBTEST, standardization procedure, item response theory
Module 18: Setting Passing ScoresContains 1 Component(s)
This ITEM module describes standard setting for achievement measures used in education, licensure, and certification.
This module describes standard setting for achievement measures used in education, licensure, and certification. On completing the module, readers will be able to: describe what standard setting is, why it is necessary, what some of the purposes of standard setting are, and what professional guidelines apply to the design and conduct of a standard-setting procedure; differentiate among different models of standard setting; calculate a cutting score using various methods; identify appropriate sources of validity evidence and threats to the validity of a standard-setting procedure; and list some elements to be considered when evaluating the success of a standard-setting procedure. A self-test and annotated bibliography are provided at the end of the module. Teaching aids to accompany the module are available through NCME.
Keywords: standard setting, cut score, Bookmark method
Module 17: Item Bank DevelopmentContains 1 Component(s)
This ITEM module is designed to help those who develop assessments of any kind to understand the process of item banking, to analyze their needs, and to find or develop programs and materials that meet those needs.
Use of item banking technology can provide much relief for the chores associated with preparing assessments; it may also enhance the quality of the items and improve the quality of the assessments. Item banking programs provide for item entry and storage, item retrieval and test creation, and maintenance of the item history. Some programs also provide companion programs for scoring, analysis, and reporting. There are many item banking programs that may be purchased or leased, and there are banks of items available for purchase. This module is designed to help those who develop assessments of any kind to understand the process of item banking, to analyze their needs, and to find or develop programs and materials that meet those needs. It should be useful to teachers at all levels of education and to school-district test directors who are responsible for developing district-wide tests. It may also provide some useful information for those who are responsible for large-scale assessment programs of all types.
Keywords: item bank, assessment development, test development
Module 16: Comparison of Classical Test Theory and Item Response TheoryContains 1 Component(s)
This ITEM module provides a nontechnical comparison of classical test theory and item response theory.
There are two currently popular statistical frameworks for addressing measurement problems such as test development, test-score equating, and the identification of biased test items: classical test theory and item response theory (lRT). In this module, both theories and models associated with each are described and compared, and the ways in which test development generally proceeds within each framework are discussed. The intent of this module is to provide a nontechnical comparison of classical test theory and item response theory.
Keywords: classical test theory, item response theory, IRT, statistical framework