Deck 5: Technical Adequacy: Reliability and Validity

ملء الشاشة (f)
exit full mode
سؤال
To evaluate the content validity of a portfolio assessment, it should be determined that the student's work that is included in the portfolio represents

A) the best work that the student has done in the domain.
B) all important dimensions within the domain.
C) all of the student's work during the previous year.
D) just those areas where the student continues to have difficulty.
استخدم زر المسافة أو
up arrow
down arrow
لقلب البطاقة.
سؤال
Which of the following statements concerning test validity is most accurate?

A) A test cannot be valid unless it is reliable.
B) A test cannot be reliable unless it is valid.
C) A test cannot be standardized unless it is valid.
D) A test cannot be reliable unless it is standardized.
سؤال
The extent to which a person's score on a test is related to performance on a criterion measure is best described as evidence based on

A) test content.
B) test structure.
C) relations to other variables.
D) response processes.
سؤال
When different groups of test takers consistently experience disparate levels of success on specific items, there is a problem in

A) differential item effectiveness.
B) group selection.
C) administration errors.
D) reliability.
سؤال
If a test measures something consistently but does not measure what it was designed to measure, then the test is

A) reliable but not valid.
B) reliable but not standardized.
C) standardized but not reliable.
D) valid but not reliable.
سؤال
The higher the reliability coefficient, the lower the

A) coefficient of regression.
B) standard error of measurement.
C) validity.
D) standard deviation.
سؤال
In the context of assessment, "enabling behaviors" are those behaviors that

A) help the tester attract the subject's attention.
B) are extraneous to the requirements of the test situation.
C) focus the assessment on qualitative analyses of performance.
D) are required by the assessment to demonstrate the target knowledge.
سؤال
Because a person's true abilities can change between two administrations of a test, it is generally true that

A) test-retest procedures cannot produce good reliability estimates.
B) the shorter the time between the two administrations, the higher the reliability.
C) the length of time between two administrations has little effect on reliability.
D) a test developer needs to calculate coefficient alpha to estimate stability.
سؤال
An individual reported a reliability coefficient of 1.25 for an intelligence test. It was obtained by correlating the results of a given group on Form A with the group's results on Form B. This coefficient indicates that

A) the test is unusually reliable.
B) the test is unusually valid.
C) there are no errors of measurement.
D) a mistake was made in computing the coefficient.
سؤال
Coefficient alpha is most linked to ​

A) test-retest reliability.
B) percentage of agreement.
C) stability.
D) internal consistency.
سؤال
To determine the stability of a test, the recommended interval between administrations of the test is __________.

A) 2 days
B) 2 weeks
C) 2 months
D) 2 years
سؤال
A statistic that enables an examiner to establish confidence for the true scores of examinees is the

A) Kuder-Richardson predictive index.
B) validity coefficient.
C) standard error of measurement.
D) mode.
سؤال
Kiana is going to evaluate the concurrent criterion-related validity of a self-report assessment of classroom problem behaviors. The most appropriate criterion measure would be

A) a test of intelligence.
B) classroom observation.
C) grades in math.
D) court records.
سؤال
T-scores for student X on an achievement test battery standardized on the same population are spelling 35, math 62, social studies 50, and English grammar 52. Each test has a SEM of 2; the tests are not intercorrelated. We conclude that

A) X is strongest in spelling.
B) X is strongest in math.
C) there are no substantial differences in X's achievements in these four areas.
D) it is not possible to compare X's performance on these subtests.
سؤال
A stability coefficient is used for measuring the reliability of

A) a test administered at two different times.
B) the first 50 items, compared with the last 50 items in a 100-item test.
C) standard error of measurement.
D) alternate forms of a test.
سؤال
The results of an achievement test are considered to be invalid if

A) reliability is less than 0.95.
B) the teacher has not taught the content being tested.
C) the student did not listen when the subject matter was taught.
D) validity is less than 0.95.
سؤال
The reliability of a test refers to its relative

A) validity.
B) power.
C) consistency.
D) inappropriateness.
سؤال
Method of measurement, enabling behaviors, and administrative errors are all considered to be

A) types of reliability.
B) signs of validity.
C) sources of systematic bias.
D) test development problems.
سؤال
The means for both Test A and Test B are 50. A 50% confidence interval for a score at the mean is 44-55 for Test A and 42-58 for Test B. Which of the following statements is true?

A) Test A is more reliable than Test B.
B) Test B is more reliable than Test A.
C) Test A has a larger SEM than Test B.
D) Test B has a larger SEM than Test A.
سؤال
The Dairy County School District appropriately uses a test that has a reliability of 0.89 to

A) place children in special education if they earn a score below an established criterion.
B) move children to another school building to receive services for gifted children if they score above a certain point.
C) decide whether students should be placed in the Rainbow reading group or the Rainstorm reading group.
D) decide to conduct further assessment procedures.
سؤال
The absence of __________ required for performance on a test invalidates the rest results. ​
سؤال
For individual test data, where a test score is used to make a tracking or placement decision for an individual student, the recommended level of required reliability is __________.
سؤال
For group test data that are used for administrative purposes and reported only by group, the recommended level of required reliability is __________.
سؤال
If one wants to generalize to different times, one should examine the test's __________.
سؤال
Failure to administer a test according to standardized procedures is considered

A) appropriate if the subject is young.
B) a form of rapport building that may be necessary.
C) a source of systematic bias.
D) a random error that varies from one subject to the next.
سؤال
A test has a norm sample that is not representative of the population. Inferences made on the basis of a student's performance on this test are

A) likely to indicate lower performance than the true score.
B) invalid.
C) unreliable.
D) considered to be reasonable for qualitative comparisons.
سؤال
If one wants to generalize to different item samples, one should examine the test's __________. ​
سؤال
Both unreliability (unsystematic error) and systematic error (bias) threaten __________.
سؤال
The most likely explanation for items having __________ for different groups of people is differential exposure to test content.
سؤال
The validity of a particular test can never exceed the __________ of that test. ​
سؤال
When we test, we are interested in __________ what we see today under one set of conditions to other occasions.
سؤال
An estimate of the likelihood that a person's true score may be found within a range of scores is provided by the __________.
سؤال
The completeness of the item sample is one of the factors to consider in determining __________ validity.
سؤال
For individual test data, where a test score is used to make a screening decision, the recommended level of required reliability is __________.
سؤال
In order for evidence of high concurrent validity to be meaningful, the criterion measures must be__________.
سؤال
A reliability coefficient of 1.00 indicates __________ reliability. ​
سؤال
Criteria for how high a test's reliability must be are determined in part by the specific __________ of assessment.
سؤال
Validity evidence based on _________ ___________ reflects the extent to which a test's items represent the domain or universe to be measured.
سؤال
A method of estimating the reliability of a test that does not have two forms is to calculate the __________.
سؤال
A test with a reliability coefficient of .97 has relatively little __________.
سؤال
The test manual for the Culture-Fair Intelligence Test reports correlations with the Stanford-Binet and the Goodenough-Harris tests. What type of validity is the author trying to demonstrate?
سؤال
Dr. Qubert has developed a test for which there is not an adequate criterion measure or construct with which to evaluate validity. She therefore decides to present complete content validity data. What three factors must she consider when determining content validity?
سؤال
To the extent that a norm sample is systematically unrepresentative, the inferences based on such scores are incorrect and __________.
سؤال
Sixty percent of a test's variance is caused by the variance of true scores, whereas 40% of the variance is caused by error. What is the test's reliability?
سؤال
Test results would be of little value if we were unable to generalize what was observed in one situation to other situations. Identify and discuss three types of generalizations that can be made from reliable test results.
سؤال
Joseph was tested on an instrument for which the SEM was relatively high. How sure can we be of Joseph's score?
سؤال
Explain in your own words the relationship between reliability and validity.
سؤال
Validity evidence based on the consequences of testing is a concept adopted by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in education. However, it has been widely accepted in education. Discuss the reason evidence based on the consequences of testing has not been accepted in education. ​
سؤال
Compare and contrast the two major approaches to estimating the extent to which we can generalize from different samples of items.
سؤال
Unless a test is administered according to the __________ the results are invalid.
سؤال
Annette was tested on an instrument for which the SEM was quite small. How sure can we be of Annette's score?
فتح الحزمة
قم بالتسجيل لفتح البطاقات في هذه المجموعة!
Unlock Deck
Unlock Deck
1/51
auto play flashcards
العب
simple tutorial
ملء الشاشة (f)
exit full mode
Deck 5: Technical Adequacy: Reliability and Validity
1
To evaluate the content validity of a portfolio assessment, it should be determined that the student's work that is included in the portfolio represents

A) the best work that the student has done in the domain.
B) all important dimensions within the domain.
C) all of the student's work during the previous year.
D) just those areas where the student continues to have difficulty.
B
2
Which of the following statements concerning test validity is most accurate?

A) A test cannot be valid unless it is reliable.
B) A test cannot be reliable unless it is valid.
C) A test cannot be standardized unless it is valid.
D) A test cannot be reliable unless it is standardized.
A
3
The extent to which a person's score on a test is related to performance on a criterion measure is best described as evidence based on

A) test content.
B) test structure.
C) relations to other variables.
D) response processes.
C
4
When different groups of test takers consistently experience disparate levels of success on specific items, there is a problem in

A) differential item effectiveness.
B) group selection.
C) administration errors.
D) reliability.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
5
If a test measures something consistently but does not measure what it was designed to measure, then the test is

A) reliable but not valid.
B) reliable but not standardized.
C) standardized but not reliable.
D) valid but not reliable.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
6
The higher the reliability coefficient, the lower the

A) coefficient of regression.
B) standard error of measurement.
C) validity.
D) standard deviation.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
7
In the context of assessment, "enabling behaviors" are those behaviors that

A) help the tester attract the subject's attention.
B) are extraneous to the requirements of the test situation.
C) focus the assessment on qualitative analyses of performance.
D) are required by the assessment to demonstrate the target knowledge.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
8
Because a person's true abilities can change between two administrations of a test, it is generally true that

A) test-retest procedures cannot produce good reliability estimates.
B) the shorter the time between the two administrations, the higher the reliability.
C) the length of time between two administrations has little effect on reliability.
D) a test developer needs to calculate coefficient alpha to estimate stability.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
9
An individual reported a reliability coefficient of 1.25 for an intelligence test. It was obtained by correlating the results of a given group on Form A with the group's results on Form B. This coefficient indicates that

A) the test is unusually reliable.
B) the test is unusually valid.
C) there are no errors of measurement.
D) a mistake was made in computing the coefficient.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
10
Coefficient alpha is most linked to ​

A) test-retest reliability.
B) percentage of agreement.
C) stability.
D) internal consistency.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
11
To determine the stability of a test, the recommended interval between administrations of the test is __________.

A) 2 days
B) 2 weeks
C) 2 months
D) 2 years
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
12
A statistic that enables an examiner to establish confidence for the true scores of examinees is the

A) Kuder-Richardson predictive index.
B) validity coefficient.
C) standard error of measurement.
D) mode.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
13
Kiana is going to evaluate the concurrent criterion-related validity of a self-report assessment of classroom problem behaviors. The most appropriate criterion measure would be

A) a test of intelligence.
B) classroom observation.
C) grades in math.
D) court records.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
14
T-scores for student X on an achievement test battery standardized on the same population are spelling 35, math 62, social studies 50, and English grammar 52. Each test has a SEM of 2; the tests are not intercorrelated. We conclude that

A) X is strongest in spelling.
B) X is strongest in math.
C) there are no substantial differences in X's achievements in these four areas.
D) it is not possible to compare X's performance on these subtests.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
15
A stability coefficient is used for measuring the reliability of

A) a test administered at two different times.
B) the first 50 items, compared with the last 50 items in a 100-item test.
C) standard error of measurement.
D) alternate forms of a test.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
16
The results of an achievement test are considered to be invalid if

A) reliability is less than 0.95.
B) the teacher has not taught the content being tested.
C) the student did not listen when the subject matter was taught.
D) validity is less than 0.95.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
17
The reliability of a test refers to its relative

A) validity.
B) power.
C) consistency.
D) inappropriateness.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
18
Method of measurement, enabling behaviors, and administrative errors are all considered to be

A) types of reliability.
B) signs of validity.
C) sources of systematic bias.
D) test development problems.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
19
The means for both Test A and Test B are 50. A 50% confidence interval for a score at the mean is 44-55 for Test A and 42-58 for Test B. Which of the following statements is true?

A) Test A is more reliable than Test B.
B) Test B is more reliable than Test A.
C) Test A has a larger SEM than Test B.
D) Test B has a larger SEM than Test A.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
20
The Dairy County School District appropriately uses a test that has a reliability of 0.89 to

A) place children in special education if they earn a score below an established criterion.
B) move children to another school building to receive services for gifted children if they score above a certain point.
C) decide whether students should be placed in the Rainbow reading group or the Rainstorm reading group.
D) decide to conduct further assessment procedures.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
21
The absence of __________ required for performance on a test invalidates the rest results. ​
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
22
For individual test data, where a test score is used to make a tracking or placement decision for an individual student, the recommended level of required reliability is __________.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
23
For group test data that are used for administrative purposes and reported only by group, the recommended level of required reliability is __________.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
24
If one wants to generalize to different times, one should examine the test's __________.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
25
Failure to administer a test according to standardized procedures is considered

A) appropriate if the subject is young.
B) a form of rapport building that may be necessary.
C) a source of systematic bias.
D) a random error that varies from one subject to the next.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
26
A test has a norm sample that is not representative of the population. Inferences made on the basis of a student's performance on this test are

A) likely to indicate lower performance than the true score.
B) invalid.
C) unreliable.
D) considered to be reasonable for qualitative comparisons.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
27
If one wants to generalize to different item samples, one should examine the test's __________. ​
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
28
Both unreliability (unsystematic error) and systematic error (bias) threaten __________.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
29
The most likely explanation for items having __________ for different groups of people is differential exposure to test content.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
30
The validity of a particular test can never exceed the __________ of that test. ​
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
31
When we test, we are interested in __________ what we see today under one set of conditions to other occasions.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
32
An estimate of the likelihood that a person's true score may be found within a range of scores is provided by the __________.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
33
The completeness of the item sample is one of the factors to consider in determining __________ validity.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
34
For individual test data, where a test score is used to make a screening decision, the recommended level of required reliability is __________.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
35
In order for evidence of high concurrent validity to be meaningful, the criterion measures must be__________.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
36
A reliability coefficient of 1.00 indicates __________ reliability. ​
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
37
Criteria for how high a test's reliability must be are determined in part by the specific __________ of assessment.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
38
Validity evidence based on _________ ___________ reflects the extent to which a test's items represent the domain or universe to be measured.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
39
A method of estimating the reliability of a test that does not have two forms is to calculate the __________.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
40
A test with a reliability coefficient of .97 has relatively little __________.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
41
The test manual for the Culture-Fair Intelligence Test reports correlations with the Stanford-Binet and the Goodenough-Harris tests. What type of validity is the author trying to demonstrate?
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
42
Dr. Qubert has developed a test for which there is not an adequate criterion measure or construct with which to evaluate validity. She therefore decides to present complete content validity data. What three factors must she consider when determining content validity?
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
43
To the extent that a norm sample is systematically unrepresentative, the inferences based on such scores are incorrect and __________.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
44
Sixty percent of a test's variance is caused by the variance of true scores, whereas 40% of the variance is caused by error. What is the test's reliability?
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
45
Test results would be of little value if we were unable to generalize what was observed in one situation to other situations. Identify and discuss three types of generalizations that can be made from reliable test results.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
46
Joseph was tested on an instrument for which the SEM was relatively high. How sure can we be of Joseph's score?
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
47
Explain in your own words the relationship between reliability and validity.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
48
Validity evidence based on the consequences of testing is a concept adopted by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in education. However, it has been widely accepted in education. Discuss the reason evidence based on the consequences of testing has not been accepted in education. ​
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
49
Compare and contrast the two major approaches to estimating the extent to which we can generalize from different samples of items.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
50
Unless a test is administered according to the __________ the results are invalid.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
51
Annette was tested on an instrument for which the SEM was quite small. How sure can we be of Annette's score?
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.
فتح الحزمة
k this deck
locked card icon
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 51 في هذه المجموعة.