Deck 9: Predictive Data Mining

ملء الشاشة (f)
exit full mode
سؤال
Misclassifying an actual __________ observation as a(n) __________ observation is known as a false positive.

A)Class 0, Class 1
B)Class 1, Class 0
C)error, accuracy
D)false, true
استخدم زر المسافة أو
up arrow
down arrow
لقلب البطاقة.
سؤال
__________ is dividing the sample data into three sets for training, validation, and testing of the data mining algorithm performance.

A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
سؤال
A(n) __________ is often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables.

A)record
B)data point
C)classification
D)location
سؤال
The percent of misclassified records out of the total records in the validation data is known as the

A)overall error rate.
B)error.
C)accuracy.
D)class.
سؤال
As we increase the cutoff value, _______ error will decrease and _________ error will rise.

A)Class 0, Class 1
B)Class 1, Class 0
C)false, true
D)None of these are correct.
سؤال
__________ involves descriptive statistics, data visualization, and clustering.

A)Data exploration
B)Data partitioning
C)Data preparation
D)Model assessment
سؤال
__________ is one minus the Class 0 error rate.

A)Sensitivity
B)Specificity
C)Accuracy
D)Cutoff value
سؤال
__________ is the manipulation of the data with the goal of putting it in a form suitable for formal modeling.

A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
سؤال
__________ is the step in data mining that includes addressing missing and erroneous data, reducing the number of variables, defining new variables, and data exploration.

A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
سؤال
__________ is NOT a step of the data mining process.

A)Data sampling
B)Data partitioning
C)Model construction
D)Supervised learning
سؤال
Determine a freshman's likely first-year grade point average from the student's Scholastic Aptitude Test (SAT) score, high school grade point average, and number of extra-curricular activities. This is an example of

A)classification of a categorical outcome.
B)estimation of a continuous outcome.
C)prediction of a categorical outcome.
D)unsupervised learning.
سؤال
A characteristic or quantity of interest that can take on different values is a(n)

A)variable.
B)observation.
C)record.
D)quality.
سؤال
__________ is a category of data mining techniques in which an algorithm learns how to classify or estimate an outcome variable of interest.

A)Supervised learning
B)Unsupervised learning
C)Dimension reduction
D)Data sampling
سؤال
Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of

A)data exploration.
B)data partitioning.
C)data preparation.
D)model assessment.
سؤال
Data mining methods for classifying or estimating an outcome based on a set of input variables is referred to as

A)supervised learning.
B)unsupervised learning.
C)dimension reduction.
D)data sampling.
سؤال
The set of recorded values of variables associated with a single entity is a(n)

A)observation.
B)data point.
C)classification.
D)location.
سؤال
Estimation methods are also referred to as

A)prediction methods.
B)clustering methods.
C)association methods.
D)supervised methods.
سؤال
Data used to build a data mining model is called

A)validation data.
B)training data.
C)test data.
D)exploration data.
سؤال
Classifying a record as belonging to one class when it belongs to another class is referred to as a(n)

A)overall error rate.
B)error.
C)accuracy.
D)class.
سؤال
__________ is a method of extracting data relevant to the business problem under consideration. It is the first step in the data mining process.

A)Data sampling
B)Data partitioning
C)Model construction
D)Model assessment
سؤال
A __________ classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules.

A)regression tree
B)scatter chart
C)classification tree
D)confusion matrix
سؤال
The x-axis of a lift chart shows

A)the number of actual Class 1 records identified.
B)the ratio of decile mean to overall mean.
C)the number of actual Class 1 records.
D)the ratio of the overall mean to the decile mean.
سؤال
A(n) __________ matrix displays a model's correct and incorrect classification.

A)cumulative lift
B)confusion
C)decile-wise lift chart
D)ROC curve
سؤال
_________ attempts to classify a categorical outcome as a linear function of explanatory variables.

A)Linear regression
B)Logistic regression
C)Classification model
D)Supervised learning
سؤال
How many Class 1's are correctly classified as Class 1 in the Table below?  Confusion Matrix  Predicted Class  Actual Class 1012211000303,000\begin{array} { | l | c | c | } \hline { \text { Confusion Matrix } } \\\hline & { \text { Predicted Class } } \\\hline \text { Actual Class } & \mathbf { 1 } & \mathbf { 0 } \\\hline \mathbf { 1 } & 221 & 100 \\\hline \mathbf { 0 } & 30 & 3,000 \\\hline\end{array} ?

A)221
B)100
C)30
D)3,000
سؤال
Which of the following is a commonly used supervised learning method?

A)k-means clustering
B)k-nearest neighbors
C)hierarchical clustering
D)association rule development
سؤال
The impurity of a group of observations is based on the variance of the outcome value for the observations in the group for

A)regression trees.
B)time-series plots.
C)classification trees.
D)cumulative lift charts.
سؤال
How many Class 1's are incorrectly classified as Class 0?  Confusion Matrix  Predicted Class  Actual Class 1012211000303,000\begin{array} { | l | c | c | } \hline { \text { Confusion Matrix } } \\\hline & { \text { Predicted Class } } \\\hline \text { Actual Class } & \mathbf { 1 } & \mathbf { 0 } \\\hline \mathbf { 1 } & 221 & 100 \\\hline \mathbf { 0 } & 30 & 3,000 \\\hline\end{array} ?

A)221
B)100
C)30
D)3,000
سؤال
An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n)

A)false negative.
B)false positive.
C)residual.
D)outlier.
سؤال
The y-axis of a decile chart shows

A)number of important class records identified.
B)ratio of decile mean to overall mean.
C)the number of actual Class 1 records.
D)the ratio of the overall mean to the decile mean.
سؤال
__________ compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability if randomly classified.

A)Cumulative lift
B)Confusion
C)Decile-wise lift chart
D)ROC curve
سؤال
__________ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.

A)Underfitting
B)Overfitting
C)Oversampling
D)Undersampling
سؤال
A test set is the data set used to

A)build the data mining model.
B)estimate performance of candidate models on unseen data.
C)estimate performance of the final model on unseen data.
D)show counts of actual versus predicted class values.
سؤال
One minus the overall error rate is often referred to as the __________ of the model.

A)sensitivity
B)accuracy
C)specificity
D)cutoff value
سؤال
Separate error rates with respect to the false negative and false positive cases are computed to take into account the

A)asymmetric costs in misclassification.
B)symmetric weights of these two cases.
C)distortions due to outliers.
D)effect of sampling error.
سؤال
__________ is a measure of the heterogeneity of observations in a classification tree.

A)Sensitivity
B)Specificity
C)Accuracy
D)Impurity
سؤال
__________ is a generalization of linear regression for predicting a categorical outcome variable.

A)Multiple linear regression
B)Logistic regression
C)Discriminant analysis
D)Cluster analysis
سؤال
In the k-nearest neighbors method, when the value of k is set to 1

A)the classification or prediction of a new observation is based solely on the single most similar observation from the training set.
B)the new observation's class is naïvely assigned to the most common class in the training set.
C)the new observation's prediction is used to estimate the anticipated error rate on future data over the entire training set.
D)the classification or prediction of a new observation is subject to the smallest possible classification error.
فتح الحزمة
قم بالتسجيل لفتح البطاقات في هذه المجموعة!
Unlock Deck
Unlock Deck
1/38
auto play flashcards
العب
simple tutorial
ملء الشاشة (f)
exit full mode
Deck 9: Predictive Data Mining
1
Misclassifying an actual __________ observation as a(n) __________ observation is known as a false positive.

A)Class 0, Class 1
B)Class 1, Class 0
C)error, accuracy
D)false, true
Class 0, Class 1
2
__________ is dividing the sample data into three sets for training, validation, and testing of the data mining algorithm performance.

A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
Data partitioning
3
A(n) __________ is often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables.

A)record
B)data point
C)classification
D)location
record
4
The percent of misclassified records out of the total records in the validation data is known as the

A)overall error rate.
B)error.
C)accuracy.
D)class.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
5
As we increase the cutoff value, _______ error will decrease and _________ error will rise.

A)Class 0, Class 1
B)Class 1, Class 0
C)false, true
D)None of these are correct.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
6
__________ involves descriptive statistics, data visualization, and clustering.

A)Data exploration
B)Data partitioning
C)Data preparation
D)Model assessment
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
7
__________ is one minus the Class 0 error rate.

A)Sensitivity
B)Specificity
C)Accuracy
D)Cutoff value
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
8
__________ is the manipulation of the data with the goal of putting it in a form suitable for formal modeling.

A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
9
__________ is the step in data mining that includes addressing missing and erroneous data, reducing the number of variables, defining new variables, and data exploration.

A)Data sampling
B)Data partitioning
C)Data preparation
D)Model assessment
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
10
__________ is NOT a step of the data mining process.

A)Data sampling
B)Data partitioning
C)Model construction
D)Supervised learning
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
11
Determine a freshman's likely first-year grade point average from the student's Scholastic Aptitude Test (SAT) score, high school grade point average, and number of extra-curricular activities. This is an example of

A)classification of a categorical outcome.
B)estimation of a continuous outcome.
C)prediction of a categorical outcome.
D)unsupervised learning.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
12
A characteristic or quantity of interest that can take on different values is a(n)

A)variable.
B)observation.
C)record.
D)quality.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
13
__________ is a category of data mining techniques in which an algorithm learns how to classify or estimate an outcome variable of interest.

A)Supervised learning
B)Unsupervised learning
C)Dimension reduction
D)Data sampling
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
14
Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of

A)data exploration.
B)data partitioning.
C)data preparation.
D)model assessment.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
15
Data mining methods for classifying or estimating an outcome based on a set of input variables is referred to as

A)supervised learning.
B)unsupervised learning.
C)dimension reduction.
D)data sampling.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
16
The set of recorded values of variables associated with a single entity is a(n)

A)observation.
B)data point.
C)classification.
D)location.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
17
Estimation methods are also referred to as

A)prediction methods.
B)clustering methods.
C)association methods.
D)supervised methods.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
18
Data used to build a data mining model is called

A)validation data.
B)training data.
C)test data.
D)exploration data.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
19
Classifying a record as belonging to one class when it belongs to another class is referred to as a(n)

A)overall error rate.
B)error.
C)accuracy.
D)class.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
20
__________ is a method of extracting data relevant to the business problem under consideration. It is the first step in the data mining process.

A)Data sampling
B)Data partitioning
C)Model construction
D)Model assessment
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
21
A __________ classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules.

A)regression tree
B)scatter chart
C)classification tree
D)confusion matrix
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
22
The x-axis of a lift chart shows

A)the number of actual Class 1 records identified.
B)the ratio of decile mean to overall mean.
C)the number of actual Class 1 records.
D)the ratio of the overall mean to the decile mean.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
23
A(n) __________ matrix displays a model's correct and incorrect classification.

A)cumulative lift
B)confusion
C)decile-wise lift chart
D)ROC curve
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
24
_________ attempts to classify a categorical outcome as a linear function of explanatory variables.

A)Linear regression
B)Logistic regression
C)Classification model
D)Supervised learning
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
25
How many Class 1's are correctly classified as Class 1 in the Table below?  Confusion Matrix  Predicted Class  Actual Class 1012211000303,000\begin{array} { | l | c | c | } \hline { \text { Confusion Matrix } } \\\hline & { \text { Predicted Class } } \\\hline \text { Actual Class } & \mathbf { 1 } & \mathbf { 0 } \\\hline \mathbf { 1 } & 221 & 100 \\\hline \mathbf { 0 } & 30 & 3,000 \\\hline\end{array} ?

A)221
B)100
C)30
D)3,000
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
26
Which of the following is a commonly used supervised learning method?

A)k-means clustering
B)k-nearest neighbors
C)hierarchical clustering
D)association rule development
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
27
The impurity of a group of observations is based on the variance of the outcome value for the observations in the group for

A)regression trees.
B)time-series plots.
C)classification trees.
D)cumulative lift charts.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
28
How many Class 1's are incorrectly classified as Class 0?  Confusion Matrix  Predicted Class  Actual Class 1012211000303,000\begin{array} { | l | c | c | } \hline { \text { Confusion Matrix } } \\\hline & { \text { Predicted Class } } \\\hline \text { Actual Class } & \mathbf { 1 } & \mathbf { 0 } \\\hline \mathbf { 1 } & 221 & 100 \\\hline \mathbf { 0 } & 30 & 3,000 \\\hline\end{array} ?

A)221
B)100
C)30
D)3,000
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
29
An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n)

A)false negative.
B)false positive.
C)residual.
D)outlier.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
30
The y-axis of a decile chart shows

A)number of important class records identified.
B)ratio of decile mean to overall mean.
C)the number of actual Class 1 records.
D)the ratio of the overall mean to the decile mean.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
31
__________ compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability if randomly classified.

A)Cumulative lift
B)Confusion
C)Decile-wise lift chart
D)ROC curve
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
32
__________ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.

A)Underfitting
B)Overfitting
C)Oversampling
D)Undersampling
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
33
A test set is the data set used to

A)build the data mining model.
B)estimate performance of candidate models on unseen data.
C)estimate performance of the final model on unseen data.
D)show counts of actual versus predicted class values.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
34
One minus the overall error rate is often referred to as the __________ of the model.

A)sensitivity
B)accuracy
C)specificity
D)cutoff value
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
35
Separate error rates with respect to the false negative and false positive cases are computed to take into account the

A)asymmetric costs in misclassification.
B)symmetric weights of these two cases.
C)distortions due to outliers.
D)effect of sampling error.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
36
__________ is a measure of the heterogeneity of observations in a classification tree.

A)Sensitivity
B)Specificity
C)Accuracy
D)Impurity
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
37
__________ is a generalization of linear regression for predicting a categorical outcome variable.

A)Multiple linear regression
B)Logistic regression
C)Discriminant analysis
D)Cluster analysis
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
38
In the k-nearest neighbors method, when the value of k is set to 1

A)the classification or prediction of a new observation is based solely on the single most similar observation from the training set.
B)the new observation's class is naïvely assigned to the most common class in the training set.
C)the new observation's prediction is used to estimate the anticipated error rate on future data over the entire training set.
D)the classification or prediction of a new observation is subject to the smallest possible classification error.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.
فتح الحزمة
k this deck
locked card icon
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 38 في هذه المجموعة.