Question 1

Misclassifying an actual __________ observation as a(n) __________ observation is known as a false positive.&#10;A)Class 0, Class 1&#10;B)Class 1, Class 0&#10;C)error, accuracy&#10;D)false, true

Accepted Answer

A false positive occurs when an actual Class 0 (negative) observation is incorrectly classified as a Class 1 (positive) observation.

Question 2

__________ is dividing the sample data into three sets for training, validation, and testing of the data mining algorithm performance.&#10;A)Data sampling&#10;B)Data partitioning&#10;C)Data preparation&#10;D)Model assessment

Accepted Answer

Data partitioning involves dividing the sample data into three sets for training, validation, and testing of the data mining algorithm performance. Data sampling refers to the process of selecting a representative subset from the larger dataset. Data preparation involves cleaning, transforming, and pre-processing data. Model assessment refers to evaluating the predictive power and accuracy of the model.

Question 3

A(n) __________ is often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables.&#10;A)record&#10;B)data point&#10;C)classification&#10;D)location

Accepted Answer

A record is a complete set of information about a single entity, often displayed as a row of values in a spreadsheet or database in which the columns correspond to the variables. Each record contains a unique identifier and the values associated with that identifier for each variable.

Question 4

The percent of misclassified records out of the total records in the validation data is known as the&#10;A)overall error rate.&#10;B)error.&#10;C)accuracy.&#10;D)class.

Accepted Answer

The overall error rate is calculated as the percent of misclassified records out of the total records in the validation data. This is a common metric used to evaluate the performance of a machine learning model. Option B, error, is too general and does not specify the type or scope of the error. Option C, accuracy, is the complement of the error rate, meaning it measures the percent of correctly classified records. Option D, class, is not relevant to this context.

Question 5

As we increase the cutoff value, _______ error will decrease and _________ error will rise.&#10;A)Class 0, Class 1&#10;B)Class 1, Class 0&#10;C)false, true&#10;D)None of these are correct.

Accepted Answer

As we increase the cutoff value, we will classify more instances as Class 0, which will decrease the Class 0 error. However, this will also increase the Class 1 error, as more instances that should have been classified as Class 1 will be misclassified as Class 0. Therefore, increasing the cutoff value will decrease one type of error (Class 0) and increase the other type of error (Class 1).

Question 6

__________ involves descriptive statistics, data visualization, and clustering.&#10;A)Data exploration&#10;B)Data partitioning&#10;C)Data preparation&#10;D)Model assessment

Accepted Answer

Data exploration involves using descriptive statistics, data visualization, and clustering to gain a better understanding of the data and identify any patterns or relationships. Data partitioning involves dividing the data set into subsets for training and testing a model. Data preparation involves cleaning, transforming, and encoding the data for analysis. Model assessment involves testing the performance of a model on a new set of data.

Question 7

__________ is one minus the Class 0 error rate.&#10;A)Sensitivity&#10;B)Specificity&#10;C)Accuracy&#10;D)Cutoff value

Accepted Answer

Specificity is one minus the Class 0 error rate, as it measures the proportion of actual negatives that are correctly identified.

Question 8

__________ is the manipulation of the data with the goal of putting it in a form suitable for formal modeling.&#10;A)Data sampling&#10;B)Data partitioning&#10;C)Data preparation&#10;D)Model assessment

Accepted Answer

Data preparation involves cleaning, transforming, and structuring the data so that it can be used for analysis and modeling. This includes tasks such as removing missing values, encoding categorical variables, and scaling numerical variables. Data sampling and partitioning refer to the division of the data into subsets for training and testing purposes, while model assessment refers to the evaluation of the performance of a model on a given dataset.

Question 9

__________ is the step in data mining that includes addressing missing and erroneous data, reducing the number of variables, defining new variables, and data exploration.&#10;A)Data sampling&#10;B)Data partitioning&#10;C)Data preparation&#10;D)Model assessment

Accepted Answer

Data preparation is the step in data mining that involves handling missing and erroneous data, reducing the number of variables, defining new variables, and exploring the data to check for any patterns or trends. The other options (A, B, D) are steps that come after data preparation in the data mining process.

Question 10

__________ is NOT a step of the data mining process.&#10;A)Data sampling&#10;B)Data partitioning&#10;C)Model construction&#10;D)Supervised learning

Accepted Answer

Supervised learning is actually a type of machine learning algorithm that is used in the model construction step of the data mining process. Therefore, it is not a step of the data mining process itself. Data sampling, data partitioning, and model construction are all important steps of the data mining process.

Question 11

Determine a freshman's likely first-year grade point average from the student's Scholastic Aptitude Test (SAT) score, high school grade point average, and number of extra-curricular activities. This is an example of

A)classification of a categorical outcome.
B)estimation of a continuous outcome.
C)prediction of a categorical outcome.
D)unsupervised learning.

Accepted Answer

The answer of Determine a freshman's likely first-year grade point...

Question 12

A characteristic or quantity of interest that can take on different values is a(n)&#10;A)variable.&#10;B)observation.&#10;C)record.&#10;D)quality.

Accepted Answer

The answer of A characteristic or quantity of interest that...

Question 13

__________ is a category of data mining techniques in which an algorithm learns how to classify or estimate an outcome variable of interest.&#10;A)Supervised learning&#10;B)Unsupervised learning&#10;C)Dimension reduction&#10;D)Data sampling

Accepted Answer

The answer of __________ is a category of data mining...

Question 14

Applying descriptive statistics and data visualization to the training set to understand the data and assist in the selection of an appropriate technique is a part of&#10;A)data exploration.&#10;B)data partitioning.&#10;C)data preparation.&#10;D)model assessment.

Accepted Answer

The answer of Applying descriptive statistics and data visualization to...

Question 15

Data mining methods for classifying or estimating an outcome based on a set of input variables is referred to as&#10;A)supervised learning.&#10;B)unsupervised learning.&#10;C)dimension reduction.&#10;D)data sampling.

Accepted Answer

The answer of Data mining methods for classifying or estimating...

Question 16

The set of recorded values of variables associated with a single entity is a(n)&#10;A)observation.&#10;B)data point.&#10;C)classification.&#10;D)location.

Accepted Answer

The answer of The set of recorded values of variables...

Question 17

Estimation methods are also referred to as&#10;A)prediction methods.&#10;B)clustering methods.&#10;C)association methods.&#10;D)supervised methods.

Accepted Answer

The answer of Estimation methods are also referred to as&#10;A)prediction...

Question 18

Data used to build a data mining model is called&#10;A)validation data.&#10;B)training data.&#10;C)test data.&#10;D)exploration data.

Accepted Answer

The answer of Data used to build a data mining...

Question 19

Classifying a record as belonging to one class when it belongs to another class is referred to as a(n)&#10;A)overall error rate.&#10;B)error.&#10;C)accuracy.&#10;D)class.

Accepted Answer

The answer of Classifying a record as belonging to one...

Question 20

__________ is a method of extracting data relevant to the business problem under consideration. It is the first step in the data mining process.&#10;A)Data sampling&#10;B)Data partitioning&#10;C)Model construction&#10;D)Model assessment

Accepted Answer

The answer of __________ is a method of extracting data...

Question 21

A __________ classifies a categorical outcome variable by splitting observations into groups via a sequence of hierarchical rules.&#10;A)regression tree&#10;B)scatter chart&#10;C)classification tree&#10;D)confusion matrix

Accepted Answer

The answer of A __________ classifies a categorical outcome variable...

Question 22

The x-axis of a lift chart shows&#10;A)the number of actual Class 1 records identified.&#10;B)the ratio of decile mean to overall mean.&#10;C)the number of actual Class 1 records.&#10;D)the ratio of the overall mean to the decile mean.

Accepted Answer

The answer of The x-axis of a lift chart shows&#10;A)the...

Question 23

A(n) __________ matrix displays a model's correct and incorrect classification.&#10;A)cumulative lift&#10;B)confusion&#10;C)decile-wise lift chart&#10;D)ROC curve

Accepted Answer

The answer of A(n) __________ matrix displays a model's correct...

Question 24

_________ attempts to classify a categorical outcome as a linear function of explanatory variables.&#10;A)Linear regression&#10;B)Logistic regression&#10;C)Classification model&#10;D)Supervised learning

Accepted Answer

The answer of _________ attempts to classify a categorical outcome...

Question 25

How many Class 1's are correctly classified as Class 1 in the Table below? $\begin{array} { | l | c | c | } \hline { \text { Confusion Matrix } } \\\hline & { \text { Predicted Class } } \\\hline \text { Actual Class } & \mathbf { 1 } & \mathbf { 0 } \\\hline \mathbf { 1 } & 221 & 100 \\\hline \mathbf { 0 } & 30 & 3,000 \\\hline\end{array}$ ?

A)221
B)100
C)30
D)3,000

Accepted Answer

The answer of How many Class 1's are correctly classified...

Question 26

Which of the following is a commonly used supervised learning method?&#10;A)k-means clustering&#10;B)k-nearest neighbors&#10;C)hierarchical clustering&#10;D)association rule development

Accepted Answer

The answer of Which of the following is a commonly...

Question 27

The impurity of a group of observations is based on the variance of the outcome value for the observations in the group for&#10;A)regression trees.&#10;B)time-series plots.&#10;C)classification trees.&#10;D)cumulative lift charts.

Accepted Answer

The answer of The impurity of a group of observations...

Question 28

How many Class 1's are incorrectly classified as Class 0? $\begin{array} { | l | c | c | } \hline { \text { Confusion Matrix } } \\\hline & { \text { Predicted Class } } \\\hline \text { Actual Class } & \mathbf { 1 } & \mathbf { 0 } \\\hline \mathbf { 1 } & 221 & 100 \\\hline \mathbf { 0 } & 30 & 3,000 \\\hline\end{array}$ ?

A)221
B)100
C)30
D)3,000

Accepted Answer

The answer of How many Class 1's are incorrectly classified...

Question 29

An observation classified as part of a group with a characteristic when it actually does not have the characteristic is termed as a(n)&#10;A)false negative.&#10;B)false positive.&#10;C)residual.&#10;D)outlier.

Accepted Answer

The answer of An observation classified as part of a...

Question 30

The y-axis of a decile chart shows&#10;A)number of important class records identified.&#10;B)ratio of decile mean to overall mean.&#10;C)the number of actual Class 1 records.&#10;D)the ratio of the overall mean to the decile mean.

Accepted Answer

The answer of The y-axis of a decile chart shows&#10;A)number...

Question 31

__________ compares the number of actual Class 1 observations identified if considered in decreasing order of their estimated probability if randomly classified.&#10;A)Cumulative lift&#10;B)Confusion&#10;C)Decile-wise lift chart&#10;D)ROC curve

Accepted Answer

The answer of __________ compares the number of actual Class...

Question 32

__________ refers to the scenario in which the analyst builds a model that does a great job of explaining the sample of data on which it is based but fails to accurately predict outside the sample data.

A)Underfitting
B)Overfitting
C)Oversampling
D)Undersampling

Accepted Answer

The answer of __________ refers to the scenario in which...

Question 33

A test set is the data set used to&#10;A)build the data mining model.&#10;B)estimate performance of candidate models on unseen data.&#10;C)estimate performance of the final model on unseen data.&#10;D)show counts of actual versus predicted class values.

Accepted Answer

The answer of A test set is the data set...

Question 34

One minus the overall error rate is often referred to as the __________ of the model.&#10;A)sensitivity&#10;B)accuracy&#10;C)specificity&#10;D)cutoff value

Accepted Answer

The answer of One minus the overall error rate is...

Question 35

Separate error rates with respect to the false negative and false positive cases are computed to take into account the&#10;A)asymmetric costs in misclassification.&#10;B)symmetric weights of these two cases.&#10;C)distortions due to outliers.&#10;D)effect of sampling error.

Accepted Answer

The answer of Separate error rates with respect to the...

Question 36

__________ is a measure of the heterogeneity of observations in a classification tree.&#10;A)Sensitivity&#10;B)Specificity&#10;C)Accuracy&#10;D)Impurity

Accepted Answer

The answer of __________ is a measure of the heterogeneity...

Question 37

__________ is a generalization of linear regression for predicting a categorical outcome variable.&#10;A)Multiple linear regression&#10;B)Logistic regression&#10;C)Discriminant analysis&#10;D)Cluster analysis

Accepted Answer

The answer of __________ is a generalization of linear regression...

Question 38

In the k-nearest neighbors method, when the value of k is set to 1&#10;A)the classification or prediction of a new observation is based solely on the single most similar observation from the training set.&#10;B)the new observation's class is na&#239;vely assigned to the most common class in the training set.&#10;C)the new observation's prediction is used to estimate the anticipated error rate on future data over the entire training set.&#10;D)the classification or prediction of a new observation is subject to the smallest possible classification error.

Accepted Answer

The answer of In the k-nearest neighbors method, when the...

Deck 9: Predictive Data Mining