Question 1

For a sufficiently large value of k, the k-nearest neighbors classification approach will always result in a lower misclassification rate than the simple branch splitting approach of the classification tree.

Accepted Answer

Neither the classification tree approach nor the k-nearest neighbors approach is always guaranteed to result in the lowest misclassification rate, regardless of k.

Question 2

A quantitative variable which can have only the values of zero (0) or one (1) and which is used to represent a qualitative variable is known as a (1, 0) dummy variable.

Accepted Answer

A (1, 0) dummy variable is a quantitative variable used to represent a qualitative variable.

Question 3

To predict a qualitative, or categorical, response variable we could use a classification tree.

Accepted Answer

A classification tree is used to predict a qualitative, or categorical, response variable.

Question 4

The optimal value of k to use for the k-nearest neighbors approach to predicting a quantitative response variable is the value of k that minimizes RMSE (the square root of the mean of the squared deviations of the predicted values from the observed values).

Accepted Answer

The optimal value of k to use for the k-nearest neighbors approach to predicting a quantitative response variable is the value of k that minimizes RMSE (the square root of the mean of the squared deviations of the predicted values from the observed values).

Question 5

To predict a quantitative response variable, we could use a regression tree.

Accepted Answer

A regression tree is used to predict a quantitative response variable.

Question 6

Naive Bayes' Theorem assumes that the events that the predictor variables take on the values x₁, x₂, …, x_k are highly correlated for observations that fall into the particular category and statistically independent for observations that do not fall into the particular category.

Accepted Answer

Naive Bayes' Theorem assumes that the events that the predictor variables take on the values x₁, x₂, …, x_k are statistically independent for observations that fall into the particular category and statistically independent for observations that do not fall into the particular category.

Question 7

Because different trust levels may be appropriate for different techniques, ensemble estimates may use a weighted average of the different results given by the different techniques.

Accepted Answer

Different trust in different techniques (based on historical RSquare values, misclassification rates, confusion matrices, and/or other metrics) may be a basis for using a weighted average of the different results given by the different techniques.

Question 8

The process of assigning items to prespecified categories is known as classification.

Accepted Answer

Classification involves assigning items to prespecified categories, or classes.

Question 9

The confusion matrix for a classification tree shows which combinations of predictor variables cannot be used to predict the response variable.

Accepted Answer

The confusion matrix for a classification tree shows the number of observed response variables that are (or are not) classified correctly by their associated predictor variables using the classification tree.

Question 10

The confusion matrix shows the number of observed response variables which are classified correctly.

Accepted Answer

The confusion matrix shows the number of observed response variables that are (or are not) classified correctly.

Question 11

One approach to avoid overfitting a classification tree is to use a validation data set to identify valid splits and a training data set to train the classification tree on when to stop making splits.

Accepted Answer

The training data set is used to make splits, while the validation data set is used to determine when to stop making splits.

Question 12

To "overfit" the data is to adjust the data until it matches our desired classification tree.

Accepted Answer

We do not "overfit" the data by adjusting the data. Rather, we "overfit" the data by developing an overly complex classification tree (or other model) that fits the observed data too closely and thus fails to capture the real underlying data patterns that would help to accurately predict and/or classify future observations.

Question 13

Because different classification techniques will perform better for different data sets, ensemble models consider multiple classification techniques before selecting the best classification technique to use for a particular data set.

Accepted Answer

Ensemble models look for the predominant classification from multiple classification techniques.

Question 14

The confusion matrix shows the number of observed response variables which are inaccurately classified.

Accepted Answer

The confusion matrix shows the number of observed response variables that are (or are not) classified correctly.

Question 15

A regression tree is used for predicting a qualitative response variable.

Accepted Answer

A classification tree is used to predict a qualitative, or categorical, response variable.

Question 16

Classification involves identifying common traits in items in order to develop broad classes into which the items may be grouped based on those traits.

Accepted Answer

Classification involves assigning items to prespecified categories, or classes.

Question 17

The nearest neighbors to an observation are determined by measuring the distance between the set of predictor variables for that observation and the set of predictor variables for every other observation.

Accepted Answer

The nearest neighbors to an observation are determined by measuring the distance between the set of predictor variables for that observation and the set of predictor variables for every other observation.

Question 18

The best value of k to use for the k-nearest neighbors approach to classifying a qualitative response variable is the largest value of k for which all distances between neighbors is less than some prespecified distance.

Accepted Answer

The best value of k is the one which results in the smallest misclassification rate.

Question 19

The confusion matrix is not a good indicator of a classification tree's accuracy.

Accepted Answer

The confusion matrix is a good indicator of a classification tree's accuracy.

Question 20

A classification tree is useful for predicting a quantitative response variable.

Accepted Answer

A classification tree is used to predict a qualitative, or categorical, response variable.

Quiz 5: Predictive Analytics I: Trees, K-Nearest Neighbors, Naive Bayes,

For a sufficiently large value of k, the k-nearest neighbors classification approach will always result in a lower misclassification rate than the simple branch splitting approach of the classification tree.

A quantitative variable which can have only the values of zero (0) or one (1) and which is used to represent a qualitative variable is known as a (1, 0) dummy variable.

To predict a qualitative, or categorical, response variable we could use a classification tree.

The optimal value of k to use for the k-nearest neighbors approach to predicting a quantitative response variable is the value of k that minimizes RMSE (the square root of the mean of the squared deviations of the predicted values from the observed values).

To predict a quantitative response variable, we could use a regression tree.

Naive Bayes' Theorem assumes that the events that the predictor variables take on the values x1, x2, …, xk are highly correlated for observations that fall into the particular category and statistically independent for observations that do not fall into the particular category.

Because different trust levels may be appropriate for different techniques, ensemble estimates may use a weighted average of the different results given by the different techniques.

The process of assigning items to prespecified categories is known as classification.

The confusion matrix for a classification tree shows which combinations of predictor variables cannot be used to predict the response variable.

The confusion matrix shows the number of observed response variables which are classified correctly.

One approach to avoid overfitting a classification tree is to use a validation data set to identify valid splits and a training data set to train the classification tree on when to stop making splits.

To "overfit" the data is to adjust the data until it matches our desired classification tree.

Because different classification techniques will perform better for different data sets, ensemble models consider multiple classification techniques before selecting the best classification technique to use for a particular data set.

The confusion matrix shows the number of observed response variables which are inaccurately classified.

A regression tree is used for predicting a qualitative response variable.

Classification involves identifying common traits in items in order to develop broad classes into which the items may be grouped based on those traits.

The nearest neighbors to an observation are determined by measuring the distance between the set of predictor variables for that observation and the set of predictor variables for every other observation.

The best value of k to use for the k-nearest neighbors approach to classifying a qualitative response variable is the largest value of k for which all distances between neighbors is less than some prespecified distance.

The confusion matrix is not a good indicator of a classification tree's accuracy.

A classification tree is useful for predicting a quantitative response variable.

Naive Bayes' Theorem assumes that the events that the predictor variables take on the values x₁, x₂, …, x_k are highly correlated for observations that fall into the particular category and statistically independent for observations that do not fall into the particular category.