## Quiz 5 : Predictive Analytics I: Trees, K-Nearest Neighbors, Naive Bayes,

For a sufficiently large value of k, the k-nearest neighbors classification approach will always result in a lower misclassification rate than the simple branch splitting approach of the classification tree.
True False

False
Explanation: Neither the classification tree approach nor the k-nearest neighbors approach is always guaranteed to result in the lowest misclassification rate, regardless of k.

A quantitative variable which can have only the values of zero (0) or one (1) and which is used to represent a qualitative variable is known as a (1, 0) dummy variable.
True False

True
Explanation: A (1, 0) dummy variable is a quantitative variable used to represent a qualitative variable.

To predict a qualitative, or categorical, response variable we could use a classification tree.
True False

True
Explanation: A classification tree is used to predict a qualitative, or categorical, response variable.

The optimal value of k to use for the k-nearest neighbors approach to predicting a quantitative response variable is the value of k that minimizes RMSE (the square root of the mean of the squared deviations of the predicted values from the observed values).
True False
To predict a quantitative response variable, we could use a regression tree.
True False
Naive Bayes' Theorem assumes that the events that the predictor variables take on the values x1, x2, …, xk are highly correlated for observations that fall into the particular category and statistically independent for observations that do not fall into the particular category.
True False
Because different trust levels may be appropriate for different techniques, ensemble estimates may use a weighted average of the different results given by the different techniques.
True False
The process of assigning items to prespecified categories is known as classification.
True False
The confusion matrix for a classification tree shows which combinations of predictor variables cannot be used to predict the response variable.
True False
The confusion matrix shows the number of observed response variables which are classified correctly.
True False
One approach to avoid overfitting a classification tree is to use a validation data set to identify valid splits and a training data set to train the classification tree on when to stop making splits.
True False
To "overfit" the data is to adjust the data until it matches our desired classification tree.
True False
Because different classification techniques will perform better for different data sets, ensemble models consider multiple classification techniques before selecting the best classification technique to use for a particular data set.
True False
The confusion matrix shows the number of observed response variables which are inaccurately classified.
True False
A regression tree is used for predicting a qualitative response variable.
True False
Classification involves identifying common traits in items in order to develop broad classes into which the items may be grouped based on those traits.
True False