Deck 8: Introduction to Data Mining
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/54
Play
Full screen (f)
Deck 8: Introduction to Data Mining
1
Cross-Industry Standard Process for Data Mining (CRISP-DM) consists of six phases. Of the six, which one represents the phase where data wrangling occurs?
A) Deployment
B) Modeling
C) Data understanding
D) Data preparation
A) Deployment
B) Modeling
C) Data understanding
D) Data preparation
Data preparation
2
Using the Manhattan distance between pairwise observations, which pairwise observation is most similar? 
A) Observations 1 & 2
B) Observations 2 & 3
C) Observations 1 & 3
D) Both Observations 1 & 2 and 2 & 3

A) Observations 1 & 2
B) Observations 2 & 3
C) Observations 1 & 3
D) Both Observations 1 & 2 and 2 & 3
Observations 1 & 2
3
Using the Manhattan distance between pairwise observations, which pairwise observation is most similar? 
A) Observations 2 & 3
B) Observations 1 & 3
C) Observations 1 & 2
D) Both Observations 2 & 3 and 1 & 3

A) Observations 2 & 3
B) Observations 1 & 3
C) Observations 1 & 2
D) Both Observations 2 & 3 and 1 & 3
Observations 2 & 3
4
Using the Euclidean distance between pairwise observations, which pairwise observation is most dissimilar? 
A) Observations 1 & 3
B) Observations 2 & 3 and 1 & 3
C) Observations 2 & 3
D) Observations 1 & 2 and 2 & 3

A) Observations 1 & 3
B) Observations 2 & 3 and 1 & 3
C) Observations 2 & 3
D) Observations 1 & 2 and 2 & 3
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
5
Using the Euclidean distance between pairwise observations, which pairwise observation is most dissimilar? 
A) Observations 1 & 3
B) Observations 2 & 3 and 1 & 3
C) Observations 2 & 3
D) Observations 1 & 2 and 2 & 3

A) Observations 1 & 3
B) Observations 2 & 3 and 1 & 3
C) Observations 2 & 3
D) Observations 1 & 2 and 2 & 3
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
6
Consider the partial data set in the table represents online hours spent shopping by age and income. The average and standard deviation for the full data set is $47,701 and $14,362, respectively. Using z-scores to standardize the observations, what is the average standard deviation of Income for the three provided? 
A) 0.926
B) 0.3780
C) 0.8640
D) 0.7637

A) 0.926
B) 0.3780
C) 0.8640
D) 0.7637
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
7
Consider the partial data set in the table represents online hours spent shopping by age and income. The average and standard deviation for the full data set is $47,667 and $14,292, respectively. Using z-scores to standardize the observations, what is the average standard deviation of Income for the three provided? 
A) 1.003
B) 0.7320
C) 0.2410
D) 0.6997

A) 1.003
B) 0.7320
C) 0.2410
D) 0.6997
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
8
Consider the partial data set in the table represents online hours spent shopping by age and income. Using the min-max transformation to normalize Income, what is the average standard deviation of Income for the chart provided?
Use the min-max transformation to normalize the observations for Income spent online.

A) 0.686
B) 0.6147
C) 0
D) 0.6727
Use the min-max transformation to normalize the observations for Income spent online.

A) 0.686
B) 0.6147
C) 0
D) 0.6727
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
9
Consider the partial data set in the table represents online hours spent shopping by age and income. Using the min-max transformation to normalize Income, what is the average standard deviation of Income for the chart provided? Use the min-max transformation to normalize the observations for Income spent online. 
A) 1
B) 6417
C) 0
D) 0.6997

A) 1
B) 6417
C) 0
D) 0.6997
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
10
The following table is a segment of Loan Data from a bank for car loans. Compute the matching coefficient between Pairs 1 and 4. 
A) Matching coefficient is 0.25.
B) Matching coefficient is 0.13.
C) Matching coefficient is 0.88.
D) Matching coefficient is 0.35.

A) Matching coefficient is 0.25.
B) Matching coefficient is 0.13.
C) Matching coefficient is 0.88.
D) Matching coefficient is 0.35.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
11
The following table is a segment of Loan Data from a bank for car loans. Compute the matching coefficient between Pairs 1 and 4. 
A) Matching coefficient is 0.50.
B) Matching coefficient is 0.25.
C) Matching coefficient is 0.75.
D) Matching coefficient is 0.40.

A) Matching coefficient is 0.50.
B) Matching coefficient is 0.25.
C) Matching coefficient is 0.75.
D) Matching coefficient is 0.40.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
12
Sandra began collecting transaction details to see if the same items were in each sales transaction. Compute the matching coefficient and Jaccard's coefficient for pairwise transaction 1 & 2. 
A) Matching Coefficient = 0.60 and Jaccard's = 0.40
B) Matching Coefficient = 0.40 and Jaccard's = 0.50
C) Matching Coefficient = 0.60 and Jaccard's = 0.33
D) Matching Coefficient = 0.40 and Jaccard's = 0.40

A) Matching Coefficient = 0.60 and Jaccard's = 0.40
B) Matching Coefficient = 0.40 and Jaccard's = 0.50
C) Matching Coefficient = 0.60 and Jaccard's = 0.33
D) Matching Coefficient = 0.40 and Jaccard's = 0.40
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
13
Sandra began collecting transaction details to see if the same items were in each sales transaction. Compute the matching coefficient and Jaccard's coefficient for pairwise transaction 1 & 2. 
A) Matching Coefficient = 0.60 and Jaccard's = 0.40
B) Matching Coefficient = 0.40 and Jaccard's = 0.50
C) Matching Coefficient = 0.60 and Jaccard's = 0.50
D) Matching Coefficient = 0.40 and Jaccard's = 0.40

A) Matching Coefficient = 0.60 and Jaccard's = 0.40
B) Matching Coefficient = 0.40 and Jaccard's = 0.50
C) Matching Coefficient = 0.60 and Jaccard's = 0.50
D) Matching Coefficient = 0.40 and Jaccard's = 0.40
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
14
The process of dividing a data set into a training, a validation, and an optimal test data set is called ________.
A) overfitting
B) oversampling
C) optional testing
D) data partitioning
A) overfitting
B) oversampling
C) optional testing
D) data partitioning
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
15
Cameron is performing a study on the IQ of groups in various areas. He has calculated that the average IQ of Group A is 108 with a standard deviation of 8. What is the z-score for someone with an IQ of 112?
A) -0.50
B) 0.50
C) -0.30
D) 1.30
A) -0.50
B) 0.50
C) -0.30
D) 1.30
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
16
Cameron is performing a study on the IQ of groups in various areas. He has calculated that the average IQ of Group A is 105 with a standard deviation of 10. What is the z-score for someone with an IQ of 98?
A) 0.7
B) -0.7
C) 0.9
D) 0.1
A) 0.7
B) -0.7
C) 0.9
D) 0.1
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
17
Cameron is performing a study on the IQ of groups in various areas. He has calculated that the average IQ of Group B is 148 with a standard deviation of 10. What is the z-score for someone with an IQ of 155?
A) 1.10
B) -0.70
C) 0.22
D) 0.70
A) 1.10
B) -0.70
C) 0.22
D) 0.70
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
18
Cameron is performing a study on the IQ of groups in various areas. He has calculated that the average IQ of Group B is 118 with a standard deviation of 12. What is the z-score for someone with an IQ of 125?
A) 0.98
B) -0.58
C) 0.10
D) 0.58
A) 0.98
B) -0.58
C) 0.10
D) 0.58
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
19
When a predictive model is made overly complex to fit in the quirks of given sample data, it is called ______.
A) oversampling
B) overfitting
C) partitioning
D) distribution
A) oversampling
B) overfitting
C) partitioning
D) distribution
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
20
Molly e-mailed her clients offering a free 30-minute massage for referrals. In the following validation set of 100, Class 1 reflects the clients predicted to provide referrals and Class 0 reflects the clients predicted to not provide referrals.
-Based on the confusion matrix, what was the True Positive (TP) of current clients who provided referrals for a free massage?
A) 36
B) 10
C) 26
D) 54
-Based on the confusion matrix, what was the True Positive (TP) of current clients who provided referrals for a free massage?

A) 36
B) 10
C) 26
D) 54
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
21
Molly e-mailed her clients offering a free 30-minute massage for referrals. In the following validation set of 100, Class 1 reflects the clients predicted to provide referrals and Class 0 reflects the clients predicted to not provide referrals.
-Based on the confusion matrix, what was the True Positive (TP) of current clients who provided referrals for a free massage?
A) 29
B) 11
C) 18
D) 60
-Based on the confusion matrix, what was the True Positive (TP) of current clients who provided referrals for a free massage?

A) 29
B) 11
C) 18
D) 60
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
22
Molly e-mailed her clients offering a free 30-minute massage for referrals. In the following validation set of 100, Class 1 reflects the clients predicted to provide referrals and Class 0 reflects the clients predicted to not provide referrals.
-Based on the confusion matrix, what was the False Negative (FN) of current clients who provided referrals for a free message?
A) 29
B) 18
C) 11
D) 60
-Based on the confusion matrix, what was the False Negative (FN) of current clients who provided referrals for a free message?

A) 29
B) 18
C) 11
D) 60
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
23
The sensitivity, also called recall, is computed using which equation?
A) Sensitivity = TP ÷ (TP + FN).
B) Sensitivity = TN ÷ (TP + FN).
C) Sensitivity = (TP + TN) ÷ (TP + TN + FP + FN).
D) Sensitivity = TP ÷ (TP + FP).
A) Sensitivity = TP ÷ (TP + FN).
B) Sensitivity = TN ÷ (TP + FN).
C) Sensitivity = (TP + TN) ÷ (TP + TN + FP + FN).
D) Sensitivity = TP ÷ (TP + FP).
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
24
The precision, also called positive predictive value, is computed using which equation?
A) Precision = TP ÷ (TP + FN).
B) Precision = TN ÷ (TP + FN).
C) Precision = (TP + TN) ÷ (TP + TN + FP + FN).
D) Precision = TP ÷ (TP + FP).
A) Precision = TP ÷ (TP + FN).
B) Precision = TN ÷ (TP + FN).
C) Precision = (TP + TN) ÷ (TP + TN + FP + FN).
D) Precision = TP ÷ (TP + FP).
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
25
Based on the following confusion matrix with a validation set of 100, Class 1 reflects the members targeted who purchased services and Class 0 reflects the non-targeted respondents who did not purchase services. Calculate the specificity rate. 
A) 72%
B) 88%
C) 84%
D) 84.4%

A) 72%
B) 88%
C) 84%
D) 84.4%
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
26
Based on the following confusion matrix with a validation set of 100, class 1 reflects the members targeted who purchased services and class 0 reflects the non-targeted respondents who did not purchase services. Calculate the specificity rate. 
A) 62%
B) 78%
C) 84%
D) 84.5%

A) 62%
B) 78%
C) 84%
D) 84.5%
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
27
Based on the following confusion matrix with a validation set of 100, Class 1 reflects the members targeted who purchased services and Class 0 reflects the non-targeted respondents who did not purchase services. Calculate the sensitivity rate. 
A) 86%
B) 85%
C) 70%
D) 40%

A) 86%
B) 85%
C) 70%
D) 40%
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
28
Based on the following confusion matrix with a validation set of 100, Class 1 reflects the members targeted who purchased services and Class 0 reflects the non-targeted respondents who did not purchase services. Calculate the sensitivity rate. 
A) 78%
B) 84%
C) 62%
D) 32%

A) 78%
B) 84%
C) 62%
D) 32%
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
29
The calculated average error measuring the average magnitude of errors in predictive performance measures is called _______.
A) mean percentage error
B) root mean square error
C) mean error
D) mean absolute deviation
A) mean percentage error
B) root mean square error
C) mean error
D) mean absolute deviation
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
30
The _______ error is the average absolute percentage error, shown as a percentage of the actual value, displaying the magnitude of the errors in performance measures.
A) RMSE
B) MAD
C) MAPE
D) ME
A) RMSE
B) MAD
C) MAPE
D) ME
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
31
Which is the best-fit definition for the use of Principal Component Analysis (PCA)?
A) The elimination of the number of components in a data set to identify errors.
B) The transformation of a small set of correlated variables into larger uncorrelated subsets.
C) The transformation of a large number of correlated variables into a smaller number of uncorrelated variables.
D) The driving of pattern recognition in a supervised data mining set used to visualize data methods.
A) The elimination of the number of components in a data set to identify errors.
B) The transformation of a small set of correlated variables into larger uncorrelated subsets.
C) The transformation of a large number of correlated variables into a smaller number of uncorrelated variables.
D) The driving of pattern recognition in a supervised data mining set used to visualize data methods.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
32
The following table displays the weights for computing the principal components and the data for two Observations. The mean and standard deviation for x 1 are 3.60 and 1.70, respectively. What is the z-score of x 1 for Observation 1? 
A) 0.42
B) 0.41
C) 0.97
D) 0.45

A) 0.42
B) 0.41
C) 0.97
D) 0.45
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
33
The following table displays the weights for computing the principal components and the data for two Observations. The mean and standard deviation for x1 are 4.2 and 1.4, respectively. What is the z-score of x 1 for Observation 1? 
A) 0.18
B) 0.17
C) 0.73
D) 0.21

A) 0.18
B) 0.17
C) 0.73
D) 0.21
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
34
The following table displays the weights for computing the principal components and the data for two Observations. The mean and standard deviation for x 1 are 4.2 and 1.9, respectively. The mean and standard deviation for x 2 are 6.2 and 4.8, respectively. Compute the first principal component score for Observation 1. 
A) -0.533
B) -0.084
C) 0.740
D) 0.403

A) -0.533
B) -0.084
C) 0.740
D) 0.403
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
35
The following table displays the weights for computing the principal components and the data for two Observations. The mean and standard deviation for x 1 are 4.2 and 1.4, respectively. The mean and standard deviation for x 2 are 6.8 and 4.8, respectively. Compute the first principal component score for Observation 1. 
A) -0.293
B) -0.742
C) 0.949
D) 0.612

A) -0.293
B) -0.742
C) 0.949
D) 0.612
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
36
Of the following selections, which is not a descriptor of principal component analysis?
A) The first principal account is not suitable for analysis.
B) Principal components are uncorrelated variables.
C) The first principal accounts for most of the variability.
D) Principal component variables are weighted linear combinations of the original variables.
A) The first principal account is not suitable for analysis.
B) Principal components are uncorrelated variables.
C) The first principal accounts for most of the variability.
D) Principal component variables are weighted linear combinations of the original variables.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
37
Calculate the misclassification rate for the following confusion matrix. 
A) 0.68
B) 0.18
C) 0.74
D) 0.24

A) 0.68
B) 0.18
C) 0.74
D) 0.24
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
38
Calculate the misclassification rate for the following confusion matrix. 
A) 0.72
B) 0.22
C) 0.78
D) 0.28

A) 0.72
B) 0.22
C) 0.78
D) 0.28
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
39
Calculate the accuracy rate for the following confusion matrix. 
A) 0.78
B) 0.28
C) 0.72
D) 0.34

A) 0.78
B) 0.28
C) 0.72
D) 0.34
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
40
Calculate the accuracy rate for the following confusion matrix. 
A) 0.72
B) 0.22
C) 0.78
D) 0.28

A) 0.72
B) 0.22
C) 0.78
D) 0.28
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
41
Tess is tasked with analyzing a data set with multiple variables with various scales. To reduce the difference in scale in variables, she is following a common process called ______ to make the observations unit-free.
A) data mining
B) Euclidean distance
C) Manhattan distance
D) standardizing
A) data mining
B) Euclidean distance
C) Manhattan distance
D) standardizing
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
42
Which chart allows for a visual representation to determine a point where a model's predictions become less useful?
A) ROC Curve
B) Cumulative Lift Chart
C) Decile-wise Lift Chart
D) Sensitivity Measure
A) ROC Curve
B) Cumulative Lift Chart
C) Decile-wise Lift Chart
D) Sensitivity Measure
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
43
Which chart is a bar chart displayed in 10 equal-sized intervals, or every 10% of the observations?
A) ROC Curve
B) Cumulative Lift Chart
C) Decile-wise Lift Chart
D) Sensitivity Measure
A) ROC Curve
B) Cumulative Lift Chart
C) Decile-wise Lift Chart
D) Sensitivity Measure
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
44
When using PCA, all the following are disadvantages except
A) PCA results are difficult to interpret clearly.
B) components are weighted linear combinations and abstract.
C) PCA only works with numerical data.
D) PCA significantly increases the dimension of the data.
A) PCA results are difficult to interpret clearly.
B) components are weighted linear combinations and abstract.
C) PCA only works with numerical data.
D) PCA significantly increases the dimension of the data.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
45
The process of applying a set of analytical techniques for the development of machine learning and artificial intelligence is called data mining.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
46
The key distinction between supervised and unsupervised data mining is that the identification of the target variable is identified in supervised data mining.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
47
Common applications of unsupervised learning include dimension reduction and prediction model.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
48
Normalization is the process that makes the numerical data independent of scale.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
49
The Jaccard's coefficient is appropriate when it is more informative to match negative outcomes between observations.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
50
Oversampling involves intentionally selecting more samples from one class than from other classes to adjust the class distribution of a data set.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
51
A diagram that represents the information in equal-sized intervals, deciles, is called a cumulative lift chart.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
52
In real-world situations, data sets contain many variables. If some variables are eliminated, valuable information may be lost.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
53
The principal component analysis (PCA) is a dimension reduction technique used to reduce variables without removing variables.
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck
54
In Excel, Analytic Solver only provides the covariance matrix for performing principal component analysis (PCA).
Unlock Deck
Unlock for access to all 54 flashcards in this deck.
Unlock Deck
k this deck