Question 1

Association rule support is defined as&#10;A) the percentage of instances that contain the antecendent conditional items listed in the association rule.&#10;B) the percentage of instances that contain the consequent conditions listed in the association rule.&#10;C) the percentage of instances that contain all items listed in the association rule.&#10;D) the percentage of instances in the database that contain at least one of the antecendent conditional items listed in the association rule.

Accepted Answer

Support in association rule mining is defined as the percentage of instances in the dataset that contain all items (both antecedent and consequent) listed in the association rule.

Question 2

This approach is best when we are interested in finding all possible interactions among a set of attributes.&#10;A) decision tree&#10;B) association rules&#10;C) K-Means algorithm&#10;D) genetic learning Computational Questions

Accepted Answer

Association rules are used to discover relationships and associations among attributes in a dataset. It is specifically designed to find all possible interactions among a set of attributes, making it the best choice for the given scenario. Decision trees and K-Means algorithm have different goals and are not suited for this type of data mining task. Genetic learning is a more complex approach that involves evolving a population of solutions over multiple generations, which may not be necessary for this specific task.

Question 3

Use these tables to answer questions 5 and 6.

One two-item set rule that can be generated from the tables above is: If Magazine Promo = Yes Then Life Ins promo = Yes
The confidence for this rule is:

A) 5 / 7
B) 5 / 12
C) 7 / 12
D) 1

Unlock Deck

Unlock for access to all 13 flashcards in this deck.

Unlock Deck

k this deck

Accepted Answer

All the other rules have support of at least 5. However, the two-item set rule for D has a support of only 4, which is less than the minimum support threshold of 5. Therefore, it is not a possible two-item set rule.

Question 4

Which statement is true about the K-Means algorithm?&#10;A) All attribute values must be categorical.&#10;B) The output attribute must be cateogrical.&#10;C) Attribute values may be either categorical or numeric.&#10;D) All attributes must be numeric.

Accepted Answer

K-Means algorithm requires all attributes to be numeric because it calculates distances between data points and centroids. Categorical data does not support the arithmetic operations needed for these calculations.

Question 5

Which statement is true about the decision tree attribute selection process described in your book?&#10;A) A categorical attribute may appear in a tree node several times but a numeric attribute may appear at most once.&#10;B) A numeric attribute may appear in several tree nodes but a categorical attribute may appear at most once.&#10;C) Both numeric and categorical attributes may appear in several tree nodes.&#10;D) Numeric and categorical attributes may appear in at most one tree node.

Accepted Answer

According to the information provided in the book, a numeric attribute may appear in several tree nodes but a categorical attribute may appear at most once.

Question 6

An evolutionary approach to data mining.&#10;A) backpropagation learning&#10;B) genetic learning&#10;C) decision tree learning&#10;D) linear regression

Accepted Answer

Genetic learning is the best choice for an evolutionary approach to data mining. This method mimics the process of natural selection to optimize a model or algorithm. By selecting the most promising individuals and using them to create new generations and mutations, genetic learning can quickly discover the most effective solutions to complex problems. Backpropagation learning, decision tree learning, and linear regression are all useful techniques in their own right, but they do not leverage the power of evolution to the same extent as genetic learning.

Question 7

Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that&#10;A) Y is true when X is known to be true.&#10;B) X is true when Y is known to be true.&#10;C) Y is false when X is known to be false.&#10;D) X is false when Y is known to be false.

Accepted Answer

Rule confidence is defined as the conditional probability that Y is true given that X is known to be true. It measures the strength of the association between X and Y. Option B talks about the inverse probability and options C and D talk about the complement probabilities, which are not relevant to rule confidence.

Question 8

The K-Means algorithm terminates when&#10;A) a user-defined minimum value for the summation of squared error differences between instances and their corresponding cluster center is seen.&#10;B) the cluster centers for the current iteration are identical to the cluster centers for the previous iteration.&#10;C) the number of instances in each cluster for the current iteration is identical to the number of instances in each cluster of the previous iteration.&#10;D) the number of clusters formed for the current iteration is identical to the number of clusters formed in the previous iteration.

Accepted Answer

The K-Means algorithm terminates when the cluster centers for the current iteration are identical to the cluster centers for the previous iteration, indicating that the algorithm has converged and further iterations will not change the cluster assignments.

Question 9

Use the confusion matrix for Model X and confusion matrix for Model Y to answer questions 4 through 6.

A data mining algorithm is unstable if

A) test set accuracy depends on the ordering of test set instances.
B) the algorithm builds models unable to classify outliers.
C) the algorithm is highly sensitive to small changes in the training data.
D) test set accuracy depends on the choice of input attributes.

Accepted Answer

An unstable algorithm is highly sensitive to small changes in the training data, meaning that small variations in the training data can result in very different models being generated. This can lead to unpredictable and inconsistent classification results, making the algorithm unreliable. The other options presented do not necessarily describe an unstable algorithm.

Question 10

Use these tables to answer questions 5 and 6.

One two-item set rule that can be generated from the tables above is: If Magazine Promo = Yes Then Life Ins promo = Yes
The confidence for this rule is:

A) 5 / 7
B) 5 / 12
C) 7 / 12
D) 1

Accepted Answer

The confidence of a rule is calculated by dividing the number of times the rule is observed by the number of times the condition of the rule is met. For the rule "If Magazine Promo = Yes Then Life Ins Promo = Yes," it is observed 5 times (as indicated by the two-item set for Magazine Promo = Yes & Life Ins Promo = Yes) out of the 7 times Magazine Promo = Yes (as indicated by the single item set for Magazine Promo = Yes). Therefore, the confidence is 5 / 7.

Question 11

Construct a decision tree with root node Type from the data in the table below. The first row contains attribute names. Each row after the first represents the values for one data instance. The output attribute is Class.

Accepted Answer

The answer of Construct a decision tree with root node...

Question 12

The computational complexity as well as the explanation offered by a genetic algorithm is largely determined by the&#10;A) fitness function&#10;B) techniques used for crossover and mutation&#10;C) training data&#10;D) population of elements

Accepted Answer

The answer of The computational complexity as well as the...

Question 13

A genetic learning operation that creates new population elements by combining parts of two or more existing elements.&#10;A) selection&#10;B) crossover&#10;C) mutation&#10;D) absorption

Accepted Answer

The answer of A genetic learning operation that creates new...

Deck 3: Basic Data Mining Techniques