Question 1

Megan is examining the likelihood of people riding the subway.The dependent variable takes on the value of 1 if the individual rides the subway and 0 otherwise.Therefore,she could use logistic regression to examine this question.

Accepted Answer

Logistic regression is used for predicting the probability of a binary outcome (1 or 0, yes or no, true or false), making it suitable for examining the likelihood of people riding the subway as described.

Question 2

When using data partitioning,the second subset,which usually contains the records that were not included in the training data,is called the prediction data set.

Accepted Answer

The second subset is typically called the validation or test dataset, not the prediction dataset. It is used to evaluate the model's performance on unseen data.

Question 3

Cluster analysis tries to group observations into clusters so that observations within a cluster a different and observations in different clusters are similar.

Accepted Answer

Cluster analysis tries to group observations into clusters so that observations within a cluster are similar and observations in different clusters are different.

Question 4

The K in K-Means refers to the number of clusters.

Accepted Answer

In K-Means clustering, the "K" represents the number of clusters into which the data is to be grouped.

Question 5

A data mart is typically smaller than a data warehouse.

Accepted Answer

A data mart is a subset of a data warehouse, designed to serve the specific needs of a particular department or group of users, hence it is typically smaller in size.

Question 6

A neural network methodology attempts to mimic&#10;A)the complex behavior of children.&#10;B)the complex behavior of the human brain.&#10;C)human emotion.&#10;D)quantifiable random processes.

Accepted Answer

Neural network methodology is designed to replicate the complex behavior of the human brain, particularly in terms of processing information, recognizing patterns, and making decisions.

Question 7

Segmentation is also known as clustering,and involves trying to group entities into similar clusters.

Accepted Answer

Segmentation and clustering both involve grouping similar entities based on certain characteristics, often used interchangeably in data analysis contexts.

Question 8

Which of the following statements about logistic regression is false?&#10;A)Logistic regression estimates the probability that an individual is in a particular category.&#10;B)Logistic regression uses a nonlinear function of the explanatory variables for classification.&#10;C)Logistic regression is essentially regression with a binary dependent variable.&#10;D)Logistic regression requires that the error terms are uniformly distributed.

Accepted Answer

Logistic regression does not require that the error terms are uniformly distributed. Instead, it models the probability of the default class (for binary logistic regression) using the logistic function, which is based on the assumption of a binomial distribution of the error terms, not a uniform distribution.

Question 9

Mya is investigating the factors that impact soda consumption.She examines a host of variables that help explain the amount consumed.Which type of data mining methodology is she most likely to use?&#10;A)market basket analysis&#10;B)prediction&#10;C)classification analysis&#10;D)forecasting

Accepted Answer

Prediction is the most suitable data mining methodology for Mya's investigation because it involves analyzing current and historical data to make predictions about future or unknown events, such as the amount of soda consumed based on various factors.

Question 10

Clustering is considered a supervised data mining technique.

Accepted Answer

Clustering is an unsupervised data mining technique because it does not rely on pre-labeled data or outcomes for its process.

Question 11

Lift is the increase in the number of purchasers over the typical number of purchasers.

Accepted Answer

The answer of Lift is the increase in the number...

Question 12

Logistic regression and neural networks use complex nonlinear functions to capture the relationship between explanatory variables and categorical dependent variables.

Accepted Answer

The answer of Logistic regression and neural networks use complex...

Question 13

The logarithm of the odds ratio is called the&#10;A)logit.&#10;B)logos.&#10;C)lods.&#10;D)logodra.

Accepted Answer

The answer of The logarithm of the odds ratio is...

Question 14

The testing set in data partitioning is the&#10;A)first subset of data,which usually contains 70% of the records.&#10;B)second subset of data,which usually contains 30% or less of the records.&#10;C)initial dataset from which subsets are created.&#10;D)first subset of data,which usually contains 30% of the records.

Accepted Answer

The answer of The testing set in data partitioning is...

Question 15

Data mining is used to examine known,expected patterns and relationships among variables.

Accepted Answer

The answer of Data mining is used to examine known,expected...

Question 16

Which methodology is used to group products that customers purchase together?&#10;A)market basket analysis&#10;B)prediction&#10;C)classification analysis&#10;D)forecasting

Accepted Answer

The answer of Which methodology is used to group products...

Question 17

When using data partitioning,the first subset,usually with about 70% to 80% of the records,is called the training data set.

Accepted Answer

The answer of When using data partitioning,the first subset,usually with...

Question 18

Which of the following is not a methodology useful for data mining?&#10;A)Classification analysis&#10;B)Prediction&#10;C)Cluster analysis&#10;D)Stock market analysis

Accepted Answer

The answer of Which of the following is not a...

Question 19

Classification analysis attempts to find variables that are related to a quantitative variable.

Accepted Answer

The answer of Classification analysis attempts to find variables that...

Question 20

It is very useful to partition a large data set into all of the following subsets,except the _____ data set.&#10;A)training&#10;B)data&#10;C)explanatory&#10;D)prediction

Accepted Answer

The answer of It is very useful to partition a...

Question 21

Unsupervised methods have no&#10;A)dependent variable.&#10;B)clustering.&#10;C)segmentation.&#10;D)association analysis.

Accepted Answer

The answer of Unsupervised methods have no&#10;A)dependent variable.&#10;B)clustering.&#10;C)segmentation.&#10;D)association analysis....

Question 22

If the regression coefficient estimate from a logistic regression is positive,the probability of the dependent variable taking on a value of 1&#10;A)decreases.&#10;B)approaches zero.&#10;C)increases.&#10;D)remains constant.

Accepted Answer

The answer of If the regression coefficient estimate from a...

Question 23

Clustering tried to group entities into _____ clusters,based on the value of their variables.&#10;A)trier&#10;B)similar&#10;C)nontrier&#10;D)discovery

Accepted Answer

The answer of Clustering tried to group entities into _____...

Question 24

The predicted value from a logistic regression will be&#10;A)between 0 and 1.&#10;B)between -1 and 1.&#10;C)less than 0.&#10;D)greater than 1.

Accepted Answer

The answer of The predicted value from a logistic regression...

Question 25

Melody is a department store manager and wants to examine whether or not female shoppers are more likely than male shoppers to use a department credit card.&#34;Female = 1&#34; indicates the individual is a female.&#34;Credit Card = 1&#34; indicates the individual used a credit card to make the purchase.&#34;Amount spent&#34; is in dollars.Melody runs a logistic regression.If the estimate on the female variable is positive,what does this indicate about credit card usage?

Accepted Answer

The answer of Melody is a department store manager and...

Question 26

Once a dissimilarity measure is developed,a clustering algorithm attempts to find&#10;A)clusters of rows where rows within a cluster are dissimilar and rows in different clusters are dissimilar.&#10;B)clusters of rows where rows within a cluster are similar and rows in different clusters are similar.&#10;C)clusters of rows where rows within a cluster are dissimilar and rows in different clusters are similar.&#10;D)clusters of rows where rows within a cluster are similar and rows in different clusters are dissimilar.

Accepted Answer

The answer of Once a dissimilarity measure is developed,a clustering...

Question 27

The higher the &#34;score&#34; for a particular member in logistic regression,the&#10;A)higher the likelihood that member is in category 1.&#10;B)lower the likelihood that member is in category 1.&#10;C)higher the likelihood that member is in category 0.&#10;D)higher the likelihood that member is not in a category.

Accepted Answer

The answer of The higher the &#34;score&#34; for a particular...

Question 28

Bridget has partitioned data into two subsets.The original file contains 300,000 observations.The subset she is currently working with has 60,000 observations.Which subset is she most likely to be using?

A)The training set
B)The original set
C)The testing set
D)The prediction set

Accepted Answer

The answer of Bridget has partitioned data into two subsets.The...

Question 29

Suppose the odds of Team A winning are 5 to 1.Then,the odds ratio is&#10;A)5/1.&#10;B)1/5.&#10;C)6/1.&#10;D)1/6.

Accepted Answer

The answer of Suppose the odds of Team A winning...

Question 30

In K-Means clustering,K refers to the&#10;A)size of the population.&#10;B)size of the sample.&#10;C)number of clusters.&#10;D)size of each cluster.

Accepted Answer

The answer of In K-Means clustering,K refers to the&#10;A)size of...

Deck 17: Data Mining