Deck 17: Data Mining

ملء الشاشة (f)
exit full mode
سؤال
Megan is examining the likelihood of people riding the subway.The dependent variable takes on the value of 1 if the individual rides the subway and 0 otherwise.Therefore,she could use logistic regression to examine this question.
استخدم زر المسافة أو
up arrow
down arrow
لقلب البطاقة.
سؤال
When using data partitioning,the second subset,which usually contains the records that were not included in the training data,is called the prediction data set.
سؤال
Cluster analysis tries to group observations into clusters so that observations within a cluster a different and observations in different clusters are similar.
سؤال
The K in K-Means refers to the number of clusters.
سؤال
A data mart is typically smaller than a data warehouse.
سؤال
A neural network methodology attempts to mimic

A)the complex behavior of children.
B)the complex behavior of the human brain.
C)human emotion.
D)quantifiable random processes.
سؤال
Segmentation is also known as clustering,and involves trying to group entities into similar clusters.
سؤال
Which of the following statements about logistic regression is false?

A)Logistic regression estimates the probability that an individual is in a particular category.
B)Logistic regression uses a nonlinear function of the explanatory variables for classification.
C)Logistic regression is essentially regression with a binary dependent variable.
D)Logistic regression requires that the error terms are uniformly distributed.
سؤال
Mya is investigating the factors that impact soda consumption.She examines a host of variables that help explain the amount consumed.Which type of data mining methodology is she most likely to use?

A)market basket analysis
B)prediction
C)classification analysis
D)forecasting
سؤال
Clustering is considered a supervised data mining technique.
سؤال
Lift is the increase in the number of purchasers over the typical number of purchasers.
سؤال
Logistic regression and neural networks use complex nonlinear functions to capture the relationship between explanatory variables and categorical dependent variables.
سؤال
The logarithm of the odds ratio is called the

A)logit.
B)logos.
C)lods.
D)logodra.
سؤال
The testing set in data partitioning is the

A)first subset of data,which usually contains 70% of the records.
B)second subset of data,which usually contains 30% or less of the records.
C)initial dataset from which subsets are created.
D)first subset of data,which usually contains 30% of the records.
سؤال
Data mining is used to examine known,expected patterns and relationships among variables.
سؤال
Which methodology is used to group products that customers purchase together?

A)market basket analysis
B)prediction
C)classification analysis
D)forecasting
سؤال
When using data partitioning,the first subset,usually with about 70% to 80% of the records,is called the training data set.
سؤال
Which of the following is not a methodology useful for data mining?

A)Classification analysis
B)Prediction
C)Cluster analysis
D)Stock market analysis
سؤال
Classification analysis attempts to find variables that are related to a quantitative variable.
سؤال
It is very useful to partition a large data set into all of the following subsets,except the _____ data set.

A)training
B)data
C)explanatory
D)prediction
سؤال
Unsupervised methods have no

A)dependent variable.
B)clustering.
C)segmentation.
D)association analysis.
سؤال
If the regression coefficient estimate from a logistic regression is positive,the probability of the dependent variable taking on a value of 1

A)decreases.
B)approaches zero.
C)increases.
D)remains constant.
سؤال
Clustering tried to group entities into _____ clusters,based on the value of their variables.

A)trier
B)similar
C)nontrier
D)discovery
سؤال
The predicted value from a logistic regression will be

A)between 0 and 1.
B)between -1 and 1.
C)less than 0.
D)greater than 1.
سؤال
Melody is a department store manager and wants to examine whether or not female shoppers are more likely than male shoppers to use a department credit card."Female = 1" indicates the individual is a female."Credit Card = 1" indicates the individual used a credit card to make the purchase."Amount spent" is in dollars.Melody runs a logistic regression.If the estimate on the female variable is positive,what does this indicate about credit card usage?
سؤال
Once a dissimilarity measure is developed,a clustering algorithm attempts to find

A)clusters of rows where rows within a cluster are dissimilar and rows in different clusters are dissimilar.
B)clusters of rows where rows within a cluster are similar and rows in different clusters are similar.
C)clusters of rows where rows within a cluster are dissimilar and rows in different clusters are similar.
D)clusters of rows where rows within a cluster are similar and rows in different clusters are dissimilar.
سؤال
The higher the "score" for a particular member in logistic regression,the

A)higher the likelihood that member is in category 1.
B)lower the likelihood that member is in category 1.
C)higher the likelihood that member is in category 0.
D)higher the likelihood that member is not in a category.
سؤال
Bridget has partitioned data into two subsets.The original file contains 300,000 observations.The subset she is currently working with has 60,000 observations.Which subset is she most likely to be using?

A)The training set
B)The original set
C)The testing set
D)The prediction set
سؤال
Suppose the odds of Team A winning are 5 to 1.Then,the odds ratio is

A)5/1.
B)1/5.
C)6/1.
D)1/6.
سؤال
In K-Means clustering,K refers to the

A)size of the population.
B)size of the sample.
C)number of clusters.
D)size of each cluster.
فتح الحزمة
قم بالتسجيل لفتح البطاقات في هذه المجموعة!
Unlock Deck
Unlock Deck
1/30
auto play flashcards
العب
simple tutorial
ملء الشاشة (f)
exit full mode
Deck 17: Data Mining
1
Megan is examining the likelihood of people riding the subway.The dependent variable takes on the value of 1 if the individual rides the subway and 0 otherwise.Therefore,she could use logistic regression to examine this question.
True
2
When using data partitioning,the second subset,which usually contains the records that were not included in the training data,is called the prediction data set.
False
3
Cluster analysis tries to group observations into clusters so that observations within a cluster a different and observations in different clusters are similar.
False
4
The K in K-Means refers to the number of clusters.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
5
A data mart is typically smaller than a data warehouse.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
6
A neural network methodology attempts to mimic

A)the complex behavior of children.
B)the complex behavior of the human brain.
C)human emotion.
D)quantifiable random processes.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
7
Segmentation is also known as clustering,and involves trying to group entities into similar clusters.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
8
Which of the following statements about logistic regression is false?

A)Logistic regression estimates the probability that an individual is in a particular category.
B)Logistic regression uses a nonlinear function of the explanatory variables for classification.
C)Logistic regression is essentially regression with a binary dependent variable.
D)Logistic regression requires that the error terms are uniformly distributed.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
9
Mya is investigating the factors that impact soda consumption.She examines a host of variables that help explain the amount consumed.Which type of data mining methodology is she most likely to use?

A)market basket analysis
B)prediction
C)classification analysis
D)forecasting
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
10
Clustering is considered a supervised data mining technique.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
11
Lift is the increase in the number of purchasers over the typical number of purchasers.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
12
Logistic regression and neural networks use complex nonlinear functions to capture the relationship between explanatory variables and categorical dependent variables.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
13
The logarithm of the odds ratio is called the

A)logit.
B)logos.
C)lods.
D)logodra.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
14
The testing set in data partitioning is the

A)first subset of data,which usually contains 70% of the records.
B)second subset of data,which usually contains 30% or less of the records.
C)initial dataset from which subsets are created.
D)first subset of data,which usually contains 30% of the records.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
15
Data mining is used to examine known,expected patterns and relationships among variables.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
16
Which methodology is used to group products that customers purchase together?

A)market basket analysis
B)prediction
C)classification analysis
D)forecasting
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
17
When using data partitioning,the first subset,usually with about 70% to 80% of the records,is called the training data set.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
18
Which of the following is not a methodology useful for data mining?

A)Classification analysis
B)Prediction
C)Cluster analysis
D)Stock market analysis
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
19
Classification analysis attempts to find variables that are related to a quantitative variable.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
20
It is very useful to partition a large data set into all of the following subsets,except the _____ data set.

A)training
B)data
C)explanatory
D)prediction
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
21
Unsupervised methods have no

A)dependent variable.
B)clustering.
C)segmentation.
D)association analysis.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
22
If the regression coefficient estimate from a logistic regression is positive,the probability of the dependent variable taking on a value of 1

A)decreases.
B)approaches zero.
C)increases.
D)remains constant.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
23
Clustering tried to group entities into _____ clusters,based on the value of their variables.

A)trier
B)similar
C)nontrier
D)discovery
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
24
The predicted value from a logistic regression will be

A)between 0 and 1.
B)between -1 and 1.
C)less than 0.
D)greater than 1.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
25
Melody is a department store manager and wants to examine whether or not female shoppers are more likely than male shoppers to use a department credit card."Female = 1" indicates the individual is a female."Credit Card = 1" indicates the individual used a credit card to make the purchase."Amount spent" is in dollars.Melody runs a logistic regression.If the estimate on the female variable is positive,what does this indicate about credit card usage?
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
26
Once a dissimilarity measure is developed,a clustering algorithm attempts to find

A)clusters of rows where rows within a cluster are dissimilar and rows in different clusters are dissimilar.
B)clusters of rows where rows within a cluster are similar and rows in different clusters are similar.
C)clusters of rows where rows within a cluster are dissimilar and rows in different clusters are similar.
D)clusters of rows where rows within a cluster are similar and rows in different clusters are dissimilar.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
27
The higher the "score" for a particular member in logistic regression,the

A)higher the likelihood that member is in category 1.
B)lower the likelihood that member is in category 1.
C)higher the likelihood that member is in category 0.
D)higher the likelihood that member is not in a category.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
28
Bridget has partitioned data into two subsets.The original file contains 300,000 observations.The subset she is currently working with has 60,000 observations.Which subset is she most likely to be using?

A)The training set
B)The original set
C)The testing set
D)The prediction set
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
29
Suppose the odds of Team A winning are 5 to 1.Then,the odds ratio is

A)5/1.
B)1/5.
C)6/1.
D)1/6.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
30
In K-Means clustering,K refers to the

A)size of the population.
B)size of the sample.
C)number of clusters.
D)size of each cluster.
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.
فتح الحزمة
k this deck
locked card icon
فتح الحزمة
افتح القفل للوصول البطاقات البالغ عددها 30 في هذه المجموعة.