Deck 14: Big Data Analytics and NoSQL
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/76
Play
Full screen (f)
Deck 14: Big Data Analytics and NoSQL
1
The ability to graphically data in a way that makes it understandable is the concept of value.
False
2
Key-value and document databases are structurally similar.
True
3
Flume is a tool for converting data back and forth between a relational database and the HDFS.
False
4
For a data set to be considered Big Data,it must display all the "3 Vs" - volume,velocity and variety.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
5
One tenet of Big Data is that all data that is capable of being captured should be.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
6
Most NoSQL products run only in a Linux or Unix environment.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
7
The analysis of data to produce actionable results is feedback loop processing.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
8
A column-family database is a NoSQL database model that organizes data in key-value pairs with keys mapped to a set of columns in the value component.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
9
Under the HDFS system,using a write-one,ready-many model simplifies concurrency issues.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
10
In many ways,the issues of associated with volume and velocity are the same.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
11
Characteristics that are important in working with data in the relational database model also apply to Big Data.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
12
Hadoop is a high-level tool that requires little effort to create,manage and use.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
13
Interest in graph databases can be tied to the area of social networks.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
14
Relational databases rely on unstructured data.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
15
A block report is used to let the name node know that the data mode is still available.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
16
Hadoop is a database that has become the de facto standard for most Big Data storage and processing.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
17
A reduce function takes a collection of key-value pairs with the same key value and summarizes them into a single result.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
18
Hive is a good choice for jobs that require a small subset of data to be returned very quickly.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
19
Scaling out is keeping the same number of systems,but migrating each system to a larger one.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
20
Much ambiguity exists in defining Big Data.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
21
___ was the first SQL-on-Hadoop application.
A)Flume
B)Pig
C)Sqoop
D)Impala
A)Flume
B)Pig
C)Sqoop
D)Impala
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
22
_ ___ is keeping the same number of systems,but migrating each system to a larger system.
A)Clustering
B)Scaling up
C)Streaming
D)Scaling out
A)Clustering
B)Scaling up
C)Streaming
D)Scaling out
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
23
Explanatory analytics uses predictive analytics as a stepping stone to create explanatory models.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
24
A(n)__ is a process or set of operations in a calculation.
A)algorithm
B)feedback loop
C)stream
D)structure
A)algorithm
B)feedback loop
C)stream
D)structure
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
25
Which of the following is NOT a key assumption of the Hadoop Distributed File System?
A)High volume
B)Write-many,read-once
C)Streaming access
D)Fault-tolerance
A)High volume
B)Write-many,read-once
C)Streaming access
D)Fault-tolerance
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
26
Data mining focuses on the discovery and explanation stages of knowledge acquisition.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
27
When using MapReduce,best practices suggest that the number of mappers on a given node should be:
A)100 or more
B)100 or less
C)50 or less
D)at least 300
A)100 or more
B)100 or less
C)50 or less
D)at least 300
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
28
__ ___ focuses on filtering data as it enters the system to determine which data to keep and which to discard.
A)Scaling up
B)Feedback loop processing
C)Stream processing
D)Scaling out
A)Scaling up
B)Feedback loop processing
C)Stream processing
D)Scaling out
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
29
_ _ is NOT one of the "3 Vs" of Big Data.
A)Volume
B)Velocity
C)Validation
D)Variety
A)Volume
B)Velocity
C)Validation
D)Variety
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
30
When using MapReduce,a _______ function takes a collection and data and sorts and filters it into a set of key-value pairs.
A)reduce
B)map
C)data
D)block
A)reduce
B)map
C)data
D)block
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
31
___ is a tool for converting data back and forth between a relational database and the HDFS.
A)Flume
B)Pig
C)Sqoop
D)Impala
A)Flume
B)Pig
C)Sqoop
D)Impala
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
32
In the context of Big Data,_____ relates to differences in meaning.
A)variety
B)variability
C)veracity
D)viability
A)variety
B)variability
C)veracity
D)viability
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
33
Two of the most popular applications to simplify the process of creating MapReduce jobs are Hive and
A)Flume
B)Pig
C)Sqoop
D)Impala
A)Flume
B)Pig
C)Sqoop
D)Impala
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
34
Big Data:
A)relies on the use of structured data
B)captures data in whatever format it naturally exists
C)relies on the use of unstructured data
D)imposes a structure on data when it is captured
A)relies on the use of structured data
B)captures data in whatever format it naturally exists
C)relies on the use of unstructured data
D)imposes a structure on data when it is captured
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
35
When using a HDFS,the _____ node creates new files by communicating with the ____ node.
A)client,name
B)name,client
C)client,data
D)data,client
A)client,name
B)name,client
C)client,data
D)data,client
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
36
In the context of Big Data,_____ refers to the trustworthiness of a set of data.
A)value
B)variability
C)veracity
D)viability
A)value
B)variability
C)veracity
D)viability
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
37
Which of the following is NOT one of the standard NoSQL categories?
A)document databases
B)column-oriented databases
C)graph databases
D)chart databases
A)document databases
B)column-oriented databases
C)graph databases
D)chart databases
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
38
processing occurs when a program runs from beginning to end without any user interaction.
A)Hadoop
B)Block
C)Hive
D)Batch
A)Hadoop
B)Block
C)Hive
D)Batch
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
39
By default,Hadoop uses a replication factor of:
A)one
B)two
C)three
D)four
A)one
B)two
C)three
D)four
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
40
When using a HDFS,a heartbeat is sent every _____ to notify the name node that the data mode is still available.
A)3 hours
B)3 seconds
C)6 hours
D)6 seconds
A)3 hours
B)3 seconds
C)6 hours
D)6 seconds
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
41
A method of text analysis that attempts to determine if a statement conveys a positive,negative,or neutral attitude is referred to as ___ analysis.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
42
Scaling out is also referred to as .
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
43
uses statistical analysis to answer questions about the how and why of relationships.
A)Explanatory analytics
B)Data mining
C)Predictive analytics
D)Knowledge acquisition
A)Explanatory analytics
B)Data mining
C)Predictive analytics
D)Knowledge acquisition
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
44
Within Hadoop, can transfer data in both directions - into and out of HDFS.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
45
languages allow the user to specify what they want,not how to get it which is very useful for query processing.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
46
Most organizations that use Hadoop also use a set of other related products that interact and complement each other to produce an entire ______ of applications and tools.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
47
The goal of the _____ phase of data mining is to identify common data characteristics or patterns.
A)data preparation
B)data analysis and classification
C)knowledge acquisition
D)prognosis
A)data preparation
B)data analysis and classification
C)knowledge acquisition
D)prognosis
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
48
uses statistical tools to answer questions about future data occurrences.
A)Explanatory analytics
B)Data mining
C)Predictive analytics
D)Knowledge acquisition
A)Explanatory analytics
B)Data mining
C)Predictive analytics
D)Knowledge acquisition
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
49
Within Hadoop, is used for producing data pipeline tasks that transform data in a series of steps.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
50
Most BI vendors are dropping the term "data mining" and replacing it with the term:
A)explanatory analytics
B)data analytics
C)predictive analytics
D)knowledge acquisition
A)explanatory analytics
B)data analytics
C)predictive analytics
D)knowledge acquisition
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
51
Document databases group documents into logical groups called:
A)buckets
B)sets
C)collections
D)blocks
A)buckets
B)sets
C)collections
D)blocks
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
52
is the coexistence of a variety of data storage and data management technologies within an organization's infrastructure.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
53
______ refers to the analysis of the data to produce actionable results.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
54
To query the value component of the pair when using a key-value database,use get or:
A)store
B)fetch
C)retrieve
D)gather
A)store
B)fetch
C)retrieve
D)gather
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
55
databases simply store data with no attempt to understand the contents of the value component or its meaning.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
56
______minimizes the number of disk reads necessary to retrieve a row of data.
A)Column-oriented database
B)Row-centric storage
C)Column-family database
D)Column-centric storage
A)Column-oriented database
B)Row-centric storage
C)Column-family database
D)Column-centric storage
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
57
The end user decides what techniques to apply to the data when using the _____ mode of data mining
A)guided
B)prognosis
C)directed
D)automated
A)guided
B)prognosis
C)directed
D)automated
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
58
Within MapReduce,a runs maps and reduces functions.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
59
Modeling and storing data about relationships is the focus of:
A)key-value databases
B)column-oriented databases
C)document databases
D)graph databases
A)key-value databases
B)column-oriented databases
C)document databases
D)graph databases
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
60
is the Big Data "3 V" that relates to the speed at which data is entering the system.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
61
In the ____ phase of data mining,findings are used to predict future behavior and forecast business outcomes.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
62
Discuss the need for a Hadoop ecosystem and identify the key components.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
63
In a column family database,a column that is composed of a group of other related columns is called a(n)______.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
64
Explain the concept of data analytics.What are the various tools of data analytics?
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
65
Discuss the "3 Vs" of Big Data.How has the definition of Big Data regarding these items changed over time?
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
66
Discuss NewSQL and what does it attempts to do.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
67
The origins of ____ can be traced back to the banking and credit card industries.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
68
In a graph database,the representation of a relationship between nodes is called a(n)_____.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
69
A query in a graph database is called a(n)_____.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
70
is a continuous spectrum of knowledge acquisition that goes from discovery to explanation to prediction..
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
71
A database model that attempts to provide ACID-compliant transactions across a highly distributed infrastructure is _____.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
72
What is NoSQL and what are the major NoSQL approaches (categories)?
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
73
do not store relationships as perceived in the relational model and generally have no support for join operations.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
74
is a human-readable text format for data interchange that defines attributes and values in a document.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
75
Define the four key assumptions of the Hadoop Distributed File System (HDFS).
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck
76
refers to traditional,relational database technologies that use column-centric,not row-centric storage.
Unlock Deck
Unlock for access to all 76 flashcards in this deck.
Unlock Deck
k this deck