Deck 14: Big Data and Nosql

Full screen (f)
exit full mode
Question
Hadoop is a database that has become the de facto standard for most Big Data storage and processing.
Use Space or
up arrow
down arrow
to flip the card.
Question
In many ways, the issues associated with volume and velocity are the same.​
Question
Characteristics that are important in working with data in the relational database model also apply to Big Data.
Question
​​A block report is used to let the name node know that the data mode is still available.
Question
​Scaling out is keeping the same number of systems, but migrating each system to a larger one.
Question
A reduce function takes a collection of key-value pairs with the same key value and summarizes them into a single result.
Question
​​​Flume is a tool for converting data back and forth between a relational database and the HDFS.
Question
The ability to graphically presentdata in a way that makes it understandable is the concept of value.
Question
Hive is a good choice for jobs that require a small subset of data to be returned very quickly.​
Question
​​Relational databases rely on unstructured data.
Question
​Most NoSQL products run only in a Linux or Unix environment.
Question
​​​Under the HDFS system, using a write-one, ready-many model simplifies concurrency issues.
Question
Key-value and document databases are structurally similar.
Question
Big Data processing imposes a structure on the data as needed for applications as a part of retrieval and processing.
Question
A column family database is a NoSQL database model that organizes data in key-value pairs with keys mapped to a set of columns in the value component.
Question
Interest in graph databases can be tied to the area of social networks.​
Question
The name, MongoDB, comes from the word humongous as its developers intended their new product to support extremely large data sets.
Question
For a data set to be considered Big Data, it must display only one of the 3Vs (volume, velocity and variety).
Question
The analysis of data to produce actionable results is feedback loop processing.
Question
Lack of specificity is what leads to ambiguity in defining Big Data.
Question
Which of the following is NOT one of the standard NoSQL categories?

A) Document databases
B) Column-oriented databases
C) Graph databases
D) Chart databases
Question
______ was the first SQL on Hadoop application.

A) Flume
B) Pig
C) Sqoop
D) Impala
Question
When using MapReduce,best practices suggest that the number of mappers on a given node should be ______.

A) 50 or less
B) over 100 but less than 300
C) 100 or less
D) at least 300
Question
______processing occurs when a program runs from beginning to end without any user interaction.

A) Hadoop
B) Block
C) Hive
D) Batch
Question
When using a HDFS, a heartbeat is sent every ______ to notify the name node that the data mode is still available.

A) 3 hours
B) 3 seconds
C) 6 hours
D) 6 seconds
Question
______is NOT one of the "3 Vs" of Big Data.

A) Volume
B) Velocity
C) Validation
D) Variety
Question
Which of the following is NOT a key assumption of the Hadoop Distributed File System?

A) High volume
B) Write many,read-once
C) Streaming access
D) Fault-tolerance
Question
______ is a tool for converting data back and forth between a relational database and the HDFS.

A) ​Flume
B) ​Pig
C) ​Sqoop
D) ​Impala
Question
In the context of Big Data, ______ relates to changes in meaning.

A) variety
B) variability
C) veracity
D) viability
Question
Big Data ______.

A) relies on the use of structured data
B) captures data in whatever format it naturally exists
C) relies on the use of unstructured data
D) imposes a structure on data when it is captured
Question
By default, Hadoop uses a replication factor of ______.

A) one
B) two
C) three
D) four
Question
​A(n) ______ is a process or set of operations in a calculation.

A) ​algorithm
B) ​feedback loop
C) ​stream
D) ​structure
Question
In the context of Big Data, ______ refers to the trustworthiness of a set of data.

A) value
B) variability
C) veracity
D) viability
Question
______ is keeping the same number of systems, but migrating each system to a larger system.

A) Clustering
B) Scaling up
C) Streaming
D) Scaling out
Question
When using a HDFS, the ______node creates new files by communicating with the______ node.

A) client; name
B) data; name
C) data; client
D) host; client
Question
Two of the most popular applications to simplify the process of creating MapReduce jobs are Hive and ______.

A) Flume
B) Pig
C) Sqoop
D) Impala
Question
To query the value component of the pair when using a key-value database, use get or ______.

A) store
B) fetch
C) retrieve
D) gather
Question
______ focuses on filtering data as it enters the system to determine which data to keep and which to discard.​

A) ​Scaling up
B) ​Feedback loop processing
C) ​Stream processing
D) ​Scaling out
Question
​When using MapReduce, a _______ function takes a collection and data and sorts and filters it into a set of key-value pairs.

A) ​reduce
B) ​map
C) ​data
D) ​block
Question
Document databases group documents into logical groups called ______.

A) buckets
B) sets
C) collections
D) blocks
Question
A(n)______ is a tag that is used to associate a collection of nodes as being of the same type or belonging to the same group.

A) edge
B) key
C) label
D) bucket
Question
A method of text analysis that attempts to determine if a statement conveys a positive, negative, or neutral attitude is referred to as______analysis.
Question
​______ isthe coexistence of a variety of data storage and data management technologies within an organization's infrastructure.
Question
_______ ​languages allow the user to specify what they want, not how to get it which is very useful for query processing.
Question
Data collected or aggregated around a central topic or entity is said to be ______ aware.

A) aggregate
B) transversally
C) feedback
D) visually
Question
______refers to the analysis of the data to produce actionable results.
Question
Graph theory is a mathematical and computer science field that models relationships, or edges, between objects called ______.

A) maps
B) scales
C) buckets
D) nodes
Question
​______minimizes the number of disk reads necessary to retrieve a row of data.

A) C​olumn-oriented database
B) ​Row-centric storage
C) Column-family database
D) ​Column-centric storage
Question
​WithinHadoop, is used for producing data pipeline tasks that transform data in a series of steps.
Question
​WithinHadoop, can transfer data in both directions - into and out of HDFS.
Question
In MongoDB, ______ method retrieves objects from a collection that match the restrictions provided.

A) count*
B) read*
C) review[]
D) find()
Question
A ______ is a programmed function within an object used to manipulate the data in that same object.

A) batch
B) method
C) block
D) node
Question
Neo4j is a ______ database.

A) graph
B) column family
C) key-value
D) row-centric
Question
Modeling and storing data about relationships is the focus of ______ databases.

A) key-value
B) column-oriented
C) document d
D) graph
Question
Within MapReduce, a(n) ______ runs maps and reduces tasks on nodes.
Question
Mostorganizations that use Hadoop also use a set of other related products that interact and complement each other to produce an entire ______ of applications and tools.
Question
______ is the Big Data 3 V that relates to the speed at which data is entering the system.
Question
In MongoDB, the______ method is used to improve the readability of retrieved documents through the use of line breaks and indention.

A) pretty()
B) clean*
C) break[]
D) filter+
Question
Scaling out is also referred to as _______.
Question
A query in a graph database is called a ______.

A) schema
B) hierarchy
C) traversal
D) script
Question
In acolumn family database, a column that is composed of a group of other related columns is called a(n) ______.
Question
Discuss NewSQL and what it attempts to do.​
Question
Adatabase model that attempts to provide ACID-compliant transactions across a highly distributed infrastructure is ______.
Question
______ is used to extract knowledge from sources of data-NoSQL databases, Hadoop data stores, and data warehouses-to provide decision support to all organizational users.
Question
Discuss the need for a Hadoop ecosystem and identify the key components.
Question
_____ are like attributes; they are the data that we need to store about the node.
Question
In agraph database, the representation of a relationship between nodes is called a(n) ______.
Question
Discuss the 3 Vs of Big Data. How has the definition of Big Data regarding these items changed over time?
Question
______ do not store relationships as perceived in the relational model and generally have no support for join operations.
Question
______ is a human-readable text format for data interchange that defines attributes and values in a document.
Question
______databases simply store data with no attempt to understand the contents of the value component or its meaning.
Question
The interactive, declarative query language in Neo4j is called ______.
Question
What is NoSQL and what are the major NoSQL approaches (categories)?
Question
Define the four key assumptions of the Hadoop Distributed File System (HDFS).
Question
______ ​refers to traditional, relational database technologies that use column-centric, not row-centric storage.
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/75
auto play flashcards
Play
simple tutorial
Full screen (f)
exit full mode
Deck 14: Big Data and Nosql
1
Hadoop is a database that has become the de facto standard for most Big Data storage and processing.
True
2
In many ways, the issues associated with volume and velocity are the same.​
True
3
Characteristics that are important in working with data in the relational database model also apply to Big Data.
True
4
​​A block report is used to let the name node know that the data mode is still available.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
5
​Scaling out is keeping the same number of systems, but migrating each system to a larger one.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
6
A reduce function takes a collection of key-value pairs with the same key value and summarizes them into a single result.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
7
​​​Flume is a tool for converting data back and forth between a relational database and the HDFS.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
8
The ability to graphically presentdata in a way that makes it understandable is the concept of value.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
9
Hive is a good choice for jobs that require a small subset of data to be returned very quickly.​
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
10
​​Relational databases rely on unstructured data.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
11
​Most NoSQL products run only in a Linux or Unix environment.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
12
​​​Under the HDFS system, using a write-one, ready-many model simplifies concurrency issues.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
13
Key-value and document databases are structurally similar.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
14
Big Data processing imposes a structure on the data as needed for applications as a part of retrieval and processing.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
15
A column family database is a NoSQL database model that organizes data in key-value pairs with keys mapped to a set of columns in the value component.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
16
Interest in graph databases can be tied to the area of social networks.​
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
17
The name, MongoDB, comes from the word humongous as its developers intended their new product to support extremely large data sets.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
18
For a data set to be considered Big Data, it must display only one of the 3Vs (volume, velocity and variety).
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
19
The analysis of data to produce actionable results is feedback loop processing.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
20
Lack of specificity is what leads to ambiguity in defining Big Data.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
21
Which of the following is NOT one of the standard NoSQL categories?

A) Document databases
B) Column-oriented databases
C) Graph databases
D) Chart databases
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
22
______ was the first SQL on Hadoop application.

A) Flume
B) Pig
C) Sqoop
D) Impala
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
23
When using MapReduce,best practices suggest that the number of mappers on a given node should be ______.

A) 50 or less
B) over 100 but less than 300
C) 100 or less
D) at least 300
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
24
______processing occurs when a program runs from beginning to end without any user interaction.

A) Hadoop
B) Block
C) Hive
D) Batch
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
25
When using a HDFS, a heartbeat is sent every ______ to notify the name node that the data mode is still available.

A) 3 hours
B) 3 seconds
C) 6 hours
D) 6 seconds
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
26
______is NOT one of the "3 Vs" of Big Data.

A) Volume
B) Velocity
C) Validation
D) Variety
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
27
Which of the following is NOT a key assumption of the Hadoop Distributed File System?

A) High volume
B) Write many,read-once
C) Streaming access
D) Fault-tolerance
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
28
______ is a tool for converting data back and forth between a relational database and the HDFS.

A) ​Flume
B) ​Pig
C) ​Sqoop
D) ​Impala
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
29
In the context of Big Data, ______ relates to changes in meaning.

A) variety
B) variability
C) veracity
D) viability
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
30
Big Data ______.

A) relies on the use of structured data
B) captures data in whatever format it naturally exists
C) relies on the use of unstructured data
D) imposes a structure on data when it is captured
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
31
By default, Hadoop uses a replication factor of ______.

A) one
B) two
C) three
D) four
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
32
​A(n) ______ is a process or set of operations in a calculation.

A) ​algorithm
B) ​feedback loop
C) ​stream
D) ​structure
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
33
In the context of Big Data, ______ refers to the trustworthiness of a set of data.

A) value
B) variability
C) veracity
D) viability
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
34
______ is keeping the same number of systems, but migrating each system to a larger system.

A) Clustering
B) Scaling up
C) Streaming
D) Scaling out
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
35
When using a HDFS, the ______node creates new files by communicating with the______ node.

A) client; name
B) data; name
C) data; client
D) host; client
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
36
Two of the most popular applications to simplify the process of creating MapReduce jobs are Hive and ______.

A) Flume
B) Pig
C) Sqoop
D) Impala
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
37
To query the value component of the pair when using a key-value database, use get or ______.

A) store
B) fetch
C) retrieve
D) gather
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
38
______ focuses on filtering data as it enters the system to determine which data to keep and which to discard.​

A) ​Scaling up
B) ​Feedback loop processing
C) ​Stream processing
D) ​Scaling out
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
39
​When using MapReduce, a _______ function takes a collection and data and sorts and filters it into a set of key-value pairs.

A) ​reduce
B) ​map
C) ​data
D) ​block
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
40
Document databases group documents into logical groups called ______.

A) buckets
B) sets
C) collections
D) blocks
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
41
A(n)______ is a tag that is used to associate a collection of nodes as being of the same type or belonging to the same group.

A) edge
B) key
C) label
D) bucket
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
42
A method of text analysis that attempts to determine if a statement conveys a positive, negative, or neutral attitude is referred to as______analysis.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
43
​______ isthe coexistence of a variety of data storage and data management technologies within an organization's infrastructure.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
44
_______ ​languages allow the user to specify what they want, not how to get it which is very useful for query processing.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
45
Data collected or aggregated around a central topic or entity is said to be ______ aware.

A) aggregate
B) transversally
C) feedback
D) visually
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
46
______refers to the analysis of the data to produce actionable results.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
47
Graph theory is a mathematical and computer science field that models relationships, or edges, between objects called ______.

A) maps
B) scales
C) buckets
D) nodes
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
48
​______minimizes the number of disk reads necessary to retrieve a row of data.

A) C​olumn-oriented database
B) ​Row-centric storage
C) Column-family database
D) ​Column-centric storage
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
49
​WithinHadoop, is used for producing data pipeline tasks that transform data in a series of steps.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
50
​WithinHadoop, can transfer data in both directions - into and out of HDFS.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
51
In MongoDB, ______ method retrieves objects from a collection that match the restrictions provided.

A) count*
B) read*
C) review[]
D) find()
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
52
A ______ is a programmed function within an object used to manipulate the data in that same object.

A) batch
B) method
C) block
D) node
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
53
Neo4j is a ______ database.

A) graph
B) column family
C) key-value
D) row-centric
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
54
Modeling and storing data about relationships is the focus of ______ databases.

A) key-value
B) column-oriented
C) document d
D) graph
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
55
Within MapReduce, a(n) ______ runs maps and reduces tasks on nodes.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
56
Mostorganizations that use Hadoop also use a set of other related products that interact and complement each other to produce an entire ______ of applications and tools.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
57
______ is the Big Data 3 V that relates to the speed at which data is entering the system.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
58
In MongoDB, the______ method is used to improve the readability of retrieved documents through the use of line breaks and indention.

A) pretty()
B) clean*
C) break[]
D) filter+
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
59
Scaling out is also referred to as _______.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
60
A query in a graph database is called a ______.

A) schema
B) hierarchy
C) traversal
D) script
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
61
In acolumn family database, a column that is composed of a group of other related columns is called a(n) ______.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
62
Discuss NewSQL and what it attempts to do.​
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
63
Adatabase model that attempts to provide ACID-compliant transactions across a highly distributed infrastructure is ______.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
64
______ is used to extract knowledge from sources of data-NoSQL databases, Hadoop data stores, and data warehouses-to provide decision support to all organizational users.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
65
Discuss the need for a Hadoop ecosystem and identify the key components.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
66
_____ are like attributes; they are the data that we need to store about the node.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
67
In agraph database, the representation of a relationship between nodes is called a(n) ______.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
68
Discuss the 3 Vs of Big Data. How has the definition of Big Data regarding these items changed over time?
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
69
______ do not store relationships as perceived in the relational model and generally have no support for join operations.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
70
______ is a human-readable text format for data interchange that defines attributes and values in a document.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
71
______databases simply store data with no attempt to understand the contents of the value component or its meaning.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
72
The interactive, declarative query language in Neo4j is called ______.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
73
What is NoSQL and what are the major NoSQL approaches (categories)?
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
74
Define the four key assumptions of the Hadoop Distributed File System (HDFS).
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
75
______ ​refers to traditional, relational database technologies that use column-centric, not row-centric storage.
Unlock Deck
Unlock for access to all 75 flashcards in this deck.
Unlock Deck
k this deck
locked card icon
Unlock Deck
Unlock for access to all 75 flashcards in this deck.