Question 1

Hadoop is a database that has become the de facto standard for most Big Data storage and processing.

Accepted Answer

Hadoop has become the most widely used platform for storing and processing Big Data, making it the de facto standard in the industry.

Question 2

In many ways, the issues associated with volume and velocity are the same.&#8203;

Accepted Answer

Both volume and velocity refer to the amount and speed of data being processed, analyzed, and stored. Businesses must be equipped to handle high volumes of data at a fast pace in real-time scenarios. Therefore, the issues associated with both are quite similar.

Question 3

Characteristics that are important in working with data in the relational database model also apply to Big Data.

Accepted Answer

The relational database model and Big Data share many characteristics, such as the need for data integrity, consistency, scalability, and the ability to handle large volumes of data efficiently. Therefore, the principles and best practices used in the relational database model can be applied to Big Data as well.

Question 4

&#8203;&#8203;A block report is used to let the name node know that the data mode is still available.

Accepted Answer

A block report is used to let the NameNode know which blocks are currently stored on a DataNode, not to indicate the availability of the DataNode itself.

Question 5

&#8203;Scaling out is keeping the same number of systems, but migrating each system to a larger one.

Accepted Answer

Scaling out, also known as horizontal scaling, involves adding more systems (e.g., servers) to a pool to handle increased load, rather than upgrading the existing systems to larger ones, which is known as scaling up or vertical scaling.

Question 6

A reduce function takes a collection of key-value pairs with the same key value and summarizes them into a single result.

Accepted Answer

This is the definition of a reduce function, which is commonly used in data processing and analysis. The key-value pairs with the same key value are combined using an operation specified by the user, such as summing up the values or finding the maximum value. The result is a single value that represents the summarized information.

Question 7

&#8203;&#8203;&#8203;Flume is a tool for converting data back and forth between a relational database and the HDFS.

Accepted Answer

Flume is primarily used for ingesting unstructured data from various sources into HDFS or other destinations, it doesn't directly deal with relational databases. However, there are plugins available for Flume that can help with extracting data from databases and sending it to HDFS.

Question 8

The ability to graphically presentdata in a way that makes it understandable is the concept of value.

Accepted Answer

The ability to graphically present data in a way that makes it understandable is referred to as data visualization, not the concept of value.

Question 9

Hive is a good choice for jobs that require a small subset of data to be returned very quickly.&#8203;

Accepted Answer

Hive is designed for processing large datasets and is optimized for batch processing. It may not be the best choice for jobs that require quick response times or real-time processing of small subsets of data.

Question 10

&#8203;&#8203;Relational databases rely on unstructured data.

Accepted Answer

Relational databases rely on structured data, meaning data that is organized in a predefined structure such as tables with columns and rows. Unstructured data, on the other hand, refers to data that does not have a predefined structure such as emails, videos, and images. While some relational databases may have the capability to store and analyze unstructured data, their primary focus is on structured data.

Question 11

&#8203;Most NoSQL products run only in a Linux or Unix environment.

Accepted Answer

The answer of &#8203;Most NoSQL products run only in a...

Question 12

&#8203;&#8203;&#8203;Under the HDFS system, using a write-one, ready-many model simplifies concurrency issues.

Accepted Answer

The answer of &#8203;&#8203;&#8203;Under the HDFS system, using a write-one,...

Question 13

Key-value and document databases are structurally similar.

Accepted Answer

The answer of Key-value and document databases are structurally similar....

Question 14

Big Data processing imposes a structure on the data as needed for applications as a part of retrieval and processing.

Accepted Answer

The answer of Big Data processing imposes a structure on...

Question 15

A column family database is a NoSQL database model that organizes data in key-value pairs with keys mapped to a set of columns in the value component.

Accepted Answer

The answer of A column family database is a NoSQL...

Question 16

Interest in graph databases can be tied to the area of social networks.&#8203;

Accepted Answer

The answer of Interest in graph databases can be tied...

Question 17

The name, MongoDB, comes from the word humongous as its developers intended their new product to support extremely large data sets.

Accepted Answer

The answer of The name, MongoDB, comes from the word...

Question 18

For a data set to be considered Big Data, it must display only one of the 3Vs (volume, velocity and variety).

Accepted Answer

The answer of For a data set to be considered...

Question 19

The analysis of data to produce actionable results is feedback loop processing.

Accepted Answer

The answer of The analysis of data to produce actionable...

Question 20

Lack of specificity is what leads to ambiguity in defining Big Data.

Accepted Answer

The answer of Lack of specificity is what leads to...

Question 21

Which of the following is NOT one of the standard NoSQL categories?&#10;A) Document databases&#10;B) Column-oriented databases&#10;C) Graph databases&#10;D) Chart databases

Accepted Answer

The answer of Which of the following is NOT one...

Question 22

______ was the first SQL on Hadoop application.&#10;A) Flume&#10;B) Pig&#10;C) Sqoop&#10;D) Impala

Accepted Answer

The answer of ______ was the first SQL on Hadoop...

Question 23

When using MapReduce,best practices suggest that the number of mappers on a given node should be ______.&#10;A) 50 or less&#10;B) over 100 but less than 300&#10;C) 100 or less&#10;D) at least 300

Accepted Answer

The answer of When using MapReduce,best practices suggest that the...

Question 24

______processing occurs when a program runs from beginning to end without any user interaction.&#10;A) Hadoop&#10;B) Block&#10;C) Hive&#10;D) Batch

Accepted Answer

The answer of ______processing occurs when a program runs from...

Question 25

When using a HDFS, a heartbeat is sent every ______ to notify the name node that the data mode is still available.&#10;A) 3 hours&#10;B) 3 seconds&#10;C) 6 hours&#10;D) 6 seconds

Accepted Answer

The answer of When using a HDFS, a heartbeat is...

Question 26

______is NOT one of the &#34;3 Vs&#34; of Big Data.&#10;A) Volume&#10;B) Velocity&#10;C) Validation&#10;D) Variety

Accepted Answer

The answer of ______is NOT one of the &#34;3 Vs&#34;...

Question 27

Which of the following is NOT a key assumption of the Hadoop Distributed File System?&#10;A) High volume&#10;B) Write many,read-once&#10;C) Streaming access&#10;D) Fault-tolerance

Accepted Answer

The answer of Which of the following is NOT a...

Question 28

______ is a tool for converting data back and forth between a relational database and the HDFS.&#10;A) &#8203;Flume&#10;B) &#8203;Pig&#10;C) &#8203;Sqoop&#10;D) &#8203;Impala

Accepted Answer

The answer of ______ is a tool for converting data...

Question 29

In the context of Big Data, ______ relates to changes in meaning.&#10;A) variety&#10;B) variability&#10;C) veracity&#10;D) viability

Accepted Answer

The answer of In the context of Big Data, ______...

Question 30

Big Data ______.&#10;A) relies on the use of structured data&#10;B) captures data in whatever format it naturally exists&#10;C) relies on the use of unstructured data&#10;D) imposes a structure on data when it is captured

Accepted Answer

The answer of Big Data ______.&#10;A) relies on the use...

Question 31

By default, Hadoop uses a replication factor of ______.&#10;A) one&#10;B) two&#10;C) three&#10;D) four

Accepted Answer

The answer of By default, Hadoop uses a replication factor...

Question 32

&#8203;A(n) ______ is a process or set of operations in a calculation.&#10;A) &#8203;algorithm&#10;B) &#8203;feedback loop&#10;C) &#8203;stream&#10;D) &#8203;structure

Accepted Answer

The answer of &#8203;A(n) ______ is a process or set...

Question 33

In the context of Big Data, ______ refers to the trustworthiness of a set of data.&#10;A) value&#10;B) variability&#10;C) veracity&#10;D) viability

Accepted Answer

The answer of In the context of Big Data, ______...

Question 34

______ is keeping the same number of systems, but migrating each system to a larger system.&#10;A) Clustering&#10;B) Scaling up&#10;C) Streaming&#10;D) Scaling out

Accepted Answer

The answer of ______ is keeping the same number of...

Question 35

When using a HDFS, the ______node creates new files by communicating with the______ node.&#10;A) client; name&#10;B) data; name&#10;C) data; client&#10;D) host; client

Accepted Answer

The answer of When using a HDFS, the ______node creates...

Question 36

Two of the most popular applications to simplify the process of creating MapReduce jobs are Hive and ______.&#10;A) Flume&#10;B) Pig&#10;C) Sqoop&#10;D) Impala

Accepted Answer

The answer of Two of the most popular applications to...

Question 37

To query the value component of the pair when using a key-value database, use get or ______.&#10;A) store&#10;B) fetch&#10;C) retrieve&#10;D) gather

Accepted Answer

The answer of To query the value component of the...

Question 38

______ focuses on filtering data as it enters the system to determine which data to keep and which to discard.&#8203;&#10;A) &#8203;Scaling up&#10;B) &#8203;Feedback loop processing&#10;C) &#8203;Stream processing&#10;D) &#8203;Scaling out

Accepted Answer

The answer of ______ focuses on filtering data as it...

Question 39

&#8203;When using MapReduce, a _______ function takes a collection and data and sorts and filters it into a set of key-value pairs.&#10;A) &#8203;reduce&#10;B) &#8203;map&#10;C) &#8203;data&#10;D) &#8203;block

Accepted Answer

The answer of &#8203;When using MapReduce, a _______ function takes...

Question 40

Document databases group documents into logical groups called ______.&#10;A) buckets&#10;B) sets&#10;C) collections&#10;D) blocks

Accepted Answer

The answer of Document databases group documents into logical groups...

Question 41

A(n)______ is a tag that is used to associate a collection of nodes as being of the same type or belonging to the same group.&#10;A) edge&#10;B) key&#10;C) label&#10;D) bucket

Accepted Answer

The answer of A(n)______ is a tag that is used...

Question 42

A method of text analysis that attempts to determine if a statement conveys a positive, negative, or neutral attitude is referred to as______analysis.

Accepted Answer

The answer of A method of text analysis that attempts...

Question 43

&#8203;______ isthe coexistence of a variety of data storage and data management technologies within an organization's infrastructure.

Accepted Answer

The answer of &#8203;______ isthe coexistence of a variety of...

Question 44

_______ &#8203;languages allow the user to specify what they want, not how to get it which is very useful for query processing.

Accepted Answer

The answer of _______ &#8203;languages allow the user to specify...

Question 45

Data collected or aggregated around a central topic or entity is said to be ______ aware.&#10;A) aggregate&#10;B) transversally&#10;C) feedback&#10;D) visually

Accepted Answer

The answer of Data collected or aggregated around a central...

Question 46

______refers to the analysis of the data to produce actionable results.

Accepted Answer

The answer of ______refers to the analysis of the data...

Question 47

Graph theory is a mathematical and computer science field that models relationships, or edges, between objects called ______.&#10;A) maps&#10;B) scales&#10;C) buckets&#10;D) nodes

Accepted Answer

The answer of Graph theory is a mathematical and computer...

Question 48

&#8203;______minimizes the number of disk reads necessary to retrieve a row of data.&#10;A) C&#8203;olumn-oriented database&#10;B) &#8203;Row-centric storage&#10;C) Column-family database&#10;D) &#8203;Column-centric storage

Accepted Answer

The answer of &#8203;______minimizes the number of disk reads necessary...

Question 49

&#8203;WithinHadoop, is used for producing data pipeline tasks that transform data in a series of steps.

Accepted Answer

The answer of &#8203;WithinHadoop, is used for producing data pipeline...

Question 50

&#8203;WithinHadoop, can transfer data in both directions - into and out of HDFS.

Accepted Answer

The answer of &#8203;WithinHadoop, can transfer data in both directions...

Question 51

In MongoDB, ______ method retrieves objects from a collection that match the restrictions provided.&#10;A) count*&#10;B) read*&#10;C) review[]&#10;D) find()

Accepted Answer

The answer of In MongoDB, ______ method retrieves objects from...

Question 52

A ______ is a programmed function within an object used to manipulate the data in that same object.&#10;A) batch&#10;B) method&#10;C) block&#10;D) node

Accepted Answer

The answer of A ______ is a programmed function within...

Question 53

Neo4j is a ______ database.&#10;A) graph&#10;B) column family&#10;C) key-value&#10;D) row-centric

Accepted Answer

The answer of Neo4j is a ______ database.&#10;A) graph&#10;B) column...

Question 54

Modeling and storing data about relationships is the focus of ______ databases.&#10;A) key-value&#10;B) column-oriented&#10;C) document d&#10;D) graph

Accepted Answer

The answer of Modeling and storing data about relationships is...

Question 55

Within MapReduce, a(n) ______ runs maps and reduces tasks on nodes.

Accepted Answer

The answer of Within MapReduce, a(n) ______ runs maps and...

Question 56

Mostorganizations that use Hadoop also use a set of other related products that interact and complement each other to produce an entire ______ of applications and tools.

Accepted Answer

The answer of Mostorganizations that use Hadoop also use a...

Question 57

______ is the Big Data 3 V that relates to the speed at which data is entering the system.

Accepted Answer

The answer of ______ is the Big Data 3 V...

Question 58

In MongoDB, the______ method is used to improve the readability of retrieved documents through the use of line breaks and indention.&#10;A) pretty()&#10;B) clean*&#10;C) break[]&#10;D) filter+

Accepted Answer

The answer of In MongoDB, the______ method is used to...

Question 59

Scaling out is also referred to as _______.

Accepted Answer

The answer of Scaling out is also referred to as...

Question 60

A query in a graph database is called a ______.&#10;A) schema&#10;B) hierarchy&#10;C) traversal&#10;D) script

Accepted Answer

The answer of A query in a graph database is...

Question 61

In acolumn family database, a column that is composed of a group of other related columns is called a(n) ______.

Accepted Answer

The answer of In acolumn family database, a column that...

Question 62

Discuss NewSQL and what it attempts to do.&#8203;

Accepted Answer

The answer of Discuss NewSQL and what it attempts to...

Question 63

Adatabase model that attempts to provide ACID-compliant transactions across a highly distributed infrastructure is ______.

Accepted Answer

The answer of Adatabase model that attempts to provide ACID-compliant...

Question 64

______ is used to extract knowledge from sources of data-NoSQL databases, Hadoop data stores, and data warehouses-to provide decision support to all organizational users.

Accepted Answer

The answer of ______ is used to extract knowledge from...

Deck 14: Big Data and Nosql