You want to build a managed Hadoop system as your data lake. The data transformation process is composed of a series of Hadoop jobs executed in sequence. To accomplish the design of separating storage from compute, you decided to use the Cloud Storage connector to store all input data, output data, and intermediary data. However, you noticed that one Hadoop job runs very slowly with Cloud Dataproc, when compared with the on-premises bare-metal Hadoop environment (8-core nodes with 100-GB RAM) . Analysis shows that this particular Hadoop job is disk I/O intensive. You want to resolve the issue. What should you do?
A) Allocate sufficient memory to the Hadoop cluster, so that the intermediary data of that particular Hadoop job can be held in memory
B) Allocate sufficient persistent disk space to the Hadoop cluster, and store the intermediate data of that particular Hadoop job on native HDFS
C) Allocate more CPU cores of the virtual machine instances of the Hadoop cluster so that the networking bandwidth for each instance can scale up
D) Allocate additional network interface card (NIC) , and configure link aggregation in the operating system to use the combined throughput when working with Cloud Storage
Correct Answer:
Verified
Q132: Your company is in the process of
Q133: You are integrating one of your internal
Q134: You work for an advertising company, and
Q135: You have a petabyte of analytics data
Q136: You need to create a data pipeline
Q138: You currently have a single on-premises Kafka
Q139: Flowlogistic Case Study Company Overview Flowlogistic is
Q140: You are responsible for writing your company's
Q141: You work for a large real estate
Q142: You need to store and analyze social
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents