You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include: Executing the transformations on a schedule Enabling non-developer analysts to modify transformations Providing a graphical tool for designing transformations What should you do?
A) Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis
B) Load each month's CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query
C) Help the analysts write a Cloud Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data's schema changes
D) Use Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery
Correct Answer:
Verified
Q120: You are building a model to predict
Q121: You have a query that filters a
Q122: You're using Bigtable for a real-time application,
Q123: You operate an IoT pipeline built around
Q124: You operate a logistics company, and you
Q126: You decided to use Cloud Datastore to
Q127: Data Analysts in your company have the
Q128: You are deploying MariaDB SQL databases on
Q129: You have an Apache Kafka cluster on-prem
Q130: You are designing storage for 20 TB
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents