A company has a business unit uploading .csv files to an Amazon S3 bucket. The company's data platform team has set up an AWS Glue crawler to do discovery, and create tables and schemas. An AWS Glue job writes processed data from the created tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creating the Amazon Redshift table appropriately. When the AWS Glue job is rerun for any reason in a day, duplicate records are introduced into the Amazon Redshift table. Which solution will update the Redshift table without duplicates when jobs are rerun?
A) Modify the AWS Glue job to copy the rows into a staging table. Add SQL commands to replace the existing rows in the main table as postactions in the DynamicFrameWriter class.
B) Load the previously inserted data into a MySQL database in the AWS Glue job. Perform an upsert operation in MySQL, and copy the results to the Amazon Redshift table.
C) Use Apache Spark's DataFrame dropDuplicates() API to eliminate duplicates and then write the data to Amazon Redshift.
D) Use the AWS Glue ResolveChoice built-in transform to select the most recent value of the column.
Correct Answer:
Verified
Q33: A large financial company is running its
Q34: A company that produces network devices has
Q35: A transportation company uses IoT sensors attached
Q36: A company leverages Amazon Athena for ad-hoc
Q37: An insurance company has raw data in
Q39: A company wants to improve the data
Q40: A data analyst is using AWS Glue
Q41: A company needs to store objects containing
Q42: A banking company wants to collect large
Q43: A retail company is building its data
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents