A credit card company wants to build a credit scoring model to help predict whether a new credit card applicant will default on a credit card payment. The company has collected data from a large number of sources with thousands of raw attributes. Early experiments to train a classification model revealed that many attributes are highly correlated, the large number of features slows down the training speed significantly, and that there are some overfitting issues. The Data Scientist on this project would like to speed up the model training time without losing a lot of information from the original dataset. Which feature engineering technique should the Data Scientist use to meet the objectives?
A) Run self-correlation on all features and remove highly correlated features
B) Normalize all numerical values to be between 0 and 1
C) Use an autoencoder or principal component analysis (PCA) to replace original features with new features
D) Cluster raw data using k-means and use sample data from each cluster to build a new dataset
Correct Answer:
Verified
Q68: An agency collects census information within a
Q69: A web-based company wants to improve its
Q70: A Machine Learning Specialist works for a
Q71: A Machine Learning team runs its own
Q72: A Machine Learning Specialist is given a
Q74: A Machine Learning Specialist previously trained a
Q75: A trucking company is collecting live image
Q76: A Data Scientist needs to analyze employment
Q77: A Machine Learning Specialist is training a
Q78: A company uses a long short-term memory
Unlock this Answer For Free Now!
View this answer and more for free by performing one of the following actions
Scan the QR code to install the App and get 2 free unlocks
Unlock quizzes for free by uploading documents