Question 1

Which of the following statements a), b) or c) is false?&#10;A) Supervised machine learning falls into two categories-xe &#34;classification (machine learning)&#34;classification and xe &#34;regression&#34;regression.&#10;B) You train machine-learning models on datasets that consist of rows and columns. Each row represents a data feature. Each column represents a sample of that feature.&#10;C) In supervised machine learning, each sample has an associated label called a target (like &#34;spam&#34; or &#34;not spam&#34; for classifying e-mails). This is the value you're trying to predict for new data that you present to your models.&#10;D) All of the above statements are true.

Accepted Answer

Explanation: The correct description for datasets in machine learning is that each row represents a sample or instance, and each column represents a feature of that sample. Statement B incorrectly swaps these definitions.

Question 2

Which of the following statements a), b) or c) is false?&#10;A) The amount of data that's available today is already enormous and continues to grow exponentially-the data produced in the world in the last few years alone equals the amount produced up to that point since the dawn of civilization.&#10;B) People used to say &#34;I'm drowning in data and I don't know what to do with it. With machine learning, we now say, &#34;Flood me with big data so I can use machine-learning technology to extract insights and make predictions from it.&#34;&#10;C) The big data phenomenon is occurring at a time when computing power is exploding and computer memory and secondary storage are exploding in capacity while costs dramatically decline. This enables us to think differently about solution approaches.&#10;D) All of the above statements are true.

Accepted Answer

All the statements provided accurately reflect current trends and sentiments in the field of big data, machine learning, and computing technology. Statement A highlights the exponential growth of data, Statement B discusses the shift in perspective towards big data with the advent of machine learning, and Statement C points out the significant advancements in computing power and storage capacities alongside decreasing costs.

Question 3

Which of the following statements a), b) or c) is false?&#10;A) &#34;Toy&#34; datasets, generally have a small number of samples with a limited number of features. In the world of big data, datasets commonly have millions and billions of samples, or even more.&#10;B) There's an enormous number of free and open datasets available for data science studies. Libraries like scikit-learn bundle popular datasets for you to experiment with and provide mechanisms for loading datasets from various repositories (such as openml.org).&#10;C) Governments, businesses and other organizations worldwide offer datasets on a vast range of subjects.&#10;D) All of the above statements are true.

Accepted Answer

All three statements are true. Toy datasets are smaller in size, there are many free and open datasets available for data science, and various organizations offer datasets on different subjects.

Question 4

With regard to our code that displays 24 digit images, which of the following statements a), b) or c) is false?&#10;A) The following call to function subplots creates a 6-by-4 inch Figure (specified by the figsize=(6, 4) keyword argument) containing 24 subplots arranged in 6 rows and 4 columns: import matplotlib.pyplot as plt&#10;Figure, axes = plt.subplots(nrows=4, ncols=6, figsize=(6, 4))&#10;B) Each subplot has its own Axes object.&#10;C) Function subplots returns the Axes objects in a two-dimensional NumPy array.&#10;D) All of the above are true.

Accepted Answer

B and C are correct statements. A is false because the call to plt.subplots incorrectly specifies the number of rows and columns; it should be `nrows=6, ncols=4` for 24 subplots arranged in 6 rows and 4 columns, not `nrows=4, ncols=6` as stated.

Question 5

Which of the following statements is false?&#10;A) K-means clustering works through the data attempting to divide it into that many clusters.

Accepted Answer

K-means clustering is an iterative algorithm, not recursive. It updates the cluster centroids in each iteration based on the mean of the points assigned to each cluster until convergence, rather than calling itself recursively.

Question 6

Which of the following statements about scikit-learn and the machine-learning models you'll build with it is false?&#10;A) It's difficult to know in advance which model(s) will perform best on your data, so you typically try many models and pick the one that performs best-scikit-learn makes this convenient for you.&#10;B) You'll rarely get to know the details of the complex mathematical algorithms in the scikit-learn estimators, but with experience, you'll be able to intuit the best model for each new dataset.&#10;C) It generally takes at most a few lines of code for you to create and use each scikit-learn model.&#10;D) The models report their performance so you can compare the results and pick the model(s) with the best performance.

Accepted Answer

Scikit-learn is designed to be accessible and to abstract away the complexities of the underlying algorithms, making it easier for users to apply machine learning models without needing deep mathematical knowledge. However, understanding the details of the algorithms can significantly enhance a user's ability to choose and tune models effectively. The statement suggests that users will rarely know the details and rely on intuition, which is misleading as gaining insight into the algorithms is encouraged and beneficial for effective model selection and application.

Question 7

Which of the following statements is false?&#10;A) Classification in xe &#34;supervised machine learning&#34;supervised machine learning attempts to predict the distinct class to which a sample belongs.&#10;B) If you have images of dogs and images of cats, you can classify each image as a &#34;dog&#34; or a &#34;cat.&#34; This is a binary classification problem.&#10;C) When classifying digit images from the Digits dataset bundled with xe &#34;machine learning:scikit-learn&#34;xe &#34;scikit-learn (sklearn) machine-learning library&#34;scikit-learn, our goal is to predict which digit an image represents. Since there are 10 possible digits (the classes), this is a multi-classification problem.&#10;D) You train a classification model using unlabeled data.

Accepted Answer

It is not possible to train a classification model using unlabeled data because in classification, we need to have labeled data to train the model.

Question 8

Which of the following statements is false?
A) The following code tests a linear regression model using the data in X_test and checks some of the predictions throughout the dataset by displaying the predicted and expected values for every ________ element: predicted = linear_regression.predict(X_test)
Expected = y_test
For p, e in zip(predicted[::5], expected[::5]):
Print(f'predicted: {p:.2f}, expected: {e:.2f}')

A) second
B) fifth
C) pth
D) eth

Accepted Answer

The LinearRegression estimator requires numerical features to perform predictions. It cannot directly use nonnumerical (categorical) features without preprocessing them into a numerical format, such as through one-hot encoding.

Question 9

Which of the following statements is false?&#10;A) Scikit-learn's machine-learning algorithms require samples to be stored in a one-dimensional array of floating-point values (or one-dimensional array-like collection, such as a list).&#10;B) To represent every sample as one row, multi-dimensional data must be flattened into a one-dimensional array.&#10;C) If you work with a dataset containing categorical features (typically represented as strings, such as 'spam' or 'not-spam'), you have to preprocess those features into numerical valuesxe &#34;one-hot encoding[one hot encoding]&#34;.&#10;D) Scikit-learn's sklearn.preprocessing module provides capabilities for converting categorical data to numeric data.

Accepted Answer

A) Scikit-learn algorithms require input data to be in the form of a two-dimensional array (or array-like structure), where rows represent samples and columns represent features, not a one-dimensional array.

Question 10

Which of the following statements a), b) or c) is false?&#10;A) The simplest supervised machine-learning algorithm we use is k-means clustering.&#10;B) In k-means clustering, each cluster's centroid is the cluster's center point.&#10;C) You'll often run multiple clustering estimators to compare their ability to divide a dataset's samples effectively into clusters.&#10;D) All of the above statements are true.

Accepted Answer

K-means clustering is actually an unsupervised machine learning algorithm, not a supervised one.

Question 11

Which of the following statements a), b) or c) is false?&#10;A) Scikit-learn conveniently packages the most effective machine-learning algorithms as evaluators.&#10;B) Each scikit-learn algorithm is encapsulated, so you don't see its intricate details, including any heavy mathematics.&#10;C) With scikit-learn and a small amount of Python code, you can create powerful models quickly for analyzing data, extracting insights from the data and making predictions.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 12

Which of the following statements a), b) or c) is false?&#10;A) We can make machines learn.&#10;B) The &#34;secret sauce&#34; of machine learning is data-and lots of it.&#10;C) With machine learning, rather than programming expertise into our applications, we program them to learn from data.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 13

Which of the following statements is false?
A) The following code tests a linear regression model using the data in X_test and checks some of the predictions throughout the dataset by displaying the predicted and expected values for every ________ element: predicted = linear_regression.predict(X_test)
Expected = y_test
For p, e in zip(predicted[::5], expected[::5]):
Print(f'predicted: {p:.2f}, expected: {e:.2f}')

A) second
B) fifth
C) pth
D) eth

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 14

Unsupervised machine learning uses ________ algorithms.&#10;A) classification&#10;B) clustering&#10;C) regression&#10;D) None of the above

Accepted Answer

The answer of Unsupervised machine learning uses ________ algorithms.&#10;A) classification&#10;B)...

Question 15

Which of the following statements is false?
A) The following code tests a linear regression model using the data in X_test and checks some of the predictions throughout the dataset by displaying the predicted and expected values for every ________ element: predicted = linear_regression.predict(X_test)
Expected = y_test
For p, e in zip(predicted[::5], expected[::5]):
Print(f'predicted: {p:.2f}, expected: {e:.2f}')

A) second
B) fifth
C) pth
D) eth

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 16

Which of the following statements is false?&#10;A) The two main types of machine learning are xe &#34;supervised machine learning&#34;supervised machine learning, which works with unxe &#34;labeled data&#34;labeled data, and xe &#34;unsupervised machine learning&#34;unsupervised machine learning, which works with xe &#34;unlabeled data&#34;labeled data.&#10;B) If you're developing a computer vision application to recognize dogs and cats, you'll train your model on lots of dog photos labeled &#34;dog&#34; and cat photos labeled &#34;cat.&#34; If your model is effective, when you put it to work processing unlabeled photos it will recognize dogs and cats it has never seen before. The more photos you train with, the greater the chance that your model will accurately predict which new photos are dogs and which are cats.&#10;D) In this era of big data and massive, economical computer power, you should be able to build some pretty accurate machine learning models.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 17

Which of the following are not steps in a typical machine-learning case study?&#10;A) loading the dataset and exploring the data with pandas and visualizations&#10;B) transforming your data (converting non-numeric data to numeric data because scikit-learn requires numeric data) and splitting the data for training and testing&#10;C) creating, training and testing the model; tuning the model, evaluating its accuracy and making predictions on live data that the model hasn't seen before.&#10;D) All of the above are steps in a typical machine-learning case study.

Accepted Answer

The answer of Which of the following are not steps...

Question 18

Which of the following statements is false?&#10;A) Even though k-nearest neighbors is one of the most complex xe &#34;classification (machine learning)&#34;classification algorithms, because of its superior prediction accuracy we use it to analyze the Digits dataset bundled with scikit-learn.&#10;B) Classification algorithms predict the discrete classes (categories) to which samples belong.&#10;C) Binary classification uses two classes, such as &#34;spam&#34; or &#34;not spam&#34; in an e-mail classification application. Multi-classification uses more than two classes, such as the 10 classes, 0 through 9, in the Digits dataset.&#10;D) A classification scheme looking at movie descriptions might try to classify them as &#34;action,&#34; &#34;adventure,&#34; &#34;fantasy,&#34; &#34;romance,&#34; &#34;history&#34; and the like.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 19

Which of the following statements is false?&#10;A) In machine learning, a model implements a machine-learning algorithm. In xe &#34;machine learning:scikit-learn&#34;xe &#34;scikit-learn (sklearn) machine-learning library&#34;scikit-learn, models are called estimators.&#10;B) There are two parameter types in machine learning-those the estimator calculates as it learns from the data you provide and those you specify in advance when you create the scikit-learn estimator object that represents the model.&#10;C) The machine-learning parameters the estimator calculates as it learns from the data are called hyperparameters-in the k-nearest neighbors algorithm, k is a hyperparameter.&#10;D) For simplicity, we use scikit-learn's default hyperparameter values. In real-world machine-learning studies, you'll want to experiment with different values of k to produce the best possible models for your studies-this process is called hyperparameter tuning.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 20

Which of the following are related to compressing a dataset's large number of features down to two for visualization purposes.&#10;A) dimensionality reduction&#10;B) TSNE estimator&#10;C) PCA estimator&#10;D) All of the above.

Accepted Answer

The answer of Which of the following are related to...

Question 21

Which of the following statements a), b) or c) is false?&#10;A) The following code uses function xe &#34;sklearn.model_selection module:cross_val_score function&#34;xe &#34;cross_val_score function sklearn.model_selection&#34;cross_val_score to train and test a model: from sklearn.model_selection import cross_val_score&#10;Scores = cross_val_score(estimator=knn, X=digits.data,&#10;Y=digits.target, cv=kfold)&#10;B) The keyword arguments in Part (a) are: $\bullet$estimator=knn, which specifies the estimator you'd like to validate.&#10;$\bullet$X=digits.data, which specifies the samples to use for training and testing.&#10;$\bullet$y=digits.target, which specifies the targets for the samples.&#10;$\bullet$cv=kfold, which specifies the cross-validation generator that defines how to split the samples and targets for training and testing.&#10;C) Function cross_val_score returns a single overall accuracy score for the model.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 22

Which of the following statements a), b) or c) is false?&#10;A) The KNeighborsClassifier estimator (module sklearn.neighbors) implements the k-nearest neighbors algorithm.&#10;B) The following code creates a KNeighborsClassifier estimator object: from sklearn.neighbors import KNeighborsClassifier&#10;Knn = KNeighborsClassifier()&#10;C) The internal details of how a KNeighborsClassifier object implements the k-nearest neighbors algorithm are hidden in the object. You simply call its methods.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 23

Which of the following statements a), b) or c) is false?&#10;A) You typically train a machine-learning model with a subset of a dataset.&#10;B) Generally, you should train your model with the smallest amount of data that makes the model perform well.&#10;C) It's important to set aside a portion of your data for testing, so you can evaluate a model's performance using data that the model has not yet seen. Once you're confident that the model is performing well, you can use it to make predictions using new data.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 24

Which of the following statements is false?&#10;A) By default, train_test_split reserves 75% of the data for training and 25% for testing.&#10;B) To specify different splits, you can set the sizes of the testing and training sets with the train_test_split function's keyword arguments test_size and train_size. Use floating-point values from 0.0 through 100.0 to specify the percentages of the data to use for each.&#10;C) You can use integer values to set the precise numbers of samples.&#10;D) If you specify one of the keyword arguments test_size and train_size, the other is inferred-for example, the statement X_train, X_test, y_train, y_test = train_test_split(&#10;Digits.data, digits.target, random_state=11, test_size=0.20)&#10;Specifies that 20% of the data is for testing, so train_size is inferred to be 0.80.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 25

Which of the following statements a), b) or c) is false?&#10;A) The LinearRegression estimator is in the sklearn.linear_model module.&#10;B) By default, LinearRegression uses all the numeric features in a dataset, performing a multiple linear regression.&#10;C) Simple linear regression uses one feature as the xe &#34;independent variable&#34;independent variable.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 26

Scikit-learn estimators require their training and testing data to be two-dimensional arrays (or two-dimensional array-like data, such as lists of lists or pandas DataFrames). Which of the following statements is false?
A) To transform a one-dimensional array into two dimensions, we call an array's ________ method.

A) transform
B) switch
C) convert
D) reshape

Accepted Answer

The answer of Scikit-learn estimators require their training and testing...

Question 27

Consider the confusion matrix for the Digits dataset's predictions: array([[45, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 45, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 54, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 42, 0, 1, 0, 1, 0, 0],
[ 0, 0, 0, 0, 49, 0, 0, 1, 0, 0],
[ 0, 0, 0, 0, 0, 38, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 42, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 45, 0, 0],
[ 0, 1, 1, 2, 0, 0, 0, 0, 39, 1],
[ 0, 0, 0, 0, 1, 0, 0, 0, 1, 41]])
Which of the following statements is false?

A) The correct predictions are shown on the diagonal from top-left to bottom-right-this is called the principal diagonal.
B) The nonzero values that are not on the principal diagonal indicate incorrect predictions (that is, misses).
C) Each row represents one distinct class-that is, one of the digits 0-9.
D) The columns within a row specify how many of the test samples were classified incorrectly into each distinct class 0-9.

Accepted Answer

The answer of Consider the confusion matrix for the Digits...

Question 28

Which of the following statements is false?
A) The following code tests a linear regression model using the data in X_test and checks some of the predictions throughout the dataset by displaying the predicted and expected values for every ________ element: predicted = linear_regression.predict(X_test)
Expected = y_test
For p, e in zip(predicted[::5], expected[::5]):
Print(f'predicted: {p:.2f}, expected: {e:.2f}')

A) second
B) fifth
C) pth
D) eth

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 29

Which of the following statements a), b) or c) is false?&#10;A) In real-world machine-learning applications, it can often take minutes, hours, days or even months to train your models-special-purpose, high-performance hardware called GPUs and TPUs can significantly reduce model training time.&#10;B) The fit method returns the estimator object.&#10;C) For simplicity, we generally use the default estimator settings-by default, a KNeighborsClassifier looks at the three nearest neighbors to make its predictions.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 30

Which of the following statements a), b) or c) is false?&#10;A) It's difficult to know in advance which machine learning model(s) will perform best for a given dataset, especially when they hide the details of how they operate from their users.&#10;B) Even though the KNeighborsClassifier predicts digit images with a high degree of accuracy, it's possible that other scikit-learn estimators are even more accurate.&#10;C) Scikit-learn provides many models with which you can quickly train and test your data. This encourages you to run multiple models to determine which is the best for a particular machine learning study.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 31

Which of the following statements is false?&#10;A) Another way to check a classification estimator's accuracy is via a confusion matrix, which shows only the incorrect predicted values (also known as the misses) for a given class.&#10;B) To create a confusion matrix imply call the function confusion_matrix from the sklearn.metrics module, passing the expected classes and the predicted classes as arguments, as in: from sklearn.metrics import confusion_matrix&#10;Confusion = confusion_matrix(y_true=expected, y_pred=predicted)&#10;C) The y_true keyword argument in Part (b) specifies the test samples' actual classes.&#10;D) The y_pred keyword argument in Part (b) specifies the predicted classes for the test samples.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 32

Which of the following statements a), b) or c) is false?&#10;A) You should first break your data into a training set and a testing set to prepare to train and test a model.&#10;B) The function train_test_split from the sklearn.model_selection module simply splits in order the dataset's samples and target values into training and testing sets. This helps ensure that the training and testing sets have similar characteristics.&#10;C) Function train_test_split provides the keyword argument random_state for xe &#34;reproducibility&#34;reproducibility. When you run the code in the future with the same seed value, train_test_split will select the same data for the training set and the same data for the testing set. In machine-learning studies, this helps others confirm your results by working with the same randomly selected data.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 33

Which of the following statements is false?&#10;A) With K-fold cross-validation, you use all of your data at once for training your model.&#10;B) K-fold cross-validation splits the dataset into k equal-size folds.&#10;C) You then repeatedly train your model with k - 1 folds and test the model with the remaining fold.&#10;D) Consider using k = 10 with folds numbered 1 through 10. With 10 folds, we'd do 10 successive training and testing cycles: $\bullet$First, we'd train with folds 1-9, then test with fold 10.&#10;$\bullet$Next, we'd train with folds 1-8 and 10, then test with fold 9.&#10;$\bullet$Next, we'd train with folds 1-7 and 9-10, then test with fold 8.&#10;This training and testing cycle continues until each fold has been used to test the model.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 34

Which of the following statements a), b) or c) is false?&#10;A) Scikit-learn has separate classes for simple linear regression and multiple linear regression.&#10;B) To find the best fitting regression line for the data in a simple linear regression, a LinearRegression estimator iteratively adjusts the slope and intercept values to minimize the sum of the squares of the data points' distances from the line.&#10;C) Once LinearRegression is finished performing a simple linear regression, you can use the slope and intercept in the y = mx + b calculation to make predictions. The slope is stored in the estimator's coeff_ attribute (m in the equation) and the intercept is stored in the estimator's intercept_ attribute (b in the equation).&#10;D) All of the above are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 35

Which of the following statements a), b) or c) is false?&#10;A) Once we've loaded our data into s KNeighborsClassifier, we can use it with the test samples to make predictions. Calling the estimator's predict method with the test samples (X_test) as an argument returns an array containing the predicted class of each sample: predicted = knn.predict(X=X_test)&#10;B) If predicted and expected are arrays containing the predictions and expected target values, respectively, evaluating the following code snippets in IPython interactive mode displays the predicted and expected target values for the first 20 test samples: predicted[:20]&#10;Expected[:20]&#10;C) If predicted and expected are arrays containing the predictions and expected target values, respectively, the following list comprehension locates all the incorrect predictions for the entire test set-that is, the cases in which the predicted and expected values do not match: wrong = [(p, e) for (p, e) in zip(predicted, expected) if p != e]&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 36

Consider the following code and output: In [57]: for k in range(1, 20, 2):
)..: kfold = KFold(n_splits=10, random_state=11, shuffle=True)
)..: knn = KNeighborsClassifier(n_neighbors=k)
)..: scores = cross_val_score(estimator=knn,
)..: X=digits.data, y=digits.target, cv=kfold)
)..: print(f'k={k:<2}; mean accuracy={scores.mean():.2%}; ' +
)..: f'standard deviation={scores.std():.2%}')
)..:
K=1 ; mean accuracy=98.83%; standard deviation=0.58%
K=3 ; mean accuracy=98.78%; standard deviation=0.78%
K=5 ; mean accuracy=98.72%; standard deviation=0.75%
K=7 ; mean accuracy=98.44%; standard deviation=0.96%
K=9 ; mean accuracy=98.39%; standard deviation=0.80%
K=11; mean accuracy=98.39%; standard deviation=0.80%
K=13; mean accuracy=97.89%; standard deviation=0.89%
K=15; mean accuracy=97.89%; standard deviation=1.02%
K=17; mean accuracy=97.50%; standard deviation=1.00%
K=19; mean accuracy=97.66%; standard deviation=0.96%
Which of the following statements is false?

A) The loop creates KNeighborsClassifiers with odd k values from 1 through 19 and performs k-fold cross-validation on each.
B) The k value 7 in kNN produces the most accurate predictions for the Digits dataset.
C) The accuracy tends to decrease for higher k values.
D) Compute time grows with k, because k-NN needs to perform many more calculations to find the nearest neighbors.

Accepted Answer

The answer of Consider the following code and output: In...

Question 37

Which of the following statements a), b) or c) is false?&#10;A) Each estimator has a score method that returns an indication of how well the estimator performs for the test data you pass as arguments.&#10;B) For classification estimators, the score method returns the xe &#34;prediction:accuracy&#34;prediction accuracy for the test data.&#10;C) You can perform hyperparameter tuning to try to determine the optimal value for k.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 38

Which of the following statements is false?&#10;A) Scikit-learn provides the KFold class and the cross_val_score function (both in the module sklearn.model_selection) to help you perform the training and testing cycles.&#10;B) The following code creates a KFold object: from sklearn.model_selection import KFold&#10;Kfold = KFold(n_folds=10, random_state=11, shuffle=True)&#10;C) The keyword argument random_state=11 seeds the random number generator for xe &#34;reproducibility&#34;reproducibility.&#10;D) The keyword argument shuffle=True causes the KFold object to randomize the data by shuffling it before splitting it into folds. This is particularly important if the samples might be ordered or grouped.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 39

Which of the following statements a), b) or c) is false?&#10;A) The following call to the KNeighborsClassifier object's fit method loads the training set's samples (X_train) and targets (y_train) into the estimator: knn.fit(X=X_train, y=y_train)&#10;B) After the KNeighborsClassifier's fit method loads the data into the estimator, it uses that data to perform complex calculations behind the scenes that learn from the data and train the model.&#10;C) The KNeighborsClassifier estimator is said to be xe &#34;lazy estimator (scikit-learn)&#34;lazy because its work is performed only when you use it to make predictions.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 40

The sklearn.metrics module's xe &#34;sklearn.metrics module:classification_report function&#34;xe &#34;classification_report function from the sklearn.metrics module&#34;classification_report function produces a table of classification metrics based on the expected and predicted values, as in: from sklearn.metrics import classification_report&#10;Names = [str(digit) for digit in digits.target_names]&#10;Print(classification_report(expected, predicted,&#10; &#10;A) The precision column shows the total number of correct predictions for a given digit divided by the total number of predictions for that digit. You can confirm the precision by looking at each column in the confusion matrix.&#10;B) The recall column is the total number of correct predictions for a given digit divided by the total number of samples that should have been predicted as that digit. You can confirm the recall by looking at each row in the confusion matrix.&#10;C) The f1-score column is the average of the precision. The recall and the support column is the number of samples with a given expected value-for example, 50 samples were labeled as 4s, and 38 samples were labeled as 5s.&#10;D) All of the above are true.

Accepted Answer

The answer of The sklearn.metrics module's xe &#34;sklearn.metrics module:classification_report function&#34;xe...

Question 41

Which of the following statements is false?&#10;A) You load the California Housing dataset using the the xe &#34;modules:sklearn.datasets&#34;xe &#34;sklearn.datasets module&#34;sklearn.datasets module's fetch_california_housing function, which returns a Bunch object.&#10;B) The Bunch object's xe &#34;Bunch class from sklearn.utils:data attribute&#34;xe &#34;data:attribute of a Bunch&#34;data and xe &#34;Bunch class from sklearn.utils:target attribute&#34;xe &#34;target attribute of a Bunch&#34;target attributes are NumPy arrays containing the 20,640 xe &#34;machine learning:samples&#34;xe &#34;samples (in machine learning)&#34;samples and their xe &#34;machine learning:target values&#34;xe &#34;target values (in machine learning)&#34;target values respectively.&#10;C) To confirm the number of samples (rows) and features (columns), look at the data array's shape attribute, which shows that there are 20,640 rows and 8 columns, as in: In [4]: california.data.shape&#10;Out[4]: (20640, 8)&#10;Similarly, you can see that the number of target values-the median house values-matches the number of samples by looking at the target array's shape, as in:&#10;In [5]: california.target.shape&#10;Out[5]: (20640,)&#10;D) The Bunch's features attribute contains the names that correspond to each column in the data array.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 42

Which of the following statements is false?
A) The following code tests a linear regression model using the data in X_test and checks some of the predictions throughout the dataset by displaying the predicted and expected values for every ________ element: predicted = linear_regression.predict(X_test)
Expected = y_test
For p, e in zip(predicted[::5], expected[::5]):
Print(f'predicted: {p:.2f}, expected: {e:.2f}')

A) second
B) fifth
C) pth
D) eth

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 43

Which of the following statements about the k-means clustering algorithm is false?&#10;A) Each cluster of samples is grouped around a centroid-the cluster's center point.&#10;B) Initially, the algorithm chooses k centroids at random from the dataset's samples. Then the remaining samples are placed in the cluster whose centroid is the closest.&#10;C) The centroids are iteratively recalculated and the samples re-assigned to clusters until, for all clusters, the distances from a given centroid to the samples in its cluster are maximized.&#10;D) The algorithm's results are a one-dimensional array of labels indicating the cluster to which each sample belongs, and a two-dimensional array of centroids representing the center of each cluster.

Accepted Answer

The answer of Which of the following statements about the...

Question 44

Which of the following statements a), b) or c) is false?&#10;A) Scikit-learn provides many metrics functions for evaluating how well estimators predict results and for comparing estimators to choose the best one(s) for your particular study.&#10;B) Scikit-learn's metrics vary by estimator type.&#10;C) Functions confusion_matrix and classification_report (from the module sklearn.metrics) are two of many metrics functions specifically for evaluating regression estimators.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 45

Which of the following statements a), b) or c) is false?&#10;A) Unsupervised machine learning and visualization can help you get to know your data by finding patterns and relationships among unlabeled samples.&#10;B) Using Matplotlib, Seaborn and other visualization libraries, you can plot datasets with two or three variables using 2D and 3D visualizations, respectively.&#10;C) In the Digits dataset, every sample has 64 features (and a target value), so there is no way to visualize the dataset.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 46

Which of the following statements a), b) or c) is false?&#10;A) It's difficult for humans to think about data with large numbers of dimensions. This is called the curse of dimensionality.&#10;B) If data has closely correlated features, some could be eliminated via dimensionality reduction to improve the training performance.&#10;C) Eliminating features with dimensionality reduction, improves the accuracy of the model.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 47

Consider the following code that imports pandas and sets some options: import pandas as pd
Pd)set_option('precision', 4)
Pd)set_option('max_columns', 9)
Pd)set_option('display.width', None)
Which of the following statements a), b) or c)about the set_option calls is false?

A) 'precision' is the maximum number of digits to display to the right of each decimal point.
B) 'max_columns' is the maximum number of columns to display when you output the DataFrame's string representation. In IPython interactive mode, by default, pandas displays all of the columns left-to-right. The 'max_columns' setting enables pandas to show all the columns using multiple rows of output.
C) 'display.width' specifies the width in characters of your Command Prompt (Windows), Terminal (macOS/Linux) or shell (Linux). The value None tells pandas to auto-detect the display width when formatting string representations of Series and DataFrames.
D) All of the above statements are true.

Accepted Answer

The answer of Consider the following code that imports pandas...

Question 48

Which of the following statements is false?&#10;A) When creating a model, a key goal is to ensure that it is capable of making accurate predictions for data it has not yet seen. Two common problems that prevent accurate predictions are overfitting and underfitting.&#10;B) Underfitting occurs when a model is too simple to make accurate predictions, based on its training data. An example of underfitting is using a linear model, such as simple linear regression, when in fact, the problem really requires a more sophisticated non-linear model.&#10;C) Overfitting occurs when your model is too complex. In the most extreme case of overfitting, a model memorizes its training data.&#10;D) When you make predictions with an overfit model, the model won't know what to do with new data that matches the training data, but the model will make excellent predictions with data it has never seen.

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 49

Which of the following statements a), b) or c) is false?&#10;A) In big data, samples can have hundreds, thousands or even millions of features.&#10;B) To visualize a dataset with many features (that is, many dimensions), you must first reduce the data to two or three dimensions. This requires a supervised machine learning technique called dimensionality reduction.&#10;C) When you graph the resulting data after dimensionality reduction, you might see patterns in the data that will help you choose the most appropriate machine learning algorithms to use. For example, if the visualization contains clusters of points, it might indicate that there are distinct classes of information within the dataset.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 50

In the context of the California Housing dataset, which of the following statements is false? A) The following code creates a LinearRegression estimator and invokes its xe "scikit-learn (sklearn) machine-learning library:fit method of an estimator"xe "fit method:of a scikit-learn estimator"fit method to train the estimator using X_train (the samples) and y_train (the targets): from sklearn.linear_model import LinearRegression Linear_regression = LinearRegression() Linear_regression.fit(X=X_train, y=y_train) B) Multiple linear regression produces separate coefficients for each feature (stored in coeff_) in the dataset and one intercept (stored in intercept_). C) For positive coefficients, the median house value increases as the feature value increases. For negative coefficients, the median house value decreases as the feature value decreases. D) You can use the coefficient and intercept values with the following equation to make predictions: y = m₁x₁ + m₂x₂ + … m_nx_n + b Where $\bullet$m₁, m₂, …, m_n are the feature coefficients $\bullet$b is the intercept $\bullet$x₁, x₂, …, x_n are the feature values (that is, the values of the independent variables) $\bullet$y is the predicted value (that is, the xe "dependent variable"dependent variable)

Accepted Answer

The answer of In the context of the California Housing...

Question 51

Which of the following statements is false?
A) The following code tests a linear regression model using the data in X_test and checks some of the predictions throughout the dataset by displaying the predicted and expected values for every ________ element: predicted = linear_regression.predict(X_test)
Expected = y_test
For p, e in zip(predicted[::5], expected[::5]):
Print(f'predicted: {p:.2f}, expected: {e:.2f}')

A) second
B) fifth
C) pth
D) eth

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 52

Which of the following statements a), b) or c) is false? A) The following code tests a model by calling the estimator's xe "scikit-learn (sklearn) machine-learning library:predict method of an estimator"xe "predict method of a scikit-learn estimator"predict method with the test samples as an argument: predicted = linear_regression.predict(X_test) B) Assuming the array expected contains the expected values for the samples used to make predictions in Part (a)'s snippet, evaluating the following snippets displays the first five predictions and their corresponding expected values: In [32]: predicted[:5] Out[32]: array([1.25396876, 2.34693107, 2.03794745, 1.8701254 , 2.53608339]) In [33]: expected[:5] Out[33]: array([0.762, 1.732, 1.125, 1.37 , 1.856]) C) With classification, we saw that the predictions were distinct classes that matched existing classes in the dataset. With regression, it's tough to get exact predictions, because you have continuous outputs. Every possible value of x₁, x₂ … x_n in the calculation y = m₁x₁ + m₂x₂ + … m_nx_n + b Predicts a different value. D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 53

Which of the following statements a), b) or c) is false?&#10;A) By default, a xe &#34;sklearn.linear_model module:LinearRegression estimator&#34;xe &#34;LinearRegression estimator from sklearn.linear_model&#34;LinearRegression estimator uses all the features in the dataset's data array to perform a xe &#34;linear regression:multiple&#34;multiple linear regression.&#10;B) An error occurs if any of the features passed to a xe &#34;sklearn.linear_model module:LinearRegression estimator&#34;xe &#34;LinearRegression estimator from sklearn.linear_model&#34;LinearRegression estimator for training are categorical rather than numeric. If a dataset contains categorical data, you must exclude the categorical features from the training process.&#10;C) A benefit of working with scikit-learn's bundled datasets is that they're already in the correct format for machine learning using scikit-learn's models.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 54

Which of the following statements a), b) or c) is false?&#10;A) Another common metric for regression models is the mean squared error, which $\bullet$calculates the difference between each expected and predicted value-this is called the error,&#10;$\bullet$squares each difference and&#10;$\bullet$calculates the average of the squared values.&#10;B) To calculate a regression estimator's mean squared error, call function mean_squared_error (from module sklearn.metrics) with the arrays representing the expected and predicted results, as in: In [46]: metrics.mean_squared_error(expected, predicted)&#10;Out[46]: 0.5350149774449119&#10;C) When comparing estimators with the mean squared error metric, the one with the value closest to 1 best fits your data.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 55

Which of the following statements a), b) or c) is false? A) Among the many metrics for regression estimators is the model's coefficient of determination, which is also called the R² score. B) To calculate an estimator's R² score, use the sklearn.metrics module's r2_score function with the arrays representing the expected and predicted results, as in: In [44]: from sklearn import metrics In [45]: metrics.r2_score(expected, predicted) Out[45]: 0.6008983115964333 C) R² scores range from 0.0 to 1.0 with 1.0 being the best. An R² score of 1.0 indicates that the estimator perfectly predicts the independent variable's value, given the dependent variable(s) value(s). An R² score of 0.0 indicates the model cannot make predictions with any accuracy, based on the independent variables' values. D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 56

Which of the following statements a), b) or c) is false?&#10;A) Dimensionality reduction in scikit-learn typically involves two steps-training the estimator with the dataset, then using the estimator to transform the data into the specified number of dimensions.&#10;B) The steps mentioned in Part (a) can be performed separately with the TSNE methods fit and transform, or they can be performed in one statement using the fit_transform method, as in: In [5]: reduced_data = tsne.fit_transform(digits.data)&#10;C) TSNE's fit_transform method takes some time to train the estimator then perform the reduction. When the method completes its task, it returns an array with the same number of rows as digits.data, but only the number of columns specified by the n_components argument when you created the estimator object. You can confirm this by checking reduced_data's shape.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 57

Which of the following statements a), b) or c) is false?&#10;A) The California Housing dataset (bundled with xe &#34;sklearn (scikit-learn)&#34;xe &#34;machine learning:scikit-learn&#34;xe &#34;scikit-learn (sklearn) machine-learning library&#34;scikit-learn) has 20,640 samples, each with eight numerical features.&#10;B) The LinearRegression estimator performs multiple linear regression by default using all of a dataset's numeric features.&#10;C) You should expect more meaningful results from simple linear regression than from multiple linear regression on the dataset.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 58

Which of the following statements a), b) or c) is false?&#10;A) It's helpful to xe &#34;visualize the data&#34;visualize your data by plotting the target value against each feature-in the case of the California Housing Prices dataset, to see how the median home value relates to each feature.&#10;B) DataFrame method sample can randomly select a percentage of a DataFrame's data (specified keyword argument frac), as in: sample_df = california_df.sample(frac=0.1, random_state=17)&#10;C) The keyword argument random_state in Part (b)'s snippet enables you to seed the random number generator. Each time you use the same seed value, method sample selects a similar random subset of the DataFrame's rows.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 59

Which of the following statements a), b) or c) is false?&#10;A) We can use a TSNE estimator (from the sklearn.manifold module) to perform dimensionality reduction. This estimator analyzes a dataset's features and reduces them to the specified number of dimensions.&#10;B) The following code creates a TSNE object for reducing a dataset's features to two dimensions, as specified by the keyword argument n_components: In [3]: from sklearn.manifold import TSNE&#10;In [4]: tsne = TSNE(n_components=2, random_state=11)&#10;C) When using TSNE on the Digits dataset bundled with scikit-learn, the TSNE estimator's random_state keyword argument in Part (b) ensures the reproducibility of the &#34;render sequence&#34; when we display the digit clusters, for example.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 60

Which of the following statements a), b) or c) is false?&#10;A) The Iris dataset bundled with scikit-learn is commonly analyzed with both classification and clustering.&#10;B) Although the Iris dataset is labeled, we can ignore those labels to demonstrate clustering. Then, we can use the labels to determine how well the k-means algorithm clusters the samples.&#10;C) The Iris dataset is referred to as a &#34;toy dataset&#34; because it has only 150 samples and four features. The dataset describes 50 samples for each of three Iris flower species-xe &#34;Iris setosa&#34;Iris setosa, xe &#34;Iris versicolor&#34;Iris versicolor and xe &#34;Iris virginica&#34;Iris virginica.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 61

Which of the following statements a), b) or c) is false?&#10;A) Because the Iris dataset is labeled, we can look at its target array values to get a sense of how well the k-means algorithm clustered the samples for the three Iris species.&#10;B) In the Iris dataset, the first 50 samples are Iris setosa, the next 50 are Iris versicolor, and the last 50 are Iris virginica.&#10;C) If the KMeans estimator chose the Iris dataset clusters perfectly, then each group of 50 elements in the estimator's labels_ array should have mostly the same label.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 62

Which of the following statements a), b) or c) is false?&#10;A) We train the KMeans estimator by calling the object's fit method-this performs the k-means algorithm.&#10;B) As with the other estimators, the fit method returns the estimator object.&#10;C) When the training completes, the KMeans object contains a labels_ array with values from 0 to n_clusters - 1 (in the Iris dataset example, 0-2), indicating the clusters to which the samples belong, and a cluster_centers_ array in which each row represents a cluster.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Question 63

Which of the following statements is false?
A) The following code tests a linear regression model using the data in X_test and checks some of the predictions throughout the dataset by displaying the predicted and expected values for every ________ element: predicted = linear_regression.predict(X_test)
Expected = y_test
For p, e in zip(predicted[::5], expected[::5]):
Print(f'predicted: {p:.2f}, expected: {e:.2f}')

A) second
B) fifth
C) pth
D) eth

Accepted Answer

The answer of Which of the following statements is false?&#10;A)...

Question 64

Which of the following statements a), b) or c) is false?&#10;A) One way to learn more about your data is to see how the features relate to one another.&#10;B) The samples in the Iris dataset each have four features.&#10;C) We cannot graph one feature against the other three in a single graph. But we can plot pairs of features against one another in a pairplot.&#10;D) All of the above statements are true.

Accepted Answer

The answer of Which of the following statements a), b)...

Deck 15: Machine Learning: Classification, Regression and Clustering