Winter Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exc65

A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model in parallel. They elect to use the Hyperopt library to facilitate this process.

Which of the following Hyperopt tools provides the ability to optimize hyperparameters in parallel?

A.

fmin

B.

SparkTrials

C.

quniform

D.

search_space

E.

objective_function

A data scientist has created a linear regression model that useslog(price)as a label variable. Using this model, they have performed inference and the predictions and actual label values are in Spark DataFramepreds_df.

They are using the following code block to evaluate the model:

regression_evaluator.setMetricName("rmse").evaluate(preds_df)

Which of the following changes should the data scientist make to evaluate the RMSE in a way that is comparable withprice?

A.

They should exponentiate the computed RMSE value

B.

They should take the log of the predictions before computing the RMSE

C.

They should evaluate the MSE of the log predictions to compute the RMSE

D.

They should exponentiate the predictions before computing the RMSE

A data scientist wants to tune a set of hyperparameters for a machine learning model. They have wrapped a Spark ML model in the objective functionobjective_functionand they have defined the search spacesearch_space.

As a result, they have the following code block:

Which of the following changes do they need to make to the above code block in order to accomplish the task?

A.

Change SparkTrials() to Trials()

B.

Reduce num_evals to be less than 10

C.

Change fmin() to fmax()

D.

Remove the trials=trials argument

E.

Remove the algo=tpe.suggest argument

A data scientist has produced three new models for a single machine learning problem. In the past, the solution used just one model. All four models have nearly the same prediction latency, but a machine learning engineer suggests that the new solution will be less time efficient during inference.

In which situation will the machine learning engineer be correct?

A.

When the new solution requires if-else logic determining which model to use to compute each prediction

B.

When the new solution's models have an average latency that is larger than the size of the original model

C.

When the new solution requires the use of fewer feature variables than the original model

D.

When the new solution requires that each model computes a prediction for every record

E.

When the new solution's models have an average size that is larger than the size of the original model

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

A.

pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata

B.

pandas API on Spark DataFrames are more performant than Spark DataFrames

C.

pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata

D.

pandas API on Spark DataFrames are less mutable versions of Spark DataFrames

E.

pandas API on Spark DataFrames are unrelated to Spark DataFrames

The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.

Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?

A.

Logistic regression

B.

Spark ML cannot distribute linear regression training

C.

Iterative optimization

D.

Least-squares method

E.

Singular value decomposition

A data scientist wants to use Spark ML to impute missing values in their PySpark DataFrame features_df. They want to replace missing values in all numeric columns in features_df with each respective numeric column’s median value.

They have developed the following code block to accomplish this task:

The code block is not accomplishing the task.

Which reasons describes why the code block is not accomplishing the imputation task?

A.

It does not impute both the training and test data sets.

B.

The inputCols and outputCols need to be exactly the same.

C.

The fit method needs to be called instead of transform.

D.

It does not fit the imputer on the data to create an ImputerModel.

A data scientist is using the following code block to tune hyperparameters for a machine learning model:

Which change can they make the above code block to improve the likelihood of a more accurate model?

A.

Increase num_evals to 100

B.

Change fmin() to fmax()

C.

Change sparkTrials() to Trials()

D.

Change tpe.suggest to random.suggest

Which of the following machine learning algorithms typically uses bagging?

A.

IGradient boosted trees

B.

K-means

C.

Random forest

D.

Decision tree

A data scientist is using Spark SQL to import their data into a machine learning pipeline. Once the data is imported, the data scientist performs machine learning tasks using Spark ML.

Which of the following compute tools is best suited for this use case?

A.

Single Node cluster

B.

Standard cluster

C.

SQL Warehouse

D.

None of these compute tools support this task