Databricks Databricks-Machine-Learning-Associate Free Certification Exam Questions Answer Feb 2026 update

Question # 11

A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model in parallel. They elect to use the Hyperopt library to facilitate this process.

Which of the following Hyperopt tools provides the ability to optimize hyperparameters in parallel?

fmin

SparkTrials

quniform

search_space

objective_function

Question # 12

A data scientist has created a linear regression model that useslog(price)as a label variable. Using this model, they have performed inference and the predictions and actual label values are in Spark DataFramepreds_df.

They are using the following code block to evaluate the model:

regression_evaluator.setMetricName("rmse").evaluate(preds_df)

Which of the following changes should the data scientist make to evaluate the RMSE in a way that is comparable withprice?

They should exponentiate the computed RMSE value

They should take the log of the predictions before computing the RMSE

They should evaluate the MSE of the log predictions to compute the RMSE

They should exponentiate the predictions before computing the RMSE

Question # 13

A data scientist wants to tune a set of hyperparameters for a machine learning model. They have wrapped a Spark ML model in the objective functionobjective_functionand they have defined the search spacesearch_space.

As a result, they have the following code block:

Which of the following changes do they need to make to the above code block in order to accomplish the task?

Change SparkTrials() to Trials()

Reduce num_evals to be less than 10

Change fmin() to fmax()

Remove the trials=trials argument

Remove the algo=tpe.suggest argument

Question # 14

A data scientist has produced three new models for a single machine learning problem. In the past, the solution used just one model. All four models have nearly the same prediction latency, but a machine learning engineer suggests that the new solution will be less time efficient during inference.

In which situation will the machine learning engineer be correct?

When the new solution requires if-else logic determining which model to use to compute each prediction

When the new solution's models have an average latency that is larger than the size of the original model

When the new solution requires the use of fewer feature variables than the original model

When the new solution requires that each model computes a prediction for every record

When the new solution's models have an average size that is larger than the size of the original model

Question # 15

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata

pandas API on Spark DataFrames are more performant than Spark DataFrames

pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata

pandas API on Spark DataFrames are less mutable versions of Spark DataFrames

pandas API on Spark DataFrames are unrelated to Spark DataFrames

Question # 16

The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.

Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?

Logistic regression

Spark ML cannot distribute linear regression training

Iterative optimization

Least-squares method

Singular value decomposition

Question # 17

A data scientist wants to use Spark ML to impute missing values in their PySpark DataFrame features_df. They want to replace missing values in all numeric columns in features_df with each respective numeric column’s median value.

They have developed the following code block to accomplish this task:

The code block is not accomplishing the task.

Which reasons describes why the code block is not accomplishing the imputation task?

It does not impute both the training and test data sets.

The inputCols and outputCols need to be exactly the same.

The fit method needs to be called instead of transform.

It does not fit the imputer on the data to create an ImputerModel.

Question # 18

A data scientist is using the following code block to tune hyperparameters for a machine learning model:

Which change can they make the above code block to improve the likelihood of a more accurate model?

Increase num_evals to 100

Change fmin() to fmax()

Change sparkTrials() to Trials()

Change tpe.suggest to random.suggest

Question # 19

Which of the following machine learning algorithms typically uses bagging?

IGradient boosted trees

K-means

Random forest

Decision tree

Question # 20

A data scientist is using Spark SQL to import their data into a machine learning pipeline. Once the data is imported, the data scientist performs machine learning tasks using Spark ML.

Which of the following compute tools is best suited for this use case?

Single Node cluster

Standard cluster

SQL Warehouse

None of these compute tools support this task

Winter Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exc65

Free Practice Questions for Databricks Databricks-Machine-Learning-Associate Exam

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: