New Year Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

Your dependent variable Y is a count, ranging from 0 to infinity. Because Y is approximately log-normally distributed, you decide to log-transform the data prior to performing a linear regression.

What should you do before log-transforming Y?

A.

Add 1 to all of the Y values.

B.

Divide all the Y values by the standard deviation of Y.

C.

Explore the data for outliers.

D.

Subtract the mean of Y from all the Y values.

An AI practitioner incorporates risk considerations into a deployment plan and decides to log and store historical predictions for potential, future access requests.

Which ethical principle is this an example of?

A.

Fairness

B.

Privacy

C.

Safety

D.

Transparency

You have a dataset with many features that you are using to classify a dependent variable. Because the sample size is small, you are worried about overfitting. Which algorithm is ideal to prevent overfitting?

A.

Decision tree

B.

Logistic regression

C.

Random forest

D.

XGBoost

Which two encoders can be used to transform categorical data into numerical features? (Select two.)

A.

Count Encoder

B.

Log Encoder

C.

Mean Encoder

D.

Median Encoder

E.

One-Hot Encoder

Your dependent variable data is a proportion. The observed range of your data is 0.01 to 0.99. The instrument used to generate the dependent variable data is known to generate low quality data for values close to 0 and close to 1. A colleague suggests performing a logit-transformation on the data prior to performing a linear regression. Which of the following is a concern with this approach?

Definition of logit-transformation

If p is the proportion: logit(p)=log(p/(l-p))

A.

After logit-transformation, the data may violate the assumption of independence.

B.

Noisy data could become more influential in your model.

C.

The model will be more likely to violate the assumption of normality.

D.

Values near 0.5 before logit-transformation will be near 0 after.

For each of the last 10 years, your team has been collecting data from a group of subjects, including their age and numerous biomarkers collected from blood samples. You are tasked with creating a prediction model of age using the biomarkers as input. You start by performing a linear regression using all of the data over the 10-year period, with age as the dependent variable and the biomarkers as predictors.

Which assumption of linear regression is being violated?

A.

Equality of variance (Homoscedastidty)

B.

Independence

C.

Linearity

D.

Normality

Which of the following regressions will help when there is the existence of near-linear relationships among the independent variables (collinearity)?

A.

Clustering

B.

Linear regression

C.

Polynomial regression

D.

Ridge regression

When should the model be retrained in the ML pipeline?

A.

A new monitoring component is added.

B.

Concept drift is detected in the pipeline.

C.

More data become available for the training phase.

D.

Some outliers are detected in live data.

In addition to understanding model performance, what does continuous monitoring of bias and variance help ML engineers to do?

A.

Detect hidden attacks

B.

Prevent hidden attacks

C.

Recover from hidden attacks

D.

Respond to hidden attacks

Which of the following is the definition of accuracy?

A.

(True Positives + False Positives) / Total Predictions

B.

(True Positives + True Negatives) / Total Predictions

C.

True Positives / (True Positives + False Negatives)

D.

True Positives / (True Positives + False Positives)