New Year Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.

Which of the following approaches can the manager use to ensure the results of the query are updated each day?

A.

They can schedule the query to refresh every 1 day from the SQL endpoint's page in Databricks SQL.

B.

They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL.

C.

They can schedule the query to refresh every 1 day from the query's page in Databricks SQL.

D.

They can schedule the query to run every 1 day from the Jobs UI.

E.

They can schedule the query to run every 12 hours from the Jobs UI.

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?

A.

pyspark.sql.types.DateType

B.

datetime

C.

pyspark.sql.types.TimestampType

D.

Cron syntax

E.

There is no way to represent and submit this information programmatically

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when It is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which approach can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

A.

O They can reduce the cluster size of the SQL endpoint.

B.

Q They can turn on the Auto Stop feature for the SQL endpoint.

C.

O They can set up the dashboard's SQL endpoint to be serverless.

D.

0 They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

A.

trigger("5 seconds")

B.

trigger()

C.

trigger(once="5 seconds")

D.

trigger(processingTime="5 seconds")

E.

trigger(continuous="5 seconds")

Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?

A.

Parquet files can be partitioned

B.

CREATE TABLE AS SELECT statements cannot be used on files

C.

Parquet files have a well-defined schema

D.

Parquet files have the ability to be optimized

E.

Parquet files will become Delta tables

A data engineer wants to create a new table containing the names of customers who live in France.

They have written the following command:

CREATE TABLE customersInFrance

_____ AS

SELECT id,

firstName,

lastName

FROM customerLocations

WHERE country = ’FRANCE’;

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (Pll).

Which line of code fills in the above blank to successfully complete the task?

A.

COMMENT "Contains PIT

B.

511

C.

"COMMENT PII"

D.

TBLPROPERTIES PII

A data engineer only wants to execute the final block of a Python program if the Python variable day_of_week is equal to 1 and the Python variable review_period is True.

Which of the following control flow statements should the data engineer use to begin this conditionally executed code block?

A.

if day_of_week = 1 and review_period:

B.

if day_of_week = 1 and review_period = "True":

C.

if day_of_week == 1 and review_period == "True":

D.

if day_of_week == 1 and review_period:

E.

if day_of_week = 1 & review_period: = "True":

A data engineer is inspecting an ETL pipeline based on a Pyspark job that consistently encounters performance bottlenecks. Based on developer feedback, the data engineer assumes the job is low on compute resources. To pinpoint the issue, the data engineer observes the Spark Ul and finds out the job has a high CPU time vs Task time.

Which course of action should the data engineer take?

A.

High CPU time vs Task time means an under-utilized cluster. The data engineer may need to repartition data to spread the jobs more evenly throughout the cluster.

B.

High CPU time vs Task time means efficient use of cluster and no change needed

C.

High CPU time vs Task time means over-utilized memory and the need to increase parallelism

D.

High CPU time vs Task time means a CPU over-utilized job. The data engineer may need to consider executor and core tuning or resizing the cluster

Which TWO items are characteristics of the Gold Layer?

Choose 2 answers

A.

Read-optimized

B.

Normalised

C.

Raw Data

D.

Historical lineage

E.

De-normalised

Which of the following tools is used by Auto Loader process data incrementally?

A.

Checkpointing

B.

Spark Structured Streaming

C.

Data Explorer

D.

Unity Catalog

E.

Databricks SQL