New Year Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

A Python file is ready to go into production and the client wants to use the cheapest but most efficient type of cluster possible. The workload is quite small, only processing 10GBs of data with only simple joins and no complex aggregations or wide transformations.

Which cluster meets the requirement?

A.

Job cluster with Photon enabled

B.

Interactive cluster

C.

Job cluster with spot instances disabled

D.

Job cluster with spot instances enabled

A data engineer needs to optimize the data layout and query performance for an e-commerce transactions Delta table. The table is partitioned by "purchase_date" a date column which helps with time-based queries but does not optimize searches on user statistics "customer_id", a high-cardinality column.

The table is usually queried with filters on "customer_i

d" within specific date ranges, but since this data is spread across multiple files in each partition, it results in full partition scans and increased runtime and costs.

How should the data engineer optimize the Data Layout for efficient reads?

A.

Alter table implementing liquid clustering on "customerid" while keeping the existing partitioning.

B.

Alter the table to partition by "customer_id".

C.

Enable delta caching on the cluster so that frequent reads are cached for performance.

D.

Alter the table implementing liquid clustering by "customer_id" and "purchase_date".

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

A.

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

B.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

C.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

D.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

E.

Records that violate the expectation cause the job to fail.

A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.

Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?

A.

Databricks Repos automatically saves development progress

B.

Databricks Repos supports the use of multiple branches

C.

Databricks Repos allows users to revert to previous versions of a notebook

D.

Databricks Repos provides the ability to comment on specific changes

E.

Databricks Repos is wholly housed within the Databricks Lakehouse Platform

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

A.

There was a type mismatch between the specific schema and the inferred schema

B.

JSON data is a text-based format

C.

Auto Loader only works with string data

D.

All of the fields had at least one null value

E.

Auto Loader cannot infer the schema of ingested data

A data engineer is working on a personal laptop and needs to perform complex transformations on data stored in a Delta Lake on cloud storage. The engineer decides to use Databricks Connect to interact with Databricks clusters and work in their local IDE.

How does Databricks Connect enable the engineer to develop, test, and debug code seamlessly on their local machine while interacting with Databricks clusters?

A.

By allowing direct execution of Spark jobs from the local machine without needing a network connection

B.

By providing a local environment that mimics the Databricks runtime, enabling the engineer to develop, test, and debug code using a specific IDE that is required by Databricks

C.

By providing a local environment that mimics the Databricks runtime, enabling the engineer to develop, test, and debug code using their preferred ide

D.

By providing a local environment that mimics the Databricks runtime, enabling the engineer to develop, test, and debug code only through Databricks' own web interface

A data engineer is attempting to write Python and SQL in the same command cell and is running into an error The engineer thought that it was possible to use a Python variable in a select statement.

Why does the command fail?

A.

Databricks supports multiple languages but only one per notebook.

B.

Databricks supports language interoperability in the same cell but only between Scala and SQL

C.

Databricks supports language interoperability but only if a special character is used.

D.

Databricks supports one language per cell.

Which two components function in the DB platform architecture’s control plane? (Choose two.)

A.

Virtual Machines

B.

Compute Orchestration

C.

Serverless Compute

D.

Compute

E.

Unity Catalog

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

A.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

B.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

C.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

E.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data engineers in other sessions. It also must be saved to a physical location.

Which of the following data entities should the data engineer create?

A.

Database

B.

Function

C.

View

D.

Temporary view

E.

Table