Pre-Summer Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

A departing platform owner currently holds ownership of multiple catalogs and controls storage credentials and external locations. A data engineer has been asked to ensure continuity: transfer catalog ownership to the platform team group, delegate ongoing privilege management, and retain the ability to receive and share data via Delta Sharing.

Which role must be in place to perform these actions across the metastore?

A.

Metastore Admin, because metastore admins can transfer ownership and manage privileges across all metastore objects, including shares and recipients.

B.

Account Admin, because account admins can only create metastores but cannot change ownership of catalogs.

C.

Workspace Admin, because workspace admins can transfer ownership of any Unity Catalog object.

D.

Catalog Owner, because catalog owners can transfer any object in any catalog in the metastore.

Which distribution does Databricks support for installing custom Python code packages?

A.

sbt

B.

CRAN

C.

CRAM

D.

nom

E.

Wheels

F.

jars

A member of the data engineering team has submitted a short notebook that they wish to schedule as part of a larger data pipeline. Assume that the commands provided below produce the logically correct results when run as presented.

Which command should be removed from the notebook before scheduling it as a job?

A.

Cmd 2

B.

Cmd 3

C.

Cmd 4

D.

Cmd 5

E.

Cmd 6

An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable:

Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order.

If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?

A.

Each write to the orders table will only contain unique records, and only those records without duplicates in the target table will be written.

B.

Each write to the orders table will only contain unique records, but newly written records may have duplicates already present in the target table.

C.

Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, these records will be overwritten.

D.

Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, the operation will tail.

E.

Each write to the orders table will run deduplication over the union of new and existing records, ensuring no duplicate records are present.

The data architect has mandated that all tables in the Lakehouse should be configured as external (also known as " unmanaged " ) Delta Lake tables.

Which approach will ensure that this requirement is met?

A.

When a database is being created, make sure that the LOCATION keyword is used.

B.

When configuring an external data warehouse for all table storage, leverage Databricks for all ELT.

C.

When data is saved to a table, make sure that a full file path is specified alongside the Delta format.

D.

When tables are created, make sure that the EXTERNAL keyword is used in the CREATE TABLE statement.

E.

When the workspace is being configured, make sure that external cloud object storage has been mounted.

A user new to Databricks is trying to troubleshoot long execution times for some pipeline logic they are working on. Presently, the user is executing code cell-by-cell, using display() calls to confirm code is producing the logically correct results as new transformations are added to an operation. To get a measure of average time to execute, the user is running each cell multiple times interactively.

Which of the following adjustments will get a more accurate measure of how code is likely to perform in production?

A.

Scala is the only language that can be accurately tested using interactive notebooks; because the best performance is achieved by using Scala code compiled to JARs. all PySpark and Spark SQL logic should be refactored.

B.

The only way to meaningfully troubleshoot code execution times in development notebooks Is to use production-sized data and production-sized clusters with Run All execution.

C.

Production code development should only be done using an IDE; executing code against a local build of open source Spark and Delta Lake will provide the most accurate benchmarks for how code will perform in production.

D.

Calling display () forces a job to trigger, while many transformations will only add to the logical query plan; because of caching, repeated execution of the same logic does not provide meaningful results.

E.

The Jobs Ul should be leveraged to occasionally run the notebook as a job and track execution time during incremental code development because Photon can only be enabled on clusters launched for scheduled jobs.

A data engineer is configuring Delta Sharing for a Databricks-to-Databricks scenario to optimize read performance. The recipient needs to perform time travel queries and streaming reads on shared sales data.

Which configuration will provide the optimal performance while enabling these capabilities?

A.

Share tables WITH HISTORY , ensure tables don’t have partitioning enabled, and enable CDF before sharing.

B.

Share tables WITHOUT HISTORY and enable partitioning for better query performance.

C.

Share the entire schema WITHOUT HISTORY and rely on recipient-side caching for performance.

D.

Use the open sharing protocol instead of Databricks-to-Databricks sharing for better performance.

The downstream consumers of a Delta Lake table have been complaining about data quality issues impacting performance in their applications. Specifically, they have complained that invalid latitude and longitude values in the activity_details table have been breaking their ability to use other geolocation processes.

A junior engineer has written the following code to add CHECK constraints to the Delta Lake table:

A senior engineer has confirmed the above logic is correct and the valid ranges for latitude and longitude are provided, but the code fails when executed.

Which statement explains the cause of this failure?

A.

Because another team uses this table to support a frequently running application, two-phase locking is preventing the operation from committing.

B.

The activity details table already exists; CHECK constraints can only be added during initial table creation.

C.

The activity details table already contains records that violate the constraints; all existing data must pass CHECK constraints in order to add them to an existing table.

D.

The activity details table already contains records; CHECK constraints can only be added prior to inserting values into a table.

E.

The current table schema does not contain the field valid coordinates; schema evolution will need to be enabled before altering the table to add a constraint.

The data engineering team maintains the following code:

Assuming that this code produces logically correct results and the data in the source table has been de-duplicated and validated, which statement describes what will occur when this code is executed?

A.

The silver_customer_sales table will be overwritten by aggregated values calculated from all records in the gold_customer_lifetime_sales_summary table as a batch job.

B.

A batch job will update the gold_customer_lifetime_sales_summary table, replacing only those rows that have different values than the current version of the table, using customer_id as the primary key.

C.

The gold_customer_lifetime_sales_summary table will be overwritten by aggregated values calculated from all records in the silver_customer_sales table as a batch job.

D.

An incremental job will leverage running information in the state store to update aggregate values in the gold_customer_lifetime_sales_summary table.

E.

An incremental job will detect if new rows have been written to the silver_customer_sales table; if new rows are detected, all aggregates will be recalculated and used to overwrite the gold_customer_lifetime_sales_summary table.

A data engineer created a daily batch ingestion pipeline using a cluster with the latest DBR version to store banking transaction data, and persisted it in a MANAGED DELTA table called prod.gold.all_banking_transactions_daily. The data engineer is constantly receiving complaints from business users who query this table ad hoc through a SQL Serverless Warehouse about poor query performance. Upon analysis, the data engineer identified that these users frequently use high-cardinality columns as filters. The engineer now seeks to implement a data layout optimization technique that is incremental, easy to maintain, and can evolve over time.

Which command should the data engineer implement?

A.

Alter the table to use Hive-Style Partitions + Z-ORDER and implement a periodic OPTIMIZE command.

B.

Alter the table to use Liquid Clustering and implement a periodic OPTIMIZE command.

C.

Alter the table to use Hive-Style Partitions and implement a periodic OPTIMIZE command.

D.

Alter the table to use Z-ORDER and implement a periodic OPTIMIZE command.