Pre-Summer Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

A data engineer needs to create a table in Databricks using data from their organization’s existing SQLite database.

They run the following command:

Which of the following lines of code fills in the above blank to successfully complete the task?

A.

org.apache.spark.sql.jdbc

B.

autoloader

C.

DELTA

D.

sqlite

E.

org.apache.spark.sql.sqlite

A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data engineers in other sessions. It also must be saved to a physical location.

Which of the following data entities should the data engineer create?

A.

Database

B.

Function

C.

View

D.

Temporary view

E.

Table

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?

A.

pyspark.sql.types.DateType

B.

datetime

C.

pyspark.sql.types.TimestampType

D.

Cron syntax

E.

There is no way to represent and submit this information programmatically

A data engineer is building a nightly batch ETL pipeline that processes very large volumes of raw JSON logs from a data lake into Delta tables for reporting. The data arrives in bulk once per day, and the pipeline takes several hours to complete. Cost efficiency is important , but performance and reliable completion of the pipeline are the highest priorities.

Which type of Databricks cluster should the data engineer configure?

A.

A job cluster configured to autoscale across multiple workers during the pipeline run

B.

A lightweight single-node cluster with a low worker node count to reduce costs

C.

A high-concurrency cluster designed for interactive SQL workloads

D.

An all-purpose cluster that always runs to ensure low-latency job startup times

A data engineer wants to create a new table containing the names of customers who live in France.

They have written the following command:

CREATE TABLE customersInFrance

_____ AS

SELECT id,

firstName,

lastName

FROM customerLocations

WHERE country = ’FRANCE’;

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (Pll).

Which line of code fills in the above blank to successfully complete the task?

A.

COMMENT " Contains PIT

B.

511

C.

" COMMENT PII "

D.

TBLPROPERTIES PII

A data engineer manages multiple external tables linked to various data sources. The data engineer wants to manage these external tables efficiently and ensure that only the necessary permissions are granted to users for accessing specific external tables.

How should the data engineer manage access to these external tables?

A.

Create a single user role with full access to all external tables and assign it to all users.

B.

Use Unity Catalog to manage access controls and permissions for each external table individually.

C.

Set up Azure Blob Storage permissions at the container level, allowing access to all external tables.

D.

Grant permissions on the Databricks workspace level, which will automatically apply to all external tables.

A data engineer needs to optimize the data layout and query performance for an e-commerce transactions Delta table. The table is partitioned by " purchase_date " a date column which helps with time-based queries but does not optimize searches on user statistics " customer_id " , a high-cardinality column.

The table is usually queried with filters on " customer_i

d " within specific date ranges, but since this data is spread across multiple files in each partition, it results in full partition scans and increased runtime and costs.

How should the data engineer optimize the Data Layout for efficient reads?

A.

Alter table implementing liquid clustering on " customerid " while keeping the existing partitioning.

B.

Alter the table to partition by " customer_id " .

C.

Enable delta caching on the cluster so that frequent reads are cached for performance.

D.

Alter the table implementing liquid clustering by " customer_id " and " purchase_date " .

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

A.

trigger( " 5 seconds " )

B.

trigger()

C.

trigger(once= " 5 seconds " )

D.

trigger(processingTime= " 5 seconds " )

E.

trigger(continuous= " 5 seconds " )

A data engineering team is using Kafka to capture event data and then ingest it into Databricks. The team wants to be able to see these historical events. Medallion architecture is already in place. The team wants to be mindful of costs.

Where should this historical event data be stored?

A.

Gold

B.

Silver

C.

Bronze

D.

Raw layer

A data engineer is writing a script that is meant to ingest new data from cloud storage. In the event of the Schema change, the ingestion should fail. It should fail until the changes downstream source can be found and verified as intended changes.

Which command will meet the requirements?

A.

addNewColumns

B.

failOnNewColumns

C.

rescue

D.

none