Pre-Summer Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

The Delta transaction log for the ‘students’ tables is shown using the ‘DESCRIBE HISTORY students’ command. A Data Engineer needs to query the table as it existed before the UPDATE operation listed in the log.

Which command should the Data Engineer use to achieve this? (Choose two.)

A.

SELECT * FROM students@v4

B.

SELECT * FROM students TIMESTAMP AS OF ‘2024-04-22T 14:32:47.000+00:00’

C.

SELECT * FROM students FROM HISTORY VERSION AS OF 3

D.

SELECT * FROM students VERSION AS OF 5

E.

SELECT * FROM students TIMESTAMP AS OF ‘2024-04-22T 14:32:58.000+00:00’

A data engineer is designing an ETL pipeline to process both streaming and batch data from multiple sources The pipeline must ensure data quality, handle schema evolution, and provide easy maintenance. The team is considering using Delta Live Tables (DLT) in Databricks to achieve these goals. They want to understand the key features and benefits of DLT that make it suitable for this use case.

Why is Delta Live Tables (DLT) an appropriate choice?

A.

Automatic data quality checks, built-in support for schema evolution, and declarative pipeline development

B.

Manual schema enforcement, high operational overhead, and limited scalability

C.

Requires custom code for data quality checks, no support for streaming data, and complex pipeline maintenance

D.

Supports only batch processing, no data versioning, and high infrastructure costs

Identify how the count_if function and the count where x is null can be used

Consider a table random_values with below data.

What would be the output of below query?

select count_if(col > 1) as count_a. count(*) as count_b.count(col1) as count_c from random_values col1

0

1

2

NULL -

2

3

A.

3 6 5

B.

4 6 5

C.

3 6 6

D.

4 6 6

A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task.

Which of the following approaches could be used by the data engineering team to complete this task?

A.

They could submit a feature request with Databricks to add this functionality.

B.

They could wrap the queries using PySpark and use Python’s control flow system to determine when to run the final query.

C.

They could only run the entire program on Sundays.

D.

They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.

E.

They could redesign the data model to separate the data used in the final query into a new table.

Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?

A.

Cloud-specific integrations

B.

Simplified governance

C.

Ability to scale storage

D.

Ability to scale workloads

E.

Avoiding vendor lock-in

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

A.

Replace predict with a stream-friendly prediction function

B.

Replace schema(schema) with option ( " maxFilesPerTrigger " , 1)

C.

Replace " transactions " with the path to the location of the Delta table

D.

Replace format( " delta " ) with format( " stream " )

E.

Replace spark.read with spark.readStream

A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables.

Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?

A.

CREATE TABLE all_transactions ASSELECT * FROM march_transactionsINNER JOIN SELECT * FROM april_transactions;

B.

CREATE TABLE all_transactions ASSELECT * FROM march_transactionsUNION SELECT * FROM april_transactions;

C.

CREATE TABLE all_transactions ASSELECT * FROM march_transactionsOUTER JOIN SELECT * FROM april_transactions;

D.

CREATE TABLE all_transactions ASSELECT * FROM march_transactionsINTERSECT SELECT * from april_transactions;

E.

CREATE TABLE all_transactions ASSELECT * FROM march_transactionsMERGE SELECT * FROM april_transactions;

What Databricks feature can be used to check the data sources and tables used in a workspace?

A.

Do not use the lineage feature as it only tracks activity from the last 3 months and will not provide full details on dependencies.

B.

Use the lineage feature to visualize a graph that highlights where the table is used only in notebooks,

C.

Use the lineage feature to visualize a graph that highlights where the table is used only in reports.

D.

Use the lineage feature to visualize a graph that shows all dependencies, including where the table is used in notebooks, other tables, and reports.

Which of the following data workloads will utilize a Gold table as its source?

A.

A job that enriches data by parsing its timestamps into a human-readable format

B.

A job that aggregates uncleaned data to create standard summary statistics

C.

A job that cleans data by removing malformatted records

D.

A job that queries aggregated data designed to feed into a dashboard

E.

A job that ingests raw data from a streaming source into the Lakehouse

Which file format is used for storing Delta Lake Table?

A.

Parquet

B.

Delta

C.

SV

D.

JSON