Pre-Summer Sale Special - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

Calculate the total sales amount for each region and store the results in a new dataframe called region_sales.

Given the expected result:

Which code will generate the expected result?

A.

region_sales = sales_df.groupBy( " region " ).agg(sum( " sales_amountM).alias( " total_sales_amount " ))

B.

region_sales = sales_df. sum ( " salen_aiTiount " ) . groupBy ( " region " ) .alias ( " total_sale3_amount " )

C.

region_sales= sales_df.groupBy( " category " ).sum(nsales_amount " ).alias( " t_otal_sales_amounl " )

D.

region sales - sales_df.agg(sum( " sales_amount " ).groupBy( " region " ).alias( " total sales amount " ))

A data engineer streams customer orders into a Kafka topic (orders_topic) and is currently writing the ingestion script of a DLT pipeline. The data engineer needs to ingest the data from Kafka brokers to DLT using Databricks

What is the correct code for ingesting the data?

A)

B)

C)

D)

A.

Option A

B.

Option B

C.

Option C

D.

Option D

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.

Which of the following tools can the data engineer use to solve this problem?

A.

Unity Catalog

B.

Delta Lake

C.

Databricks SQL

D.

Data Explorer

E.

Auto Loader

A Databricks single-task workflow fails at the last task due to an error in a notebook. The data engineer fixes the mistake in the notebook. What should the data engineer do to rerun the workflow?

A.

Repair the task

B.

Rerun the pipeline

C.

Restart the Cluster

D.

Switch the cluster

A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their project using Databricks Repos.

Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?

A.

Databricks Repos automatically saves development progress

B.

Databricks Repos supports the use of multiple branches

C.

Databricks Repos allows users to revert to previous versions of a notebook

D.

Databricks Repos provides the ability to comment on specific changes

E.

Databricks Repos is wholly housed within the Databricks Lakehouse Platform

A company uses Delta Sharing to collaborate with partners across different cloud providers and geographic regions. What will result in additional costs due to cross-region or egress fees?

A.

Transferring data via Delta Sharing across clouds and across different geographic regions

B.

Sharing data within the same cloud provider and region

C.

Utilizing Delta Sharing for internal data analytics within a single cloud environment

D.

Accessing Delta Sharing data using a VPN within the same data center

A data engineer is managing a data pipeline in Databricks, where multiple Delta tables are used for various transformations. The team wants to track how data flows through the pipeline, including identifying dependencies between Delta tables, notebooks, jobs, and dashboards. The data engineer is utilizing the Unity Catalog lineage feature to monitor this process.

How does Unity Catalog’s data lineage feature support the visualization of relationships between Delta tables, notebooks, jobs, and dashboards?

A.

Unity Catalog lineage visualizes dependencies between Delta tables, notebooks, and jobs, but does not provide column-level tracing or relationships with dashboards.

B.

Unity Catalog lineage only supports visualizing relationships at the table level and does not extend to notebooks, jobs, or dashboards.

C.

Unity Catalog lineage provides an interactive graph that tracks dependencies between tables and notebooks but excludes any job-related dependencies or dashboard visualizations.

D.

Unity Catalog provides an interactive graph that visualizes the dependencies between Delta tables, notebooks, jobs, and dashboards, while also supporting column-level tracking of data transformations.

A data engineer needs to parse only png files in a directory that contains files with different suffixes. Which code should the data engineer use to achieve this task?

A)

B)

C)

D)

A.

Option A

B.

Option B

C.

Option C

D.

Option D

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

A.

There was a type mismatch between the specific schema and the inferred schema

B.

JSON data is a text-based format

C.

Auto Loader only works with string data

D.

All of the fields had at least one null value

E.

Auto Loader cannot infer the schema of ingested data

A data engineer needs to develop integration tests for an ETL process and deploy a version-controlled, packaged workflow into production using an external job scheduler.

Which tool should the data engineer use for this job?

A.

Databricks Command Line Interface

B.

Databricks Asset Bundles

C.

Databricks Connect

D.

Databricks Software Development Kit