Databricks Databricks-Certified-Professional-Data-Engineer Free Certification Exam Questions Answer Jul 2025 update

Question # 11

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Incremental state information should be maintained for 10 minutes for late-arriving data.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:

Choose the response that correctly fills in the blank within the code block to complete this task.

withWatermark("event_time", "10 minutes")

awaitArrival("event_time", "10 minutes")

await("event_time + ‘10 minutes'")

slidingWindow("event_time", "10 minutes")

delayWrite("event_time", "10 minutes")

Question # 12

A distributed team of data analysts share computing resources on an interactive cluster with autoscaling configured. In order to better manage costs and query throughput, the workspace administrator is hoping to evaluate whether cluster upscaling is caused by many concurrent users or resource-intensive queries.

In which location can one review the timeline for cluster resizing events?

Workspace audit logs

Driver's log file

Ganglia

Cluster Event Log

Executor's log file

Question # 13

A production workload incrementally applies updates from an external Change Data Capture feed to a Delta Lake table as an always-on Structured Stream job. When data was initially migrated for this table, OPTIMIZE was executed and most data files were resized to 1 GB. Auto Optimize and Auto Compaction were both turned on for the streaming production job. Recent review of data files shows that most data files are under 64 MB, although each partition in the table contains at least 1 GB of data and the total table size is over 10 TB.

Which of the following likely explains these smaller file sizes?

Databricks has autotuned to a smaller target file size to reduce duration of MERGE operations

Z-order indices calculated on the table are preventing file compaction

C Bloom filler indices calculated on the table are preventing file compaction

Databricks has autotuned to a smaller target file size based on the overall size of data in the table

Databricks has autotuned to a smaller target file size based on the amount of data in each partition

Question # 14

The Databricks CLI is use to trigger a run of an existing job by passing the job_id parameter. The response that the job run request has been submitted successfully includes a filed run_id.

Which statement describes what the number alongside this field represents?

The job_id is returned in this field.

The job_id and number of times the job has been are concatenated and returned.

The number of times the job definition has been run in the workspace.

The globally unique ID of the newly triggered run.

Question # 15

The data engineering team has configured a Databricks SQL query and alert to monitor the values in a Delta Lake table. Therecent_sensor_recordingstable contains an identifyingsensor_idalongside thetimestampandtemperaturefor the most recent 5 minutes of recordings.

The below query is used to create the alert:

The query is set to refresh each minute and always completes in less than 10 seconds. The alert is set to trigger whenmean (temperature) > 120. Notifications are triggered to be sent at most every 1 minute.

If this alert raises notifications for 3 consecutive minutes and then stops, which statement must be true?

The total average temperature across all sensors exceeded 120 on three consecutive executions of the query

Therecent_sensor_recordingstable was unresponsive for three consecutive runs of the query

The source query failed to update properly for three consecutive minutes and then restarted

The maximum temperature recording for at least one sensor exceeded 120 on three consecutive executions of the query

The average temperature recordings for at least one sensor exceeded 120 on three consecutive executions of the query

Question # 16

Which statement describes the default execution mode for Databricks Auto Loader?

New files are identified by listing the input directory; new files are incrementally and idempotently loaded into the target Delta Lake table.

Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; new files are incrementally and impotently into the target Delta Lake table.

Webhook trigger Databricks job to run anytime new data arrives in a source directory; new data automatically merged into target tables using rules inferred from the data.

New files are identified by listing the input directory; the target table is materialized by directory querying all valid files in the source directory.

Question # 17

The data engineer team is configuring environment for development testing, and production before beginning migration on a new data pipeline. The team requires extensive testing on both the code and data resulting from code execution, and the team want to develop and test against similar production data as possible.

A junior data engineer suggests that production data can be mounted to the development testing environments, allowing pre production code to execute against production data. Because all users have

Admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team.

Which statement captures best practices for this situation?

Because access to production data will always be verified using passthrough credentials it is safe to mount data to any Databricks development environment.

All developer, testing and production code and data should exist in a single unified workspace; creating separate environments for testing and development further reduces risks.

In environments where interactive code will be executed, production data should only be accessible with read permissions; creating isolated databases for each environment further reduces risks.

Because delta Lake versions all data and supports time travel, it is not possible for user error or malicious actors to permanently delete production data, as such it is generally safe to mount production data anywhere.

Question # 18

A junior developer complains that the code in their notebook isn't producing the correct results in the development environment. A shared screenshot reveals that while they're using a notebook versioned with Databricks Repos, they're using a personal branch that contains old logic. The desired branch nameddev-2.3.9is not available from the branch selection dropdown.

Which approach will allow this developer to review the current logic for this notebook?

Use Repos to make a pull request use the Databricks REST API to update the current branch to dev-2.3.9

Use Repos to pull changes from the remote Git repository and select the dev-2.3.9 branch.

Use Repos to checkout the dev-2.3.9 branch and auto-resolve conflicts with the current branch

Merge all changes back to the main branch in the remote Git repository and clone the repo again

Use Repos to merge the current branch and the dev-2.3.9 branch, then make a pull request to sync with the remote repository

Question # 19

A Delta Lake table was created with the below query:

Consider the following query:

DROP TABLE prod.sales_by_store -

If this statement is executed by a workspace admin, which result will occur?

Nothing will occur until a COMMIT command is executed.

The table will be removed from the catalog but the data will remain in storage.

The table will be removed from the catalog and the data will be deleted.

An error will occur because Delta Lake prevents the deletion of production data.

Data will be marked as deleted but still recoverable with Time Travel.

Question # 20

A junior data engineer on your team has implemented the following code block.

The viewnew_eventscontains a batch of records with the same schema as theeventsDelta table. Theevent_idfield serves as a unique key for this table.

When this query is executed, what will happen with new records that have the sameevent_idas an existing record?

They are merged.

They are ignored.

They are updated.

They are inserted.

They are deleted.

Weekend Sale - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

Free Practice Questions for Databricks Databricks-Certified-Professional-Data-Engineer Exam

The Answer Is:

Explanation:

The Answer Is:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: