Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Free Certification Exam Questions Answer Jul 2025 update

Question # 21

A data scientist has identified that some records in the user profile table contain null values in any of the fields, and such records should be removed from the dataset before processing. The schema includes fields like user_id, username, date_of_birth, created_ts, etc.

The schema of the user profile table looks like this:

Which block of Spark code can be used to achieve this requirement?

Options:

filtered_df = users_raw_df.na.drop(thresh=0)

filtered_df = users_raw_df.na.drop(how='all')

filtered_df = users_raw_df.na.drop(how='any')

filtered_df = users_raw_df.na.drop(how='all', thresh=None)

Question # 22

A Spark developer is building an app to monitor task performance. They need to track the maximum task processing time per worker node and consolidate it on the driver for analysis.

Which technique should be used?

Use an RDD action like reduce() to compute the maximum time

Use an accumulator to record the maximum time on the driver

Broadcast a variable to share the maximum time among workers

Configure the Spark UI to automatically collect maximum times

Question # 23

A data analyst builds a Spark application to analyze finance data and performs the following operations:filter,select,groupBy, andcoalesce.

Which operation results in a shuffle?

groupBy

filter

select

coalesce

Question # 24

A data scientist is working with a Spark DataFrame called customerDF that contains customer information.The DataFrame has a column named email with customer email addresses. The data scientist needs to split this column into username and domain parts.

Which code snippet splits the email column into username and domain columns?

customerDF.select(

col("email").substr(0, 5).alias("username"),

col("email").substr(-5).alias("domain")

)

customerDF.withColumn("username", split(col("email"), "@").getItem(0)) \

.withColumn("domain", split(col("email"), "@").getItem(1))

customerDF.withColumn("username", substring_index(col("email"), "@", 1)) \

.withColumn("domain", substring_index(col("email"), "@", -1))

customerDF.select(

regexp_replace(col("email"), "@", "").alias("username"),

regexp_replace(col("email"), "@", "").alias("domain")

)

Question # 25

A Spark DataFramedfis cached using theMEMORY_AND_DISKstorage level, but the DataFrame is too large to fit entirely in memory.

What is the likely behavior when Spark runs out of memory to store the DataFrame?

Spark duplicates the DataFrame in both memory and disk. If it doesn't fit in memory, the DataFrame is stored and retrieved from the disk entirely.

Spark splits the DataFrame evenly between memory and disk, ensuring balanced storage utilization.

Spark will store as much data as possible in memory and spill the rest to disk when memory is full, continuing processing with performance overhead.

Spark stores the frequently accessed rows in memory and less frequently accessed rows on disk, utilizing both resources to offer balanced performance.

Weekend Sale - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

Free Practice Questions for Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: