Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Free Certification Exam Questions Answer Jun 2025 update

Which of the following DataFrame methods is classified as a transformation?

DataFrame.count()

DataFrame.show()

DataFrame.select()

DataFrame.foreach()

DataFrame.first()

Which of the following code blocks shows the structure of a DataFrame in a tree-like way, containing both column names and types?

1.print(itemsDf.columns)

2.print(itemsDf.types)

itemsDf.printSchema()

spark.schema(itemsDf)

itemsDf.rdd.printSchema()

itemsDf.print.schema()

Question # 33

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

itemsDf.persist(StorageLevel.MEMORY_ONLY)

itemsDf.cache(StorageLevel.MEMORY_AND_DISK)

itemsDf.store()

itemsDf.cache()

itemsDf.write.option('destination', 'memory').save()

Question # 34

The code block displayed below contains an error. The code block should combine data from DataFrames itemsDf and transactionsDf, showing all rows of DataFrame itemsDf that have a matching

value in column itemId with a value in column transactionsId of DataFrame transactionsDf. Find the error.

Code block:

itemsDf.join(itemsDf.itemId==transactionsDf.transactionId)

The join statement is incomplete.

The union method should be used instead of join.

The join method is inappropriate.

The merge method should be used instead of join.

The join expression is malformed.

Question # 35

Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?

Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.

Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.

Use a narrow transformation to reduce the number of partitions.

Use a wide transformation to reduce the number of partitions.

Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.

Question # 36

Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?

itemsDf.withColumn(["supplier", "manufacturer"])

itemsDf.withColumn("supplier").alias("manufacturer")

itemsDf.withColumnRenamed("supplier", "manufacturer")

itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))

itemsDf.withColumnsRenamed("supplier", "manufacturer")

Explanation:

itemsDf.withColumnRenamed("supplier", "manufacturer")

Correct! This uses the relatively trivial DataFrame method withColumnRenamed for renaming column supplier to column manufacturer.

Note that the QUESTION NO: asks for "a copy of DataFrame itemsDf". This may be confusing if you are not familiar with Spark yet. RDDs (Resilient Distributed Datasets) are the foundation of

Spark DataFrames and are immutable. As such, DataFrames are immutable, too. Any command that changes anything in the DataFrame therefore necessarily returns a copy, or a new version, of it

that has the changes applied.

itemsDf.withColumnsRenamed("supplier", "manufacturer")

Incorrect. Spark's DataFrame API does not have a withColumnsRenamed() method.

itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))

No. Watch out – although the col() method works for many methods of the DataFrame API, withColumnRenamed is not one of them. As outlined in the documentation linked below,

withColumnRenamed expects strings.

itemsDf.withColumn(["supplier", "manufacturer"])

Wrong. While DataFrame.withColumn() exists in Spark, it has a different purpose than renaming columns. withColumn is typically used to add columns to DataFrames, taking the name of the new

column as a first, and a Column as a second argument. Learn more via the documentation that is linked below.

itemsDf.withColumn("supplier").alias("manufacturer")

No. While DataFrame.withColumn() exists, it requires 2 arguments. Furthermore, the alias() method on DataFrames would not help the cause of renaming a column much. DataFrame.alias() can be

useful in addressing the input of join statements. However, this is far outside of the scope of this question. If you are curious nevertheless, check out the link below.

More info: pyspark.sql.DataFrame.withColumnRenamed — PySpark 3.1.1 documentation, pyspark.sql.DataFrame.withColumn — PySpark 3.1.1 documentation, and pyspark.sql.DataFrame.alias —

PySpark 3.1.2 documentation (https://bit.ly/3aSB5tm , https://bit.ly/2Tv4rbE , https://bit.ly/2RbhBd2)

Static notebook | Dynamic notebook: See test 1, QUESTION NO: 31 (Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/31.html ,

https://bit.ly/sparkpracticeexams_import_instructions)

Question # 37

Which of the following describes Spark's way of managing memory?

Spark uses a subset of the reserved system memory.

Storage memory is used for caching partitions derived from DataFrames.

As a general rule for garbage collection, Spark performs better on many small objects than few big objects.

Disabling serialization potentially greatly reduces the memory footprint of a Spark application.

Spark's memory usage can be divided into three categories: Execution, transaction, and storage.

Question # 38

The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__

1. select

2. "storeId"

3. print_schema()

1. limit

2. 1

3. columns

1. select

2. "storeId"

3. printSchema()

1. limit

2. "storeId"

3. printSchema()

1. select

2. storeId

3. dtypes

Question # 39

Which of the following describes Spark actions?

Writing data to disk is the primary purpose of actions.

Actions are Spark's way of exchanging data between executors.

The driver receives data upon request by actions.

Stage boundaries are commonly established by actions.

Actions are Spark's way of modifying RDDs.

Question # 40

Which of the following statements about executors is correct, assuming that one can consider each of the JVMs working as executors as a pool of task execution slots?

Slot is another name for executor.

There must be less executors than tasks.

An executor runs on a single core.

There must be more slots than tasks.

Tasks run in parallel via slots.

Summer Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exc65

Free Practice Questions for Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: