Summer Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exc65

Which of the following DataFrame methods is classified as a transformation?

A.

DataFrame.count()

B.

DataFrame.show()

C.

DataFrame.select()

D.

DataFrame.foreach()

E.

DataFrame.first()

Which of the following code blocks shows the structure of a DataFrame in a tree-like way, containing both column names and types?

A.

1.print(itemsDf.columns)

2.print(itemsDf.types)

B.

itemsDf.printSchema()

C.

spark.schema(itemsDf)

D.

itemsDf.rdd.printSchema()

E.

itemsDf.print.schema()

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

A.

itemsDf.persist(StorageLevel.MEMORY_ONLY)

B.

itemsDf.cache(StorageLevel.MEMORY_AND_DISK)

C.

itemsDf.store()

D.

itemsDf.cache()

E.

itemsDf.write.option('destination', 'memory').save()

The code block displayed below contains an error. The code block should combine data from DataFrames itemsDf and transactionsDf, showing all rows of DataFrame itemsDf that have a matching

value in column itemId with a value in column transactionsId of DataFrame transactionsDf. Find the error.

Code block:

itemsDf.join(itemsDf.itemId==transactionsDf.transactionId)

A.

The join statement is incomplete.

B.

The union method should be used instead of join.

C.

The join method is inappropriate.

D.

The merge method should be used instead of join.

E.

The join expression is malformed.

Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?

A.

Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.

B.

Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.

C.

Use a narrow transformation to reduce the number of partitions.

D.

Use a wide transformation to reduce the number of partitions.

Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.

Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?

A.

itemsDf.withColumn(["supplier", "manufacturer"])

B.

itemsDf.withColumn("supplier").alias("manufacturer")

C.

itemsDf.withColumnRenamed("supplier", "manufacturer")

D.

itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))

E.

itemsDf.withColumnsRenamed("supplier", "manufacturer")

Which of the following describes Spark's way of managing memory?

A.

Spark uses a subset of the reserved system memory.

B.

Storage memory is used for caching partitions derived from DataFrames.

C.

As a general rule for garbage collection, Spark performs better on many small objects than few big objects.

D.

Disabling serialization potentially greatly reduces the memory footprint of a Spark application.

E.

Spark's memory usage can be divided into three categories: Execution, transaction, and storage.

The code block shown below should show information about the data type that column storeId of DataFrame transactionsDf contains. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__

A.

1. select

2. "storeId"

3. print_schema()

B.

1. limit

2. 1

3. columns

C.

1. select

2. "storeId"

3. printSchema()

D.

1. limit

2. "storeId"

3. printSchema()

E.

1. select

2. storeId

3. dtypes

Which of the following describes Spark actions?

A.

Writing data to disk is the primary purpose of actions.

B.

Actions are Spark's way of exchanging data between executors.

C.

The driver receives data upon request by actions.

D.

Stage boundaries are commonly established by actions.

E.

Actions are Spark's way of modifying RDDs.

Which of the following statements about executors is correct, assuming that one can consider each of the JVMs working as executors as a pool of task execution slots?

A.

Slot is another name for executor.

B.

There must be less executors than tasks.

C.

An executor runs on a single core.

D.

There must be more slots than tasks.

E.

Tasks run in parallel via slots.