Weekend Sale - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

Which of the following statements about Spark's execution hierarchy is correct?

A.

In Spark's execution hierarchy, a job may reach over multiple stage boundaries.

B.

In Spark's execution hierarchy, manifests are one layer above jobs.

C.

In Spark's execution hierarchy, a stage comprises multiple jobs.

D.

In Spark's execution hierarchy, executors are the smallest unit.

E.

In Spark's execution hierarchy, tasks are one layer above slots.

The code block displayed below contains an error. The code block should configure Spark to split data in 20 parts when exchanging data between executors for joins or aggregations. Find the error.

Code block:

spark.conf.set(spark.sql.shuffle.partitions, 20)

A.

The code block uses the wrong command for setting an option.

B.

The code block sets the wrong option.

C.

The code block expresses the option incorrectly.

D.

The code block sets the incorrect number of parts.

E.

The code block is missing a parameter.

Which of the following code blocks returns a DataFrame showing the mean value of column "value" of DataFrame transactionsDf, grouped by its column storeId?

A.

transactionsDf.groupBy(col(storeId).avg())

B.

transactionsDf.groupBy("storeId").avg(col("value"))

C.

transactionsDf.groupBy("storeId").agg(avg("value"))

D.

transactionsDf.groupBy("storeId").agg(average("value"))

E.

transactionsDf.groupBy("value").average()

Which of the following describes Spark's standalone deployment mode?

A.

Standalone mode uses a single JVM to run Spark driver and executor processes.

B.

Standalone mode means that the cluster does not contain the driver.

C.

Standalone mode is how Spark runs on YARN and Mesos clusters.

D.

Standalone mode uses only a single executor per worker per application.

E.

Standalone mode is a viable solution for clusters that run multiple frameworks, not only Spark.

The code block shown below should return only the average prediction error (column predError) of a random subset, without replacement, of approximately 15% of rows in DataFrame

transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, __3__).__4__(avg('predError'))

A.

1. sample

2. True

3. 0.15

4. filter

B.

1. sample

2. False

3. 0.15

4. select

C.

1. sample

2. 0.85

3. False

4. select

D.

1. fraction

2. 0.15

3. True

4. where

E.

1. fraction

2. False

3. 0.85

4. select

Which of the following is not a feature of Adaptive Query Execution?

A.

Replace a sort merge join with a broadcast join, where appropriate.

B.

Coalesce partitions to accelerate data processing.

C.

Split skewed partitions into smaller partitions to avoid differences in partition processing time.

D.

Reroute a query in case of an executor failure.

E.

Collect runtime statistics during query execution.

Which of the following code blocks returns a copy of DataFrame transactionsDf where the column storeId has been converted to string type?

A.

transactionsDf.withColumn("storeId", convert("storeId", "string"))

B.

transactionsDf.withColumn("storeId", col("storeId", "string"))

C.

transactionsDf.withColumn("storeId", col("storeId").convert("string"))

D.

transactionsDf.withColumn("storeId", col("storeId").cast("string"))

E.

transactionsDf.withColumn("storeId", convert("storeId").as("string"))

Which of the following code blocks returns the number of unique values in column storeId of DataFrame transactionsDf?

A.

transactionsDf.select("storeId").dropDuplicates().count()

B.

transactionsDf.select(count("storeId")).dropDuplicates()

C.

transactionsDf.select(distinct("storeId")).count()

D.

transactionsDf.dropDuplicates().agg(count("storeId"))

E.

transactionsDf.distinct().select("storeId").count()

Which of the following code blocks returns a one-column DataFrame of all values in column supplier of DataFrame itemsDf that do not contain the letter X? In the DataFrame, every value should

only be listed once.

Sample of DataFrame itemsDf:

1.+------+--------------------+--------------------+-------------------+

2.|itemId| itemName| attributes| supplier|

3.+------+--------------------+--------------------+-------------------+

4.| 1|Thick Coat for Wa...|[blue, winter, cozy]|Sports Company Inc.|

5.| 2|Elegant Outdoors ...|[red, summer, fre...| YetiX|

6.| 3| Outdoors Backpack|[green, summer, t...|Sports Company Inc.|

7.+------+--------------------+--------------------+-------------------+

A.

itemsDf.filter(col(supplier).not_contains('X')).select(supplier).distinct()

B.

itemsDf.select(~col('supplier').contains('X')).distinct()

C.

itemsDf.filter(not(col('supplier').contains('X'))).select('supplier').unique()

D.

itemsDf.filter(~col('supplier').contains('X')).select('supplier').distinct()

E.

itemsDf.filter(!col('supplier').contains('X')).select(col('supplier')).unique()

Which of the following code blocks applies the boolean-returning Python function evaluateTestSuccess to column storeId of DataFrame transactionsDf as a user-defined function?

A.

1.from pyspark.sql import types as T

2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.BooleanType())

3.transactionsDf.withColumn("result", evaluateTestSuccessUDF(col("storeId")))

B.

1.evaluateTestSuccessUDF = udf(evaluateTestSuccess)

2.transactionsDf.withColumn("result", evaluateTestSuccessUDF(storeId))

C.

1.from pyspark.sql import types as T

2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.IntegerType())

3.transactionsDf.withColumn("result", evaluateTestSuccess(col("storeId")))

D.

1.evaluateTestSuccessUDF = udf(evaluateTestSuccess)

2.transactionsDf.withColumn("result", evaluateTestSuccessUDF(col("storeId")))

E.

1.from pyspark.sql import types as T

2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.BooleanType())

3.transactionsDf.withColumn("result", evaluateTestSuccess(col("storeId")))