Summer Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: exc65

The code block shown below should return the number of columns in the CSV file stored at location filePath. From the CSV file, only lines should be read that do not start with a # character. Choose

the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

__1__(__2__.__3__.csv(filePath, __4__).__5__)

A.

1. size

2. spark

3. read()

4. escape='#'

5. columns

B.

1. DataFrame

2. spark

3. read()

4. escape='#'

5. shape[0]

C.

1. len

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

D.

1. size

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

E.

1. len

2. spark

3. read

4. comment='#'

5. columns

Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?

A.

DataFrame.repartition(12)

B.

DataFrame.coalesce(6).shuffle()

C.

DataFrame.coalesce(6)

D.

DataFrame.coalesce(6, shuffle=True)

E.

DataFrame.repartition(6)

The code block displayed below contains an error. The code block should return a DataFrame where all entries in column supplier contain the letter combination et in this order. Find the error.

Code block:

itemsDf.filter(Column('supplier').isin('et'))

A.

The Column operator should be replaced by the col operator and instead of isin, contains should be used.

B.

The expression inside the filter parenthesis is malformed and should be replaced by isin('et', 'supplier').

C.

Instead of isin, it should be checked whether column supplier contains the letters et, so isin should be replaced with contains. In addition, the column should be accessed using col['supplier'].

D.

The expression only returns a single column and filter should be replaced by select.

Which of the following describes Spark's Adaptive Query Execution?

A.

Adaptive Query Execution features include dynamically coalescing shuffle partitions, dynamically injecting scan filters, and dynamically optimizing skew joins.

B.

Adaptive Query Execution is enabled in Spark by default.

C.

Adaptive Query Execution reoptimizes queries at execution points.

D.

Adaptive Query Execution features are dynamically switching join strategies and dynamically optimizing skew joins.

E.

Adaptive Query Execution applies to all kinds of queries.

The code block shown below should return all rows of DataFrame itemsDf that have at least 3 items in column itemNameElements. Choose the answer that correctly fills the blanks in the code block

to accomplish this.

Example of DataFrame itemsDf:

1.+------+----------------------------------+-------------------+------------------------------------------+

2.|itemId|itemName |supplier |itemNameElements |

3.+------+----------------------------------+-------------------+------------------------------------------+

4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|[Thick, Coat, for, Walking, in, the, Snow]|

5.|2 |Elegant Outdoors Summer Dress |YetiX |[Elegant, Outdoors, Summer, Dress] |

6.|3 |Outdoors Backpack |Sports Company Inc.|[Outdoors, Backpack] |

7.+------+----------------------------------+-------------------+------------------------------------------+

Code block:

itemsDf.__1__(__2__(__3__)__4__)

A.

1. select

2. count

3. col("itemNameElements")

4. >3

B.

1. filter

2. count

3. itemNameElements

4. >=3

C.

1. select

2. count

3. "itemNameElements"

4. >3

D.

1. filter

2. size

3. "itemNameElements"

4. >=3

(Correct)

E.

1. select

2. size

3. "itemNameElements"

4. >3

The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate

row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes

contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:

itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))

A.

1. filter

2. array_contains("cozy")

3. select

4. "itemId"

5. explode

6. "attributes"

B.

1. where

2. "array_contains(attributes, 'cozy')"

3. select

4. itemId

5. explode

6. attributes

C.

1. filter

2. "array_contains(attributes, 'cozy')"

3. select

4. "itemId"

5. map

6. "attributes"

D.

1. filter

2. "array_contains(attributes, cozy)"

3. select

4. "itemId"

5. explode

6. "attributes"

E.

1. filter

2. "array_contains(attributes, 'cozy')"

3. select

4. "itemId"

5. explode

6. "attributes"

Which of the following code blocks reads JSON file imports.json into a DataFrame?

A.

spark.read().mode("json").path("/FileStore/imports.json")

B.

spark.read.format("json").path("/FileStore/imports.json")

C.

spark.read("json", "/FileStore/imports.json")

D.

spark.read.json("/FileStore/imports.json")

E.

spark.read().json("/FileStore/imports.json")

Which of the following describes characteristics of the Dataset API?

A.

The Dataset API does not support unstructured data.

B.

In Python, the Dataset API mainly resembles Pandas' DataFrame API.

C.

In Python, the Dataset API's schema is constructed via type hints.

D.

The Dataset API is available in Scala, but it is not available in Python.

E.

The Dataset API does not provide compile-time type safety.

Which of the following code blocks returns a DataFrame with approximately 1,000 rows from the 10,000-row DataFrame itemsDf, without any duplicates, returning the same rows even if the code

block is run twice?

A.

itemsDf.sampleBy("row", fractions={0: 0.1}, seed=82371)

B.

itemsDf.sample(fraction=0.1, seed=87238)

C.

itemsDf.sample(fraction=1000, seed=98263)

D.

itemsDf.sample(withReplacement=True, fraction=0.1, seed=23536)

E.

itemsDf.sample(fraction=0.1)

The code block displayed below contains an error. The code block should arrange the rows of DataFrame transactionsDf using information from two columns in an ordered fashion, arranging first by

column value, showing smaller numbers at the top and greater numbers at the bottom, and then by column predError, for which all values should be arranged in the inverse way of the order of items

in column value. Find the error.

Code block:

transactionsDf.orderBy('value', asc_nulls_first(col('predError')))

A.

Two orderBy statements with calls to the individual columns should be chained, instead of having both columns in one orderBy statement.

B.

Column value should be wrapped by the col() operator.

C.

Column predError should be sorted in a descending way, putting nulls last.

D.

Column predError should be sorted by desc_nulls_first() instead.

E.

Instead of orderBy, sort should be used.