Weekend Sale - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sntaclus

Which of the following describes characteristics of the Spark driver?

A.

The Spark driver requests the transformation of operations into DAG computations from the worker nodes.

B.

If set in the Spark configuration, Spark scales the Spark driver horizontally to improve parallel processing performance.

C.

The Spark driver processes partitions in an optimized, distributed fashion.

D.

In a non-interactive Spark application, the Spark driver automatically creates the SparkSession object.

E.

The Spark driver's responsibility includes scheduling queries for execution on worker nodes.

Which of the following code blocks reads in the parquet file stored at location filePath, given that all columns in the parquet file contain only whole numbers and are stored in the most appropriate

format for this kind of data?

A.

1.spark.read.schema(

2. StructType(

3. StructField("transactionId", IntegerType(), True),

4. StructField("predError", IntegerType(), True)

5. )).load(filePath)

B.

1.spark.read.schema([

2. StructField("transactionId", NumberType(), True),

3. StructField("predError", IntegerType(), True)

4. ]).load(filePath)

C.

1.spark.read.schema(

2. StructType([

3. StructField("transactionId", StringType(), True),

4. StructField("predError", IntegerType(), True)]

5. )).parquet(filePath)

D.

1.spark.read.schema(

2. StructType([

3. StructField("transactionId", IntegerType(), True),

4. StructField("predError", IntegerType(), True)]

5. )).format("parquet").load(filePath)

E.

1.spark.read.schema([

2. StructField("transactionId", IntegerType(), True),

3. StructField("predError", IntegerType(), True)

4. ]).load(filePath, format="parquet")

Which of the following statements about Spark's DataFrames is incorrect?

A.

Spark's DataFrames are immutable.

B.

Spark's DataFrames are equal to Python's DataFrames.

C.

Data in DataFrames is organized into named columns.

D.

RDDs are at the core of DataFrames.

E.

The data in DataFrames may be split into multiple chunks.

Which of the following describes a narrow transformation?

A.

narrow transformation is an operation in which data is exchanged across partitions.

B.

A narrow transformation is a process in which data from multiple RDDs is used.

C.

A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.

D.

A narrow transformation is an operation in which data is exchanged across the cluster.

E.

A narrow transformation is an operation in which no data is exchanged across the cluster.