A data engineer is reviewing a Spark application that applies several transformations to a DataFrame but notices that the job does not start executing immediately.
Which two characteristics of Apache Spark's execution model explain this behavior?
Choose 2 answers:
An engineer has a large ORC file located at/file/test_data.orcand wants to read only specific columns to reduce memory usage.
Which code fragment will select the columns, i.e.,col1,col2, during the reading process?
Given a CSV file with the content:
And the following code:
from pyspark.sql.types import *
schema = StructType([
StructField("name", StringType()),
StructField("age", IntegerType())
])
spark.read.schema(schema).csv(path).collect()
What is the resulting output?
A Spark engineer is troubleshooting a Spark application that has been encountering out-of-memory errors during execution. By reviewing the Spark driver logs, the engineer notices multiple "GC overhead limit exceeded" messages.
Which action should the engineer take to resolve this issue?
A data engineer is working with a large JSON dataset containing order information. The dataset is stored in a distributed file system and needs to be loaded into a Spark DataFrame for analysis. The data engineer wants to ensure that the schema is correctly defined and that the data is read efficiently.
Which approach should the data scientist use to efficiently load the JSON data into a Spark DataFrame with a predefined schema?
Given a DataFramedfthat has 10 partitions, after running the code:
result = df.coalesce(20)
How many partitions will the result DataFrame have?
A Spark application is experiencing performance issues in client mode because the driver is resource-constrained.
How should this issue be resolved?
Which feature of Spark Connect is considered when designing an application to enable remote interaction with the Spark cluster?
A data engineer is working ona Streaming DataFrame streaming_df with the given streaming data:
Which operation is supported with streamingdf ?
A developer wants to test Spark Connect with an existing Spark application.
What are the two alternative ways the developer can start a local Spark Connect server without changing their existing application code? (Choose 2 answers)