Explanation
A narrow transformation is an operation in which no data is exchanged across the cluster.
Correct! In narrow transformations, no data is exchanged across the cluster, since these transformations do not require any data from outside of the partition they are applied on. Typical narrow
transformations include filter, drop, and coalesce.
A narrow transformation is an operation in which data is exchanged across partitions.
No, that would be one definition of a wide transformation, but not of a narrow transformation. Wide transformations typically cause a shuffle, in which data is exchanged across partitions, executors,
and the cluster.
A narrow transformation is an operation in which data is exchanged across the cluster.
No, see explanation just above this one.
A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.
No, type conversion has nothing to do with narrow transformations in Spark.
A narrow transformation is a process in which data from multiple RDDs is used.
No. A resilient distributed dataset (RDD) can be described as a collection of partitions. In a narrow transformation, no data is exchanged between partitions. Thus, no data is exchanged between
RDDs.
One could say though that a narrow transformation and, in fact, any transformation results in a new RDD being created. This is because a transformation results in a change to an existing RDD
(RDDs are the foundation of other Spark data structures, like DataFrames). But, since RDDs are immutable, a new RDD needs to be created to reflect the change caused by the transformation.
More info: Spark Transformation and Action: A Deep Dive | by Misbah Uddin | CodeX | Medium