Summer Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: cramtreat

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Questions 4

The code block displayed below contains an error. The code block below is intended to add a column itemNameElements to DataFrame itemsDf that includes an array of all words in column

itemName. Find the error.

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-------------------+

2.|itemId|itemName |supplier |

3.+------+----------------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|

5.|2 |Elegant Outdoors Summer Dress |YetiX |

6.|3 |Outdoors Backpack |Sports Company Inc.|

7.+------+----------------------------------+-------------------+

Code block:

itemsDf.withColumnRenamed("itemNameElements", split("itemName"))

itemsDf.withColumnRenamed("itemNameElements", split("itemName"))

Options:

A.

All column names need to be wrapped in the col() operator.

B.

Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument "," needs to be passed to the split method.

C.

Operator withColumnRenamed needs to be replaced with operator withColumn and the split method needs to be replaced by the splitString method.

D.

Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument " " needs to be passed to the split method.

E.

The expressions "itemNameElements" and split("itemName") need to be swapped.

Buy Now
Questions 5

Which of the following code blocks reads in the two-partition parquet file stored at filePath, making sure all columns are included exactly once even though each partition has a different schema?

Schema of first partition:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- productId: integer (nullable = true)

7. |-- f: integer (nullable = true)

Schema of second partition:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- rollId: integer (nullable = true)

7. |-- f: integer (nullable = true)

8. |-- tax_id: integer (nullable = false)

Options:

A.

spark.read.parquet(filePath, mergeSchema='y')

B.

spark.read.option("mergeSchema", "true").parquet(filePath)

C.

spark.read.parquet(filePath)

D.

1.nx = 0

2.for file in dbutils.fs.ls(filePath):

3. if not file.name.endswith(".parquet"):

4. continue

5. df_temp = spark.read.parquet(file.path)

6. if nx == 0:

7. df = df_temp

8. else:

9. df = df.union(df_temp)

10. nx = nx+1

11.df

E.

1.nx = 0

2.for file in dbutils.fs.ls(filePath):

3. if not file.name.endswith(".parquet"):

4. continue

5. df_temp = spark.read.parquet(file.path)

6. if nx == 0:

7. df = df_temp

8. else:

9. df = df.join(df_temp, how="outer")

10. nx = nx+1

11.df

Buy Now
Questions 6

Which of the following code blocks generally causes a great amount of network traffic?

Options:

A.

DataFrame.select()

B.

DataFrame.coalesce()

C.

DataFrame.collect()

D.

DataFrame.rdd.map()

E.

DataFrame.count()

Buy Now
Questions 7

Which of the following code blocks returns about 150 randomly selected rows from the 1000-row DataFrame transactionsDf, assuming that any row can appear more than once in the returned

DataFrame?

Options:

A.

transactionsDf.resample(0.15, False, 3142)

B.

transactionsDf.sample(0.15, False, 3142)

C.

transactionsDf.sample(0.15)

D.

transactionsDf.sample(0.85, 8429)

E.

transactionsDf.sample(True, 0.15, 8261)

Buy Now
Questions 8

The code block shown below should return the number of columns in the CSV file stored at location filePath. From the CSV file, only lines should be read that do not start with a # character. Choose

the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

__1__(__2__.__3__.csv(filePath, __4__).__5__)

Options:

A.

1. size

2. spark

3. read()

4. escape='#'

5. columns

B.

1. DataFrame

2. spark

3. read()

4. escape='#'

5. shape[0]

C.

1. len

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

D.

1. size

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

E.

1. len

2. spark

3. read

4. comment='#'

5. columns

Buy Now
Questions 9

Which of the following code blocks returns a one-column DataFrame of all values in column supplier of DataFrame itemsDf that do not contain the letter X? In the DataFrame, every value should

only be listed once.

Sample of DataFrame itemsDf:

1.+------+--------------------+--------------------+-------------------+

2.|itemId| itemName| attributes| supplier|

3.+------+--------------------+--------------------+-------------------+

4.| 1|Thick Coat for Wa...|[blue, winter, cozy]|Sports Company Inc.|

5.| 2|Elegant Outdoors ...|[red, summer, fre...| YetiX|

6.| 3| Outdoors Backpack|[green, summer, t...|Sports Company Inc.|

7.+------+--------------------+--------------------+-------------------+

Options:

A.

itemsDf.filter(col(supplier).not_contains('X')).select(supplier).distinct()

B.

itemsDf.select(~col('supplier').contains('X')).distinct()

C.

itemsDf.filter(not(col('supplier').contains('X'))).select('supplier').unique()

D.

itemsDf.filter(~col('supplier').contains('X')).select('supplier').distinct()

E.

itemsDf.filter(!col('supplier').contains('X')).select(col('supplier')).unique()

Buy Now
Questions 10

Which of the following is a characteristic of the cluster manager?

Options:

A.

Each cluster manager works on a single partition of data.

B.

The cluster manager receives input from the driver through the SparkContext.

C.

The cluster manager does not exist in standalone mode.

D.

The cluster manager transforms jobs into DAGs.

E.

In client mode, the cluster manager runs on the edge node.

Buy Now
Questions 11

The code block displayed below contains multiple errors. The code block should return a DataFrame that contains only columns transactionId, predError, value and storeId of DataFrame

transactionsDf. Find the errors.

Code block:

transactionsDf.select([col(productId), col(f)])

Sample of transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

A.

The column names should be listed directly as arguments to the operator and not as a list.

B.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed

as strings without being wrapped in a col() operator.

C.

The select operator should be replaced by a drop operator.

D.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and

f should be replaced by transactionId, predError, value and storeId.

E.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Buy Now
Questions 12

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

Options:

A.

spark.read.json(filePath)

B.

spark.read.path(filePath, source="json")

C.

spark.read().path(filePath)

D.

spark.read().json(filePath)

E.

spark.read.path(filePath)

Buy Now
Questions 13

The code block shown below should return an exact copy of DataFrame transactionsDf that does not include rows in which values in column storeId have the value 25. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

Options:

A.

transactionsDf.remove(transactionsDf.storeId==25)

B.

transactionsDf.where(transactionsDf.storeId!=25)

C.

transactionsDf.filter(transactionsDf.storeId==25)

D.

transactionsDf.drop(transactionsDf.storeId==25)

E.

transactionsDf.select(transactionsDf.storeId!=25)

Buy Now
Questions 14

Which of the following describes how Spark achieves fault tolerance?

Options:

A.

Spark helps fast recovery of data in case of a worker fault by providing the MEMORY_AND_DISK storage level option.

B.

If an executor on a worker node fails while calculating an RDD, that RDD can be recomputed by another executor using the lineage.

C.

Spark builds a fault-tolerant layer on top of the legacy RDD data system, which by itself is not fault tolerant.

D.

Due to the mutability of DataFrames after transformations, Spark reproduces them using observed lineage in case of worker node failure.

E.

Spark is only fault-tolerant if this feature is specifically enabled via the spark.fault_recovery.enabled property.

Buy Now
Questions 15

Which of the following describes a narrow transformation?

Options:

A.

narrow transformation is an operation in which data is exchanged across partitions.

B.

A narrow transformation is a process in which data from multiple RDDs is used.

C.

A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.

D.

A narrow transformation is an operation in which data is exchanged across the cluster.

E.

A narrow transformation is an operation in which no data is exchanged across the cluster.

Buy Now
Questions 16

Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?

Options:

A.

itemsDf.join(transactionsDf, "inner", itemsDf.itemId == transactionsDf.transactionId)

B.

itemsDf.join(transactionsDf, itemId == transactionId)

C.

itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.transactionId, "inner")

D.

itemsDf.join(transactionsDf, "itemsDf.itemId == transactionsDf.transactionId", "inner")

E.

itemsDf.join(transactionsDf, col(itemsDf.itemId) == col(transactionsDf.transactionId))

Buy Now
Questions 17

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

Options:

A.

itemsDf.persist(StorageLevel.MEMORY_ONLY)

B.

itemsDf.cache(StorageLevel.MEMORY_AND_DISK)

C.

itemsDf.store()

D.

itemsDf.cache()

E.

itemsDf.write.option('destination', 'memory').save()

Buy Now
Questions 18

Which of the following is a problem with using accumulators?

Options:

A.

Only unnamed accumulators can be inspected in the Spark UI.

B.

Only numeric values can be used in accumulators.

C.

Accumulator values can only be read by the driver, but not by executors.

D.

Accumulators do not obey lazy evaluation.

E.

Accumulators are difficult to use for debugging because they will only be updated once, independent if a task has to be re-run due to hardware failure.

Buy Now
Questions 19

The code block displayed below contains an error. The code block should display the schema of DataFrame transactionsDf. Find the error.

Code block:

transactionsDf.rdd.printSchema

Options:

A.

There is no way to print a schema directly in Spark, since the schema can be printed easily through using print(transactionsDf.columns), so that should be used instead.

B.

The code block should be wrapped into a print() operation.

C.

printSchema is only accessible through the spark session, so the code block should be rewritten as spark.printSchema(transactionsDf).

D.

printSchema is a method and should be written as printSchema(). It is also not callable through transactionsDf.rdd, but should be called directly from transactionsDf.

(Correct)

E.

printSchema is a not a method of transactionsDf.rdd. Instead, the schema should be printed via transactionsDf.print_schema().

Buy Now
Questions 20

Which of the following code blocks returns a new DataFrame with only columns predError and values of every second row of DataFrame transactionsDf?

Entire DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

Options:

A.

transactionsDf.filter(col("transactionId").isin([3,4,6])).select([predError, value])

B.

transactionsDf.select(col("transactionId").isin([3,4,6]), "predError", "value")

C.

transactionsDf.filter("transactionId" % 2 == 0).select("predError", "value")

D.

transactionsDf.filter(col("transactionId") % 2 == 0).select("predError", "value")

(Correct)

E.

1.transactionsDf.createOrReplaceTempView("transactionsDf")

2.spark.sql("FROM transactionsDf SELECT predError, value WHERE transactionId % 2 = 2")

F.

transactionsDf.filter(col(transactionId).isin([3,4,6]))

Buy Now
Questions 21

Which of the following describes characteristics of the Dataset API?

Options:

A.

The Dataset API does not support unstructured data.

B.

In Python, the Dataset API mainly resembles Pandas' DataFrame API.

C.

In Python, the Dataset API's schema is constructed via type hints.

D.

The Dataset API is available in Scala, but it is not available in Python.

E.

The Dataset API does not provide compile-time type safety.

Buy Now
Questions 22

Which of the following code blocks sorts DataFrame transactionsDf both by column storeId in ascending and by column productId in descending order, in this priority?

Options:

A.

transactionsDf.sort("storeId", asc("productId"))

B.

transactionsDf.sort(col(storeId)).desc(col(productId))

C.

transactionsDf.order_by(col(storeId), desc(col(productId)))

D.

transactionsDf.sort("storeId", desc("productId"))

E.

transactionsDf.sort("storeId").sort(desc("productId"))

Buy Now
Questions 23

Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?

Options:

A.

itemsDf.withColumn(["supplier", "manufacturer"])

B.

itemsDf.withColumn("supplier").alias("manufacturer")

C.

itemsDf.withColumnRenamed("supplier", "manufacturer")

D.

itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))

E.

itemsDf.withColumnsRenamed("supplier", "manufacturer")

Buy Now
Questions 24

The code block displayed below contains an error. The code block should return the average of rows in column value grouped by unique storeId. Find the error.

Code block:

transactionsDf.agg("storeId").avg("value")

Options:

A.

Instead of avg("value"), avg(col("value")) should be used.

B.

The avg("value") should be specified as a second argument to agg() instead of being appended to it.

C.

All column names should be wrapped in col() operators.

D.

agg should be replaced by groupBy.

E.

"storeId" and "value" should be swapped.

Buy Now
Questions 25

Which of the following options describes the responsibility of the executors in Spark?

Options:

A.

The executors accept jobs from the driver, analyze those jobs, and return results to the driver.

B.

The executors accept tasks from the driver, execute those tasks, and return results to the cluster manager.

C.

The executors accept tasks from the driver, execute those tasks, and return results to the driver.

D.

The executors accept tasks from the cluster manager, execute those tasks, and return results to the driver.

E.

The executors accept jobs from the driver, plan those jobs, and return results to the cluster manager.

Buy Now
Questions 26

Which of the following code blocks produces the following output, given DataFrame transactionsDf?

Output:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- productId: integer (nullable = true)

7. |-- f: integer (nullable = true)

DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

A.

transactionsDf.schema.print()

B.

transactionsDf.rdd.printSchema()

C.

transactionsDf.rdd.formatSchema()

D.

transactionsDf.printSchema()

E.

print(transactionsDf.schema)

Buy Now
Questions 27

Which is the highest level in Spark's execution hierarchy?

Options:

A.

Task

B.

Executor

C.

Slot

D.

Job

E.

Stage

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam
Last Update: May 23, 2024
Questions: 180
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 pdf

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

$28  $80
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

$33.25  $95
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

$45.5  $130