Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam Questions and Answers

Questions 4

Which of the following SQL keywords can be used to convert a table from a long format to a wide format?

Options:

PIVOT

CONVERT

WHERE

TRANSFORM

SUM

Buy Now

Questions 5

A data engineer is attempting to write Python and SQL in the same command cell and is running into an error The engineer thought that it was possible to use a Python variable in a select statement.

Why does the command fail?

Options:

Databricks supports multiple languages but only one per notebook.

Databricks supports language interoperability in the same cell but only between Scala and SQL

Databricks supports language interoperability but only if a special character is used.

Databricks supports one language per cell.

Buy Now

Questions 6

A data engineer has left the organization. The data team needs to transfer ownership of the data engineer’s Delta tables to a new data engineer. The new data engineer is the lead engineer on the data team.

Assuming the original data engineer no longer has access, which of the following individuals must be the one to transfer ownership of the Delta tables in Data Explorer?

Options:

Databricks account representative

This transfer is not possible

Workspace administrator

New lead data engineer

Original data engineer

Buy Now

Questions 7

A Delta Live Table pipeline includes two datasets defined using streaming live table. Three datasets are defined against Delta Lake table sources using live table.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

What is the expected outcome after clicking Start to update the pipeline assuming previously unprocessed data exists and all definitions are valid?

Options:

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

Buy Now

Questions 8

Which of the following must be specified when creating a new Delta Live Tables pipeline?

Options:

A key-value pair configuration

The preferred DBU/hour cost

A path to cloud storage location for the written data

A location of a target database for the written data

At least one notebook library to be executed

Buy Now

Questions 9

Which of the following tools is used by Auto Loader process data incrementally?

Options:

Checkpointing

Spark Structured Streaming

Data Explorer

Unity Catalog

Databricks SQL

Buy Now

Questions 10

Which of the following commands will return the number of null values in the member_id column?

Options:

SELECT count(member_id) FROM my_table;

SELECT count(member_id) - count_null(member_id) FROM my_table;

SELECT count_if(member_id IS NULL) FROM my_table;

SELECT null(member_id) FROM my_table;

SELECT count_null(member_id) FROM my_table;

Buy Now

Questions 11

A data engineer needs to provide access to a group named manufacturing-team. The team needs privileges to create tables in the quality schema.

Which set of SQL commands will grant a group named manufacturing-team to create tables in a schema named production with the parent catalog named manufacturing with the least privileges?

Options:

Option A

Option B

Option C

Option D

Buy Now

Questions 12

Which of the following data workloads will utilize a Gold table as its source?

Options:

A job that enriches data by parsing its timestamps into a human-readable format

A job that aggregates uncleaned data to create standard summary statistics

A job that cleans data by removing malformatted records

A job that queries aggregated data designed to feed into a dashboard

A job that ingests raw data from a streaming source into the Lakehouse

Buy Now

Questions 13

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which change will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

The pipeline can have different notebook sources in SQL & Python.

The pipeline will need to be written entirely in SQL.

The pipeline will need to be written entirely in Python.

The pipeline will need to use a batch source in place of a streaming source.

Buy Now

Questions 14

A data engineer manages multiple external tables linked to various data sources. The data engineer wants to manage these external tables efficiently and ensure that only the necessary permissions are granted to users for accessing specific external tables.

How should the data engineer manage access to these external tables?

Options:

Create a single user role with full access to all external tables and assign it to all users.

Use Unity Catalog to manage access controls and permissions for each external table individually.

Set up Azure Blob Storage permissions at the container level, allowing access to all external tables.

Grant permissions on the Databricks workspace level, which will automatically apply to all external tables.

Buy Now

Questions 15

In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?

Options:

Checkpointing and Write-ahead Logs

Structured Streaming cannot record the offset range of the data being processed in each trigger.

Replayable Sources and Idempotent Sinks

Write-ahead Logs and Idempotent Sinks

Checkpointing and Idempotent Sinks

Buy Now

Questions 16

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A data lakehouse provides storage solutions for structured and unstructured data.

A data lakehouse supports ACID-compliant transactions.

A data lakehouse allows the use of SQL queries to examine data.

A data lakehouse stores data in open formats.

A data lakehouse enables machine learning and artificial Intelligence workloads.

Buy Now

Questions 17

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

Options:

There was a type mismatch between the specific schema and the inferred schema

JSON data is a text-based format

Auto Loader only works with string data

All of the fields had at least one null value

Auto Loader cannot infer the schema of ingested data

Buy Now

Questions 18

Which SQL code snippet will correctly demonstrate a Data Definition Language (DDL) operation used to create a table?

Options:

DROP TABLE employees;

INSERT INTO employees (id, name) VALUES (1, 'Alice');

CRFATF tabif employees ( id INT, name suing

ALTFR TABIF employees add column salary DECTMA(10,2);

Buy Now

Questions 19

A data engineer is getting a partner organization up to speed with Databricks account. Both teams share some business use cases. The data engineer has to share some of your Unity-Catalog managed delta tables and the notebook jobs creating those tables with the partner organization.

How can the data engineer seamlessly share the required information?

Options:

Zip all the code and share via email and allow data ingestion from your data lake

Data and Notebooks can be shared simply using Unity Catalog.

Share access to codebase via Github and allow them to ingest datasets from your Datalake.

Share required datasets and notebooks via Delta Sharing. Manage permissions via Unity Catalog.

Buy Now

Questions 20

A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables.

Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?