Pre-Summer Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: cramtick70

Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam Questions and Answers

Questions 4

A Delta Live Table pipeline includes two datasets defined using streaming live table. Three datasets are defined against Delta Lake table sources using live table.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

What is the expected outcome after clicking Start to update the pipeline assuming previously unprocessed data exists and all definitions are valid?

Options:

A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

C.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

D.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

Buy Now
Questions 5

A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task.

Which of the following approaches could be used by the data engineering team to complete this task?

Options:

A.

They could submit a feature request with Databricks to add this functionality.

B.

They could wrap the queries using PySpark and use Python’s control flow system to determine when to run the final query.

C.

They could only run the entire program on Sundays.

D.

They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.

E.

They could redesign the data model to separate the data used in the final query into a new table.

Buy Now
Questions 6

Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

Options:

A.

CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.

B.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.

C.

CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.

D.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.

E.

CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.

Buy Now
Questions 7

A Databricks single-task workflow fails at the last task due to an error in a notebook. The data engineer fixes the mistake in the notebook. What should the data engineer do to rerun the workflow?

Options:

A.

Repair the task

B.

Rerun the pipeline

C.

Restart the Cluster

D.

Switch the cluster

Buy Now
Questions 8

A data architect has determined that a table of the following format is necessary:

Which of the following code blocks uses SQL DDL commands to create an empty Delta table in the above format regardless of whether a table already exists with this name?

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Buy Now
Questions 9

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Development mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:

A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist until the pipeline is shut down.

C.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

E.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

Buy Now
Questions 10

Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?

Options:

A.

Manually programming in an alert system in each cell of the Notebook

B.

Setting up an Alert in the Job page

C.

Setting up an Alert in the Notebook

D.

There is no way to notify the Job owner in the case of Job failure

E.

MLflow Model Registry Webhooks

Buy Now
Questions 11

In which of the following file formats is data from Delta Lake tables primarily stored?

Options:

A.

Delta

B.

CSV

C.

Parquet

D.

JSON

E.

A proprietary, optimized format specific to Databricks

Buy Now
Questions 12

Which of the following can be used to simplify and unify siloed data architectures that are specialized for specific use cases?

Options:

A.

None of these

B.

Data lake

C.

Data warehouse

D.

All of these

E.

Data lakehouse

Buy Now
Questions 13

A data engineer needs to conduct Exploratory Data Analysis (EDA) on data residing in a database within the company’s custom-defined cloud network . The data engineer is using SQL for this task.

Which type of SQL Warehouse will enable the data engineer to process large numbers of queries quickly and cost-effectively?

Options:

A.

All-purpose compute cluster

B.

Pro SQL Warehouse

C.

SQL Serverless Warehouse

D.

Classic SQL Warehouse

Buy Now
Questions 14

A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.

Which of the following explains why the data files are no longer present?

Options:

A.

The VACUUM command was run on the table

B.

The TIME TRAVEL command was run on the table

C.

The DELETE HISTORY command was run on the table

D.

The OPTIMIZE command was nun on the table

E.

The HISTORY command was run on the table

Buy Now
Questions 15

A global retail company sells products across multiple categories (e.g.. Electronics, Clothing) and regions (e.g.. North. South, East. West). The sales team has provided the data engineer with a PySpark dataframe named sales_df as below and the team wants the data engineer to analyze the sales data to help them make strategic decisions.

Options:

A.

Category_sales = sales df.groupBy( " category " ).agg(sum( " sales amount " ) .alias ( " total sales amount " ))

B.

Category_sales = sales_df.sum( " 3ales_amount " ). g-1- upBy( " categcryn).alias( " toLal_sales_amount))

C.

Category_sale: .es df -agg (sum ( " sales amount " ) .-;r*i:rRy ( " category " ) .alias ( " total sa.en amount " ))

D.

Category_sales = sales_df.groupBy( " reqion " ). agq(sum( " sales_amountn).alias(ntotal_sales_amount ' ' ))

Buy Now
Questions 16

A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data engineers in other sessions. It also must be saved to a physical location.

Which of the following data entities should the data engineer create?

Options:

A.

Database

B.

Function

C.

View

D.

Temporary view

E.

Table

Buy Now
Questions 17

A data engineer is setting up access control in Unity Catalog and needs to ensure that a group of data analysts can query tables but not modify data.

Which permission should the data engineer grant to the data analysts?

Options:

A.

SELECT

B.

INSERT

C.

MODIFY

D.

ALL PRIVILEGES

Buy Now
Questions 18

A data engineer has joined an existing project and they see the following query in the project repository:

CREATE STREAMING LIVE TABLE loyal_customers AS

SELECT customer_id -

FROM STREAM(LIVE.customers)

WHERE loyalty_level = ' high ' ;

Which of the following describes why the STREAM function is included in the query?

Options:

A.

The STREAM function is not needed and will cause an error.

B.

The table being created is a live table.

C.

The customers table is a streaming live table.

D.

The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.

E.

The data in the customers table has been updated since its last run.

Buy Now
Questions 19

A company is collaborating with a partner that does not use Databricks but needs access to a large historical dataset stored in Delta format. The data engineer needs to ensure that the partner can access the data securely, without the need for them to set up an account, and with read-only access.

How should the data be shared?

Options:

A.

Share the dataset using Delta Sharing, which allows your partner to access the data using a secure, read-only URL without requiring a Databricks account, ensuring that they cannot modify the data.

B.

Share the dataset using Unity Catalog, ensuring that both teams have full write access to the data within the same organization.

C.

Share the dataset by exporting it to a CSV file and manually transferring the file to the partner ' s system.

D.

Grant your partner access to your Databricks workspace and assign them full write permissions to the Delta table, enabling them to modify the dataset.

Buy Now
Questions 20

A data engineer is migrating pipeline tasks to reduce operational toil. The workspace uses Unity Catalog and is in a region that supports serverless. The engineer wants Databricks to auto-select instance types, manage scaling, apply Photon, and handle runtime upgrades automatically for job runs.

How should the data engineer meet this requirement while adhering to Databricks constraints?

Options:

A.

Use a Pro SQL warehouse and schedule Python notebook tasks to execute as pipeline steps.

B.

Use an all-purpose cluster with cluster policies to enforce standard sizes and enable autoscaling.

C.

Create a job with a single-task job cluster and manually set the instance families and minimum/maximum workers.

D.

Run the job on a serverless compute for workflows configuration, ensuring Unity Catalog is enabled and regional support is available.

Buy Now
Questions 21

A data engineer needs to combine sales data from an on-premises PostgreSQL database with customer data in Azure Synapse for a comprehensive report. The goal is to avoid data duplication and ensure up-to-date information

How should the data engineer achieve this using Databricks?

Options:

A.

Develop custom ETL pipelines to ingest data into Databricks

B.

Use Lakehouse Federation to query both data sources directly

C.

Manually synchronize data from both sources into a single database

D.

Export data from both sources to CSV files and upload them to Databricks

Buy Now
Questions 22

A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data.

They run the following command:

DROP TABLE IF EXISTS my_table

While the object no longer appears when they run SHOW TABLES, the data files still exist.

Which of the following describes why the data files still exist and the metadata files were deleted?

Options:

A.

The table’s data was larger than 10 GB

B.

The table’s data was smaller than 10 GB

C.

The table was external

D.

The table did not have a location

E.

The table was managed

Buy Now
Questions 23

An organization plans to share a large dataset stored in a Databricks workspace on AWS with a partner organization whose Databricks workspace is hosted on Azure. The data engineer wants to minimize data transfer costs while ensuring secure and efficient data sharing.

Which strategy will reduce data egress costs associated with cross-cloud data sharing?

Options:

A.

Sharing data via pre-signed URLs without monitoring egress costs

B.

Migrating the dataset to Cloudflare R2 object storage before sharing

C.

Configure VPN connection between AWS and Azure for faster data sharing

D.

Using Delta Sharing without any additional configurations

Buy Now
Questions 24

A data engineer is getting a partner organization up to speed with Databricks account. Both teams share some business use cases. The data engineer has to share some of your Unity-Catalog managed delta tables and the notebook jobs creating those tables with the partner organization.

How can the data engineer seamlessly share the required information?

Options:

A.

Zip all the code and share via email and allow data ingestion from your data lake

B.

Data and Notebooks can be shared simply using Unity Catalog.

C.

Share access to codebase via Github and allow them to ingest datasets from your Datalake.

D.

Share required datasets and notebooks via Delta Sharing. Manage permissions via Unity Catalog.

Buy Now
Questions 25

A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table:

Which of the following changes needs to be made so this code block will work when the transactions table is a stream source?

Options:

A.

Replace predict with a stream-friendly prediction function

B.

Replace schema(schema) with option ( " maxFilesPerTrigger " , 1)

C.

Replace " transactions " with the path to the location of the Delta table

D.

Replace format( " delta " ) with format( " stream " )

E.

Replace spark.read with spark.readStream

Buy Now
Questions 26

Which compute option should be chosen in a scenario where small-scale ad hoc Python scripts need to be run at high frequency and should wind down quickly after these queries have finished running?

Options:

A.

All-purpose cluster

B.

Job cluster

C.

Serverless compute

D.

SQL Warehouse

Buy Now
Questions 27

A data engineer is running code in a Databricks Repo that is cloned from a central Git repository. A colleague of the data engineer informs them that changes have been made and synced to the central Git repository. The data engineer now needs to sync their Databricks Repo to get the changes from the central Git repository.

Which of the following Git operations does the data engineer need to run to accomplish this task?

Options:

A.

Merge

B.

Push

C.

Pull

D.

Commit

E.

Clone

Buy Now
Questions 28

A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum?

Which of the following code blocks can the data engineer use to complete this task?

A)

B)

C)

D)

E)

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Buy Now
Questions 29

A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables.

Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?

Options:

A.

CREATE TABLE all_transactions ASSELECT * FROM march_transactionsINNER JOIN SELECT * FROM april_transactions;

B.

CREATE TABLE all_transactions ASSELECT * FROM march_transactionsUNION SELECT * FROM april_transactions;

C.

CREATE TABLE all_transactions ASSELECT * FROM march_transactionsOUTER JOIN SELECT * FROM april_transactions;

D.

CREATE TABLE all_transactions ASSELECT * FROM march_transactionsINTERSECT SELECT * from april_transactions;

E.

CREATE TABLE all_transactions ASSELECT * FROM march_transactionsMERGE SELECT * FROM april_transactions;

Buy Now
Questions 30

Which of the following commands will return the location of database customer360?

Options:

A.

DESCRIBE LOCATION customer360;

B.

DROP DATABASE customer360;

C.

DESCRIBE DATABASE customer360;

D.

ALTER DATABASE customer360 SET DBPROPERTIES ( ' location ' = ' /user ' };

E.

USE DATABASE customer360;

Buy Now
Questions 31

A data engineer wants to create a new table containing the names of customers that live in France.

They have written the following command:

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (PII).

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

A.

There is no way to indicate whether a table contains PII.

B.

" COMMENT PII "

C.

TBLPROPERTIES PII

D.

COMMENT " Contains PII "

E.

PII

Buy Now
Questions 32

A data engineer is standardizing repository layouts for multiple teams adopting Databricks Asset Bundles. The engineer wants to ensure every project has a single authoritative configuration file at the repository root that defines the bundle name, targets, workspace settings, permissions, and resource mappings (for jobs and pipelines).

Which strategy should the data engineer use to meet this goal?

Options:

A.

Place multiple databricks.yml files under each subfolder (for example, jobs/, pipelines/, workspace/) and merge them at deploy time using the include mapping.

B.

Place exactly one databricks.yml at the repository root; it is the main configuration file and may reference additional configuration files via the include mapping.

C.

Place a databricks.yml in a .databricks/ hidden folder at the repository root; only hidden locations are valid for bundle configs.

D.

Place a databricks.yml at the repository root and optional databricks.yml in subfolders; the CLI prefers .yaml over .yml when both exist.

Buy Now
Questions 33

Which Databricks Asset Bundle format is valid?

Options:

A.

resources:

jobs:

hello-job:

name: hello-job

tasks:

- task_key: hello-task

existing_cluster_id: 1234-567890-abcde123

notebook_task:

notebook_path: ./hello.py

B.

{

" resources " : {

" jobs " : {

" name " : " hello-job " ,

" tasks " : {

" task_key " : " hello-task " ,

" existing_cluster_id " : " 1234-567890-abcde123 " ,

" notebook_task " : {

" notebook_path " : " ./hello.py "

}

}

}

}

}

C.

configuration = {

" resources " : {

" jobs " : {

" name " : " hello-job " ,

" tasks " : {

" task_key " : " hello-task " ,

" existing_cluster_id " : " 1234-567890-abcde123 " ,

" notebook_task " : {

" notebook_path " : " ./hello.py "

}

}

}

}

}

D.

resources {

jobs {

name = " hello-job "

tasks {

task_key = " hello-task "

existing_cluster_id = " 1234-567890-abcde123 "

notebook_task {

notebook_path = " ./hello.py "

}

}

}

}

Buy Now
Questions 34

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when It is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which approach can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:

A.

O They can reduce the cluster size of the SQL endpoint.

B.

Q They can turn on the Auto Stop feature for the SQL endpoint.

C.

O They can set up the dashboard ' s SQL endpoint to be serverless.

D.

0 They can ensure the dashboard ' s SQL endpoint matches each of the queries ' SQL endpoints.

Buy Now
Questions 35

A company uses Delta Sharing to collaborate with partners across different cloud providers and geographic regions. What will result in additional costs due to cross-region or egress fees?

Options:

A.

Transferring data via Delta Sharing across clouds and across different geographic regions

B.

Sharing data within the same cloud provider and region

C.

Utilizing Delta Sharing for internal data analytics within a single cloud environment

D.

Accessing Delta Sharing data using a VPN within the same data center

Buy Now
Questions 36

Which method should a Data Engineer apply to ensure Workflows are being triggered on schedule?

Options:

A.

Scheduled Workflows require an always-running cluster, which is more expensive but reduces processing latency.

B.

Scheduled Workflows process data as it arrives at configured sources.

C.

Scheduled Workflows can reduce resource consumption and expense since the cluster runs only long enough to execute the pipeline.

D.

Scheduled Workflows run continuously until manually stopped.

Buy Now
Questions 37

A data engineer needs to create a table in Databricks using data from their organization ' s existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url " jdbc:sqlite:/customers.db " , dbtable " customer360 "

)

Which line of code fills in the above blank to successfully complete the task?

Options:

A.

autoloader

B.

org.apache.spark.sql.jdbc

C.

sqlite

D.

org.apache.spark.sql.sqlite

Buy Now
Questions 38

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

Options:

A.

Worker node

B.

JDBC data source

C.

Databricks web application

D.

Databricks Filesystem

E.

Driver node

Buy Now
Questions 39

A data engineer needs to conduct Exploratory Analysis on data residing in a database that is within the company ' s custom-defined network in the cloud. The data engineer is using SQL for this task.

Which type of SQL Warehouse will enable the data engineer to process large numbers of queries quickly and cost-effectively?

Options:

A.

Serverless compute for notebooks

B.

Serverless SQL Warehouse

C.

Classic SQL Warehouse

D.

Pro SQL Warehouse

Buy Now
Questions 40

A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly because the clusters take a long time to start.

Which of the following actions can the data engineer perform to improve the start up time for the clusters used for the Job?

Options:

A.

They can use endpoints available in Databricks SQL

B.

They can use jobs clusters instead of all-purpose clusters

C.

They can configure the clusters to be single-node

D.

They can use clusters that are from a cluster pool

E.

They can configure the clusters to autoscale for larger data sizes

Buy Now
Questions 41

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which command can be used to grant full permissions on the database to the new data engineering team?

Options:

A.

grant all privileges on table sales TO team;

B.

GRANT SELECT ON TABLE sales TO team;

C.

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

D.

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Buy Now
Questions 42

A Data Engineer is building a simple data pipeline using Delta Live Tables (DLT) in Databricksto ingest customer data. The raw customer data is stored in a cloud storage location in JSON format. The task is to create a DLT pipeline that reads the rawJSON data and writes it into a Delta table for further processing.

Which code snippet will correctly ingest the raw JSON data and create a Delta table using DLT?

A)

B)

C)

D)

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Buy Now
Questions 43

A data engineer is onboarding a new Bronze ingestion pipeline in Databricks with Unity Catalog. The team wants Databricks to handle storage layout, apply platform optimizations over time, and simplify lifecycle management so that when a table is dropped, its underlying data is also cleaned up according to Databricks-managed retention policies.

Which table type should the data engineer create for these ingestion tables?

Options:

A.

Managed tables so that Unity Catalog manages both metadata and underlying data lifecycle

B.

External tables with a LOCATION pointing to an external volume for full control of file layout

C.

Foreign tables federated from an external catalog to delegate optimization to the source system

D.

Temporary views over files to avoid table-level governance and lifecycle coupling

Buy Now
Questions 44

What is the functionality of AutoLoader in Databricks?

Options:

A.

Auto Loader automatically ingests and processes new files from cloud storage, handling batch data with support for schema evolution.

B.

Auto Loader automatically ingests and processes new files from cloud storage, handling only streaming data with no support for schema evolution.

C.

Auto Loader automatically ingests and processes new files from cloud storage, handling batch and streaming data with no support for schema evolution.

D.

Auto Loader automatically ingests and processes new files from cloud storage, handling both batch and streaming data with support for schema evolution.

Buy Now
Questions 45

A data engineer is building a nightly batch ETL pipeline that processes very large volumes of raw JSON logs from a data lake into Delta tables for reporting. The data arrives in bulk once per day, and the pipeline takes several hours to complete. Cost efficiency is important , but performance and reliable completion of the pipeline are the highest priorities.

Which type of Databricks cluster should the data engineer configure?

Options:

A.

A job cluster configured to autoscale across multiple workers during the pipeline run

B.

A lightweight single-node cluster with a low worker node count to reduce costs

C.

A high-concurrency cluster designed for interactive SQL workloads

D.

An all-purpose cluster that always runs to ensure low-latency job startup times

Buy Now
Questions 46

A data engineer needs to apply custom logic to string column city in table stores for a specific use case. In order to apply this custom logic at scale, the data engineer wants to create a SQL user-defined function (UDF).

Which of the following code blocks creates this SQL UDF?

Options:

A.

B.

C.

D.

E.

Buy Now
Questions 47

An organization has implemented a data pipeline in Databricks and needs to ensure it can scale automatically based on varying workloads without manual cluster management. The goal is to meet the company’s Service Level Agreements (SLAs), which require high availability and minimal downtime, while Databricks automatically handles resource allocation and optimization.

Which approach fulfills these requirements?

Options:

A.

Use Serverless compute in Databricks to automatically scale and provision resources with minimal manual intervention

B.

Deploy job clusters with fixed configurations, dedicated to specific tasks, without automatic scaling

C.

Use spot instances to allocate resources dynamically while minimizing costs, with potential interruptions

D.

Use interactive clusters in Databricks, adjusting cluster sizes manually based on workload demands

Buy Now
Questions 48

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > ' 2020-01-01 ' ) ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

A.

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

B.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

C.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

D.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

E.

Records that violate the expectation cause the job to fail.

Buy Now
Questions 49

Which of the following SQL keywords can be used to convert a table from a long format to a wide format?

Options:

A.

PIVOT

B.

CONVERT

C.

WHERE

D.

TRANSFORM

E.

SUM

Buy Now
Questions 50

Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?

Options:

A.

DROP

B.

IGNORE

C.

MERGE

D.

APPEND

E.

INSERT

Buy Now
Questions 51

Which two components function in the DB platform architecture’s control plane? (Choose two.)

Options:

A.

Virtual Machines

B.

Compute Orchestration

C.

Serverless Compute

D.

Compute

E.

Unity Catalog

Buy Now
Questions 52

A data engineer needs to optimize the data layout and query performance for an e-commerce transactions Delta table. The table is partitioned by " purchase_date " a date column which helps with time-based queries but does not optimize searches on user statistics " customer_id " , a high-cardinality column.

The table is usually queried with filters on " customer_i

d " within specific date ranges, but since this data is spread across multiple files in each partition, it results in full partition scans and increased runtime and costs.

How should the data engineer optimize the Data Layout for efficient reads?

Options:

A.

Alter table implementing liquid clustering on " customerid " while keeping the existing partitioning.

B.

Alter the table to partition by " customer_id " .

C.

Enable delta caching on the cluster so that frequent reads are cached for performance.

D.

Alter the table implementing liquid clustering by " customer_id " and " purchase_date " .

Buy Now
Exam Name: Databricks Certified Data Engineer Associate Exam
Last Update: May 13, 2026
Questions: 176
Databricks-Certified-Data-Engineer-Associate pdf

Databricks-Certified-Data-Engineer-Associate PDF

$25.5  $84.99
Databricks-Certified-Data-Engineer-Associate Engine

Databricks-Certified-Data-Engineer-Associate Testing Engine

$30  $99.99
Databricks-Certified-Data-Engineer-Associate PDF + Engine

Databricks-Certified-Data-Engineer-Associate PDF + Testing Engine

$40.5  $134.99