Databricks-Machine-Learning-Professional Databricks Certified Machine Learning Professional Questions and Answers

Questions 4

A data scientist has written a function to track the runs of their random forest model. The data scientist is changing the number of trees in the forest across each run.

Which of the following MLflow operations is designed to log single values like the number of trees in a random forest?

Options:

mlflow.log_artifact

mlflow.log_model

mlflow.log_metric

mlflow.log_param

There is no way to store values like this.

Buy Now

Questions 5

A data scientist would like to enable MLflow Autologging for all machine learning libraries used in a notebook. They want to ensure that MLflow Autologging is used no matter what version of the Databricks Runtime for Machine Learning is used to run the notebook and no matter what workspace-wide configurations are selected in the Admin Console.

Which of the following lines of code can they use to accomplish this task?

Options:

mlflow.sklearn.autolog()

mlflow.spark.autolog()

spark.conf.set(“autologging”, True)

It is not possible to automatically log MLflow runs.

mlflow.autolog()

Buy Now

Questions 6

Which of the following deployment paradigms can centrally compute predictions for a single record with exceedingly fast results?

Options:

Streaming

Batch

Edge/on-device

None of these strategies will accomplish the task.

Real-time

Buy Now

Questions 7

A machine learning engineering team has written predictions computed in a batch job to a Delta table for querying. However, the team has noticed that the querying is running slowly. The team has alreadytuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the query condition are sparsely located throughout each of the data files.

Based on the scenario, which of the following optimization techniques could speed up the query by colocating similar records while considering values in multiple columns?

Options:

Z-Ordering

Bin-packing

Write as a Parquet file

Data skipping

Tuning the file size

Buy Now

Answer:

Explanation:

Z-Ordering is an optimization technique that can speed up the query by colocating similar records while considering values in multiple columns. Z-Ordering is a way of organizing data in storage based on the values of one or more columns. Z-Ordering maps multidimensional data to one dimension while preserving locality of the data points. This means that rows with similar values for the specified columns are stored close together in the same set of files. This improves the performance of queries that filter on those columns, as they can skip over irrelevant files or data blocks. Z-Ordering also enhances data skipping and caching, as it reduces the number of distinct values per file for the chosen columns1. The other options are incorrect because:

Option B: Bin-packing is an optimization technique that compacts small files into larger ones, but does not colocate similar records based on multiple columns. Bin-packing can improve the performance of queries by reducing the number of files that need to be read, but it does not affect the data layout within the files2.
Option C: Writing as a Parquet file is not an optimization technique, but a file format choice. Parquet is a columnar storage format that supports efficient compression and encoding schemes. Parquet can improve the performance of queries by reducing the storage footprint and the amount of data transferred, but it does not colocate similar records based on multiple columns3.
Option D: Data skipping is an optimization technique that skips over files or data blocks that do not match the query predicates, but does not colocate similar records based on multiple columns. Data skipping can improve the performance of queries by avoiding unnecessary data scans, but it depends on the data layout and the metadata collected for each file4.
Option E: Tuning the file size is an optimization technique that adjusts the size of the data files to a target value, but does not colocate similar records based on multiple columns. Tuning the file size can improve the performance of queries by balancing the trade-off between parallelism and overhead, but it does not affectthe data layout within the files5. References: Z-Ordering (multi-dimensional clustering), Compaction (bin-packing), Parquet, Data skipping, Tuning file sizes

Questions 8

A data scientist has developed a scikit-learn random forest model model, but they have not yet logged model with MLflow. They want to obtain the input schema and the output schema of the model so they can document what type of data is expected as input.

Which of the following MLflow operations can be used to perform this task?

Options:

mlflow.models.schema.infer_schema

mlflow.models.signature.infer_signature

mlflow.models.Model.get_input_schema

mlflow.models.Model.signature

There is no way to obtain the input schema and the output schema of an unlogged model.

Buy Now

Questions 9

Which of the following is a benefit of logging a model signature with an MLflow model?

Options:

The model will have a unique identifier in the MLflow experiment

The schema of input data can be validated when serving models

The model can be deployed using real-time serving tools

The model will be secured by the user that developed it

The schema of input data will be converted to match the signature

Buy Now

Questions 10

A machine learning engineer needs to select a deployment strategy for a new machine learning application. The feature values are not available until the time of delivery, and results are needed exceedingly fast for one record at a time.

Which of the following deployment strategies can be used to meet these requirements?

Options:

Edge/on-device

Streaming

None of these strategies will meet the requirements.

Batch

Real-time

Buy Now

Questions 11

Which of the following is a simple statistic to monitor for categorical feature drift?

Options:

Mode

None of these

Mode, number of unique values, and percentage of missing values

Percentage of missing values

Number of unique values

Buy Now

Questions 12

Which of the following describes concept drift?

Options:

Concept drift is when there is a change in the distribution of an input variable

Concept drift is when there is a change in the distribution of a target variable

Concept drift is when there is a change in the relationship between input variables and target variables

Concept drift is when there is a change in the distribution of the predicted target given by the model

None of these describe Concept drift

Buy Now

Questions 13

A machine learning engineering manager has asked all of the engineers on their team to add text descriptions to each of the model projects in the MLflow Model Registry. They are starting with the model project"model"and they'd like to add the text in themodel_descriptionvariable.

The team is using the following line of code:

Which of the following changes does the team need to make to the above code block to accomplish the task?

Options:

Replace update_registered_model with update_model_version

There no changes necessary

Replace description with artifact

Replace client.update_registered_model with mlflow

Add a Python model as an argument to update_registered_model

Buy Now

Answer:

Explanation:

The code block that the team is using is correct and does not need any changes to accomplish the task. The update_registered_model method of the MlflowClient class can be used to update the metadata of a registered model, such as its name or description. The method takes the following parameters:

name: The name of the registered model to update.
description: The new description for the registered model.
new_name: The new name for the registered model.

The method returns a RegisteredModel object that represents the updated registered model1

The other options are incorrect because:

A. Replacing update_registered_model with update_model_version would not update the metadata of the registered model, but rather the metadata of a specific model version. The update_model_version method can be used to update the stage, description, or name of a model version. The method takes the following parameters:

The method returns a ModelVersion object that represents the updated model version2

C. Replacing description with artifact would not update the description of the registered model, but rather raise an error, as artifact is not a valid parameter for the update_registered_model method. The artifact parameter is used for the log_model or save_model methods, which can be used to log or save a model with a specific flavor and artifact path34
D. Replacing client.update_registered_model with mlflow would not update the registered model, but rather raise an error, as mlflow is not a valid method, but rather a module that contains various submodules and functions. The mlflow module does not have an update_registered_model function, but rather the MlflowClient class has an update_registered_model method5
E. Adding a Python model as an argument to update_registered_model would not update the registered model, but rather raise an error, as a Python model is not a valid argument for the update_registered_model method. The update_registered_model method does not take a model as an argument, but rather the name and description of the registered model to update. To add a Python model to the Model Registry, the create_model_version method of the MlflowClient class can be used, which takes the following parameters:

The method returns a ModelVersion object that represents the created model version6

References:

mlflow.tracking.client.MlflowClient.update_registered_model — MLflow 2.9.1 documentation
mlflow.tracking.client.MlflowClient.update_model_version — MLflow 2.9.1 documentation
mlflow..log_model — MLflow 2.9.1 documentation
mlflow..save_model — MLflow 2.9.1 documentation
mlflow — MLflow 2.9.1 documentation
mlflow.tracking.client.MlflowClient.create_model_version — MLflow 2.9.1 documentation

Questions 14

Which of the following is a probable response to identifying drift in a machine learning application?

Options:

None of these responses

Retraining and deploying a model on more recent data

All of these responses

Rebuilding the machine learning application with a new label variable

Sunsetting the machine learning application

Buy Now

Answer:

Explanation:

Drift is the change over time in the statistical properties of the data that was used to train a machine learning model. This can cause the model to become less accurate or perform differently than it was designed to1. Drift can be detected by monitoring the statistics of the input and output data over time and comparing them with the baseline statistics from the training data2. Depending on the type and severity of the drift, different responses may be appropriate. Some possible responses are:

Retraining and deploying a model on more recent data: This can help the model adapt to the changes in the data and improve its performance. However, this may require frequent retraining and deployment cycles, which can be costly and time-consuming. Also, retraining may not be sufficient if the drift is caused by a change in the underlying concept or relationship between the input and output variables3.
Rebuilding the machine learning application with a new label variable: This can help the model capture the new concept or relationship that has emerged in the data. However, this may require a significant redesign of the application and the data pipeline, as well as collecting and labeling new data. Also, rebuilding may not be feasible if the concept or relationship is constantly changing or unknown3.
Sunsetting the machine learning application: This can help avoid the risks and costs of maintaining a model that is no longer reliable or useful. However, this may mean losing the benefits and value of the application and the data. Also, sunsetting may not be an option if the application is critical or mandatory for the business or the users3.

Therefore, all of these responses are probable, depending on the situation and the trade-offs involved. References:

Databricks Machine Learning Professional Exam Guide, Section 4: Solution and Data Monitoring, p. 5
Databricks Machine Learning Documentation, Monitoring ML Models, Data Drift Detection, p. 2-3
A Gentle Introduction to Concept Drift in Machine Learning, Types of Concept Drift, p. 3-4
Understanding Data Drift and Model Drift: Drift Detection in Python, Types of Drift, p. 2-3

Questions 15

A machine learning engineer wants to move their model versionmodel_versionfor the MLflow Model Registry modelmodelfrom the Staging stage to the Production stage using MLflow Clientclient.

Which of the following code blocks can they use to accomplish the task?

Options:

Option A

Option B

Option C

Option D

option E

Buy Now

Questions 16

Which of the following MLflow operations can be used to delete a model from the MLflow Model Registry?

Options:

client.transition_model_version_stage

client.delete_model_version

client.update_registered_model

client.delete_model

client.delete_registered_model

Buy Now

Questions 17

Which of the following statements describes streaming with Spark as a model deployment strategy?

Options:

The inference of batch processed records as soon as a trigger is hit

The inference of all types of records in real-time

The inference of batch processed records as soon as a Spark job is run

The inference of incrementally processed records as soon as trigger is hit

The inference of incrementally processed records as soon as a Spark job is run

Buy Now

Answer:

Explanation:

Streaming with Spark as a model deployment strategy means applying a machine learning model to data streams that are processed incrementally and continuously by Spark Structured Streaming. Spark Structured Streaming is a scalable and fault-tolerant stream processing engine that enables complex analytics on live data streams using the Dataset/DataFrame API1. Spark Structured Streaming supports various sources and sinks for streaming data, such as Kafka, Kinesis, TCP sockets, Delta tables, etc2. Spark Structured Streaming also supports various types of operations on streaming data, such as aggregations, windowing, joins, and stateful transformations3. To deploy a machine learning model on streaming data, you can use the MLflow model registry to managethe model lifecycle and versioning4. You can also use the MLflow model serving feature to serve the model as a REST API endpoint that can be invoked by Spark Structured Streaming5. Alternatively, you can use the UDF (user-defined function) feature to apply the model to streaming data within Spark Structured Streaming6.

The inference of incrementally processed records as soon as trigger is hit describes the streaming with Spark as a model deployment strategy. A trigger defines when the results of a streaming query should be written to the output sink. A trigger can be based on a processing time interval, a data size limit, or a continuous mode that writes the results as soon as they are available. The trigger ensures that the streaming query is executed incrementally and continuously, and the model inference is applied to the latest available data. The other options are incorrect because:

Option A: The inference of batch processed records as soon as a trigger is hit does not describe streaming with Spark, but rather batch processing with Spark. Batch processing means applying a machine learning model to a finite set of data that is processed as a single job. Batch processing does not require a trigger, as the results are written to the output sink when the job is completed.
Option B: The inference of all types of records in real-time does not describe streaming with Spark, but rather a generic definition of real-time processing. Real-time processing means applying a machine learning model to data streams that are processed as soon as they arrive, with minimal latency. Real-time processing does not necessarily use Spark Structured Streaming, as there are other frameworks and tools that can support it, such as Apache Flink, Apache Storm, etc.
Option C: The inference of batch processed records as soon as a Spark job is run does not describe streaming with Spark, but rather batch processing with Spark. Batch processing means applying a machine learning model to a finite set of data that is processed as a single job. Batch processing does not depend on a Spark job, as the model inference can be done outside of Spark, such as using a REST API endpoint, a command-line tool, etc.
Option E: The inference of incrementally processed records as soon as a Spark job is run does not describe streaming with Spark, but rather a contradiction. Incrementally processed records imply streaming processing, while a Spark job implies batch processing. Streaming processing and batch processing are different paradigms of data processing, and cannot be mixed in this way. References: Structured Streaming Programming Guide, Input Sources and Output Sinks, Operations on streaming DataFrames/Datasets, MLflow Model Registry, MLflow Model Serving, Apply machine learning models, [Triggers], [Trigger Types], [Batch Processing], [Real-time Processing], [Real-time Data Processing Frameworks], [Deploy machine learning models], [Batch vs Streaming Processing]

Questions 18

A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.

They are using the following code block:

The code block is not nesting the runs in MLflow as they expected.

Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?

Options:

Indent the child run blocks within the parent run block

Add the nested=True argument to the parent run

Remove the nested=True argument from the child runs

Provide the same name to the run name parameter for all three run blocks

Add the nested=True argument to the parent run and remove the nested=True arguments from the child runs

Buy Now

ML Data Scientist |

Exam Code: Databricks-Machine-Learning-Professional

Exam Name: Databricks Certified Machine Learning Professional

Last Update: Jul 2, 2025

Questions: 60

Databricks-Machine-Learning-Professional PDF

$29.75 ~~$84.99~~

Add to Cart

Databricks-Machine-Learning-Professional Engine

Databricks-Machine-Learning-Professional Testing Engine

$35 ~~$99.99~~

Add to Cart

Databricks-Machine-Learning-Professional PDF + Engine

Databricks-Machine-Learning-Professional PDF + Testing Engine

$47.25 ~~$134.99~~

Add to Cart

Summer Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: cramtreat

cramtick logo

Navigation:

Hot Vendors:

Databricks-Machine-Learning-Professional Databricks Certified Machine Learning Professional Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Databricks-Machine-Learning-Professional PDF

Databricks-Machine-Learning-Professional Testing Engine

Databricks-Machine-Learning-Professional PDF + Testing Engine

Quick Links

Recently New Released Certification Exams

Site Secure