MLA-C01 AWS Certified Machine Learning Engineer - Associate Questions and Answers

Questions 4

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

The ML engineer needs to use an Amazon SageMaker built-in algorithm to train the model.

Which algorithm should the ML engineer use to meet this requirement?

Options:

LightGBM

Linear learner

К-means clustering

Neural Topic Model (NTM)

Buy Now

Questions 5

An ML engineer is using an Amazon SageMaker Studio notebook to train a neural network by creating an estimator. The estimator runs a Python training script that uses Distributed Data Parallel (DDP) on a single instance that has more than one GPU.

The ML engineer discovers that the training script is underutilizing GPU resources. The ML engineer must identify the point in the training script where resource utilization can be optimized.

Which solution will meet this requirement?

Options:

Use Amazon CloudWatch metrics to create a report that describes GPU utilization over time.

Add SageMaker Profiler annotations to the training script. Run the script and generate a report from the results.

Use AWS CloudTrail to create a report that describes GPU utilization and GPU memory utilization over time.

Create a default monitor in Amazon SageMaker Model Monitor and suggest a baseline. Generate a report based on the constraints and statistics the monitor generates.

Buy Now

Questions 6

A company is exploring generative AI and wants to add a new product feature. An ML engineer is making API calls from existing Amazon EC2 instances to Amazon Bedrock.

The EC2 instances are in a private subnet and must remain private during the implementation. The EC2 instances have a security group that allows access to all IP addresses in the private subnet.

What should the ML engineer do to establish a connection between the EC2 instances and Amazon Bedrock?

Options:

Modify the security group to allow inbound and outbound traffic to and from Amazon Bedrock.

Use AWS PrivateLink to access Amazon Bedrock through an interface VPC endpoint.

Configure Amazon Bedrock to use the private subnet where the EC2 instances are deployed.

Use AWS Direct Connect to link the VPC to Amazon Bedrock.

Buy Now

Questions 7

A company is developing an internal cost-estimation tool that uses an ML model in Amazon SageMaker AI. Users upload high-resolution images to the tool.

The model must process each image and predict the cost of the object in the image. The model also must notify the user when processing is complete.

Which solution will meet these requirements?

Options:

Store the images in an Amazon S3 bucket. Deploy the model on SageMaker AI. Use batch transform jobs for model inference. Use an Amazon Simple Queue Service (Amazon SQS) queue to notify users.

Store the images in an Amazon S3 bucket. Deploy the model on SageMaker AI. Use an asynchronous inference strategy for model inference. Use an Amazon Simple Notification Service (Amazon SNS) topic to notify users.

Store the images in an Amazon Elastic File System (Amazon EFS) file system. Deploy the model on SageMaker AI. Use batch transform jobs for model inference. Use an Amazon Simple Queue Service (Amazon SQS) queue to notify users.

Store the images in an Amazon Elastic File System (Amazon EFS) file system. Deploy the model on SageMaker AI. Use an asynchronous inference strategy for model inference. Use an Amazon Simple Notification Service (Amazon SNS) topic to notify users.

Buy Now

Questions 8

An ML engineer is using Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect very high or very low machine operating temperatures compared to normal. The ML engineer sets the Severity parameter to Low and above. The ML engineer sets the Direction parameter to All.

What effect will the ML engineer observe in the anomaly detection results if the ML engineer changes the Direction parameter to Lower than expected?

Options:

Increased anomaly identification frequency and increased recall

Decreased anomaly identification frequency and decreased recall

Increased anomaly identification frequency and decreased recall

Decreased anomaly identification frequency and increased recall

Buy Now

Questions 9

An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize production inference data in the same way before passing the data to the model.

Which solution will meet this requirement?

Options:

Apply statistics from a well-known dataset to normalize the production samples.

Keep the min-max normalization statistics from the training set and use them to normalize the production samples.

Calculate new min-max statistics from a batch of production samples and use them to normalize all production samples.

Calculate new min-max statistics from each production sample and use them to normalize all production samples.

Buy Now

Questions 10

An ML engineer needs to implement a solution to host a trained ML model. The rate of requests to the model will be inconsistent throughout the day.

The ML engineer needs a scalable solution that minimizes costs when the model is not in use. The solution also must maintain the model's capacity to respond to requests during times of peak usage.

Which solution will meet these requirements?

Options:

Create AWS Lambda functions that have fixed concurrency to host the model. Configure the Lambda functions to automatically scale based on the number of requests to the model.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Set a static number of tasks to handle requests during times of peak usage.

Deploy the model to an Amazon SageMaker endpoint. Deploy multiple copies of the model to the endpoint. Create an Application Load Balancer to route traffic between the different copies of the model at the endpoint.

Deploy the model to an Amazon SageMaker endpoint. Create SageMaker endpoint auto scaling policies that are based on Amazon CloudWatch metrics to adjust the number of instances dynamically.

Buy Now

Questions 11

A company uses Amazon Athena to query a dataset in Amazon S3. The dataset has a target variable that the company wants to predict.

The company needs to use the dataset in a solution to determine if a model can predict the target variable.

Which solution will provide this information with the LEAST development effort?

Options:

Create a new model by using Amazon SageMaker Autopilot. Report the model's achieved performance.

Implement custom scripts to perform data pre-processing, multiple linear regression, and performance evaluation. Run the scripts on Amazon EC2 instances.

Configure Amazon Macie to analyze the dataset and to create a model. Report the model's achieved performance.

Select a model from Amazon Bedrock. Tune the model with the data. Report the model's achieved performance.

Buy Now

Questions 12

A company uses AWS CodePipeline to orchestrate a continuous integration and continuous delivery (CI/CD) pipeline for ML models and applications.

Select and order the steps from the following list to describe a CI/CD process for a successful deployment. Select each step one time. (Select and order FIVE.)

. CodePipeline deploys ML models and applications to production.

· CodePipeline detects code changes and starts to build automatically.

. Human approval is provided after testing is successful.

. The company builds and deploys ML models and applications to staging servers for testing.

. The company commits code changes or new training datasets to a Git repository.

Options:

Buy Now

Questions 13

Case study

The training dataset includes categorical data and numerical data. The ML engineer must prepare the training dataset to maximize the accuracy of the model.

Which action will meet this requirement with the LEAST operational overhead?

Options:

Use AWS Glue to transform the categorical data into numerical data.

Use AWS Glue to transform the numerical data into categorical data.

Use Amazon SageMaker Data Wrangler to transform the categorical data into numerical data.

Use Amazon SageMaker Data Wrangler to transform the numerical data into categorical data.

Buy Now

Answer:

Explanation:

Preparing a training dataset that includes both categorical and numerical data is essential for maximizing the accuracy of a machine learning model. Transforming categorical data into numerical format is a critical step, as most ML algorithms require numerical input.

Why Transform Categorical Data into Numerical Data?

Model Compatibility: Many ML algorithms cannot process categorical data directly and require numerical representations.

Improved Performance: Proper encoding of categorical variables can enhance model accuracy and convergence speed.

Why Use Amazon SageMaker Data Wrangler?

Amazon SageMaker Data Wrangler offers a visual interface with over 300 built-in data transformations, including tools for encoding categorical variables.

Implementation Steps:

Import Data:

Load the dataset into SageMaker Data Wrangler from sources like Amazon S3 or on-premises databases.

Identify Categorical Features:

Use Data Wrangler's data type inference to detect categorical columns.

Apply Categorical Encoding:

Choose appropriate encoding techniques (e.g., one-hot encoding or ordinal encoding) from Data Wrangler's transformation options.

Apply the selected transformation to convert categorical features into numerical format.

Validate Transformations:

Review the transformed dataset to ensure accuracy and completeness.

Advantages of Using SageMaker Data Wrangler:

Ease of Use: Provides a user-friendly interface for data transformation without extensive coding.

Operational Efficiency: Integrates data preparation steps, reducing the need for multiple tools and minimizing operational overhead.

Flexibility: Supports various data sources and transformation techniques, accommodating diverse datasets.

By utilizing SageMaker Data Wrangler to transform categorical data into numerical format, the ML engineer can efficiently prepare the dataset, thereby enhancing the model's accuracy with minimal operational overhead.

Transform Data - Amazon SageMaker

Prepare ML Data with Amazon SageMaker Data Wrangler

Questions 14

An ML engineer is building a logistic regression model to predict customer churn for subscription services. The dataset contains two string variables: location and job_seniority_level.

The location variable has 3 distinct values, and the job_seniority_level variable has over 10 distinct values.

The ML engineer must perform preprocessing on the variables.

Which solution will meet this requirement?

Options:

Apply tokenization to location. Apply ordinal encoding to job_seniority_level.

Apply one-hot encoding to location. Apply ordinal encoding to job_seniority_level.

Apply binning to location. Apply standard scaling to job_seniority_level.

Apply one-hot encoding to location. Apply standard scaling to job_seniority_level.

Buy Now

Questions 15

A company has an application that uses different APIs to generate embeddings for input text. The company needs to implement a solution to automatically rotate the API tokens every 3 months.

Which solution will meet this requirement?

Options:

Store the tokens in AWS Secrets Manager. Create an AWS Lambda function to perform the rotation.

Store the tokens in AWS Systems Manager Parameter Store. Create an AWS Lambda function to perform the rotation.

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS managed key to perform the rotation.

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS owned key to perform the rotation.

Buy Now

Questions 16

A company that has hundreds of data scientists is using Amazon SageMaker to create ML models. The models are in model groups in the SageMaker Model Registry.

The data scientists are grouped into three categories: computer vision, natural language processing (NLP), and speech recognition. An ML engineer needs to implement a solution to organize the existing models into these groups to improve model discoverability at scale. The solution must not affect the integrity of the model artifacts and their existing groupings.

Which solution will meet these requirements?

Options:

Create a custom tag for each of the three categories. Add the tags to the model packages in the SageMaker Model Registry.

Create a model group for each category. Move the existing models into these category model groups.

Use SageMaker ML Lineage Tracking to automatically identify and tag which model groups should contain the models.

Create a Model Registry collection for each of the three categories. Move the existing model groups into the collections.

Buy Now

Questions 17

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

Options:

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts.

Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts.

Deploy the models by using Amazon SageMaker AI batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts.

Buy Now

Questions 18

A company has multiple models that are hosted on Amazon SageMaker Al. The models need to be re-trained. The requirements for each model are different, so the company needs to choose different deployment strategies to transfer all requests to a new model.

Select the correct strategy from the following list for each requirement. Select each strategy one time. (Select THREE.)

. Canary traffic shifting

. Linear traffic shifting guardrail

. All at once traffic shifting

Options:

Buy Now

Questions 19

A healthcare analytics company wants to segment patients into groups that have similar risk factors to develop personalized treatment plans. The company has a dataset that includes patient health records, medication history, and lifestyle changes. The company must identify the appropriate algorithm to determine the number of groups by using hyperparameters.

Which solution will meet these requirements?

Options:

Use the Amazon SageMaker AI XGBoost algorithm. Set max_depth to control tree complexity for risk groups.

Use the Amazon SageMaker k-means clustering algorithm. Set k to specify the number of clusters.

Use the Amazon SageMaker AI DeepAR algorithm. Set epochs to determine the number of training iterations for risk groups.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm. Set a contamination hyperparameter for risk anomaly detection.

Buy Now

Questions 20

A company stores training data as a .csv file in an Amazon S3 bucket. The company must encrypt the data and must control which applications have access to the encryption key.

Which solution will meet these requirements?

Options:

Create a new SSH access key and use the AWS Encryption CLI to encrypt the file.

Create a new API key by using Amazon API Gateway and use it to encrypt the file.

Create a new IAM role with permissions for kms:GenerateDataKey and use the role to encrypt the file.

Create a new AWS Key Management Service (AWS KMS) key and use the AWS Encryption CLI with the KMS key to encrypt the file.

Buy Now

Questions 21

A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (Cl/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket.

Select and order the pipeline's correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.)

• An S3 event notification invokes the pipeline when new data is uploaded.

• S3 Lifecycle rule invokes the pipeline when new data is uploaded.

• SageMaker retrains the model by using the data in the S3 bucket.

• The pipeline deploys the model to a SageMaker endpoint.

• The pipeline deploys the model to SageMaker Model Registry.

Options:

Buy Now

Questions 22

A travel company wants to create an ML model to recommend the next airport destination for its users. The company has collected millions of data records about user location, recent search history on the company's website, and 2,000 available airports. The data has several categorical features with a target column that is expected to have a high-dimensional sparse matrix.

The company needs to use Amazon SageMaker AI built-in algorithms for the model. An ML engineer converts the categorical features by using one-hot encoding.

Which algorithm should the ML engineer implement to meet these requirements?

Options:

Use the CatBoost algorithm to recommend the next airport destination.

Use the DeepAR forecasting algorithm to recommend the next airport destination.

Use the Factorization Machines algorithm to recommend the next airport destination.

Use the k-means algorithm to cluster users into groups and map each group to the next airport destination.

Buy Now

Questions 23

An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data.

Which file format will meet these requirements?

Options:

CSV files compressed with Snappy

JSON objects in JSONL format

JSON files compressed with gzip

Apache Parquet files

Buy Now

Questions 24

A company needs to run a batch data-processing job on Amazon EC2 instances. The job will run during the weekend and will take 90 minutes to finish running. The processing can handle interruptions. The company will run the job every weekend for the next 6 months.

Which EC2 instance purchasing option will meet these requirements MOST cost-effectively?

Options:

Spot Instances

Reserved Instances

On-Demand Instances

Dedicated Instances

Buy Now

Questions 25

A company has trained and deployed an ML model by using Amazon SageMaker. The company needs to implement a solution to record and monitor all the API call events for the SageMaker endpoint. The solution also must provide a notification when the number of API call events breaches a threshold.

Use SageMaker Debugger to track the inferences and to report metrics. Create a custom rule to provide a notification when the threshold is breached.

Which solution will meet these requirements?

Options:

Use SageMaker Debugger to track the inferences and to report metrics. Create a custom rule to provide a notification when the threshold is breached.

Use SageMaker Debugger to track the inferences and to report metrics. Use the tensor_variance built-in rule to provide a notification when the threshold is breached.

Log all the endpoint invocation API events by using AWS CloudTrail. Use an Amazon CloudWatch dashboard for monitoring. Set up a CloudWatch alarm to provide notification when the threshold is breached.

Add the Invocations metric to an Amazon CloudWatch dashboard for monitoring. Set up a CloudWatch alarm to provide notification when the threshold is breached.

Buy Now

Questions 26

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

Options:

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Buy Now

Questions 27

A company has deployed an XGBoost prediction model in production to predict if a customer is likely to cancel a subscription. The company uses Amazon SageMaker Model Monitor to detect deviations in the F1 score.

During a baseline analysis of model quality, the company recorded a threshold for the F1 score. After several months of no change, the model's F1 score decreases significantly.

What could be the reason for the reduced F1 score?

Options:

Concept drift occurred in the underlying customer data that was used for predictions.

The model was not sufficiently complex to capture all the patterns in the original baseline data.

The original baseline data had a data quality issue of missing values.

Incorrect ground truth labels were provided to Model Monitor during the calculation of the baseline.

Buy Now

Questions 28

An ML engineer needs to process thousands of existing CSV objects and new CSV objects that are uploaded. The CSV objects are stored in a central Amazon S3 bucket and have the same number of columns. One of the columns is a transaction date. The ML engineer must query the data based on the transaction date.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use an Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to create a table based on the transaction date from data in the central S3 bucket. Query the objects from the table.

Create a new S3 bucket for processed data. Set up S3 replication from the central S3 bucket to the new S3 bucket. Use S3 Object Lambda to query the objects based on transaction date.

Create a new S3 bucket for processed data. Use AWS Glue for Apache Spark to create a job to query the CSV objects based on transaction date. Configure the job to store the results in the new S3 bucket. Query the objects from the new S3 bucket.

Create a new S3 bucket for processed data. Use Amazon Data Firehose to transfer the data from the central S3 bucket to the new S3 bucket. Configure Firehose to run an AWS Lambda function to query the data based on transaction date.

Buy Now

Questions 29

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a retraining job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

Options:

Use an AWS Glue crawler and an AWS Glue ETL job to detect data drift. Use AWS Glue triggers to automate the retraining job.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the retraining job.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the retraining job.

Use Amazon QuickSight anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the retraining job.

Buy Now

Questions 30

A company ingests sales transaction data using Amazon Data Firehose into Amazon OpenSearch Service. The Firehose buffer interval is set to 60 seconds.

The company needs sub-second latency for a real-time OpenSearch dashboard.

Which architectural change will meet this requirement?

Options:

Use zero buffering in the Firehose stream and tune the PutRecordBatch batch size.

Replace Firehose with AWS DataSync and enhanced fan-out consumers.

Increase the Firehose buffer interval to 120 seconds.

Replace Firehose with Amazon SQS.

Buy Now

Questions 31

A company has a binary classification model in production. An ML engineer needs to develop a new version of the model.

The new model version must maximize correct predictions of positive labels and negative labels. The ML engineer must use a metric to recalibrate the model to meet these requirements.

Which metric should the ML engineer use for the model recalibration?

Options:

Accuracy

Precision

Recall

Specificity

Buy Now

Questions 32

A credit card company has a fraud detection model in production on an Amazon SageMaker endpoint. The company develops a new version of the model. The company needs to assess the new model's performance by using live data and without affecting production end users.

Which solution will meet these requirements?

Options:

Set up SageMaker Debugger and create a custom rule.

Set up blue/green deployments with all-at-once traffic shifting.

Set up blue/green deployments with canary traffic shifting.

Set up shadow testing with a shadow variant of the new model.

Buy Now

Questions 33

An ML engineer has an Amazon Comprehend custom model in Account A in the us-east-1 Region. The ML engineer needs to copy the model to Account В in the same Region.

Which solution will meet this requirement with the LEAST development effort?

Options:

Use Amazon S3 to make a copy of the model. Transfer the copy to Account B.

Create a resource-based IAM policy. Use the Amazon Comprehend ImportModel API operation to copy the model to Account B.

Use AWS DataSync to replicate the model from Account A to Account B.

Create an AWS Site-to-Site VPN connection between Account A and Account В to transfer the model.

Buy Now

Questions 34

A company is building an Amazon SageMaker AI pipeline for an ML model. The pipeline uses distributed processing and distributed training.

An ML engineer needs to encrypt network communication between instances that run distributed jobs. The ML engineer configures the distributed jobs to run in a private VPC.

What should the ML engineer do to meet the encryption requirement?

Options:

Enable network isolation.

Configure traffic encryption by using security groups.

Enable inter-container traffic encryption.

Enable VPC flow logs.

Buy Now

Questions 35

A company is developing an ML model to predict customer satisfaction. The company needs to use survey feedback and the past satisfaction level of customers to predict the future satisfaction level of customers.

The dataset includes a column named Feedback that contains long text responses. The dataset also includes a column named Satisfaction Level that contains three distinct values for past customer satisfaction: High, Medium, and Low. The company must apply encoding methods to transform the data in each column.

Which solution will meet these requirements?

Options:

Apply one-hot encoding to the Feedback column and the Satisfaction Level column.

Apply one-hot encoding to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

Apply label encoding to the Feedback column. Apply binary encoding to the Satisfaction Level column.

Apply tokenization to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

Buy Now

Questions 36

A company is developing ML models by using PyTorch and TensorFlow estimators with Amazon SageMaker AI. An ML engineer configures the SageMaker AI estimator and now needs to initiate a training job that uses a training dataset.

Which SageMaker AI SDK method can initiate the training job?

Options:

fit method

create_model method

deploy method

predict method

Buy Now

Questions 37

A company uses a training job on Amazon SageMaker Al to train a neural network. The job first trains a model and then evaluates the model's performance ag

test dataset. The company uses the results from the evaluation phase to decide if the trained model will go to production.

The training phase takes too long. The company needs solutions that can shorten training time without decreasing the model's final performance.

Select the correct solutions from the following list to meet the requirements for each description. Select each solution one time or not at all. (Select THREE.)

. Change the epoch count.

. Choose an Amazon EC2 Spot Fleet.

· Change the batch size.

. Use early stopping on the training job.

· Use the SageMaker Al distributed data parallelism (SMDDP) library.

. Stop the training job.

Options:

Buy Now

Questions 38

An ML engineer is training a simple neural network model. The model’s performance improves initially and then degrades after a certain number of epochs.

Which solutions will mitigate this problem? (Select TWO.)

Options:

Enable early stopping on the model.

Increase dropout in the layers.

Increase the number of layers.

Increase the number of neurons.

Investigate and reduce the sources of model bias.

Buy Now

Questions 39

An ML engineer decides to use Amazon SageMaker AI automated model tuning (AMT) for hyperparameter optimization (HPO). The ML engineer requires a tuning strategy that uses regression to slowly and sequentially select the next set of hyperparameters based on previous runs. The strategy must work across small hyperparameter ranges.

Which solution will meet these requirements?

Options:

Grid search

Random search

Bayesian optimization

Hyperband

Buy Now

Questions 40

A company needs to give its ML engineers appropriate access to training data. The ML engineers must access training data from only their own business group. The ML engineers must not be allowed to access training data from other business groups.

The company uses a single AWS account and stores all the training data in Amazon S3 buckets. All ML model training occurs in Amazon SageMaker.

Which solution will provide the ML engineers with the appropriate access?

Options:

Enable S3 bucket versioning.

Configure S3 Object Lock settings for each user.

Add cross-origin resource sharing (CORS) policies to the S3 buckets.

Create IAM policies. Attach the policies to IAM users or IAM roles.

Buy Now

Questions 41

A company has developed a new ML model. The company requires online model validation on 10% of the traffic before the company fully releases the model in production. The company uses an Amazon SageMaker endpoint behind an Application Load Balancer (ALB) to serve the model.

Which solution will set up the required online validation with the LEAST operational overhead?

Options:

Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 0.1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

Create a new SageMaker endpoint. Use production variants to add the new model to the new endpoint. Monitor the number of invocations by using Amazon CloudWatch.

Configure the ALB to route 10% of the traffic to the new model at the existing SageMaker endpoint. Monitor the number of invocations by using AWS CloudTrail.

Buy Now

Answer:

Explanation:

Scenario: The company wants to perform online validation of a new ML model on 10% of the traffic before fully deploying the model in production. The setup must have minimal operational overhead.

Why Use SageMaker Production Variants?

Built-In Traffic Splitting: Amazon SageMaker endpoints support production variants, allowing multiple models to run on a single endpoint. You can direct a percentage of incoming traffic to each variant by adjusting the variant weights.

Ease of Management: Using production variants eliminates the need for additional infrastructure like separate endpoints or custom ALB configurations.

Monitoring with CloudWatch: SageMaker automatically integrates with CloudWatch, enabling real-time monitoring of model performance and invocation metrics.

Steps to Implement:

Deploy the New Model as a Production Variant:

Update the existing SageMaker endpoint to include the new model as a production variant. This can be done via the SageMaker console, CLI, or SDK.

Example SDK Code:

import boto3

sm_client = boto3.client('sagemaker')

response = sm_client.update_endpoint_weights_and_capacities(

EndpointName='existing-endpoint-name',

DesiredWeightsAndCapacities=[

{'VariantName': 'current-model', 'DesiredWeight': 0.9},

{'VariantName': 'new-model', 'DesiredWeight': 0.1}

]

)

Set the Variant Weight:

Assign a weight of 0.1 to the new model and 0.9 to the existing model. This ensures 10% of traffic goes to the new model while the remaining 90% continues to use the current model.

Monitor the Performance:

Use Amazon CloudWatch metrics, such as InvocationCount and ModelLatency, to monitor the traffic and performance of each variant.

Validate the Results:

Analyze the performance of the new model based on metrics like accuracy, latency, and failure rates.

Why Not the Other Options?

Option B: Setting the weight to 1 directs all traffic to the new model, which does not meet the requirement of splitting traffic for validation.

Option C: Creating a new endpoint introduces additional operational overhead for traffic routing and monitoring, which is unnecessary given SageMaker's built-in production variant capability.

Option D: Configuring the ALB to route traffic requires manual setup and lacks SageMaker's seamless variant monitoring and traffic splitting features.

Conclusion:

Using production variants with a weight of 0.1 for the new model on the existing SageMaker endpoint provides the required traffic split for online validation with minimal operational overhead.

[References:, Amazon SageMaker Endpoints, SageMaker Production Variants, Monitoring SageMaker Endpoints with CloudWatch, , , ]

Questions 42

A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data is masked before another team starts to build the model.

Which solution will meet these requirements?

Options:

Use Amazon Made to categorize the sensitive data.

Prepare the data by using AWS Glue DataBrew.

Run an AWS Batch job to change the sensitive data to random values.

Run an Amazon EMR job to change the sensitive data to random values.

Buy Now

Questions 43

A company has significantly increased the amount of data stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than before.

An ML engineer must implement a solution to optimize the data for query performance with the LEAST operational overhead.

Which solution will meet this requirement?

Options:

Configure an AWS Lambda function to split the .csv files into smaller objects.

Configure an AWS Glue job to drop string-type columns and save the results to S3.

Configure an AWS Glue ETL job to convert the .csv files to Apache Parquet format.

Configure an Amazon EMR cluster to process the data in S3.

Buy Now

Questions 44

A company is training a deep learning model to detect abnormalities in images. The company has limited GPU resources and a large hyperparameter space to explore. The company needs to test different configurations and avoid wasting computation time on poorly performing models that show weak validation accuracy in early epochs.

Which hyperparameter optimization strategy should the company use?

Options:

Grid search across all possible combinations

Bayesian optimization with early stopping

Manual tuning of each parameter individually

Exhaustive search without early stopping

Buy Now

Questions 45

A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running.

How should the company deploy the model on Amazon SageMaker to meet these requirements?

Options:

Use a multi-model serverless endpoint. Enable caching.

Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.

Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.

Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.

Buy Now

Questions 46

A company is using Amazon SageMaker AI to build an ML model to predict customer behavior. The company needs to explain the bias in the model to an auditor. The explanation must focus on demographic data of the customers.

Which solution will meet these requirements?

Options:

Use SageMaker Clarify to generate a bias report. Send the report to the auditor.

Use AWS Glue DataBrew to create a job to detect drift in the model's data quality. Send the job output to the auditor.

Use Amazon QuickSight integration with SageMaker AI to generate a bias report. Send the report to the auditor.

Use Amazon CloudWatch metrics from the SageMaker AI namespace to create a bias dashboard. Share the dashboard with the auditor.

Buy Now

Questions 47

A gaming company needs to deploy a natural language processing (NLP) model to moderate a chat forum in a game. The workload experiences heavy usage during evenings and weekends but minimal activity during other hours.

Which solution will meet these requirements MOST cost-effectively?

Options:

Use an Amazon SageMaker AI batch transform job with fixed capacity.

Use Amazon SageMaker Serverless Inference.

Use a single Amazon EC2 GPU instance with reserved capacity.

Use Amazon SageMaker Asynchronous Inference.

Buy Now

Questions 48

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model’s performance?

Options:

Accuracy

Area Under the ROC Curve (AUC)

F1 score

Mean absolute error (MAE)

Buy Now

Questions 49

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to use the central model registry to manage different versions of models in the application.

Which action will meet this requirement with the LEAST operational overhead?

Options:

Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model.

Use Amazon Elastic Container Registry (Amazon ECR) and unique tags for each model version.

Use the SageMaker Model Registry and model groups to catalog the models.

Use the SageMaker Model Registry and unique tags for each model version.

Buy Now

Questions 50

A company needs to analyze a large dataset that is stored in Amazon S3 in Apache Parquet format. The company wants to use one-hot encoding for some of the columns.

The company needs a no-code solution to transform the data. The solution must store the transformed data back to the same S3 bucket for model training.

Which solution will meet these requirements?

Options:

Configure an AWS Glue DataBrew project that connects to the data. Use the DataBrew interactive interface to create a recipe that performs the one-hot encoding transformation. Create a job to apply the transformation and write the output back to an S3 bucket.

Use Amazon Athena SQL queries to perform the one-hot encoding transformation.

Use an AWS Glue ETL interactive notebook to perform the transformation.

Use Amazon Redshift Spectrum to perform the transformation.

Buy Now

Questions 51

An ML engineer is preparing a dataset that contains medical records to train an ML model to predict the likelihood of patients developing diseases.

The dataset contains columns for patient ID, age, medical conditions, test results, and a "Disease" target column.

How should the ML engineer configure the data to train the model?

Options:

Remove the patient ID column.

Remove the age column.

Remove the medical conditions and test results columns.

Remove the "Disease" target column.

Buy Now

Questions 52

A company wants to improve its customer retention ML model. The current model has 85% accuracy and a new model shows 87% accuracy in testing. The company wants to validate the new model’s performance in production.

Which solution will meet these requirements?

Options:

Deploy the new model for 4 weeks across all production traffic. Monitor performance metrics and validate improvements.

Run A/B testing on both models for 4 weeks. Route 20% of traffic to the new model. Monitor customer retention rates across both variants.

Run both models in parallel for 4 weeks. Analyze offline predictions weekly by using historical customer data analysis.

Implement alternating deployments for 4 weeks between the current model and the new model. Track performance metrics for comparison.

Buy Now

Questions 53

An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML.

Which solution will meet these requirements?

Options:

Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data.

Use Amazon SageMaker Ground Truth to import the datasets and to consolidate them into a single data frame. Use the human-in-the-loop capability to prepare the data.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon Q Developer to generate code snippets that will prepare the data.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon SageMaker data labeling to prepare the data.

Buy Now

Questions 54

A company collects customer data daily and stores it as compressed files in an Amazon S3 bucket partitioned by date. Each month, analysts process the data, check data quality, and upload results to Amazon QuickSight dashboards.

An ML engineer needs to automatically check data quality before the data is sent to QuickSight, with the LEAST operational overhead.

Which solution will meet these requirements?

Options:

Run an AWS Glue crawler monthly and use AWS Glue Data Quality rules to check data quality.

Run an AWS Glue crawler and create a custom AWS Glue job with PySpark to evaluate data quality.

Use AWS Lambda with Python scripts triggered by S3 uploads to evaluate data quality.

Send S3 events to Amazon SQS and use Amazon CloudWatch Insights to evaluate data quality.

Buy Now

Questions 55

A company is developing a generative AI conversational interface to assist customers with payments. The company wants to use an ML solution to detect customer intent. The company does not have training data to train a model.

Which solution will meet these requirements?

Options:

Fine-tune a sequence-to-sequence (seq2seq) algorithm in Amazon SageMaker JumpStart.

Use an LLM from Amazon Bedrock with zero-shot learning.

Use the Amazon Comprehend DetectEntities API.

Run an LLM from Amazon Bedrock on Amazon EC2 instances.

Buy Now

Answer:

Explanation:

The key requirement in this scenario is detecting customer intent without having any training data. According to AWS Machine Learning and Generative AI documentation, zero-shot learning is specifically designed for situations where labeled training data is unavailable. Zero-shot learning allows a pre-trained large language model (LLM) to perform tasks it has not been explicitly trained on by leveraging its general knowledge and language understanding.

Amazon Bedrock provides fully managed access to foundation models (FMs) and LLMs that support zero-shot and few-shot learning. By using an LLM from Amazon Bedrock, the company can directly infer customer intent from natural language inputs without building, training, or fine-tuning a custom model. This approach is ideal for conversational interfaces where rapid deployment and scalability are required.

Option A is incorrect because fine-tuning a sequence-to-sequence (seq2seq) model in Amazon SageMaker JumpStart still requires labeled training data. Since the company explicitly does not have training data, this option does not meet the requirement.

Option C is also incorrect because the Amazon Comprehend DetectEntities API is designed for named entity recognition (NER), such as detecting names, dates, locations, or monetary values. It does not perform intent detection and is not suitable for conversational AI intent classification.

Option D is partially misleading. While it is technically possible to run an LLM on Amazon EC2, this does not inherently solve the problem of intent detection without training data. Additionally, Amazon Bedrock already abstracts infrastructure management, scaling, and model hosting, making direct EC2 deployment unnecessary and less efficient.

Therefore, using an LLM from Amazon Bedrock with zero-shot learning is the most appropriate, scalable, and AWS-recommended solution for intent detection without training data.

Questions 56

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to run an on-demand workflow to monitor bias drift for models that are deployed to real-time endpoints from the application.

Which action will meet this requirement?

Options:

Configure the application to invoke an AWS Lambda function that runs a SageMaker Clarify job.

Invoke an AWS Lambda function to pull the sagemaker-model-monitor-analyzer built-in SageMaker image.

Use AWS Glue Data Quality to monitor bias.

Use SageMaker notebooks to compare the bias.

Buy Now

Answer:

Explanation:

Monitoring bias drift in deployed machine learning models is crucial to ensure fairness and accuracy over time. Amazon SageMaker Clarify provides tools to detect bias in ML models, both during training and after deployment. To monitor bias drift for models deployed to real-time endpoints, an effective approach involves orchestrating SageMaker Clarify jobs using AWS Lambda functions.

Implementation Steps:

Set Up Data Capture:

Enable data capture on the SageMaker endpoint to record input data and model predictions. This captured data serves as the basis for bias analysis.

Develop a Lambda Function:

Create an AWS Lambda function configured to initiate a SageMaker Clarify job. This function will process the captured data to assess bias metrics.

Schedule or Trigger the Lambda Function:

Configure the Lambda function to run on-demand or at scheduled intervals using Amazon CloudWatch Events or EventBridge. This setup allows for regular bias monitoring as per the application's requirements.

Analyze and Respond to Results:

After each Clarify job completes, review the generated bias reports. If bias drift is detected, take appropriate actions, such as retraining the model or adjusting data preprocessing steps.

Advantages of This Approach:

Automation: Utilizing AWS Lambda for orchestrating Clarify jobs enables automated and scalable bias monitoring without manual intervention.

Cost-Effectiveness: AWS Lambda's serverless nature ensures that you only pay for the compute time consumed during the execution of the function, optimizing resource usage.

Flexibility: The solution can be tailored to specific monitoring needs, allowing for adjustments in monitoring frequency and analysis parameters.

By implementing this solution, the company can effectively monitor bias drift in real-time, ensuring that the AI application maintains fairness and accuracy throughout its lifecycle.

[References:, Bias drift for models in production - Amazon SageMaker, Schedule Bias Drift Monitoring Jobs - Amazon SageMaker, , ]

Questions 57

A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker compute costs reach a specific threshold.

Which solution will meet these requirements?

Options:

Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Cost Explorer to send an alert when the threshold is reached.

Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is reached.

Add resource tagging by editing each user's IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached.

Add resource tagging by editing each user's IAM profile. Configure AWS Budgets to send an alert when the threshold is reached.

Buy Now

Questions 58

An ML engineer needs to create data ingestion pipelines and ML model deployment pipelines on AWS. All the raw data is stored in Amazon S3 buckets.

Which solution will meet these requirements?

Options:

Use Amazon Data Firehose to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use AWS Glue to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use Amazon Redshift ML to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use Amazon Athena to create the data ingestion pipelines. Use an Amazon SageMaker notebook to create the model deployment pipelines.

Buy Now

Questions 59

An ML engineer wants to use Amazon SageMaker Data Wrangler to perform preprocessing on a dataset. The ML engineer wants to use the processed dataset to train a classification model. During preprocessing, the ML engineer notices that a text feature has a range of thousands of values that differ only by spelling errors. The ML engineer needs to apply an encoding method so that after preprocessing is complete, the text feature can be used to train the model.

Which solution will meet these requirements?

Options:

Perform ordinal encoding to represent categories of the feature.

Perform similarity encoding to represent categories of the feature.

Perform one-hot encoding to represent categories of the feature.

Perform target encoding to represent categories of the feature.

Buy Now

Questions 60

A company is building a deep learning model on Amazon SageMaker. The company uses a large amount of data as the training dataset. The company needs to optimize the model's hyperparameters to minimize the loss function on the validation dataset.

Which hyperparameter tuning strategy will accomplish this goal with the LEAST computation time?

Options:

Hyperbaric!

Grid search

Bayesian optimization

Random search

Buy Now

Questions 61

An ML engineer needs to deploy a trained model based on a genetic algorithm. Predictions can take several minutes, and requests can include up to 100 MB of data.

Which deployment solution will meet these requirements with the LEAST operational overhead?

Options:

Deploy on EC2 Auto Scaling behind an ALB.

Deploy to a SageMaker AI real-time endpoint.

Deploy to a SageMaker AI Asynchronous Inference endpoint.

Deploy to Amazon ECS on EC2.

Buy Now

Questions 62

An ML engineer must choose the appropriate Amazon SageMaker algorithm to solve specific AI problems.

Select the correct SageMaker built-in algorithm from the following list for each use case. Each algorithm should be selected one time.

• Random Cut Forest (RCF) algorithm

• Semantic segmentation algorithm

• Sequence-to-Sequence (seq2seq) algorithm

Options:

Buy Now

AWS Certified Associate |

Exam Code: MLA-C01

Exam Name: AWS Certified Machine Learning Engineer - Associate

Last Update: Feb 8, 2026

Questions: 207

MLA-C01 PDF

$25.5 ~~$84.99~~

Add to Cart

MLA-C01 Testing Engine

$30 ~~$99.99~~

Add to Cart

MLA-C01 PDF + Testing Engine

$40.5 ~~$134.99~~

Add to Cart

Weekend Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: cramtick70

cramtick logo

Navigation:

Hot Vendors:

MLA-C01 AWS Certified Machine Learning Engineer - Associate Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options: