AIP-C01 AWS Certified Generative AI Developer - Professional Questions and Answers

Questions 4

A legal research company has a Retrieval Augmented Generation (RAG) application that uses Amazon Bedrock and Amazon OpenSearch Service. The application stores 768-dimensional vector embeddings for 15 million legal documents, including statutes, court rulings, and case summaries.

The company's current chunking strategy segments text into fixed-length blocks of 500 tokens. The current chunking strategy often splits contextually linked information such as legal arguments, court opinions, or statute references across separate chunks. Researchers report that generated outputs frequently omit key context or cite outdated legal information.

Recent application logs show a 40% increase in response times. The p95 latency metric exceeds 2 seconds. The company expects storage needs for the application to grow from 90 GB to 360 GB within a year.

The company needs a solution to improve retrieval relevance and system performance at scale.

Which solution will meet these requirements?

Options:

Increase the embedding vector dimensionality from 768 to 4,096 without changing the existing chunking or pre-processing strategy.

Replace dynamic retrieval with static, pre-written summaries that are stored in Amazon S3. Use Amazon CloudFront to serve the summaries to reduce compute demand and improve predictability.

Update the chunking strategy to use semantic boundaries such as complete legal arguments, clauses, or sections rather than fixed token limits. Regenerate vector embeddings to align with the new chunk structure.

Migrate from OpenSearch Service to Amazon DynamoDB. Implement keyword-based indexes to enable faster lookups for legal concepts.

Buy Now

Questions 5

A company has a recommendation system. The system's applications run on Amazon EC2 instances. The applications make API calls to Amazon Bedrock foundation models (FMs) to analyze customer behavior and generate personalized product recommendations.

The system is experiencing intermittent issues. Some recommendations do not match customer preferences. The company needs an observability solution to monitor operational metrics and detect patterns of operational performance degradation compared to established baselines. The solution must also generate alerts with correlation data within 10 minutes when FM behavior deviates from expected patterns.

Which solution will meet these requirements?

Options:

Configure Amazon CloudWatch Container Insights for the application infrastructure. Set up CloudWatch alarms for latency thresholds. Add custom metrics for token counts by using the CloudWatch embedded metric format. Create CloudWatch dashboards to visualize the data.

Implement AWS X-Ray to trace requests through the application components. Enable CloudWatch Logs Insights for error pattern detection. Set up AWS CloudTrail to monitor all API calls to Amazon Bedrock. Create custom dashboards in Amazon QuickSight.

Enable Amazon CloudWatch Application Insights for the application resources. Create custom metrics for recommendation quality, token usage, and response latency by using the CloudWatch embedded metric format with dimensions for request types and user segments. Configure CloudWatch anomaly detection on the model metrics. Establish log pattern analysis by using CloudWatch Logs Insights.

Use Amazon OpenSearch Service with the Observability plugin. Ingest model metrics and logs by using Amazon Kinesis. Create custom Piped Processing Language (PPL) queries to analyze model behavior patterns. Establish operational dashboards to visualize anomalies in real time.

Buy Now

Answer:

Explanation:

Option C best satisfies the requirements because it combines application-aware observability, metric baselining, anomaly detection, and correlated alerting using fully managed AWS services with minimal operational overhead. Amazon CloudWatch Application Insights is designed to automatically monitor application health by analyzing metrics, logs, and events across EC2-based workloads. This aligns directly with the need to detect intermittent performance issues and deviations from expected behavior.

By publishing custom metrics using the CloudWatch embedded metric format, the application can track generative AI–specific signals such as recommendation quality indicators, token usage, request volume, and response latency from Amazon Bedrock foundation model calls. Adding dimensions such as request type or user segment enables fine-grained visibility into which workloads or customer groups are impacted when recommendation quality degrades.

A critical requirement is detecting degradation compared to established baselines and generating alerts within 10 minutes. CloudWatch anomaly detection automatically builds statistical models of normal behavior for time-series metrics and flags deviations without requiring manually tuned thresholds. This capability is well suited for monitoring foundation model behavior, which can vary subtly over time. When anomalies are detected, CloudWatch alarms can trigger notifications with contextual metric data quickly, meeting the alerting requirement.

CloudWatch Logs Insights complements the metric-based view by enabling log pattern analysis and correlation. Engineers can query application logs and model response logs to identify recurring error patterns or shifts in output behavior that explain why recommendations no longer align with user preferences. Application Insights further correlates metrics and logs to surface probable root causes, reducing mean time to resolution.

The other options lack one or more critical elements. Option A focuses on infrastructure-level metrics without baseline anomaly detection. Option B emphasizes tracing and auditing but does not provide automated performance deviation analysis. Option D offers flexibility but requires significantly more development and operational effort than a native CloudWatch-based solution.

Questions 6

A company is developing a generative AI (GenAI) application that analyzes customer service calls in real time and generates suggested responses for human customer service agents. The application must process 500,000 concurrent calls during peak hours with less than 200 ms end-to-end latency for each suggestion. The company uses existing architecture to transcribe customer call audio streams. The application must not exceed a predefined monthly compute budget and must maintain auto scaling capabilities.

Which solution will meet these requirements?

Options:

Deploy a large, complex reasoning model on Amazon Bedrock. Purchase provisioned throughput and optimize for batch processing.

Deploy a low-latency, real-time optimized model on Amazon Bedrock. Purchase provisioned throughput and set up automatic scaling policies.

Deploy a large language model (LLM) on an Amazon SageMaker real-time endpoint that uses dedicated GPU instances.

Deploy a mid-sized language model on an Amazon SageMaker serverless endpoint that is optimized for batch processing.

Buy Now

Questions 7

A financial services company uses an AI application to process financial documents by using Amazon Bedrock. During business hours, the application handles approximately 10,000 requests each hour, which requires consistent throughput.

The company uses the CreateProvisionedModelThroughput API to purchase provisioned throughput. Amazon CloudWatch metrics show that the provisioned capacity is unused while on-demand requests are being throttled. The company finds the following code in the application:

python

response = bedrock_runtime.invoke_model(modelId="anthropic.claude-v2", body=json.dumps(payload))

The company needs the application to use the provisioned throughput and to resolve the throttling issues.

Which solution will meet these requirements?

Options:

Increase the number of model units (MUs) in the provisioned throughput configuration.

Replace the model ID parameter with the ARN of the provisioned model that the CreateProvisionedModelThroughput API returns.

Add exponential backoff retry logic to handle throttling exceptions during peak hours.

Modify the application to use the InvokeModelWithResponseStream API instead of the InvokeModel API.

Buy Now

Answer:

Explanation:

Option B is correct because the application is currently invoking the base foundation model identifier, which routes traffic to the on-demand capacity pool rather than the company’s purchased provisioned throughput. In Amazon Bedrock, provisioned throughput is attached to a specific provisioned resource created through the provisioned throughput APIs. To consume that reserved capacity, inference requests must target the provisioned resource identifier that represents the purchased throughput, not the generic model identifier used for on-demand inference.

The code snippet uses modelId="anthropic.claude-v2". This value selects the on-demand endpoint for that model. As a result, requests are subject to on-demand quotas and throttling behavior, while the provisioned throughput remains idle. This directly explains the CloudWatch observation: provisioned capacity metrics show unused capacity because no traffic is being directed to the provisioned resource, and the on-demand path is throttling because it is exceeding the applicable on-demand limits during peak volume.

Replacing the modelId value with the provisioned throughput ARN returned by the CreateProvisionedModelThroughput workflow ensures the runtime invocation is routed to the reserved capacity. Once traffic is directed correctly, the purchased model units provide the consistent throughput required for predictable performance during business hours, which is exactly why provisioned throughput is used.

Option A could increase capacity, but it does not fix the core issue that the application is not using the provisioned resource at all. Option C can reduce the impact of throttling temporarily, but it adds latency and does not guarantee consistent throughput; it also still wastes the provisioned capacity. Option D changes the response delivery mechanism, but throttling is a capacity routing and quota issue, not a streaming API issue.

Questions 8

A pharmaceutical company is developing a Retrieval Augmented Generation application that uses an Amazon Bedrock knowledge base. The knowledge base uses Amazon OpenSearch Service as a data source for more than 25 million scientific papers. Users report that the application produces inconsistent answers that cite irrelevant sections of papers when queries span methodology, results, and discussion sections of the papers.

The company needs to improve the knowledge base to preserve semantic context across related paragraphs on the scale of the entire corpus of data.

Which solution will meet these requirements?

Options:

Configure the knowledge base to use fixed-size chunking. Set a 300-token maximum chunk size and a 10% overlap between chunks. Use an appropriate Amazon Bedrock embedding model.

Configure the knowledge base to use hierarchical chunking. Use parent chunks that contain 1,000 tokens and child chunks that contain 200 tokens. Set a 50-token overlap between chunks.

Configure the knowledge base to use semantic chunking. Use a buffer size of 1 and a breakpoint percentile threshold of 85% to determine chunk boundaries based on content meaning.

Configure the knowledge base not to use chunking. Manually split each document into separate files before ingestion. Apply post-processing reranking during retrieval.

Buy Now

Answer:

Explanation:

Option B is the best fit because hierarchical chunking is designed to preserve local detail while keeping broader document context available during retrieval, which directly addresses the problem of questions spanning methodology, results, and discussion. In large scientific papers, a single answer often depends on linked paragraphs across adjacent sections. If the knowledge base retrieves only small, isolated chunks, the RAG system can cite text that is semantically close to a query term but not contextually correct, producing inconsistent answers and irrelevant citations.

With hierarchical chunking, the knowledge base creates child chunks that are small enough for high-precision vector similarity matching, such as 200 tokens, which improves the likelihood that the retrieved text is tightly related to the user’s query. At the same time, each child chunk is associated with a larger parent chunk, such as 1,000 tokens, which retains the surrounding narrative and section-level context. This structure helps the retrieval pipeline return passages that include the relevant subsection plus the explanatory framing that prevents misinterpretation, which is especially important in scientific writing where methods, results, and discussion are interdependent.

The configured overlap further reduces boundary effects where key statements split across chunks. This improves continuity for paragraphs that bridge sections, such as a results paragraph that references the methodological setup or a discussion paragraph interpreting a specific metric.

Option A can improve consistency slightly, but fixed-size chunking still risks separating related paragraphs and does not provide a built-in mechanism to retrieve broader context linked to precise matches. Option C can create more meaningful boundaries, but it does not guarantee the parent-level context that hierarchical chunking provides at retrieval time. Option D increases operational burden and is not practical at the scale of 25 million

Questions 9

response = bedrock_runtime.invoke_model(

modelId="anthropic.claude-v2",

body=json.dumps(payload)

)

The company needs the application to use the provisioned throughput and to resolve the throttling issues.

Which solution will meet these requirements?

Options:

Increase the number of model units (MUs) in the provisioned throughput configuration.

Replace the model ID parameter with the ARN of the provisioned model that the CreateProvisionedModelThroughput API returns.

Add exponential backoff retry logic to handle throttling exceptions during peak hours.

Modify the application to use the invokeModelWithResponseStream API instead of the invokeModel API.

Buy Now

Questions 10

An ecommerce company operates a global product recommendation system that needs to switch between multiple foundation models (FMs) in Amazon Bedrock based on regulations, cost optimization, and performance requirements. The company must apply custom controls based on proprietary business logic, including dynamic cost thresholds, AWS Region-specific compliance rules, and real-time A/B testing across multiple FMs. The system must be able to switch between FMs without deploying new code. The system must route user requests based on complex rules including user tier, transaction value, regulatory zone, and real-time cost metrics that change hourly and require immediate propagation across thousands of concurrent requests.

Which solution will meet these requirements?

Options:

Deploy an AWS Lambda function that uses environment variables to store routing rules and Amazon Bedrock FM IDs. Use the Lambda console to update the environment variables when business requirements change. Configure an Amazon API Gateway REST API to read request parameters to make routing decisions.

Deploy Amazon API Gateway REST API request transformation templates to implement routing logic based on request attributes. Store Amazon Bedrock FM endpoints as REST API stage variables. Update the variables when the system switches between models.

Configure an AWS Lambda function to fetch routing configuration from the AWS AppConfig Agent for each user request. Run business logic in the Lambda function to select the appropriate FM for each request. Expose the FM through a single Amazon API Gateway REST API endpoint.

Use AWS Lambda authorizers for an Amazon API Gateway REST API to evaluate routing rules that are stored in AWS AppConfig. Return authorization contexts based on business logic. Route requests to model-specific Lambda functions for each Amazon Bedrock FM.

Buy Now

Answer:

Explanation:

Option C best satisfies the requirement to change routing decisions without redeploying code while supporting complex, frequently changing business logic at scale. AWS AppConfig is designed for centrally managing dynamic configuration (feature flags, rules, thresholds, and policy parameters) and deploying changes safely. It supports controlled deployments, validation, and rapid propagation of updated configuration values, which aligns with “real-time cost metrics that change hourly” and the need for “immediate propagation across thousands of concurrent requests.”

In this design, the Lambda function becomes the policy decision point. For each request, it evaluates user attributes (tier, transaction value), context (regulatory zone, Region), and live cost/performance thresholds stored in AppConfig to determine which Amazon Bedrock FM to invoke. Because the routing rules and FM identifiers are delivered as configuration, the company can switch models, adjust A/B testing weights, or update compliance routing rules by deploying new AppConfig configuration versions rather than pushing new application code. This reduces operational risk and accelerates iteration.

Exposing a single API Gateway endpoint also minimizes client complexity and keeps routing logic server-side, which is important when rules change frequently. Lambda can cache configuration between invocations (within the execution environment) to reduce repeated fetch overhead while still picking up changes quickly, enabling both low latency and rapid rule rollout under high concurrency.

Option A relies on Lambda environment variables, which are not intended for frequent real-time updates and typically require function configuration updates that are slower and operationally brittle. Option B uses mapping templates and stage variables, which are limited for complex rule evaluation and safe rollout patterns. Option D misuses authorizers for business routing, adds extra latency and complexity, and complicates observability and error handling by splitting decisioning from execution.

Questions 11

A financial services company is deploying a generative AI (GenAI) application that uses Amazon Bedrock to assist customer service representatives to provide personalized investment advice to customers. The company must implement a comprehensive governance solution that follows responsible AI practices and meets regulatory requirements.

The solution must detect and prevent hallucinations in recommendations. The solution must have safety controls for customer interactions. The solution must also monitor model behavior drift in real time and maintain audit trails of all prompt-response pairs for regulatory review. The company must deploy the solution within 60 days. The solution must integrate with the company's existing compliance dashboard and respond to customers within 200 ms.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Configure Amazon Bedrock guardrails to apply custom content filters and toxicity detection. Use Amazon Bedrock Model Evaluation to detect hallucinations. Store prompt-response pairs in Amazon DynamoDB to capture audit trails and set a TTL. Integrate Amazon CloudWatch custom metrics with the existing compliance dashboard.

Deploy Amazon Bedrock and use AWS PrivateLink to access the application securely. Use AWS Lambda functions to implement custom prompt validation. Store prompt-response pairs in an Amazon S3 bucket and configure S3 Lifecycle policies. Create custom Amazon CloudWatch dashboards to monitor model performance metrics.

Use Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases to ground responses. Use Amazon Bedrock Guardrails to enforce content safety. Use Amazon OpenSearch Service to store and index prompt-response pairs. Integrate OpenSearch Service with Amazon QuickSight to create compliance reports and to detect model behavior drift.

Use Amazon SageMaker Model Monitor to detect model behavior drift. Use AWS WAF to filter content. Store customer interactions in an encrypted Amazon RDS database. Use Amazon API Gateway to create custom HTTP APIs to integrate with the compliance dashboard.

Buy Now

Answer:

Explanation:

Option A is the correct solution because it uses native Amazon Bedrock governance and evaluation capabilities to meet regulatory, performance, and deployment timeline requirements with the least operational overhead.

Amazon Bedrock guardrails provide built-in safety controls that enforce responsible AI policies directly during inference. Custom content filters and toxicity detection protect customer interactions and prevent disallowed investment guidance patterns without requiring custom application logic. Guardrails operate inline and are optimized for low latency, which helps meet the strict 200 ms response-time requirement.

Hallucination detection is addressed through Amazon Bedrock Model Evaluation, which supports automated evaluation at scale using LLM-as-a-judge techniques. This enables the company to detect factual inaccuracies and policy violations systematically, without building custom evaluation pipelines or requiring extensive human review. Evaluation outputs can be surfaced as metrics.

Storing all prompt-response pairs in Amazon DynamoDB provides a low-latency, highly scalable audit store that aligns with financial regulatory requirements. Using TTL enforces data retention policies automatically, reducing compliance risk and storage overhead.

Amazon CloudWatch custom metrics integrate seamlessly with existing compliance dashboards, allowing near–real-time monitoring of safety interventions, hallucination rates, and drift indicators. CloudWatch anomaly detection can be applied to these metrics to surface behavior changes quickly.

Option B relies on custom Lambda logic and S3-based auditing, increasing latency and operational complexity. Option C introduces additional services that increase setup time and may exceed the 60-day deployment window. Option D uses non–Bedrock-native monitoring and adds unnecessary infrastructure layers.

Therefore, Option A provides the most complete, compliant, and low-overhead governance solution for a regulated GenAI financial services application.

Questions 12

A company is using Amazon Bedrock to develop an AI-powered application that uses a foundation model (FM) that supports cross-Region inference and provisioned throughput. The application must serve users in Europe and North America with consistently low latency. The application must comply with data residency regulations that require European user data to remain within Europe-based AWS Regions.

During testing, the application experiences service degradation when Regional traffic spikes reach service quotas. The company needs a solution that maintains application resilience and minimizes operational complexity.

Which solution will meet these requirements?

Options:

Deploy separate Amazon Bedrock instances in North American and European Regions. Use a custom routing layer that directs traffic based on user location. Configure Amazon CloudWatch alarms to monitor Regional service usage. Use Amazon SNS to send email alerts when usage approaches thresholds.

Use Amazon Bedrock cross-Region inference profiles by specifying geographical codes in profile IDs when calling the InvokeModel API. Configure separate Amazon API Gateway HTTP APIs to direct European and North American users to the appropriate Regional endpoints.

Deploy a multi-Region Amazon API Gateway HTTP API and AWS Lambda functions that implement retry logic to handle throttling. Configure the Lambda functions to call the FM in the nearest secondary Region when quotas are reached.

Configure provisioned throughput for Amazon Bedrock in multiple Regions. Implement failover logic in application code to switch Regions when throttling occurs. Use AWS Global Accelerator to route traffic based on user location.

Buy Now

Questions 13

A company has a customer service application that uses Amazon Bedrock to generate personalized responses to customer inquiries. The company needs to establish a quality assurance process to evaluate prompt effectiveness and model configurations across updates. The process must automatically compare outputs from multiple prompt templates, detect response quality issues, provide quantitative metrics, and allow human reviewers to give feedback on responses. The process must prevent configurations that do not meet a predefined quality threshold from being deployed.

Which solution will meet these requirements?

Options:

Create an AWS Lambda function that sends sample customer inquiries to multiple Amazon Bedrock model configurations and stores responses in Amazon S3. Use Amazon QuickSight to visualize response patterns. Manually review outputs daily. Use AWS CodePipeline to deploy configurations that meet the quality threshold.

Use Amazon Bedrock evaluation jobs to compare model outputs by using custom prompt datasets. Configure AWS CodePipeline to run the evaluation jobs when prompt templates change. Configure CodePipeline to deploy only configurations that exceed the predefined quality threshold.

Set up Amazon CloudWatch alarms to monitor response latency and error rates from Amazon Bedrock. Use Amazon EventBridge rules to notify teams when thresholds are exceeded. Configure a manual approval workflow in AWS Systems Manager.

Use AWS Lambda functions to create an automated testing framework that samples production traffic and routes duplicate requests to the updated model version. Use Amazon Comprehend sentiment analysis to compare results. Block deployment if sentiment scores decrease.

Buy Now

Questions 14

A financial services company is developing a customer service AI assistant application that uses a foundation model (FM) in Amazon Bedrock. The application must provide transparent responses by documenting reasoning and by citing sources that are used for Retrieval Augmented Generation (RAG). The application must capture comprehensive audit trails for all responses to users. The application must be able to serve up to 10,000 concurrent users and must respond to each customer inquiry within 2 seconds.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Enable tracing for Amazon Bedrock Agents. Configure structured prompts that direct the FM to provide evidence presentations. Integrate Amazon Bedrock Knowledge Bases with data sources to enable RAG. Configure the application to reference and cite authoritative content. Deploy the application in a Multi-AZ architecture. Use Amazon API Gateway and AWS Lambda functions to scale the application. Use Amazon CloudFront to provide low-latency deli

Enable tracing for Amazon Bedrock agents. Integrate a custom RAG pipeline with Amazon OpenSearch Service to retrieve and cite sources. Configure structured prompts to present retrieved evidence. Deploy the application behind an Amazon API Gateway REST API. Use AWS Lambda functions and Amazon CloudFront to scale the application and to provide low latency. Store logs in Amazon S3 and use AWS CloudTrail to capture audit trails.

Use Amazon CloudWatch to monitor latency and error rates. Embed model prompts directly in the application backend to cite sources. Store application interactions with users in Amazon RDS for audits.

Store generated responses and supporting evidence in an Amazon S3 bucket. Enable versioning on the bucket for audits. Use AWS Glue to catalog retrieved documents. Process the retrieved documents in Amazon Athena to generate periodic compliance reports.

Buy Now

Questions 15

A publishing company is developing a chat assistant that uses a containerized large language model (LLM) that runs on Amazon SageMaker AI. The architecture consists of an Amazon API Gateway REST API that routes user requests to an AWS Lambda function. The Lambda function invokes a SageMaker AI real-time endpoint that hosts the LLM.

Users report uneven response times. Analytics show that a high number of chats are abandoned after 2 seconds of waiting for the first token. The company wants a solution to ensure that p95 latency is under 800 ms for interactive requests to the chat assistant.

Which combination of solutions will meet this requirement? (Select TWO.)

Options:

Enable model preload upon container startup. Implement dynamic batching to process multiple user requests together in a single inference pass.

Select a larger GPU instance type for the SageMaker AI endpoint. Set the minimum number of instances to 0. Continue to perform per-request processing. Lazily load model weights on the first request.

Switch to a multi-model endpoint. Use lazy loading without request batching.

Set the minimum number of instances to greater than 0. Enable response streaming.

Switch to Amazon SageMaker Asynchronous Inference for all requests. Store requests in an Amazon S3 bucket. Set the minimum number of instances to 0.

Buy Now

Answer:

A, D

Explanation:

The correct answers are A and D because they directly reduce time-to-first-token and stabilize p95 latency for interactive, real-time chat workloads hosted on Amazon SageMaker AI real-time endpoints.

Option D addresses the biggest driver of uneven latency: cold starts and scale-to-zero behavior. By setting the minimum number of instances to greater than 0, the endpoint always has warm capacity and loaded runtime resources, eliminating the first-request penalty that causes users to wait multiple seconds. Enabling response streaming improves perceived latency by returning the first tokens as soon as they are generated rather than waiting for the complete response. This directly targets the abandonment problem described (users leaving after waiting for the first token).

Option A further improves p95 latency and throughput by removing model loading overhead during inference and improving GPU utilization. Preloading model weights during container startup ensures the model is ready before traffic arrives and avoids unpredictable on-demand weight loading. Dynamic batching increases efficiency by grouping compatible requests into a single inference pass, reducing per-request overhead and improving GPU saturation. When tuned properly for interactive workloads, batching can reduce tail latency while preserving responsiveness by enforcing small batch windows.

Option B makes latency worse because setting minimum instances to 0 and lazily loading weights guarantees cold-start delays and unpredictable first-token performance. Option C similarly increases cold-start behavior through lazy loading and offers no batching benefits. Option E is designed for non-interactive workloads and introduces queueing and storage latency, which conflicts with the 800 ms p95 requirement for interactive chat.

Therefore, A and D are the best combination to achieve consistently low p95 latency and fast first-token streaming for a SageMaker-hosted chat assistant.

Questions 16

An ecommerce company is using Amazon Bedrock to build a generative AI (GenAI) application. The application uses AWS Step Functions to orchestrate a multi-agent workflow to produce detailed product descriptions. The workflow consists of three sequential states: a description generator, a technical specifications validator, and a brand voice consistency checker. Each state produces intermediate reasoning traces and outputs that are passed to the next state. The application uses an Amazon S3 bucket for process storage and to store outputs.

During testing, the company discovers that outputs between Step Functions states frequently exceed the 256 KB quota and cause workflow failures. A GenAI Developer needs to revise the application architecture to efficiently handle the Step Functions 256 KB quota and maintain workflow observability. The revised architecture must preserve the existing multi-agent reasoning and acting (ReAct) pattern.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Store intermediate outputs in Amazon DynamoDB. Pass only references between states. Create a Map state that retrieves the complete data from DynamoDB when required for each agent's processing step.

Configure an Amazon Bedrock integration to use the S3 bucket URI in the input parameters for large outputs. Use the ResultPath and ResultSelector fields to route S3 references between the agent steps while maintaining the sequential validation workflow.

Use AWS Lambda functions to compress outputs to less than 256 KB before each agent state. Configure each agent task to decompress outputs before processing and to compress results before passing them to the next state.

Configure a separate Step Functions state machine to handle each agent’s processing. Use Amazon EventBridge to coordinate the execution flow between state machines. Use S3 references for the outputs as event data.

Buy Now

Answer:

Explanation:

Option B is the best solution because it directly addresses the Step Functions 256 KB state payload quota by externalizing large intermediate artifacts to Amazon S3 and passing only lightweight references (URIs/keys) between states. This is a standard AWS pattern for workflows that produce large intermediate results, and it avoids introducing additional databases, compression logic, or cross-state-machine coordination that increases operational overhead.

In a multi-agent ReAct workflow, intermediate reasoning traces can be verbose and grow quickly as each agent produces chain-of-thought style artifacts, structured outputs, and supporting evidence. Step Functions is designed to orchestrate state transitions and pass JSON payloads, but large payloads should be stored outside the state machine and referenced by pointer values. Using Amazon S3 for intermediate outputs is operationally efficient because the application already uses S3 for storage, and S3 provides durable, low-cost storage with simple access patterns.

ResultPath and ResultSelector allow each state to store or reshape results so that only the required reference fields (such as s3Uri, object key, metadata, trace IDs) are forwarded to subsequent states. This preserves observability because the workflow can still log trace references, correlate steps with S3 objects, and store structured metadata for debugging. It also preserves the sequential validation design, keeping the existing ReAct pattern intact while preventing failures due to oversized payloads.

Option A adds additional services and read/write patterns that increase operational complexity. Option C introduces custom compression/decompression logic that is fragile, adds latency, and complicates troubleshooting. Option D increases orchestration overhead by splitting workflows and coordinating with events, which makes debugging harder and increases failure modes.

Therefore, Option B meets the payload limit requirement while keeping the architecture simple and observable.

Questions 17

A company uses Amazon Bedrock to generate technical content for customers. The company has recently experienced a surge in hallucinated outputs when the company’s model generates summaries of long technical documents. The model outputs include inaccurate or fabricated details. The company’s current solution uses a large foundation model (FM) with a basic one-shot prompt that includes the full document in a single input.

The company needs a solution that will reduce hallucinations and meet factual accuracy goals. The solution must process more than 1,000 documents each hour and deliver summaries within 3 seconds for each document.

Which combination of solutions will meet these requirements? (Select TWO.)

Options:

Implement zero-shot chain-of-thought (CoT) instructions that require step-by-step reasoning with explicit fact verification before the model generates each summary.

Use Retrieval Augmented Generation (RAG) with an Amazon Bedrock knowledge base. Apply semantic chunking and tuned embeddings to ground summaries in source content.

Configure Amazon Bedrock guardrails to block any generated output that matches patterns that are associated with hallucinated content.

Increase the temperature parameter in Amazon Bedrock.

Prompt the Amazon Bedrock model to summarize each full document in one pass.

Buy Now

Questions 18

A financial services company needs to build a document analysis system that uses Amazon Bedrock to process quarterly reports. The system must analyze financial data, perform sentiment analysis, and validate compliance across batches of reports. Each batch contains 5 reports. Each report requires multiple foundation model (FM) calls. The solution must finish the analysis within 10 seconds for each batch. Current sequential processing takes 45 seconds for each batch.

Which solution will meet these requirements?

Options:

Use AWS Lambda functions with provisioned concurrency to process each analysis type sequentially. Configure the Lambda function timeouts to 10 seconds. Configure automatic retries with exponential backoff.

Use AWS Step Functions with a Parallel state to invoke separate AWS Lambda functions for each analysis type simultaneously. Configure Amazon Bedrock client timeouts. Use Amazon CloudWatch metrics to track execution time and model inference latency.

Create an Amazon SQS queue to buffer analysis requests. Deploy multiple AWS Lambda functions with reserved concurrency. Configure each Lambda function to process different aspects of each report sequentially and then combine the results.

Deploy an Amazon ECS cluster that runs containers that process each report sequentially. Use a load balancer to distribute batch workloads. Configure an auto-scaling policy based on CPU utilization.

Buy Now

Answer:

Explanation:

Option B is the correct solution because it parallelizes independent foundation model inference tasks while maintaining orchestration, observability, and time-bound execution. AWS Generative AI best practices emphasize reducing end-to-end latency by parallelizing independent inference calls rather than scaling individual calls vertically.

In this scenario, each report requires multiple independent analyses such as financial extraction, sentiment analysis, and compliance validation. These tasks do not depend on each other’s output, making them ideal candidates for parallel execution. AWS Step Functions provides a Parallel state that can invoke multiple AWS Lambda functions simultaneously, drastically reducing total processing time compared to sequential execution.

By invoking Amazon Bedrock from separate Lambda functions in parallel, the system can reduce batch execution time from 45 seconds to well under the 10-second requirement, assuming each inference call remains within acceptable latency bounds. Step Functions also provide built-in error handling, retries, and state tracking, which improves reliability without increasing complexity.

CloudWatch metrics allow teams to monitor both workflow execution time and individual model inference latency, enabling performance tuning and operational visibility. Configuring client-side timeouts ensures that slow or failed model invocations do not block the entire batch.

Option A still processes tasks sequentially and therefore cannot meet the strict latency requirement. Option C introduces queuing delays and sequential processing within each report, which increases total execution time. Option D relies on container-based sequential processing and adds unnecessary operational overhead for a workload that is event-driven and latency-sensitive.

Therefore, Option B best meets the performance, scalability, and operational efficiency requirements for high-speed batch document analysis using Amazon Bedrock.

Questions 19

Company configures a landing zone in AWS Control Tower. The company handles sensitive data that must remain within the European Union. The company must use only the eu-central-1 Region. The company uses Service Control Policies (SCPs) to enforce data residency policies. GenAI developers at the company are assigned IAM roles that have full permissions for Amazon Bedrock.

The company must ensure that GenAI developers can use the Amazon Nova Pro model through Amazon Bedrock only by using cross-Region inference (CRI) and only in eu-central-1. The company enables model access for the GenAI developer IAM roles in Amazon Bedrock. However, when a GenAI developer attempts to invoke the model through the Amazon Bedrock Chat/Text playground, the GenAI developer receives the following error:

User arn:aws:sts:123456789012:assumed-role/AssumedDevRole/DevUserName

Action: bedrock:InvokeModelWithResponseStream

On resource(s): arn:aws:bedrock:eu-west-3::foundation-model/amazon.nova-pro-v1:0

Context: a service control policy explicitly denies the action

The company needs a solution to resolve the error. The solution must retain the company's existing governance controls and must provide precise access control. The solution must comply with the company's existing data residency policies.

Which combination of solutions will meet these requirements? (Select TWO.)

Options:

Add an AdministratorAccess policy to the GenAI developer IAM role

Extend the existing SCPs to enable CRI for the eu.amazon.nova-pro-v1:0 inference profile

Enable Amazon Bedrock model access for Amazon Nova Pro in the eu-west-3 Region

Validate that the GenAI developer IAM roles have permissions to invoke Amazon Nova Pro through the eu.amazon.nova-pro-v1:0 inference profile on all European Union AWS Regions that can serve the model

Extend the existing SCP to enable CRI for the eu-* inference profile

Buy Now

Questions 20

A medical company is building a generative AI (GenAI) application that uses Retrieval Augmented Generation (RAG) to provide evidence-based medical information. The application uses Amazon OpenSearch Service to retrieve vector embeddings. Users report that searches frequently miss results that contain exact medical terms and acronyms and return too many semantically similar but irrelevant documents. The company needs to improve retrieval quality and maintain low end-user latency, even as the document collection grows to millions of documents.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Configure hybrid search by combining vector similarity with keyword matching to improve semantic understanding and exact term and acronym matching.

Increase the dimensions of the vector embeddings from 384 to 1536. Use a post-processing AWS Lambda function to filter out irrelevant results after retrieval.

Replace OpenSearch Service with Amazon Kendra. Use query expansion to handle medical acronyms and terminology variants during pre-processing.

Implement a two-stage retrieval architecture in which initial vector search results are re-ranked by an ML model hosted on Amazon SageMaker.

Buy Now

Questions 21

An enterprise application uses an Amazon Bedrock foundation model (FM) to process and analyze 50 to 200 pages of technical documents. Users are experiencing inconsistent responses and receiving truncated outputs when processing documents that exceed the FM's context window limits.

Which solution will resolve this problem?

Options:

Configure fixed-size chunking at 4,000 tokens for each chunk with 20% overlap. Use application-level logic to link multiple chunks sequentially until the FM's maximum context window of 200,000 tokens is reached before making inference calls.

Use hierarchical chunking with parent chunks of 8,000 tokens and child chunks of 2,000 tokens. Use Amazon Bedrock Knowledge Bases built-in retrieval to automatically select relevant parent chunks based on query context. Configure overlap tokens to maintain semantic continuity.

Use semantic chunking with a breakpoint percentile threshold of 95% and a buffer size of 3 sentences. Use the RetrieveAndGenerate API to dynamically select the most relevant chunks based on embedding similarity scores.

Create a pre-processing AWS Lambda function that analyzes document token count by using the FM's tokenizer. Configure the Lambda function to split documents into equal segments that fit within 80% of the context window. Configure the Lambda function to process each segment independently before aggregating the results.

Buy Now

Questions 22

A company is building a multicloud generative AI (GenAI)-powered secret resolution application that uses Amazon Bedrock and Agent Squad. The application resolves secrets from multiple sources, including key stores and hardware security modules (HSMs). The application uses AWS Lambda functions to retrieve secrets from the sources. The application uses AWS AppConfig to implement dynamic feature gating. The application supports secret chaining and detects secret drift. The application handles short-lived and expiring secrets. The application also supports prompt flows for templated instructions. The application uses AWS Step Functions to orchestrate agents to resolve the secrets and to manage secret validation and drift detection.

The company finds multiple issues during application testing. The application does not refresh expired secrets in time for agents to use. The application sends alerts for secret drift, but agents still use stale data. Prompt flows within the application reuse outdated templates, which cause cascading failures. The company must resolve the performance issues.

Which solution will meet this requirement?

Options:

Use Step Functions Map states to run agent workflows in parallel. Pass updated secret metadata through Lambda function outputs. Use AWS AppConfig to version all prompt flows to gate and roll back faulty templates.

Use Amazon Bedrock Agents only. Configure Amazon Bedrock guardrails to restrict prompt variation. Use an inline JSON schema for a single agent’s workflow definition to chain tool calls.

Use a centralized Amazon EventBridge pipeline to invoke each agent. Store intermediate prompts in Amazon DynamoDB. Resolve agent ordering by using TTL-based backoff and retries.

Use Amazon EventBridge Pipes to invoke resolvers based on Amazon CloudWatch log patterns. Store response metadata in DynamoDB with TTL and versioned writes. Use Amazon Q Developer to dynamically generate fallback prompts.

Buy Now

Questions 23

A healthcare company is using Amazon Bedrock to build a Retrieval Augmented Generation (RAG) application that helps practitioners make clinical decisions. The application must achieve high accuracy for patient information retrievals, identify hallucinations in generated content, and reduce human review costs.

Which solution will meet these requirements?

Options:

Use Amazon Comprehend to analyze and classify RAG responses and to extract medical entities and relationships. Use AWS Step Functions to orchestrate automated evaluations. Configure Amazon CloudWatch metrics to track entity recognition confidence scores. Configure CloudWatch to send an alert when accuracy falls below specified thresholds.

Implement automated large language model (LLM)-based evaluations that use a specialized model that is fine-tuned for medical content to assess all responses. Deploy AWS Lambda functions to parallelize evaluations. Publish results to Amazon CloudWatch metrics that track relevance and factual accuracy.

Configure Amazon CloudWatch Synthetics to generate test queries that have known answers on a regular schedule, and track model success rates. Set up dashboards that compare synthetic test results against expected outcomes.

Deploy a hybrid evaluation system that uses an automated LLM-as-a-judge evaluation to initially screen responses and targeted human reviews for edge cases. Use a built-in Amazon Bedrock evaluation to track retrieval precision and hallucination rates.

Buy Now

Questions 24

A book publishing company wants to build a book recommendation system that uses an AI assistant. The AI assistant will use ML to generate a list of recommended books from the company's book catalog. The system must suggest books based on conversations with customers.

The company stores the text of the books, customers' and editors' reviews of the books, and extracted book metadata in Amazon S3. The system must support low-latency responses and scale efficiently to handle more than 10,000 concurrent users.

Which solution will meet these requirements?

Options:

Use Amazon Bedrock Knowledge Bases to generate embeddings. Store the embeddings as a vector store in Amazon OpenSearch Service. Create an AWS Lambda function that queries the knowledge base. Configure Amazon API Gateway to invoke the Lambda function when handling user requests.

Use Amazon Bedrock Knowledge Bases to generate embeddings. Store the embeddings as a vector store in Amazon DynamoDB. Create an AWS Lambda function that queries the knowledge base. Configure Amazon API Gateway to invoke the Lambda function when handling user requests.

Use Amazon SageMaker AI to deploy a pre-trained model to build a personalized recommendation engine for books. Deploy the model as a SageMaker AI endpoint. Invoke the model endpoint by using Amazon API Gateway.

Create an Amazon Kendra GenAI Enterprise Edition index that uses the S3 connector to index the book catalog data stored in Amazon S3. Configure built-in FAQ in the Kendra index. Develop an AWS Lambda function that queries the Kendra index based on user conversations. Deploy Amazon API Gateway to expose this functionality and invoke the Lambda function.

Buy Now

Answer:

Explanation:

Option A best meets the requirements because it directly implements a Retrieval Augmented Generation pattern for conversational recommendations using managed Amazon Bedrock capabilities and a scalable vector store. The company’s source data already resides in Amazon S3, which aligns naturally with Amazon Bedrock Knowledge Bases ingestion workflows. A knowledge base can ingest book text, reviews, and metadata, generate embeddings using a supported embedding model, and persist those vectors in a purpose-built vector backend such as Amazon OpenSearch Service. This enables semantic retrieval that is well suited to conversation-driven intent, where user prompts are often descriptive and do not map cleanly to keyword filters.

The requirement to suggest books based on conversations implies the system must interpret natural language context and retrieve relevant passages, reviews, and metadata to ground the recommendation. Knowledge Bases provide managed orchestration for embedding creation and retrieval, which reduces development effort compared to building custom embedding pipelines. OpenSearch Service provides scalable vector search and k-nearest neighbors style similarity retrieval, which supports low-latency responses when properly indexed and sized.

For scaling to more than 10,000 concurrent users, the API layer design in option A is a common AWS pattern: Amazon API Gateway provides a managed front door with throttling and request handling, while AWS Lambda scales horizontally with demand and can invoke the knowledge base retrieval operations. This separates compute scaling from the vector store scaling and helps keep latency predictable under load.

Option B is not the best choice because DynamoDB is not the standard native vector store target for Amazon Bedrock Knowledge Bases in this context and would introduce additional implementation complexity around vector indexing and similarity search behavior. Option C requires substantial ML lifecycle work, model hosting, tuning, and continuous iteration to achieve quality recommendations at scale. Option D provides strong enterprise search, but it focuses on retrieval and FAQs rather than a managed RAG recommendation workflow grounded in embeddings and conversational context for generative responses.

Questions 25

A specialty coffee company has a mobile app that generates personalized coffee roast profiles by using Amazon Bedrock with a three-stage prompt chain. The prompt chain converts user inputs into structured metadata, retrieves relevant logs for coffee roasts, and generates a personalized roast recommendation for each customer.

Users in multiple AWS Regions report inconsistent roast recommendations for identical inputs, slow inference during the retrieval step, and unsafe recommendations such as brewing at excessively high temperatures. The company must improve the stability of outputs for repeated inputs. The company must also improve app performance and the safety of the app's outputs. The updated solution must ensure 99.5% output consistency for identical inputs and achieve inference latency of less than 1 second. The solution must also block unsafe or hallucinated recommendations by using validated safety controls.

Which solution will meet these requirements?

Options:

Deploy Amazon Bedrock with provisioned throughput to stabilize inference latency. Apply Amazon Bedrock guardrails that have semantic denial rules to block unsafe outputs. Use Amazon Bedrock Prompt Management to manage prompts by using approval workflows.

Use Amazon Bedrock Agents to manage chaining. Log model inputs and outputs to Amazon CloudWatch Logs. Use logs from Amazon CloudWatch to perform A/B testing for prompt versions.

Cache prompt results in Amazon ElastiCache. Use AWS Lambda functions to pre-process metadata and to trace end-to-end latency. Use AWS X-Ray to identify and remediate performance bottlenecks.

Use Amazon Kendra to improve roast log retrieval accuracy. Store normalized prompt metadata within Amazon DynamoDB. Use AWS Step Functions to orchestrate multi-step prompts.

Buy Now

Answer:

Explanation:

Option A best meets the combined requirements of low latency, stability, and validated safety controls by using purpose-built Amazon Bedrock features designed for production GenAI operations. The company’s latency target of under 1 second and its observation of degradation during spikes strongly indicate capacity and throughput variability. Provisioned throughput for Amazon Bedrock is intended to deliver more predictable performance by reserving inference capacity for a chosen model, reducing throttling risk and stabilizing response times under load. This directly improves operational consistency across Regions where on-demand capacity can vary.

The requirement to “block unsafe or hallucinated recommendations” is most directly addressed by Amazon Bedrock Guardrails. Guardrails provide managed safety enforcement, including sensitive information controls and configurable content policies. Using semantic denial rules enables the application to prevent unsafe guidance such as dangerous brewing temperatures or other harmful procedural instructions, enforcing safety at the model boundary rather than relying on downstream filtering.

The remaining requirement is “99.5% output consistency for identical inputs.” While generative models can be probabilistic, production systems achieve practical consistency by controlling prompt versions, inputs, and policy behavior. Amazon Bedrock Prompt Management supports controlled prompt lifecycle practices, including versioning and approval workflows, which reduce unintended drift across deployments and Regions. By ensuring the same approved prompt templates and parameters are used consistently, the company can materially improve repeatability for the same structured inputs and retrieval context, which is essential in multi-stage prompt chains.

The other options are incomplete. B improves experimentation and observability but does not enforce safety controls or stabilize latency. C can improve performance, but it does not provide validated safety enforcement at inference time. D can help retrieval relevance, but it does not address unsafe outputs or inference stability. Therefore, A is the only option that simultaneously targets predictable latency, governance of prompt behavior, and strong safety controls within Amazon Bedrock.

Questions 26

A company uses Amazon Bedrock to build a Retrieval Augmented Generation (RAG) system. The RAG system uses an Amazon Bedrock Knowledge Bases that is based on an Amazon S3 bucket as the data source for emergency news video content. The system retrieves transcripts, archived reports, and related documents from the S3 bucket.

The RAG system uses state-of-the-art embedding models and a high-performing retrieval setup. However, users report slow responses and irrelevant results, which cause decreased user satisfaction. The company notices that vector searches are evaluating too many documents across too many content types and over long periods of time.

The company determines that the underlying models will not benefit from additional fine-tuning. The company must improve retrieval accuracy by applying smarter constraints and wants a solution that requires minimal changes to the existing architecture.

Which solution will meet these requirements?

Options:

Enhance embeddings by using a domain-adapted model that is specifically trained on emergency news content for improved vector similarity.

Migrate to Amazon OpenSearch Service. Use vector fields and metadata filters to define the scope of results retrieval.

Enable metadata-aware filtering within the Amazon Bedrock knowledge base by indexing S3 object metadata.

Migrate to an Amazon Q Business index to perform structured metadata filtering and document categorization during retrieval.

Buy Now

Questions 27

A healthcare company is using Amazon Bedrock to develop a real-time patient care AI assistant to respond to queries for separate departments that handle clinical inquiries, insurance verification, appointment scheduling, and insurance claims. The company wants to use a multi-agent architecture.

The company must ensure that the AI assistant is scalable and can onboard new features for patients. The AI assistant must be able to handle thousands of parallel patient interactions. The company must ensure that patients receive appropriate domain-specific responses to queries.

Which solution will meet these requirements?

Options:

Isolate data for each agent by using separate knowledge bases. Use IAM filtering to control access to each knowledge base. Deploy a supervisor agent to perform natural language intent classification on patient inquiries. Configure the supervisor agent to route queries to specialized collaborator agents to respond to department-specific queries. Configure each specialized collaborator agent to use Retrieval Augmented Generation (RAG) with th

Create a separate supervisor agent for each department. Configure individual collaborator agents to perform natural language intent classification for each specialty domain within each department. Integrate each collaborator agent with department-specific knowledge bases only. Implement manual handoff processes between the supervisor agents.

Isolate data for each department in separate knowledge bases. Use IAM filtering to control access to each knowledge base. Deploy a single general-purpose agent. Configure multiple action groups within the general-purpose agent to perform specific department functions. Implement rule-based routing logic in the general-purpose agent instructions.

Implement multiple independent supervisor agents that run in parallel to respond to patient inquiries for each department. Configure multiple collaborator agents for each supervisor agent. Integrate all agents with the same knowledge base. Use external routing logic to merge responses from multiple supervisor agents.

Buy Now

Answer:

Explanation:

Option A best meets the requirements because it applies an AWS-aligned multi-agent pattern that cleanly separates responsibilities: a supervisor agent performs intent classification and orchestration, while specialized collaborator agents handle domain-specific tasks using the right knowledge sources. This structure is well suited for healthcare workflows where clinical questions, scheduling, and insurance processes require different policies, terminology, and data access boundaries.

The requirement for appropriate domain-specific responses is addressed by routing each user query to a department-focused collaborator agent that is grounded with its own department-specific knowledge base. Using Retrieval Augmented Generation with the correct knowledge base improves factual alignment and reduces cross-department leakage (for example, avoiding claims content in a clinical answer). It also supports better prompt grounding and more consistent tone and constraints per department.

The requirement to isolate data maps to using separate knowledge bases per agent and enforcing access through IAM controls, ensuring that each agent can retrieve only from the authorized datasets. This is important for minimizing unintended exposure of sensitive or irrelevant departmental data and supports governance and compliance needs.

For scalability and thousands of parallel interactions, this architecture minimizes contention and bottlenecks. Each collaborator agent can scale independently because requests are distributed across multiple agents and multiple retrieval backends. Operationally, onboarding new features is also simpler: the company can add a new collaborator agent (for example, “billing disputes” or “pharmacy refills”) with its own knowledge base and policies without redesigning the entire assistant.

Option B introduces unnecessary complexity with multiple supervisors and manual handoffs. Option C overloads a single agent with broad instructions and rule-based routing, which increases prompt complexity and reduces maintainability as features grow. Option D creates high operational complexity and risks inconsistent outputs when merging responses from parallel supervisors, and it weakens data isolation by using a shared knowledge base across agents.

Questions 28

A company is developing a customer support application that uses Amazon Bedrock foundation models (FMs) to provide real-time AI assistance to the company’s employees. The application must display AI-generated responses character by character as the responses are generated. The application needs to support thousands of concurrent users with minimal latency. The responses typically take 15 to 45 seconds to finish.

Which solution will meet these requirements?

Options:

Configure an Amazon API Gateway WebSocket API with an AWS Lambda integration. Configure the WebSocket API to invoke the Amazon Bedrock InvokeModelWithResponseStream API and stream partial responses through WebSocket connections.

Configure an Amazon API Gateway REST API with an AWS Lambda integration. Configure the REST API to invoke the Amazon Bedrock standard InvokeModel API and implement frontend client-side polling every 100 ms for complete response chunks.

Implement direct frontend client connections to Amazon Bedrock by using IAM user credentials and the InvokeModelWithResponseStream API without any intermediate gateway or proxy layer.

Configure an Amazon API Gateway HTTP API with an AWS Lambda integration. Configure the HTTP API to cache complete responses in an Amazon DynamoDB table and serve the responses through multiple paginated GET requests to frontend clients.

Buy Now

Questions 29

A bank is building a generative AI (GenAI) application that uses Amazon Bedrock to assess loan applications by using scanned financial documents. The application must extract structured data from the documents. The application must redact personally identifiable information (PII) before inference. The application must use foundation models (FMs) to generate approvals. The application must route low-confidence document extraction results to human reviewers who are within the same AWS Region as the loan applicant.

The company must ensure that the application complies with strict Regional data residency and auditability requirements. The application must be able to scale to handle 25,000 applications each day and provide 99.9% availability.

Which combination of solutions will meet these requirements? (Select THREE.)

Options:

Deploy Amazon Textract and Amazon Augmented AI within the same Region to extract relevant data from the scanned documents. Route low-confidence pages to human reviewers.

Use AWS Lambda functions to detect and redact PII from submitted documents before inference. Apply Amazon Bedrock guardrails to prevent inappropriate or unauthorized content in model outputs. Configure Region-specific IAM roles to enforce data residency requirements and to control access to the extracted data.

Use Amazon Kendra and Amazon OpenSearch Service to extract field-level values semantically from the uploaded documents before inference.

Store uploaded documents in Amazon S3 and apply object metadata. Configure IAM policies to store original documents within the same Region as each applicant. Enable object tagging for future audits.

Use AWS Glue Data Quality to validate the structured document data. Use AWS Step Functions to orchestrate a review workflow that includes a prompt engineering step that transforms validated data into optimized prompts before invoking Amazon Bedrock to assess loan applications.

Use Amazon SageMaker Clarify to generate fairness and bias reports based on model scoring decisions that Amazon Bedrock makes.

Buy Now

Answer:

A, B, D

Explanation:

The correct combination is A, B, and D because these three options collectively satisfy the mandatory requirements for structured extraction, PII redaction before inference, regional human review, data residency, auditability, and high-scale availability with managed AWS services.

Option A is essential because Amazon Textract is the AWS-managed service designed to extract structured data from scanned documents such as forms, tables, and financial statements. Textract provides confidence scores, and Amazon Augmented AI (A2I) is purpose-built to route low-confidence extractions to human reviewers. Deploying Textract and A2I within the same Region ensures that the human review loop remains regionally constrained, meeting strict data residency requirements for applicants.

Option B satisfies the requirement to redact PII before inference by using AWS Lambda preprocessing. It also adds Amazon Bedrock guardrails to enforce safety controls on model outputs. Region-specific IAM roles ensure that only authorized principals in the correct Region can access the extracted data and invoke downstream services, strengthening residency enforcement and auditability.

Option D ensures that source documents are stored in Amazon S3 in the same Region as the applicant. Object metadata and tagging provide an auditable trail, supporting compliance reporting and traceability. S3 also provides the durability and availability needed to support 99.9% application availability as part of a well-architected pipeline.

Option C is not the correct approach for structured extraction from scans. Option E adds useful quality validation but is not strictly required to meet the stated requirements compared to A, B, and D. Option F is unrelated to the extraction/redaction/residency workflow requirements.

Therefore, A, B, and D are the best three choices to meet all stated requirements with minimal operational overhead.

Questions 30

A company is building a generative AI (GenAI) application that produces content based on a variety of internal and external data sources. The company wants to ensure that the generated output is fully traceable. The application must support data source registration and enable metadata tagging to attribute content to its original source. The application must also maintain audit logs of data access and usage throughout the pipeline.

Which solution will meet these requirements?

Options:

Use AWS Lake Formation to catalog data sources and control access. Apply metadata tags directly in Amazon S3. Use AWS CloudTrail to monitor API activity.

Use AWS Glue Data Catalog to register and tag data sources. Use Amazon CloudWatch Logs to monitor access patterns and application behavior.

Store data in Amazon S3 and use object tagging for attribution. Use AWS Glue Data Catalog to manage schema information. Use AWS CloudTrail to log access to S3 buckets.

Use AWS Glue Data Catalog to register all data sources. Apply metadata tags to attribute data sources. Use AWS CloudTrail to log access and activity across services.

Buy Now

Questions 31

A financial services company is developing a generative AI (GenAI) application that serves both premium customers and standard customers. The application uses AWS Lambda functions behind an Amazon API Gateway REST API to process requests. The company needs to dynamically switch between AI models based on which customer tier each user belongs to. The company also wants to perform A/B testing for new features without redeploying code. The company needs to validate model parameters like temperature and maximum token limits before applying changes.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Create AWS Systems Manager Parameter Store parameters for each configuration. Use Lambda functions to poll for parameter updates. Use Amazon EventBridge events to trigger redeployments when configurations change.

Store model configurations in Amazon DynamoDB tables. Optimize access patterns to retrieve configurations according to customer tier. Configure Lambda functions to query DynamoDB at the beginning of each request to determine which model to use.

Use AWS AppConfig to manage model configurations. Use feature flags to perform A/B testing. Define JSON schema validation rules for model parameters. Configure Lambda functions to retrieve configurations by using the AWS AppConfig Agent.

Create an Amazon ElastiCache (Redis OSS) cluster to store model configurations. Set short TTL values. Run custom validation logic in Lambda functions. Use Amazon CloudWatch metrics to monitor configuration usage.

Buy Now

Questions 32

A retail company is using Amazon Bedrock to develop a customer service AI assistant. Analysis shows that 70% of customer inquiries are simple product questions that a smaller model can effectively handle. However, 30% of inquiries are complex return policy questions that require advanced reasoning.

The company wants to implement a cost-effective model selection framework to automatically route customer inquiries to appropriate models based on inquiry complexity. The framework must maintain high customer satisfaction and minimize response latency.

Which solution will meet these requirements with the LEAST implementation effort?

Options:

Create a multi-stage architecture that uses a small foundation model (FM) to classify the complexity of each inquiry. Route simple inquiries to a smaller, more cost-effective model. Route complex inquiries to a larger, more capable model. Use AWS Lambda functions to handle routing logic.

Use Amazon Bedrock intelligent prompt routing to automatically analyze inquiries. Route simple product inquiries to smaller models and route complex return policy inquiries to more capable larger models.

Implement a single-model solution that uses an Amazon Bedrock mid-sized foundation model (FM) with on-demand pricing. Include special instructions in model prompts to handle both simple and complex inquiries by using the same model.

Create separate Amazon Bedrock endpoints for simple and complex inquiries. Implement a rule-based routing system based on keyword detection. Use on-demand pricing for the smaller model and provisioned throughput for the larger model.

Buy Now

AWS Certified Professional |

Exam Code: AIP-C01

Exam Name: AWS Certified Generative AI Developer - Professional

Last Update: Feb 8, 2026

Questions: 107

AIP-C01 PDF

$25.5 ~~$84.99~~

Add to Cart

AIP-C01 Testing Engine

$30 ~~$99.99~~

Add to Cart

AIP-C01 PDF + Testing Engine

$40.5 ~~$134.99~~

Add to Cart

Weekend Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: cramtick70

cramtick logo

Navigation:

Hot Vendors:

AIP-C01 AWS Certified Generative AI Developer - Professional Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation: