Skip to content

Latest commit

 

History

History
444 lines (335 loc) · 42.4 KB

File metadata and controls

444 lines (335 loc) · 42.4 KB

aws-qa-appsync-opensearch


Stability: Experimental

All classes are under active development and subject to non-backward compatible changes or removal in any future version. These are not subject to the Semantic Versioning model. This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.


Language Package
Typescript Logo TypeScript @cdklabs/generative-ai-cdk-constructs
Python Logo Python cdklabs.generative_ai_cdk_constructs

Table of contents

Overview

This construct provides a question answering workflow (RAG + long context window) using Amazon Bedrock and a provisioned Amazon OpenSearch cluster. Additionally, the construct leverages Anthropic's Claude-3 Sonnet model through Amazon Bedrock to allow visual question answering capabilities.

PDF Q&A

  • If a pdf document is provided as an input to the AppSync query, the AWS Lambda function will first verify the length of the document. If the document size is above the max number of tokens for the selected model, the Lambda will query the knowledge base (similarity search) and filter by document name. This assumes that the chunks of texts stored in the knowledge base have the document name as metadata. Otherwise, the content of the document is provided to the LLM as part of the context.
  • If no document is provided as input, the Lambda will perform a similarity search against the entire knowledge base.

Image Q&A

  • Utilizing AppSync queries, images can be provided as inputs to invoke AWS Lambda functions that leverage Anthropic's Claude-3-sonnet-20240229-v1:0 model through Amazon Bedrock. This enables visual question answering capabilities powered by Anthropic's natural language processing technology. The Lambda functions also integrate with an Amazon SageMaker-deployed Idefics model from Hugging Face as another option for visual question answering functionality. For details on deploying the Idefics model from Hugging Face to SageMaker, refer to the "AWS Model Deployment on SageMaker" guide using Hugging Face models: aws-model-deployment-sagemake.

The construct uses Amazon Bedrock as the large language model provider.

  • amazon.titan-embed-text-v1 is used as the embeddings model for text.
  • amazon.titan-embed-image-v1 is used as the embeddings model for images.
  • anthropic.claude-v2:1 is used for question answering on pdfs.
  • anthropic.claude-3-sonnet-20240229-v1 is used for question answering on images.

Baseds on your solution please make sure the above models are enabled in your account. Please follow the Amazon Bedrock User Guide for steps related to enabling model access.

The input document must be stored in the input Amazon Simple Storage Service bucket in text format (.txt). Another construct is available to ingest and process files to text format and store them in a knowledge base: aws-rag-appsync-stepfn-opensearch.

This construct builds a Lambda function from a Docker image, thus you need to have Docker desktop running on your machine.

Here is a minimal deployable pattern definition:

TypeScript

import { Construct } from 'constructs';
import { Stack, StackProps, Aws } from 'aws-cdk-lib';
import * as os from 'aws-cdk-lib/aws-opensearchservice';
import * as cognito from 'aws-cdk-lib/aws-cognito';
import { QaAppsyncOpensearch, QaAppsyncOpensearchProps } from '@cdklabs/generative-ai-cdk-constructs';

// get an existing OpenSearch provisioned cluster
const osDomain = os.Domain.fromDomainAttributes(this, 'osdomain', {
    domainArn: 'arn:' + Aws.PARTITION + ':es:us-east-1:XXXXXX',
    domainEndpoint: 'https://XXXXX.us-east-1.es.amazonaws.com'
});

// get an existing userpool 
const cognitoPoolId = 'us-east-1_XXXXX';
const userPoolLoaded = cognito.UserPool.fromUserPoolId(this, 'myuserpool', cognitoPoolId);

const rag_source = new QaAppsyncOpensearch(
      this,
      'QaAppsyncOpensearch',
      {
        existingOpensearchDomain: osDomain,
        openSearchIndexName: 'demoindex',
        cognitoUserPool: userPoolLoaded
      }
    )

Python

from constructs import Construct
from aws_cdk import (
    aws_opensearchservice as os,
    aws_cognito as cognito,
)
from cdklabs.generative_ai_cdk_constructs import QaAppsyncOpensearch

# get an existing OpenSearch provisioned cluster
os_domain = os.Domain.from_domain_attributes(
    self, 
    'osdomain',
    domain_arn='arn:aws:es:us-east-1:XXXXXX:resource-id',
    domain_endpoint='https://XXXXX.us-east-1.es.amazonaws.com',
)

# get an existing userpool 
cognito_pool_id = 'us-east-1_XXXXX'
user_pool_loaded = cognito.UserPool.from_user_pool_id(
    self,
    'myuserpool',
    user_pool_id=cognito_pool_id,
)

rag_source = QaAppsyncOpensearch(
    self,
    'QaAppsyncOpensearch',
    existing_opensearch_domain=os_domain,
    open_search_index_name='demoindex',
    cognito_user_pool=user_pool_loaded,
)

After deploying the CDK stack, the QA process can be invoked using GraphQL APIs. The API Schema details are present here: resources/gen-ai/aws-qa-appsync-opensearch/schema.graphql.

The code below provides an example of a mutation call and associated subscription to trigger a question and get response notifications. The subscription call will wait for mutation requests to send the notifications.

Subscription call to get notifications about the question answering process:

subscription MySubscription {
  updateQAJobStatus(jobid: "123") {
    sources
    question
    answer
    jobstatus
  }
}
____________________________________________________________________
Expected response:

{
  "data": {
    "updateQAJobStatus": {
      "sources": [
        ""
      ],
      "question": "<base 64 encoded question>",
      "answer": "<base 64 encoded answer>",
      "jobstatus": "Succeed"
    }
  }
}

Where:

  • jobid: id which can be used to filter subscriptions on client side
  • answer: response to the question from the large language model as a base64 encoded string
  • sources: sources from the knowledge base used as context to answer the question
  • jobstatus: status update of the question answering process for the file specified

Mutation call to trigger the question:

  postQuestion(filename: "",
    embeddings_model: 
    {
      modality: "Text",
      modelId: "amazon.titan-embed-text-v1",
      provider: "Bedrock",
      streaming: true
    },
    filename:"projen_cdk_blog.txt"
    jobid: "123",
    jobstatus: "", 
    qa_model: 
      {
      provider: "Bedrock",
      modality: "Text",
      modelId: "anthropic.claude-v2:1", 
      streaming: true,
      model_kwargs: "{\"temperature\":0.5,\"top_p\":0.9,\"max_tokens_to_sample\":250}"
    },
    question:"d2hhdCBpcyBwcm9qZW4/",
    responseGenerationMethod: RAG
    ,
    retrieval:{
      max_docs:10
    },
    verbose:false
  
  ) {
    jobid
    question
    verbose
    filename
    answer
    jobstatus
    responseGenerationMethod
  }

____________________________________________________________________
Expected response:

{
  "data": {
    "postQuestion": {
      "jobid": null,
      "question": null,
      "verbose": null,
      "filename": null,
      "answer": null,
      "jobstatus": null,
      "responseGenerationMethod": null
    }
  }
}

Where:

  • jobid: id which can be used to filter subscriptions on client side
  • jobstatus: this field will be used by the subscription to update the status of the question answering process for the file specified
  • qa_model.modality/embeddings_model.modality: Applicable values Text or Image
  • qa_model.modelId/embeddings_model.modelId: Model to process Q&A. example - anthropic.claude-v2:1,Claude-3-sonnet-20240229-v1:0
  • retrieval.max_docs: maximum number of documents (chunks) retrieved from the knowledge base if the Retrieveal Augmented Generation (RAG) approach is used
  • question: question to ask as a base64 encoded string
  • verbose: boolean indicating if the LangChain chain call verbosity should be enabled or not
  • streaming: boolean indicating if the streaming capability of Bedrock is used. If set to true, tokens will be send back to the subscriber as they are generated. If set to false, the entire response will be sent back to the subscriber once generated.
  • filename: optional. Name of the file stored in the input S3 bucket, in txt format.
  • responseGenerationMethod: optional. Method used to generate the response. Can be either RAG or LONG_CONTEXT. If not provided, the default value is LONG_CONTEXT.

Initializer

new QaAppsyncOpensearch(scope: Construct, id: string, props: QaAppsyncOpensearchProps)

Parameters

  • scope Construct
  • id string
  • props QaAppsyncOpensearchProps

Pattern Construct Props

Note: One of either existingOpensearchDomain or existingOpensearchServerlessCollection must be specified, but not both.

Name Type Required Description
existingOpensearchDomain opensearchservice.IDomain Optional Existing domain for the OpenSearch Service.Mutually exclusive with existingOpensearchServerlessCollection - only one should be specified.
existingOpensearchServerlessCollection openSearchServerless.CfnCollection Optional Existing Amazon Amazon OpenSearch Serverless collection.Mutually exclusive with existingOpensearchDomain - only one should be specified.
openSearchIndexName string Required Index name for the Amazon OpenSearch Service. If doesn't exist, the pattern will create the index in the cluster.
cognitoUserPool cognito.IUserPool Required Cognito user pool used for authentication.
openSearchSecret secret.ISecret Optional Optional. Secret containing credentials to authenticate to the existing Amazon OpenSearch domain if fine grain control access is configured. If not provided, the Lambda function will useAWS Signature Version 4.
vpcProps ec2.VpcProps Optional Custom properties for a VPC the construct will create. This VPC will be used by the Lambda functions the construct creates. Providing both this and existingVpc will result in an error..
existingVpc ec2.IVpc Optional An existing VPC to deploy the construct. Providing both this and vpcProps will result in an error..
existingSecurityGroup ec2.ISecurityGroup Optional Existing security group allowing access to OpenSearch. Used by the Lambda functions built by this construct. If not provided, the construct will create one.
existingBusInterface events.IEventBus Optional Existing instance of an Amazon EventBridge bus. If not provided, the construct will create one.
existingInputAssetsBucketObj s3.IBucket Optional Existing instance of S3 Bucket object, providing both this andbucketInputsAssetsProps will result in an error.
bucketInputsAssetsProps s3.BucketProps Optional User provided props to override the default props for the S3 Bucket. Providing both this andexistingInputAssetsBucketObj will cause an error.
stage string Optional Value will be appended to resources name Service.
existingMergedApi appsync.CfnGraphQLApi Optional Existing Merged API instance. The Merged API provides a federated schema over source API schemas.
observability boolean Optional Enables observability on all services used. Warning: associated cost with the services used. Best practice to enable by default. Defaults to true.
lambdaProvisionedConcurrency number Optional Allows a user to configure Lambda provisioned concurrency for consistent performance
customDockerLambdaProps DockerLambdaCustomProps Optional Allows to provide question answering custom lambda code and settings instead of the default construct implementation.

Pattern Properties

Name Type Description
vpc ec2.IVpc The VPC used by the construct (whether created by the construct or provided by the client)
securityGroup ec2.ISecurityGroup The security group used by the construct (whether created by the construct or provided by the client)
qaBus events.IEventBus The event bus used by the construct (whether created by the construct or provided by the client)
s3InputAssetsBucketInterface s3.IBucket Returns an instance of s3.IBucket created by the construct
s3InputAssetsBucket s3.Bucket Returns an instance of s3.Bucket created by the construct. IMPORTANT: If existingInputAssetsBucketObj was provided in Pattern Construct Props, this property will be undefined
graphqlApi appsync.IGraphqlApi Returns an instance of appsync.IGraphqlApi created by the construct
qaLambdaFunction lambda.DockerImageFunction Returns an instance of lambda.DockerImageFunction used for the question answering job created by the construct

Supported models

Question answering

Provider Model id Modalities Streaming Notes
Bedrock anthropic.claude-v2 Text
Bedrock anthropic.claude-v2:1 Text Default model is none selected
Bedrock anthropic.claude-3-haiku-20240307-v1:0 Text, Image
Bedrock anthropic.claude-3-sonnet-20240229-v1:0 Text, Image
Bedrock anthropic.claude-instant-v1 Text
Bedrock amazon.titan-text-lite-v1 Text
Bedrock amazon.titan-text-express-v1 Text
SageMaker idefics Text, Image The model is not deployed as part of the construct and requires to be provisioned separately

Embeddings

Provider Model id Modalities Notes
Bedrock amazon.titan-embed-image-v1 Text
Bedrock amazon.titan-embed-text-v1 Text Default model is none selected

Default properties

Out-of-the-box implementation of the construct without any override will set the following defaults:

Authentication

  • Primary authentication method for the AppSync GraphQL API is Amazon Cognito User Pool.
  • Secondary authentication method for the AppSync GraphQL API is IAM role.

Networking

  • Set up a VPC
    • Uses existing VPC if provided, otherwise creates a new one
  • Set up a security group used by the AWS Lambda functions
    • Uses existing security group, otherwise creates a new one

Amazon S3 Bucket

  • Uses existing S3 bucket if provided, otherwise creates a new one

Observability

By default the construct will enable logging and tracing on all services which support those features. Observability can be turned off by setting the pattern property observability to false.

  • AWS Lambda: AWS X-Ray, Amazon CloudWatch Logs
  • AWS AppSync GraphQL API: AWS X-Ray, Amazon CloudWatch Logs

Troubleshooting

Error Code Message Description Fix
Failed to load information about the requested file This error happens when the Lambda function was not able to load metadata about the file provided as input parameter Ensure the file is present in the input bucket
Working on the question The Lambda function started the question processing Not an error, informational only
Exception during prediction An issue happened during the prediction process (call to the large language model via Amazon Bedrock) Verify the Lambda CloudWatch Logs to get access to the related error. One common issue is throttling.
Done The process ended successfully Not an error, informational only
Failed to load document content This error happens when the Lambda function was not able to load the content of the file provided as input parameter Ensure the file is present in the input bucket
Failed to load the LLM Internal error related to loading the large language model client Check the Lambda error logs to get a detailed description of the issue

Architecture

Architecture Diagram

Cost

You are responsible for the cost of the AWS services used while running this construct. As of this revision, the cost for running this construct with the default settings in the US East (N. Virginia) Region is approximately $58.60 per month.

We recommend creating a budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this solution.

The following table provides a sample cost breakdown for deploying this solution with the default parameters in the US East (N. Virginia) Region for one month.

PDF Q&A

AWS Service Dimensions Cost [USD]
Amazon Virtual Private Cloud 0.00
AWS AppSync 15 requests per hour to trigger questions + (15 x 4 calls to notify clients through subscriptions) = 54,000 requests per month 0.22
Amazon EventBridge 15 requests per hour = 10800 custom events per month 0.01
AWS Lambda 15 q/a requests per hour through 1 Lambda function with 7076 MB of memory allocated and 512 MB of ephemeral storage allocated and an average run time of 30 seconds = 10800 requests per month 30.65
Amazon Simple Storage Service 15 transformed files to text format added every hour with an average size of 1 MB = 21.6 GB per month in S3 Standard Storage 0.50
Amazon Bedrock Prompt template is 1,500 characters (~400 tokens), OpenSearch returns 200 tokens per excerpt and only uses top 5 documents (~1000 tokens), User inputs average 1 sentence long (~20 tokens), LLM outputs average 8 sentences (~160 tokens). Using those assumptions: Input Tokens = promptTemplate + context + query -> Input tokens = 1,900 and Output tokens = 160. Using Anthropic Claude V2.1 for question answering and Amazon Titan for embeddings, with 360 (15x24h) transactions a day, daily cost is 2K tokens/1000*$0.01102 + 1K tokens/1000 * $0.03268 = $0.05472* 360 = 19.70 19.70
Amazon CloudWatch 15 metrics using 5 GB data ingested for logs 7.02
AWS X-Ray 100,000 requests per month through AppSync and Lambda calls 0.50
Total monthly cost 58.60

The resources not created by this construct (Amazon Cognito User Pool, Amazon OpenSearch provisioned cluster, AppSync Merged API, AWS Secrets Manager secret) do not appear in the table above. You can refer to the decicated pages to get an estimate of the cost related to those services:

Note You can share the Amazon OpenSearch provisioned cluster between use cases, but this can drive up the number of queries per index and additional charges will apply.

Security

When you build systems on AWS infrastructure, security responsibilities are shared between you and AWS. This shared responsibility model reduces your operational burden because AWS operates, manages, and controls the components including the host operating system, virtualization layer, and physical security of the facilities in which the services operate. For more information about AWS security, visit AWS Cloud Security.

This construct requires you to provide an existing Amazon Cognito User Pool and a provisioned Amazon OpenSearch cluster. Please refer to the official documentation on best practices to secure those services:

Optionnaly, you can provide existing resources to the constructs (marked optional in the construct pattern props). If you chose to do so, please refer to the official documentation on best practices to secure each service:

If you grant access to a user to your account where this construct is deployed, this user may access information stored by the construct (Amazon Simple Storage Service bucket, Amazon OpenSearch cluster, Amazon CloudWatch logs). To help secure your AWS resources, please follow the best practices for AWS Identity and Access Management (IAM).

AWS CloudTrail provides a number of security features to consider as you develop and implement your own security policies. Please follow the related best practices through the official documentation.

Note This construct requires you to provide documents in the input assets bucket. You should validate each file in the bucket before using this construct. See here for file input validation best practices. Ensure you only ingest the appropriate documents into your knowledge base. Any results returned by the knowledge base is eligible for inclusion into the prompt; and therefore, being sent to the LLM. If using a third-party LLM, ensure you audit the documents contained within your knowledge base. This construct provides several configurable options for logging. Please consider security best practices when enabling or disabling logging and related features. Verbose logging, for instance, may log content of API calls. You can disable this functionality by ensuring observability flag is set to false.

Supported AWS Regions

This solution optionally uses the Amazon Bedrock and Amazon OpenSearch Service, which is not currently available in all AWS Regions. You must launch this construct in an AWS Region where these services are available. For the most current availability of AWS services by Region, see the AWS Regional Services List.

Note You need to explicity enable access to models before they are available for use in Amazon Bedrock. Please follow the Amazon Bedrock User Guide for steps related to enabling model access.

Quotas

Service quotas, also referred to as limits, are the maximum number of service resources or operations for your AWS account.

Make sure you have sufficient quota for each of the services implemented in this solution. For more information, refer to AWS service quotas.

To view the service quotas for all AWS services in the documentation without switching pages, view the information in the Service endpoints and quotas page in the PDF instead.

Clean up

When deleting your stack which uses this construct, do not forget to go over the following instructions to avoid unexpected charges:

  • empty and delete the Amazon Simple Storage Bucket created by this construct if you didn't provide an existing one during the construct creation
  • if the observability flag is turned on, delete all the associated logs created by the different services in Amazon CloudWatch logs

© Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.