Tag Archives: AWS

Amazon Nova Multimodal Embeddings: State-of-the-art embedding model for agentic RAG and semantic search

This post was originally published on this site

Today, we’re introducing Amazon Nova Multimodal Embeddings, a state-of-the-art multimodal embedding model for agentic retrieval-augmented generation (RAG) and semantic search applications, available in Amazon Bedrock. It is the first unified embedding model that supports text, documents, images, video, and audio through a single model to enable crossmodal retrieval with leading accuracy.

Embedding models convert textual, visual, and audio inputs into numerical representations called embeddings. These embeddings capture the meaning of the input in a way that AI systems can compare, search, and analyze, powering use cases such as semantic search and RAG.

Organizations are increasingly seeking solutions to unlock insights from the growing volume of unstructured data that is spread across text, image, document, video, and audio content. For example, an organization might have product images, brochures that contain infographics and text, and user-uploaded video clips. Embedding models are able to unlock value from unstructured data, however traditional models are typically specialized to handle one content type. This limitation drives customers to either build complex crossmodal embedding solutions or restrict themselves to use cases focused on a single content type. The problem also applies to mixed-modality content types such as documents with interleaved text and images or video with visual, audio, and textual elements where existing models struggle to capture crossmodal relationships effectively.

Nova Multimodal Embeddings supports a unified semantic space for text, documents, images, video, and audio for use cases such as crossmodal search across mixed-modality content, searching with a reference image, and retrieving visual documents.

Evaluating Amazon Nova Multimodal Embeddings performance
We evaluated the model on a broad range of benchmarks, and it delivers leading accuracy out-of-the-box as described in the following table.

Amazon Nova Embeddings benchmarks

Nova Multimodal Embeddings supports a context length of up to 8K tokens, text in up to 200 languages, and accepts inputs via synchronous and asynchronous APIs. Additionally, it supports segmentation (also known as “chunking”) to partition long-form text, video, or audio content into manageable segments, generating embeddings for each portion. Lastly, the model offers four output embedding dimensions, trained using Matryoshka Representation Learning (MRL) that enables low-latency end-to-end retrieval with minimal accuracy changes.

Let’s see how the new model can be used in practice.

Using Amazon Nova Multimodal Embeddings
Getting started with Nova Multimodal Embeddings follows the same pattern as other models in Amazon Bedrock. The model accepts text, documents, images, video, or audio as input and returns numerical embeddings that you can use for semantic search, similarity comparison, or RAG.

Here’s a practical example using the AWS SDK for Python (Boto3) that shows how to create embeddings from different content types and store them for later retrieval. For simplicity, I’ll use Amazon S3 Vectors, a cost-optimized storage with native support for storing and querying vectors at any scale, to store and search the embeddings.

Let’s start with the fundamentals: converting text into embeddings. This example shows how to transform a simple text description into a numerical representation that captures its semantic meaning. These embeddings can later be compared with embeddings from documents, images, videos, or audio to find related content.

To make the code easy to follow, I’ll show a section of the script at a time. The full script is included at the end of this walkthrough.

import json
import base64
import time
import boto3

MODEL_ID = "amazon.nova-2-multimodal-embeddings-v1:0"
EMBEDDING_DIMENSION = 3072

# Initialize Amazon Bedrock Runtime client
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

print(f"Generating text embedding with {MODEL_ID} ...")

# Text to embed
text = "Amazon Nova is a multimodal foundation model"

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "text": {"truncationMode": "END", "value": text},
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")

Now we’ll process visual content using the same embedding space using a photo.jpg file in the same folder as the script. This demonstrates the power of multimodality: Nova Multimodal Embeddings is able to capture both textual and visual context into a single embedding that provides enhanced understanding of the document.

Nova Multimodal Embeddings can generate embeddings that are optimized for how they are being used. When indexing for a search or retrieval use case, embeddingPurpose can be set to GENERIC_INDEX. For the query step, embeddingPurpose can be set depending on the type of item to be retrieved. For example, when retrieving documents, embeddingPurpose can be set to DOCUMENT_RETRIEVAL.

# Read and encode image
print(f"Generating image embedding with {MODEL_ID} ...")

with open("photo.jpg", "rb") as f:
    image_bytes = base64.b64encode(f.read()).decode("utf-8")

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "image": {
            "format": "jpeg",
            "source": {"bytes": image_bytes}
        },
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")

To process video content, I use the asynchronous API. That’s a requirement for videos that are larger than 25MB when encoded as Base64. First, I upload a local video to an S3 bucket in the same AWS Region.

aws s3 cp presentation.mp4 s3://my-video-bucket/videos/

This example shows how to extract embeddings from both visual and audio components of a video file. The segmentation feature breaks longer videos into manageable chunks, making it practical to search through hours of content efficiently.

# Initialize Amazon S3 client
s3 = boto3.client("s3", region_name="us-east-1")

print(f"Generating video embedding with {MODEL_ID} ...")

# Amazon S3 URIs
S3_VIDEO_URI = "s3://my-video-bucket/videos/presentation.mp4"
S3_EMBEDDING_DESTINATION_URI = "s3://my-embedding-destination-bucket/embeddings-output/"

# Create async embedding job for video with audio
model_input = {
    "taskType": "SEGMENTED_EMBEDDING",
    "segmentedEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "video": {
            "format": "mp4",
            "embeddingMode": "AUDIO_VIDEO_COMBINED",
            "source": {
                "s3Location": {"uri": S3_VIDEO_URI}
            },
            "segmentationConfig": {
                "durationSeconds": 15  # Segment into 15-second chunks
            },
        },
    },
}

response = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": S3_EMBEDDING_DESTINATION_URI
        }
    },
)

invocation_arn = response["invocationArn"]
print(f"Async job started: {invocation_arn}")

# Poll until job completes
print("nPolling for job completion...")
while True:
    job = bedrock_runtime.get_async_invoke(invocationArn=invocation_arn)
    status = job["status"]
    print(f"Status: {status}")

    if status != "InProgress":
        break
    time.sleep(15)

# Check if job completed successfully
if status == "Completed":
    output_s3_uri = job["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"]
    print(f"nSuccess! Embeddings at: {output_s3_uri}")

    # Parse S3 URI to get bucket and prefix
    s3_uri_parts = output_s3_uri[5:].split("/", 1)  # Remove "s3://" prefix
    bucket = s3_uri_parts[0]
    prefix = s3_uri_parts[1] if len(s3_uri_parts) > 1 else ""

    # AUDIO_VIDEO_COMBINED mode outputs to embedding-audio-video.jsonl
    # The output_s3_uri already includes the job ID, so just append the filename
    embeddings_key = f"{prefix}/embedding-audio-video.jsonl".lstrip("/")

    print(f"Reading embeddings from: s3://{bucket}/{embeddings_key}")

    # Read and parse JSONL file
    response = s3.get_object(Bucket=bucket, Key=embeddings_key)
    content = response['Body'].read().decode('utf-8')

    embeddings = []
    for line in content.strip().split('n'):
        if line:
            embeddings.append(json.loads(line))

    print(f"nFound {len(embeddings)} video segments:")
    for i, segment in enumerate(embeddings):
        print(f"  Segment {i}: {segment.get('startTime', 0):.1f}s - {segment.get('endTime', 0):.1f}s")
        print(f"    Embedding dimension: {len(segment.get('embedding', []))}")
else:
    print(f"nJob failed: {job.get('failureMessage', 'Unknown error')}")

With our embeddings generated, we need a place to store and search them efficiently. This example demonstrates setting up a vector store using Amazon S3 Vectors, which provides the infrastructure needed for similarity search at scale. Think of this as creating a searchable index where semantically similar content naturally clusters together. When adding an embedding to the index, I use the metadata to specify the original format and the content being indexed.

# Initialize Amazon S3 Vectors client
s3vectors = boto3.client("s3vectors", region_name="us-east-1")

# Configuration
VECTOR_BUCKET = "my-vector-store"
INDEX_NAME = "embeddings"

# Create vector bucket and index (if they don't exist)
try:
    s3vectors.get_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Vector bucket {VECTOR_BUCKET} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Created vector bucket: {VECTOR_BUCKET}")

try:
    s3vectors.get_index(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME)
    print(f"Vector index {INDEX_NAME} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_index(
        vectorBucketName=VECTOR_BUCKET,
        indexName=INDEX_NAME,
        dimension=EMBEDDING_DIMENSION,
        dataType="float32",
        distanceMetric="cosine"
    )
    print(f"Created index: {INDEX_NAME}")

texts = [
    "Machine learning on AWS",
    "Amazon Bedrock provides foundation models",
    "S3 Vectors enables semantic search"
]

print(f"nGenerating embeddings for {len(texts)} texts...")

# Generate embeddings using Amazon Nova for each text
vectors = []
for text in texts:
    response = bedrock_runtime.invoke_model(
        body=json.dumps({
            "taskType": "SINGLE_EMBEDDING",
            "singleEmbeddingParams": {
                "embeddingDimension": EMBEDDING_DIMENSION,
                "text": {"truncationMode": "END", "value": text}
            }
        }),
        modelId=MODEL_ID,
        accept="application/json",
        contentType="application/json"
    )

    response_body = json.loads(response["body"].read())
    embedding = response_body["embeddings"][0]["embedding"]

    vectors.append({
        "key": f"text:{text[:50]}",  # Unique identifier
        "data": {"float32": embedding},
        "metadata": {"type": "text", "content": text}
    })
    print(f"  ✓ Generated embedding for: {text}")

# Add all vectors to store in a single call
s3vectors.put_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    vectors=vectors
)

print(f"nSuccessfully added {len(vectors)} vectors to the store in one put_vectors call!")

This final example demonstrates the capability of searching across different content types with a single query, finding the most similar content regardless of whether it originated from text, images, videos, or audio. The distance scores help you understand how closely related the results are to your original query.

# Text to query
query_text = "foundation models"  

print(f"nGenerating embeddings for query '{query_text}' ...")

# Generate embeddings
response = bedrock_runtime.invoke_model(
    body=json.dumps({
        "taskType": "SINGLE_EMBEDDING",
        "singleEmbeddingParams": {
            "embeddingPurpose": "GENERIC_RETRIEVAL",
            "embeddingDimension": EMBEDDING_DIMENSION,
            "text": {"truncationMode": "END", "value": query_text}
        }
    }),
    modelId=MODEL_ID,
    accept="application/json",
    contentType="application/json"
)

response_body = json.loads(response["body"].read())
query_embedding = response_body["embeddings"][0]["embedding"]

print(f"Searching for similar embeddings...n")

# Search for top 5 most similar vectors
response = s3vectors.query_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    queryVector={"float32": query_embedding},
    topK=5,
    returnDistance=True,
    returnMetadata=True
)

# Display results
print(f"Found {len(response['vectors'])} results:n")
for i, result in enumerate(response["vectors"], 1):
    print(f"{i}. {result['key']}")
    print(f"   Distance: {result['distance']:.4f}")
    if result.get("metadata"):
        print(f"   Metadata: {result['metadata']}")
    print()

Crossmodal search is one of the key advantages of multimodal embeddings. With crossmodal search, you can query with text and find relevant images. You can also search for videos using text descriptions, find audio clips that match certain topics, or discover documents based on their visual and textual content. For your reference, the full script with all previous examples merged together is here:

import json
import base64
import time
import boto3

MODEL_ID = "amazon.nova-2-multimodal-embeddings-v1:0"
EMBEDDING_DIMENSION = 3072

# Initialize Amazon Bedrock Runtime client
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

print(f"Generating text embedding with {MODEL_ID} ...")

# Text to embed
text = "Amazon Nova is a multimodal foundation model"

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "text": {"truncationMode": "END", "value": text},
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")
# Read and encode image
print(f"Generating image embedding with {MODEL_ID} ...")

with open("photo.jpg", "rb") as f:
    image_bytes = base64.b64encode(f.read()).decode("utf-8")

# Create embedding
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "image": {
            "format": "jpeg",
            "source": {"bytes": image_bytes}
        },
    },
}

response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json",
)

# Extract embedding
response_body = json.loads(response["body"].read())
embedding = response_body["embeddings"][0]["embedding"]

print(f"Generated embedding with {len(embedding)} dimensions")
# Initialize Amazon S3 client
s3 = boto3.client("s3", region_name="us-east-1")

print(f"Generating video embedding with {MODEL_ID} ...")

# Amazon S3 URIs
S3_VIDEO_URI = "s3://my-video-bucket/videos/presentation.mp4"

# Amazon S3 output bucket and location
S3_EMBEDDING_DESTINATION_URI = "s3://my-video-bucket/embeddings-output/"

# Create async embedding job for video with audio
model_input = {
    "taskType": "SEGMENTED_EMBEDDING",
    "segmentedEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "video": {
            "format": "mp4",
            "embeddingMode": "AUDIO_VIDEO_COMBINED",
            "source": {
                "s3Location": {"uri": S3_VIDEO_URI}
            },
            "segmentationConfig": {
                "durationSeconds": 15  # Segment into 15-second chunks
            },
        },
    },
}

response = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={
        "s3OutputDataConfig": {
            "s3Uri": S3_EMBEDDING_DESTINATION_URI
        }
    },
)

invocation_arn = response["invocationArn"]
print(f"Async job started: {invocation_arn}")

# Poll until job completes
print("nPolling for job completion...")
while True:
    job = bedrock_runtime.get_async_invoke(invocationArn=invocation_arn)
    status = job["status"]
    print(f"Status: {status}")

    if status != "InProgress":
        break
    time.sleep(15)

# Check if job completed successfully
if status == "Completed":
    output_s3_uri = job["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"]
    print(f"nSuccess! Embeddings at: {output_s3_uri}")

    # Parse S3 URI to get bucket and prefix
    s3_uri_parts = output_s3_uri[5:].split("/", 1)  # Remove "s3://" prefix
    bucket = s3_uri_parts[0]
    prefix = s3_uri_parts[1] if len(s3_uri_parts) > 1 else ""

    # AUDIO_VIDEO_COMBINED mode outputs to embedding-audio-video.jsonl
    # The output_s3_uri already includes the job ID, so just append the filename
    embeddings_key = f"{prefix}/embedding-audio-video.jsonl".lstrip("/")

    print(f"Reading embeddings from: s3://{bucket}/{embeddings_key}")

    # Read and parse JSONL file
    response = s3.get_object(Bucket=bucket, Key=embeddings_key)
    content = response['Body'].read().decode('utf-8')

    embeddings = []
    for line in content.strip().split('n'):
        if line:
            embeddings.append(json.loads(line))

    print(f"nFound {len(embeddings)} video segments:")
    for i, segment in enumerate(embeddings):
        print(f"  Segment {i}: {segment.get('startTime', 0):.1f}s - {segment.get('endTime', 0):.1f}s")
        print(f"    Embedding dimension: {len(segment.get('embedding', []))}")
else:
    print(f"nJob failed: {job.get('failureMessage', 'Unknown error')}")
# Initialize Amazon S3 Vectors client
s3vectors = boto3.client("s3vectors", region_name="us-east-1")

# Configuration
VECTOR_BUCKET = "my-vector-store"
INDEX_NAME = "embeddings"

# Create vector bucket and index (if they don't exist)
try:
    s3vectors.get_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Vector bucket {VECTOR_BUCKET} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_vector_bucket(vectorBucketName=VECTOR_BUCKET)
    print(f"Created vector bucket: {VECTOR_BUCKET}")

try:
    s3vectors.get_index(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME)
    print(f"Vector index {INDEX_NAME} already exists")
except s3vectors.exceptions.NotFoundException:
    s3vectors.create_index(
        vectorBucketName=VECTOR_BUCKET,
        indexName=INDEX_NAME,
        dimension=EMBEDDING_DIMENSION,
        dataType="float32",
        distanceMetric="cosine"
    )
    print(f"Created index: {INDEX_NAME}")

texts = [
    "Machine learning on AWS",
    "Amazon Bedrock provides foundation models",
    "S3 Vectors enables semantic search"
]

print(f"nGenerating embeddings for {len(texts)} texts...")

# Generate embeddings using Amazon Nova for each text
vectors = []
for text in texts:
    response = bedrock_runtime.invoke_model(
        body=json.dumps({
            "taskType": "SINGLE_EMBEDDING",
            "singleEmbeddingParams": {
                "embeddingPurpose": "GENERIC_INDEX",
                "embeddingDimension": EMBEDDING_DIMENSION,
                "text": {"truncationMode": "END", "value": text}
            }
        }),
        modelId=MODEL_ID,
        accept="application/json",
        contentType="application/json"
    )

    response_body = json.loads(response["body"].read())
    embedding = response_body["embeddings"][0]["embedding"]

    vectors.append({
        "key": f"text:{text[:50]}",  # Unique identifier
        "data": {"float32": embedding},
        "metadata": {"type": "text", "content": text}
    })
    print(f"  ✓ Generated embedding for: {text}")

# Add all vectors to store in a single call
s3vectors.put_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    vectors=vectors
)

print(f"nSuccessfully added {len(vectors)} vectors to the store in one put_vectors call!")
# Text to query
query_text = "foundation models"  

print(f"nGenerating embeddings for query '{query_text}' ...")

# Generate embeddings
response = bedrock_runtime.invoke_model(
    body=json.dumps({
        "taskType": "SINGLE_EMBEDDING",
        "singleEmbeddingParams": {
            "embeddingPurpose": "GENERIC_RETRIEVAL",
            "embeddingDimension": EMBEDDING_DIMENSION,
            "text": {"truncationMode": "END", "value": query_text}
        }
    }),
    modelId=MODEL_ID,
    accept="application/json",
    contentType="application/json"
)

response_body = json.loads(response["body"].read())
query_embedding = response_body["embeddings"][0]["embedding"]

print(f"Searching for similar embeddings...n")

# Search for top 5 most similar vectors
response = s3vectors.query_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    queryVector={"float32": query_embedding},
    topK=5,
    returnDistance=True,
    returnMetadata=True
)

# Display results
print(f"Found {len(response['vectors'])} results:n")
for i, result in enumerate(response["vectors"], 1):
    print(f"{i}. {result['key']}")
    print(f"   Distance: {result['distance']:.4f}")
    if result.get("metadata"):
        print(f"   Metadata: {result['metadata']}")
    print()

For production applications, embeddings can be stored in any vector database. Amazon OpenSearch Service offers native integration with Nova Multimodal Embeddings at launch, making it straightforward to build scalable search applications. As shown in the examples before, Amazon S3 Vectors provides a simple way to store and query embeddings with your application data.

Things to know
Nova Multimodal Embeddings offers four output dimension options: 3,072, 1,024, 384, and 256. Larger dimensions provide more detailed representations but require more storage and computation. Smaller dimensions offer a practical balance between retrieval performance and resource efficiency. This flexibility helps you optimize for your specific application and cost requirements.

The model handles substantial context lengths. For text inputs, it can process up to 8,192 tokens at once. Video and audio inputs support segments of up to 30 seconds, and the model can segment longer files. This segmentation capability is particularly useful when working with large media files—the model splits them into manageable pieces and creates embeddings for each segment.

The model includes responsible AI features built into Amazon Bedrock. Content submitted for embedding goes through Amazon Bedrock content safety filters, and the model includes fairness measures to reduce bias.

As described in the code examples, the model can be invoked through both synchronous and asynchronous APIs. The synchronous API works well for real-time applications where you need immediate responses, such as processing user queries in a search interface. The asynchronous API handles latency insensitive workloads more efficiently, making it suitable for processing large content such as videos.

Availability and pricing
Amazon Nova Multimodal Embeddings is available today in Amazon Bedrock in the US East (N. Virginia) AWS Region. For detailed pricing information, visit the Amazon Bedrock pricing page.

To learn more, see the Amazon Nova User Guide for comprehensive documentation and the Amazon Nova model cookbook on GitHub for practical code examples.

If you’re using an AI–powered assistant for software development such as Amazon Q Developer or Kiro, you can set up the AWS API MCP Server to help the AI assistants interact with AWS services and resources and the AWS Knowledge MCP Server to provide up-to-date documentation, code samples, knowledge about the regional availability of AWS APIs and CloudFormation resources.

Start building multimodal AI-powered applications with Nova Multimodal Embeddings today, and share your feedback through AWS re:Post for Amazon Bedrock or your usual AWS Support contacts.

Danilo

AWS Weekly Roundup: AWS RTB Fabric, AWS Customer Carbon Footprint Tool, AWS Secret-West Region, and more (October 27, 2025)

This post was originally published on this site

This week started with challenges for many using services in the the North Virginia (us-east-1) Region. On Monday, we experienced a service disruption affecting DynamoDB and several other services due to a DNS configuration problem. The issue has been fully resolved, and you can read the full details in our official summary. As someone who works closely with developers, I know how disruptive these incidents can be to your applications and your users. The teams are learning valuable lessons from this event that will help improve our services going forward.

Last week’s launches

On a brighter note, I’m excited to share some launches and updates from this past week that I think you’ll find interesting.

AWS RTB Fabric is now generally available — If you’re working in advertising technology, you’ll be interested in AWS RTB Fabric, a fully managed service for real-time bidding workloads. It connects AdTech partners like SSPs, DSPs, and publishers through a private, high-performance network that delivers single-digit millisecond latency—critical for those split-second ad auctions. The service reduces networking costs by up to 80% compared to standard cloud solutions with no upfront commitments, and includes three built-in modules to optimize traffic, improve bid efficiency, and increase bid response rates. AWS RTB Fabric is available in US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore and Tokyo), and Europe (Frankfurt and Ireland).

Customer Carbon Footprint Tool now includes Scope 3 emissions data — Understanding the full environmental impact of your cloud usage just got more comprehensive. The AWS Customer Carbon Footprint Tool (CCFT) now covers all three industry-standard emission scopes as defined by the Greenhouse Gas Protocol. This update adds Scope 3 emissions—covering the lifecycle carbon impact from manufacturing servers, powering AWS facilities, and transporting equipment to data centers—plus Scope 1 natural gas and refrigerants. With historical data available back to January 2022, you can track your progress over time and make informed decisions about your cloud strategy to meet sustainability goals. Access the data through the CCFT dashboard or AWS Billing and Cost Management Data Exports.

Additional updates

I thought these projects, blog posts, and news items were also interesting:

AWS Secret-West Region is now available — AWS launched its second Secret Region in the western United States, capable of handling mission-critical workloads at the Secret U.S. security classification level. This new region provides enhanced performance for latency-sensitive workloads and offers multi-region resiliency with geographic separation for Intelligence Community and Department of Defense missions. The infrastructure features data centers and network architecture designed, built, accredited, and operated for security compliance with Intelligence Community Directive requirements.

Amazon CloudWatch now generates incident reports — CloudWatch investigations can now automatically generate comprehensive incident reports that include executive summaries, timeline of events, impact assessments, and actionable recommendations. The feature collects and correlates telemetry data along with investigation actions to help teams identify patterns and implement preventive measures through structured post-incident analysis.

Amazon Connect introduces threaded email views — Amazon Connect email now displays exchanges in a threaded format and automatically includes prior conversation context when agents compose responses. These enhancements make it easier for both agents and customers to maintain context and continuity across interactions, delivering a more natural and familiar email experience.

Amazon EC2 I8g instances expand to additional regions — Storage Optimized I8g instances are now available in Europe (London), Asia Pacific (Singapore), and Asia Pacific (Tokyo). Powered by AWS Graviton4 processors and third-generation AWS Nitro SSDs, these instances deliver up to 60% better compute performance and 65% better real-time storage performance per TB compared to previous generation I4g instances, with storage I/O latency reduced by up to 50%.

AWS Location Service adds enhanced map styling — Developers can now incorporate terrain visualization, contour lines, real-time traffic overlays, and transportation-specific routing details through the GetStyleDescriptor API. The new styling parameters enable tailored maps for specific applications—from outdoor navigation to logistics planning.

CloudWatch Synthetics introduces multi-check canaries — You can now bundle up to 10 different monitoring steps in a single canary using JSON configuration without custom scripts. The multi-check blueprints support HTTP endpoints with authentication, DNS validation, SSL certificate monitoring, and TCP port checks, making API monitoring more cost-effective.

Amazon S3 Tables now generates CloudTrail events — S3 Tables now logs AWS CloudTrail events for automatic maintenance operations, including compaction and snapshot expiration. This enables organizations to audit the maintenance activities that S3 Tables automatically performs to enhance query performance and reduce operational costs.

AWS Lambda increases asynchronous invocation payload size to 1 MB — Lambda has quadrupled the maximum payload size for asynchronous invocations from 256 KB to 1 MB across all AWS Commercial and GovCloud (US) Regions. This expansion streamlines architectures by allowing comprehensive data to be included in a single event, eliminating the need for complex data chunking or external storage solutions. Use cases now better supported include large language model prompts, detailed telemetry signals, complex ML output structures, and complete user profiles. The update applies to asynchronous invocations through the Lambda API or push-based events from services like S3, CloudWatch, SNS, EventBridge, and Step Functions. Pricing remains at 1 request charge for the first 256 KB, with 1 additional charge per 64 KB chunk thereafter.

Upcoming AWS events

Keep a look out and be sure to sign up for these upcoming events:

AWS re:Invent 2025 (December 1-5, 2025, Las Vegas) — AWS flagship annual conference offering collaborative innovation through peer-to-peer learning, expert-led discussions, and invaluable networking opportunities. Registration is now open.

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse for upcoming in-person and virtual developer-focused events in your area.

That’s all for this week. Check back next Monday for another Weekly Roundup!

~ micah

Introducing AWS RTB Fabric for real-time advertising technology workloads

This post was originally published on this site

Today, we’re announcing AWS RTB Fabric, a fully managed service purpose built for real-time bidding (RTB) advertising workloads. The service helps advertising technology (AdTech) companies seamlessly connect with their supply and demand partners, such as Amazon Ads, GumGum, Kargo, MobileFuse, Sovrn, TripleLift, Viant, Yieldmo and more, to run high-volume, latency-sensitive RTB workloads on Amazon Web Services (AWS) with consistent single-digit millisecond performance and up to 80% lower networking costs compared to standard networking costs.

AWS RTB Fabric provides a dedicated, high-performance network environment for RTB workloads and partner integrations without requiring colocated, on-premises infrastructure or upfront commitments. The following diagram shows the high-level architecture of RTB Fabric.

AWS RTB Fabric also includes modules, a capability that helps customers bring their own and partner applications securely into the compute environment used for real-time bidding. Modules support containerized applications and foundation models (FMs) that can enhance transaction efficiency and bidding effectiveness. At launch, AWS RTB Fabric includes modules for optimizing traffic management, improving bid efficiency, and increasing bid response rates, all running inline within the service for consistent low-latency execution.

The growth of programmatic advertising has created a need for low-latency, cost-efficient infrastructure to support RTB workloads. AdTech companies process millions of bid requests per second across publishers, supply-side platforms (SSPs), and demand-side platforms (DSPs). These workloads are highly sensitive to latency because most RTB auctions must complete within 200–300 milliseconds and require reliable, high-speed exchange of OpenRTB requests and responses among multiple partners. Many companies have addressed this by deploying infrastructure in colocation data centers near key partners, which reduces latency but adds operational complexity, long provisioning cycles, and high costs. Others have turned to cloud infrastructure to gain elasticity and scale, but they often face complex provisioning, partner-specific connectivity, and long-term commitments to achieve cost efficiency. These gaps add operational overhead and limit agility. AWS RTB Fabric solves these challenges by providing a managed private network built for RTB workloads that delivers consistent performance, simplifies partner onboarding, and achieves predictable cost efficiency without the burden of maintaining colocation or custom networking setups.

Key capabilities
AWS RTB Fabric introduces a managed foundation for running RTB workloads at scale. The service provides the following key capabilities:

  • Simplified connectivity to AdTech partners – When you register an RTB Fabric gateway, the service automatically generates secure endpoints that can be shared with selected partners. Using the AWS RTB Fabric API, you can create optimized, private connections to exchange RTB traffic securely across different environments. External Links are also available to connect with partners who aren’t using RTB Fabric, such as those operating on premises or in third-party cloud environments. This approach shortens integration time and simplifies collaboration among AdTech participants.
  • Dedicated network for low-latency advertising transactions – AWS RTB Fabric provides a managed, high-performance network layer optimized for OpenRTB communication. It connects AdTech participants such as SSPs, DSPs, and publishers through private, high-speed links that deliver consistent single-digit millisecond latency. The service automatically optimizes routing paths to maintain predictable performance and reduce networking costs, without requiring manual peering or configuration.
  • Pricing model aligned with RTB economics – AWS RTB Fabric uses a transaction-based pricing model designed to align with programmatic advertising economics. Customers are billed per billion transactions, providing predictable infrastructure costs that align with how advertising exchanges, SSPs, and DSPs operate.
  • Built-in traffic management modules – AWS RTB Fabric includes configurable modules that help AdTech workloads operate efficiently and reliably. Modules such as Rate Limiter, OpenRTB Filter, and Error Masking help you control request volume, validate message formats, and manage response handling directly in the network path. These modules execute inline within the AWS RTB Fabric environment, maintaining network-speed performance without adding application-level latency. All configurations are managed through the AWS RTB Fabric API, so you can define and update rules programmatically as your workloads scale.

Getting started
Today, you can start building with AWS RTB Fabric using the AWS Management Console, AWS Command Line Interface (AWS CLI), or infrastructure-as-code (IaC) tools such as AWS CloudFormation and Terraform.

The console provides a visual entry point to view and manage RTB gateways and links, as shown on the Dashboard of the AWS RTB Fabric console.

You can also use the AWS CLI to configure gateways, create links, and manage traffic programmatically. When I started building with AWS RTB Fabric, I used the AWS CLI to configure everything from gateway creation to link setup and traffic monitoring. The setup ran inside my Amazon Virtual Private Cloud (Amazon VPC) endpoint while AWS managed the low-latency infrastructure that connected workloads.

To begin, I created a requester gateway to send bid requests and a responder gateway to receive and process bid responses. These gateways act as secure communication points within the AWS RTB Fabric.

# Create a requester gateway with required parameters
aws rtbfabric create-requester-gateway 
  --description "My RTB requester gateway" 
  --vpc-id vpc-12345678 
  --subnet-ids subnet-abc12345 subnet-def67890 
  --security-group-ids sg-12345678 
  --client-token "unique-client-token-123"
# Create a responder gateway with required parameters
aws rtbfabric create-responder-gateway 
  --description "My RTB responder gateway" 
  --vpc-id vpc-01f345ad6524a6d7 
  --subnet-ids subnet-abc12345 subnet-def67890 
  --security-group-ids sg-12345678 
  --dns-name responder.example.com 
  --port 443 
  --protocol HTTPS

After both gateways were active, I created a link from the requester to the responder to establish a private, low-latency communication path for OpenRTB traffic. The link handled routing and load balancing automatically.

# Requester account creating a link from requester gateway to a responder gateway
aws rtbfabric create-link 
  --gateway-id rtb-gw-requester123 
  --peer-gateway-id rtb-gw-responder456 
  --log-settings '{"applicationLogs:{"sampling":"errorLog":10.0,"filterLog":10.0}}'
# Responder account accepting a link from requester gateway to responder gateway
aws rtbfabfic accept-link 
  --gateway-id rtb-gw-responder456 
  --link-id link-reqtoresplink789 
  --log-settings '{"applicationLogs:{"sampling":"errorLog":10.0,"filterLog":10.0}}'

I also connected with external partners using External Links, which extended my RTB workloads to on-premises or third-party environments while maintaining the same latency and security characteristics.

# Create an inbound external link endpoint for an external partner to send bid requests to
aws rtbfabric create-inbound-external-link 
  --gateway-id rtb-gw-responder456
# Create an outbound external link for sending bid requests to an external partner
aws rtbfabric create-outbound-external-link 
  --gateway-id rtb-gw-requester123 
  --public-endpoint "https://my-external-partner-responder.com"

To manage traffic efficiently, I added modules directly into the data path. The Rate Limiter module controlled request volume, and the OpenRTB Filter validated message formats inline at network speed.

# Attach a rate limiting module
aws rtbfabric update-link-module-flow 
  --gateway-id rtb-gw-responder456 
  --link-id link-toresponder789 
  --modules '{"name":"RateLimiter":"moduleParameters":{"rateLimiter":{"tps":10000}}}'

Finally, I used Amazon CloudWatch to monitor throughput, latency, and module performance, and I exported logs to Amazon Simple Storage Service (Amazon S3) for auditing and optimization.

All configurations can also be automated with AWS CloudFormation or Terraform, allowing consistent, repeatable deployment across multiple environments. With RTB Fabric, I could focus on optimizing bidding logic while AWS maintained predictable, single-digit millisecond performance across my AdTech partners.

For more details, refer to the AWS RTB Fabric User Guide.

Now available
AWS RTB Fabric is available today in the following AWS Regions: US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland).

AWS RTB Fabric is continually evolving to address the changing needs of the AdTech industry. The service expands its capabilities to support secure integration of advanced applications and AI-driven optimizations in real-time bidding workflows that help customers simplify operations and improve performance on AWS. To learn more about AWS RTB Fabric, visit the AWS RTB Fabric page.

Betty

Customer Carbon Footprint Tool Expands: Additional emissions categories including Scope 3 are now available

This post was originally published on this site

Since it launched in 2022, the Customer Carbon Footprint Tool (CCFT) has supported our customers’ sustainability journey to track, measure, and review their carbon emissions by providing the estimated carbon emissions associated with their usage of Amazon Web Services (AWS) services.

In April, we made major updates in the CCFT, including easier access to carbon emissions data, visibility into emissions by AWS Region, inclusion of location-based emissions (LBM), an updated, independently-verified methodology as well as moving to a dedicated page in the AWS Billing console.

The CCFT is informed by the Greenhouse Gas (GHG) Protocol’s classification of emissions, which classifies a company’s emissions. Today, we’re announcing the inclusion of Scope 3 emissions data and an update to Scope 1 emissions in the CCFT. The new emission categories complement the existing Scope 1 and 2 data, and they’ll give our customers a comprehensive look into their carbon emissions data.

In this updated methodology we incorporate new emissions categories. We’ve added Scope 1 refrigerants and natural gas, alongside the existing Scope 1 emissions from fuel combustion in emergency backup generators (diesel). Although Scope 1 emissions represent a small share of overall emissions, we provide our customers with a complete image of their carbon emissions.

To decide which categories of Scope 3 to include in our model we looked at how material each of them were to the overall carbon impact and confirmed the vast majority of emissions were represented. With that in mind, the methodology now includes:

  • Fuel- and energy-related activities (“FERA” under the GHG Protocol) – This includes upstream emissions from purchased fuels, upstream emissions of purchased electricity, and transmission and distribution (T&D) losses. AWS calculates these emissions using both LBM and the market-based method (MBM).

  • IT hardware – AWS uses a comprehensive cradle-to-gate approach that tracks emissions from raw material extraction through manufacturing and transportation to AWS data centers. We use four calculation pathways: process-based life cycle assessment (LCA) with engineering attributes, extrapolation, representative category average LCA, and economic input-output LCA. AWS prioritizes the most detailed and accurate methods for components that contribute significantly to overall emissions.

  • Buildings and equipment – AWS follows established whole building life cycle assessment (wbLCA) standards, considering emissions from construction, use, and end-of-life phases. The analysis covers data center shells, rooms, and long-lead equipment such as air handling units and generators. The methodology uses both process-based life cycle assessment models and economic input-output analysis to provide comprehensive coverage.

The Scope 3 emissions are then amortized over the assets’ service life (6 years for IT hardware, 50 years for buildings) to calculate monthly emissions that can be allocated to customers. This amortization means that we fairly distribute the total embodied carbon of each asset across its operational lifetime, accounting for scenarios such as early retirement or extended use.

All these updates are part of methodology version 3.0.0 and are explained in detail in our methodology document, which has been independently verified by a third party.

How to access the CCFT
To get started, go to the AWS Billing and Cost Management console and choose Customer Carbon Footprint Tool under Cost and Usage Analysis. You can access your carbon emissions data in the dashboard, download a csv file, or export all data using basic SQL and visualize your data by integrating with AWS Data Exports and Amazon Quick Sight.

To ensure you can make meaningful year-over-year comparisons, we’ve recalculated historical data back to January 2022 using version 3 of the methodology. All the data displayed in the CCFT now uses version 3. To see historical data using v3, choose Create custom data export. A new data export now includes new columns breaking down emissions by Scope 1, 2, and 3.

You can see estimated AWS emissions and estimated emissions savings. The tool shows emissions calculated using the MBM for 38 months of data by default. You can find your emissions calculated using the LBM by choosing LBM in the Calculation method filter on the dashboard. The unit of measurement for carbon emissions is metric tons of carbon dioxide equivalent (MTCO2e), an industry-standard measure.

In the Carbon emissions summary, it shows trends of your carbon emissions over time. You can also find emissions resulting from your usage of AWS services and across all AWS Regions. To learn more, visit Viewing your carbon footprint in the AWS documentation.

Voice of the customer
Some of our customers had early access to these updates. This is what they shared with us:

Sunya Norman, senior vice president, Impact at Salesforce shared “Effective decarbonization begins with visibility into our carbon footprint, especially in Scope 3 emissions. Industry averages are only a starting point. The granular carbon data we get from cloud providers like AWS are critical to helping us better understand the actual emissions associated with our cloud infrastructure and focus reductions where they matter most.”

Gerhard Loske, Head of Environmental Management at SAP said “The latest updates to the CCFT are a big step forward in helping us managing SAP’s sustainability goals. With new Region-specific data, we can now see better where emissions are coming from and take targeted action. The upcoming addition of Scope 3 emissions will give us a much fuller picture of our carbon footprint across AWS workloads. These improvements make it easier for us to turn data into meaningful climate action.”

Pinterest’s Global Sustainability Lead, Mia Ketterling highlighted the benefits of the Scope 3 emission data, saying, “By including Scope 3 emissions data in their CCFT, AWS empowers customers like Pinterest to more accurately measure and report the full carbon footprint of our digital operations. Enhanced transparency helps us drive meaningful climate action across our value chain.”

If you’re attending AWS re:Invent in person in December, join technical leaders from AWS, Adobe, and Salesforce as they reveal how the Customer Carbon Footprint Tool supports their environmental initiatives.

Now available
With Scope 1, 2, and 3 coverage in the CCFT, you can track your emissions over time to understand how you’re trending towards your sustainability goals and see the impact of any carbon reduction projects you’ve implemented. To learn more, visit the Customer Carbon Footprint Tool (CCFT) page.

Give these new features a try in the AWS Billing and Cost Management console and send feedback to AWS re:Post for the CCFT or through your usual AWS Support contacts.

Channy

AWS Weekly Roundup: Kiro waitlist, EBS Volume Clones, EC2 Capacity Manager, and more (October 20, 2025)

This post was originally published on this site

I’ve been inspired by all the activities that tech communities around the world have been hosting and participating in throughout the year. Here in the southern hemisphere we’re starting to dream about our upcoming summer breaks and closing out on some of the activities we’ve initiated this year. The tech community in South Africa is participating in Amazon Q Developer coding challenges that my colleagues and I are hosting throughout this month as a fun way to wind down activities for the year. The first one was hosted in Johannesburg last Friday with Durban and Cape Town coming up next.

Last week’s launches
These are the launches from last week that caught my attention:

Additional updates
I thought these projects, blog posts, and news items were also interesting:

Upcoming AWS events
Keep a look out and be sure to sign up for these upcoming events:

AWS re:Invent 2025 (December 1-5, 2025, Las Vegas) — AWS flagship annual conference offering collaborative innovation through peer-to-peer learning, expert-led discussions, and invaluable networking opportunities.

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here for upcoming in-person and virtual developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

Veliswa.

Monitor, analyze, and manage capacity usage from a single interface with Amazon EC2 Capacity Manager

This post was originally published on this site

Today, I’m happy to announce Amazon EC2 Capacity Manager, a centralized solution to monitor, analyze, and manage capacity usage across all accounts and AWS Regions from a single interface. This service aggregates capacity information with hourly refresh rates and provides prioritized optimization opportunities, streamlining capacity management workflows that previously required custom automation or manual data collection from multiple AWS services.

Organizations using Amazon Elastic Compute Cloud (Amazon EC2) at scale operate hundreds of instance types across multiple Availability Zones and accounts, using On-Demand Instances, Spot Instances, and Capacity Reservations. This complexity means customers currently access capacity data through various AWS services including the AWS Management Console, Cost and Usage Reports, Amazon CloudWatch, and EC2 describe APIs. This distributed approach can create operational overhead through manual data collection, context switching between tools, and the need for custom automation to aggregate information for capacity optimization analysis.

EC2 Capacity Manager helps you overcome these operational complexities by consolidating all capacity data into a unified dashboard. You can now view cross-account and cross-Region capacity metrics for On-Demand Instances, Spot Instances, and Capacity Reservations across all commercial AWS Regions from a single location, eliminating the need to build custom data collection tools or navigate between multiple AWS services.

This consolidated visibility can help you discover cost savings by highlighting underutilized Capacity Reservations, analyzing usage patterns across instance types, and providing insights into Spot Instance interruption patterns. By having access to comprehensive capacity data in one place, you can make more informed decisions about rightsizing your infrastructure and optimizing your EC2 spending.

Let me show you the capabilities of EC2 Capacity Manager in detail.

Getting started with EC2 Capacity Manager
On the AWS Management Console, I navigate to Amazon EC2 and select Capacity Manager from the navigation pane. I enable EC2 Capacity Manager through the service settings. The service aggregates historical data from the previous 14 days during initial setup.

The main Dashboard displays capacity utilization across all instance types through a comprehensive overview section that presents key metrics at a glance. The capacity overview cards for Reservations, Usage, and Spot show trend indicators and percentage changes to help you identify capacity patterns quickly. You can apply filtering through the date filter controls, which include date range selection, time zone configuration, and interval settings.

You can select different units to analyze data by vCPUs, instance counts, or estimated costs to understand resource consumption patterns. Estimated costs are based on published On-Demand rates and do not include Savings Plans or other discounts. This pricing reference helps you compare the relative impact of underutilized capacity across different instance types—for example, 100 vCPU hours of unused p5 reservations represents a larger cost impact than 100 vCPU hours of unused t3 reservations.

The dashboard includes detailed Usage metrics with both total usage visualization and usage over time charts. The total usage section shows the breakdown between reserved usage, unreserved usage, and Spot usage. The usage over time chart provides visualization that tracks capacity trends over time, helping you identify usage patterns and peak demand periods.

Under Reservation metrics, Reserved capacity trends visualizes used and unused reserved capacity across the selected period, showing the proportion of reserved vCPU hours that remain unutilized compared with those actively consumed, helping you track reservation efficiency patterns and identify periods of consistent low utilization. This visibility can help you reduce costs by identifying underutilized reservations and helping you to make informed decisions about capacity adjustments.

The Unused capacity section lists underutilized capacity reservations by instance type and Availability Zone combinations, displaying specific utilization percentages and instance types across different Availability Zones. This prioritized list helps you identify potential savings with direct visibility into unused capacity costs.

The Usage tab provides detailed historical trends and usage statistics across all AWS Regions for Spot Instances, On-Demand Instances, Capacity Reservations, Reserved Instances, and Savings Plans. Dedicated Hosts usage is not included. The Dimension filter helps you group by and filter capacity data by Account ID, Region, Instance Family, Availability Zone, and Instance Type, creating custom views that reveal usage patterns across your accounts and AWS Organizations. This helps you analyze specific configurations and compare performance across accounts or Regions.

The Aggregations section provides a comprehensive usage table across EC2 and Spot Instances. You can select different units to analyze data by vCPUs, instance counts, or estimated costs to understand resource consumption patterns. The table shows instance family breakdowns with total usage statistics, reserved usage hours, unreserved usage hours, and Spot usage data. Each row includes a View breakdown action for a detailed analysis.

The Capacity usage or estimated cost trends section visualizes usage trends, reserved usage, unreserved usage, and Spot usage. You can filter the displayed data and adjust the unit of measurement to view historical patterns. These filtering and analysis tools help you identify usage trends, compare costs across dimensions, and make informed decisions for capacity planning and optimization.

When you choose View breakdown from the Aggregations table, you access detailed Usage breakdown based on the dimension filters you selected. This breakdown view shows usage patterns for individual instance types within the selected family and Availability Zone combinations, helping you identify specific optimization opportunities.

The Reservations tab displays capacity reservation utilization with automated analysis capabilities that generate prioritized lists of optimization opportunities. Similar to the Usage tab, you can apply dimension filters by Account ID, Region, Instance Family, Availability Zone, and Instance Type along with additional options related to the reservation details. On each of the tabs you can drill down to see data for individual line items. For reservations specifically, you can view specific reservations and access detailed information about On-Demand Capacity Reservations (ODCRs), including utilization history, configuration parameters, and current status. When the ODCR exists in the same account as Capacity Manager, you can modify reservation parameters directly from this interface, eliminating the need to navigate to separate EC2 console sections for reservation management.

The Statistics section provides summary metrics, including total reservations count, overall utilization percentage, reserved capacity totals, used and unused capacity volumes, average scheduled reservations, and counts of accounts, instance families, and Regions with reservations.

This consolidated view helps you understand reservation distribution and utilization patterns across your infrastructure. For example, you might discover that your development accounts consistently show 30% reservation utilization while production accounts exceed 95%, indicating an opportunity to redistribute or modify reservations. Similarly, you could identify that specific instance families in certain Regions have sustained low utilization rates, suggesting candidates for reservation adjustments or workload optimization. These insights help you make data-driven decisions about reservation purchases, modifications, or cancellations to better align your reserved capacity with actual usage patterns.

The Spot tab focuses on Spot Instance usage and displays the amount of time your Spot instances run before being interrupted. This analysis of Spot Instance usage patterns helps you identify optimization opportunities for Spot Instance workloads. You can use Spot placement score recommendations to improve workload flexibility.

For organizations requiring data export capabilities, Capacity Manager includes data exports to Amazon Simple Storage Service (Amazon S3) buckets for capacity analysis. You can view and manage your data exports through the Data exports tab, which helps you create new exports, monitor delivery status, and configure export schedules to analyze capacity data outside the AWS Management Console.

Data exports extend your analytical capabilities by storing capacity data beyond the 90-day retention period available through the console and APIs. This extended retention enables long-term trend analysis and historical capacity planning. You can also integrate exported data with existing analytics workflows, business intelligence tools, or custom reporting systems to incorporate EC2 capacity metrics into broader infrastructure analysis and decision-making processes.

The Settings section provides configuration options for AWS Organizations integration, enabling centralized capacity management across multiple accounts. Organization administrators can enable enterprise-wide capacity visibility or delegate access to specific accounts while maintaining appropriate permissions and access controls.

Now available
EC2 Capacity Manager eliminates the operational overhead of collecting and analyzing capacity data from multiple sources. The service provides automated optimization opportunities, centralized multi-account visibility, and direct access to capacity management tools. You can reduce manual analysis time while improving capacity utilization and cost optimization across your EC2 infrastructure.

Amazon EC2 Capacity Manager is available at no additional cost. To begin using Amazon EC2 Capacity Manager, visit the Amazon EC2 console or access the service APIs. The service is available in all commercial AWS Regions.

To learn more, visit the EC2 Capacity Manager documentation.

— Esra

Introducing Amazon EBS Volume Clones: Create instant copies of your EBS volumes

This post was originally published on this site

 

As someone that used to work at Sun Microsystems, where ZFS was invented, I’ve always loved working with storage systems that offer instant volume copies for my development and testing needs.

Today, I’m excited to share that AWS is bringing similar capabilities to Amazon Elastic Block Store (Amazon EBS) with the launch of Amazon EBS Volume Clones, a new capability that lets you create instant point-in-time copies of your EBS volumes within the same Availability Zone.

Many customers need to create copies of their production data to support development and testing activities in a separate nonproduction environment. Until now, this process required taking an EBS snapshot (stored in Amazon Simple Storage Service (Amazon S3)) and then creating a new volume from that snapshot. Although this approach works, the process creates operational overhead due to multiple steps.

With Amazon EBS Volume Clones, you can now create copies of your EBS volumes with a single API call or console click. The copied volumes are available within seconds and provide immediate access to your data with single-digit millisecond latency. This makes Volume Clones particularly useful for quickly setting up test environments with production data or creating temporary copies of databases for development purposes.

Let me show you how Volume Clones works
For this post, I created a small Amazon Elastic Compute Cloud (Amazon EC2) instance, with an attached volume. I created a file on the root file system with the command echo "Hello CopyVolumes" > hello.txt.

To initiate the copy, I open a browser on the AWS Management Console and I navigate to EC2, Elastic Block Store, Volumes. I select the volume I want to copy.

Note that, at the time of publication of this post, only encrypted volumes can be copied.

On the Actions menu, I choose the Copy Volume option.

Copy Volume - initiate

Next, I choose the details of the target volume. I can change the Volume type and adjust the Size, IOPS, and Throughput parameters. I choose Copy volume to start the Volume Clone operation.

Copy Volume - Parameters

The copied volume enters the Creating state and becomes available within seconds. I can then attach it to an EC2 instance and start using it immediately.

Data blocks are copied from the source volume and written to the volume copy in the background. The volume remains in the Initializing state until the process is complete. I can monitor its progress with the describe-volume-status API. The initializing operation doesn’t affect the performance of the source volume. I can continue using it normally during the copy process.

I love that the copied volume is available immediately. I don’t need to wait for its initialization to complete. During the initialization phase, my copied volume delivers performance based on the lowest of: a baseline of 3,000 IOPS and 125 MiB/s, the source volume’s provisioned performance, or the copied volume’s provisioned performance.

After initialization is completed, the copied volume becomes fully independent of the source volume and delivers its full provisioned performance.

Copy Volume - InitializingAlternatively, I can use the AWS Command Line Interface (AWS CLI) to initiate the copy:

aws ec2 copy-volumes                          
     --source-volume-id vol-1234567890abcdef0 
     --size 500                               
     --volume-type gp3

After the volume copy is created, I attach it to my EC2 instance and mount it. I can check the file I created at start is present.

First, I attach the volume from my laptop, using the attach-volume command:

aws ec2 attach-volume 
         --volume-id 'vol-09b700e3a23a9b4ad' 
         --instance-id 'i-079e6504ad25b029e'   
         --device '/dev/sdb'

Then, I connect to the instance, and I type these commands:

$ sudo lsblk -f
NAME          FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
nvme0n1                                                                              
├─nvme0n1p1   xfs          /     49e26d9d-0a9d-4667-b93e-a23d1de8eacd    6.2G    22% /
└─nvme0n1p128 vfat   FAT16       3105-2F44                               8.6M    14% /boot/efi
nvme1n1                                                                              
├─nvme1n1p1   xfs          /     49e26d9d-0a9d-4667-b93e-a23d1de8eacd                
└─nvme1n1p128 vfat   FAT16       3105-2F44     

$ sudo mount -t xfs /dev/nvme1n1p1 /data

$ df -h
Filesystem        Size  Used Avail Use% Mounted on
devtmpfs          4.0M     0  4.0M   0% /dev
tmpfs             924M     0  924M   0% /dev/shm
tmpfs             370M  476K  369M   1% /run
/dev/nvme0n1p1    8.0G  1.8G  6.2G  22% /
tmpfs             924M     0  924M   0% /tmp
/dev/nvme0n1p128   10M  1.4M  8.7M  14% /boot/efi
tmpfs             185M     0  185M   0% /run/user/1000
/dev/nvme1n1p1    8.0G  1.8G  6.2G  22% /data

$ cat /data/home/ec2-user/hello.txt 
Hello CopyVolumes

Things to know
Volume Clones creates copies within the same Availability Zone as your source volume. You can create copies from encrypted volumes only, and the size of your copy must be equal to or greater than the source volume.

Volume Clones creates crash-consistent copies of your volumes, exactly like snapshots. For application consistency, you need to pause application I/O operations before creating the copy. For example, with PostgreSQL databases, you can use the pg_start_backup() and pg_stop_backup() functions to pause writes and create a consistent copy. At the operating system level on Linux with XFS, you can use the xfs_freeze command to temporarily suspend and resume access to the file system and ensure all cached updates are written to disk.

Although Volume Clones creates point-in-time copies, it complements rather than replaces EBS snapshots for backup purposes. EBS snapshots remain the recommended solution for data backup and protection against AZ-level and volume failures. Snapshots provide incremental backups to Amazon S3 with 11 nines of durability, compared to Volume Clones which maintains EBS volume durability (99.999% for io2, 99.9% for other volume types). Consider using Volume Clones specifically for test and development environment scenarios where you need instant access to volume copies.

Copied volumes exist independently of their source volumes and continue to incur standard EBS volume charges until you delete them. To manage costs effectively, implement governance rules to identify and remove copied volumes that are no longer needed for your development or testing activities.

Pricing and availability
Volume Clones supports all EBS volume types and works with volumes in the same AWS account and Availability Zone. This new capability is available in all AWS commercial Regions, selected Local Zones, and in the AWS GovCloud (US).

For pricing, you’re charged a one-time fee per GiB of data on the source volume at initiation and standard EBS pricing for the new volume.

I find Volume Clones particularly valuable for database workloads and continuous integration (CI) scenarios. For instance, you can quickly create a copy of your production database for testing new features or troubleshooting issues without impacting your production environment or waiting for data to hydrate from Amazon S3.

To get started with Amazon EBS Volume Clones, visit the Amazon EBS section on the console or check out the EBS documentation. I look forward to hearing how you use this capability to improve your development workflows.

— seb

AWS Transfer Family SFTP connectors now support VPC-based connectivity

This post was originally published on this site

Many organizations rely on the Secure File Transfer Protocol (SFTP) as the industry standard for exchanging critical business data. Traditionally, securely connecting to private SFTP servers required custom infrastructure, manual scripting, or exposing endpoints to the public internet.

Today, AWS Transfer Family SFTP connectors now support connectivity to remote SFTP servers through Amazon Virtual Private Cloud (Amazon VPC) environments. You can transfer files between Amazon Simple Storage Service (Amazon S3) and private or public SFTP servers while applying the security controls and network configurations already defined in your VPC. This capability helps you integrate data sources across on-premises environments, partner-hosted private servers, or internet-facing endpoints, with the operational simplicity of a fully managed Amazon Web Services (AWS) service.

New capabilities with SFTP connectors
The following are the key enhancements:

  • Connect to private SFTP servers – SFTP connectors can now reach endpoints that are only accessible within your AWS VPC connection. These include servers hosted in your VPC or a shared VPC, on-premises systems connected over AWS Direct Connect, and partner-hosted servers connected through VPN tunnels.
  • Security and compliance – All file transfers are routed through the security controls already applied in your VPC, such as AWS Network Firewall or centralized ingress and egress inspection. Private SFTP servers remain private and don’t need to be exposed to the internet. You can also present static Elastic IP or bring your own IP (BYOIP) addresses to meet partner allowlist requirements.
  • Performance and simplicity – By using your own network resources such as NAT gateways, AWS Direct Connect or VPN connections, connectors can take advantage of higher bandwidth capacity for large-scale transfers. You can configure connectors in minutes through the AWS Management Console,  AWS Command Line Interface (AWS CLI), or AWS SDKs without building custom scripts or third-party tools.

How VPC- based SFTP connections work
SFTP connectors use Amazon VPC Lattice resources to establish secure connectivity through your VPC. Key constructs include a resource configuration and a resource gateway. The resource configuration represents the target SFTP server, which you specify using a private IP address or public DNS name. The resource gateway provides SFTP connector access to these configurations, enabling file transfers to flow through your VPC and its security controls.

The following architecture diagram illustrates how traffic flows between Amazon S3 and remote SFTP servers. As shown in the architecture, traffic flows from Amazon S3 through the SFTP connector into your VPC. A resource gateway is the entry point that handles inbound connections from the connector to your VPC resources. Outbound traffic is routed through your configured egress path, using Amazon VPC NAT gateways with Elastic IPs for public servers or AWS Direct Connect and VPN connections for private servers. You can use existing IP addresses from your VPC CIDR range, simplifying partner server allowlists. Centralized firewalls in the VPC enforce security policies, and customer-owned NAT gateways provide higher bandwidth for large-scale transfers.

When to use this feature
With this capability, developers and IT administrators can simplify workflows while meeting security and compliance requirements across a range of scenarios:

  • Hybrid environments – Transfer files between Amazon S3 and on-premises SFTP servers using AWS Direct Connect or AWS Site-to-Site VPN, without exposing endpoints to the internet.
  • Partner integrations – Connect with business partners’ SFTP servers that are only accessible through private VPN tunnels or shared VPCs. This avoids building custom scripts or managing third-party tools, reducing operational complexity.
  • Regulated industries – Route file transfers through centralized firewalls and inspection points in VPCs to comply with financial services, government, or healthcare security requirements.
  • High-throughput transfers – Use your own network configurations such as NAT gateways, AWS Direct Connect, or VPN connections with Elastic IP or BYOIP to handle large-scale, high-bandwidth transfers while retaining IP addresses already on partner allowlists.
  • Unified file transfer solution – Standardize on Transfer Family for both internal and external SFTP connectivity, reducing fragmentation across file transfer tools.

Start building with SFTP connectors
To begin transferring files with SFTP connectors through my VPC environment, I follow these steps:

First, I configure my VPC Lattice resources. In the Amazon VPC console, under PrivateLink and Lattice in the navigation pane, I choose Resource gateways, choose Create resource gateway to create one to act as the ingress point into my VPC. Next, under PrivateLink and Lattice in the navigation pane, I choose Resource configuration and choose Create resource configuration to create a resource configuration for my target SFTP server. Specify the private IP address or public DNS name, and the port (typically 22).

Then, I configure AWS Identity and Access Management (IAM) permissions. I ensure that the IAM role used for connector creation has transfer:* permissions, and VPC Lattice permissions (vpc-lattice:CreateServiceNetworkResourceAssociation, vpc-lattice:GetResourceConfiguration, vpc-lattice:AssociateViaAWSService). I update the trust policy on the IAM role to specify transfer.amazonaws.com as a trusted principal. This enables AWS Transfer Family to assume the role when creating and managing my SFTP connectors.

After that, I create an SFTP connector through the AWS Transfer Family console. I choose SFTP Connectors and then choose Create SFTP connector. In the Connector configuration section, I select VPC Lattice as the egress type, then provide the Amazon Resource Name (ARN) of the Resource Configuration, Access role, and Connector credentials. Optionally, include a trusted host key for enhanced security, or override the default port if my SFTP server uses a nonstandard port.

Next, I test the connection. On the Actions menu, I choose Test connection to confirm that the connector can reach the target SFTP server.

Finally, after the connector status is ACTIVE, I can begin file operations with my remote SFTP server programmatically by calling Transfer Family APIs such as StartDirectoryListing, StartFileTransfer, StartRemoteDelete, or StartRemoteMove. All traffic is routed through my VPC using my configured resources such as NAT gateways, AWS Direct Connect, or VPN connections together with my IP addresses and security controls.

For the complete set of options and advanced workflows, refer to the AWS Transfer Family documentation.

Now available

SFTP connectors with VPC-based connectivity are now available in 21 AWS Regions. Check the AWS Services by Region for the latest supported AWS Regions. You can now securely connect AWS Transfer Family SFTP connectors to private, on-premises, or internet-facing servers using your own VPC resources such as NAT gateways, Elastic IPs, and network firewalls.

Betty

AWS Weekly Roundup: Amazon Quick Suite, Amazon EC2, Amazon EKS, and more (October 13, 2025)

This post was originally published on this site

This week I was at the inaugural AWS AI in Practice meetup from the AWS User Group UK. AI-assisted software development and agents were the focus of the evening! Next week I’ll be in Italy for Codemotion (Milan) and an AWS User Group meetup (Rome). I am also excited to try the new Amazon Quick Suite that brings AI-powered research, business intelligence, and automation capabilities into a single workspace.

Last week’s launches
Here are the launches that got my attention this week:

Additional updates
Here are some additional projects, blog posts, and news items that I found interesting:

Upcoming AWS events
Check your calendars so that you can sign up for these upcoming events:

  • AWS AI Agent Global Hackathon – This is your chance to dive deep into our powerful generative AI stack and create something truly awesome. From September 8th to October 20th, you have the opportunity to create AI agents using AWS suite of AI services, competing for over $45,000 in prizes and exclusive go-to-market opportunities.
  • AWS Gen AI Lofts – You can learn AWS AI products and services with exclusive sessions, meet industry-leading experts, and have valuable networking opportunities with investors and peers. Register in your nearest city: Paris (October 7–21), London (Oct 13–21), and Tel Aviv (November 11–19).
  • AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Budapest (October 16).

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here upcoming in-person events, developer-focused events, and events for startups.

That’s all for this week. Check back next Monday for another Weekly Roundup!

Danilo

Announcing Amazon Quick Suite: your agentic teammate for answering questions and taking action

This post was originally published on this site

Today, we’re announcing Amazon Quick Suite, a new agentic teammate that quickly answers your questions at work and turns those insights into actions for you. Instead of switching between multiple applications to gather data, find important signals and trends, and complete manual tasks, Quick Suite brings AI-powered research, business intelligence, and automation capabilities into a single workspace. You can now analyze data through natural language queries, find critical information across enterprise and external sources in minutes, and automate processes from simple tasks to complex multi-department workflows.

Here’s a look into Quick Suite.

Business users often need to gather data across multiple applications—pulling customer details, checking performance metrics, reviewing internal product information, and performing competitive intelligence. This fragmented process often requires consultation with specialized teams to analyze advanced datasets, and in some cases, must be repeated regularly, reducing efficiency and leading to incomplete insights for decision-making.

Quick Suite helps you overcome these challenges by combining agentic teammates for research, business intelligence, and automation into a unified digital workspace for your day-to-day work.

Integrated capabilities that power productivity 
Quick Suite includes the following integrated capabilities:

  • Research – Quick Research accelerates complex research by combining enterprise knowledge, premium third-party data, and data from the internet for more comprehensive insights.
  • Business intelligence – Quick Sight provides AI-powered business intelligence capabilities that transform data into actionable insights through natural language queries and interactive visualizations, helping everyone make faster decisions and achieve better business outcomes.
  • Automation – Quick Flows and Quick Automate help users and technical teams to automate any business process from simple, routine tasks to complex multi-department workflows, enabling faster execution and reducing manual work across the organization.

Let’s dive into some of these key capabilities.

Quick Index: Your unified knowledge foundation
Quick Index creates a secure, searchable repository that consolidates documents, files, and application data to power AI-driven insights and responses across your organization.

As a foundational component of Quick Suite, Quick Index operates in the background to bring together all your data—from databases and data warehouses to documents and email. This creates a single, intelligent knowledge base that makes AI responses more accurate and reduces time spent searching for information.

Quick Index automatically indexes and prepares any uploaded files or unstructured data you add to your Quick Suite, enabling efficient searching, sorting, and data access. For example, when you search for a specific project update, Quick Index instantly returns results from uploaded documents, meeting notes, project files, and reference materials—all from one unified search instead of checking different repositories and file systems.

To learn more, visit the Quick Index overview page.

Quick Research: From complex business challenges to expert-level insights
Quick Research is a powerful agent that conducts comprehensive research across your enterprise data and external sources to deliver contextualized, actionable insights in minutes or hours — work that previously could take longer.

Quick Research systematically breaks down complex questions into organized research plans. Starting with a simple prompt, it automatically creates detailed research frameworks that outline the approach and data sources needed for comprehensive analysis.

After Quick Research creates the plan, you can easily refine it through natural language conversations. When you are happy with the plan, it works in the background to gather information from multiple sources, using advanced reasoning to validate findings and provide thorough analysis with citations.

Quick Research integrates with your enterprise data connected to Quick Suite, the unified knowledge foundation that connects to your dashboards, documents, databases, and external sources, including Amazon S3, Snowflake, Google Drive, and Microsoft SharePoint. Quick Research grounds key insights to original sources and reveals clear reasoning paths, helping you verify accuracy, understand the logic behind recommendations, and present findings with confidence. You can trace findings back to their original sources and validate conclusions through source citations. This makes it ideal for complex topics requiring in-depth analysis.

To learn more, visit the Quick Research overview page.

Quick Sight: AI-powered business intelligence
Quick Sight provides AI-powered business intelligence capabilities that transform data into actionable insights through natural language queries and interactive visualizations.

You can create dashboards and executive summaries using conversational prompts, reducing dashboard development time while making advanced analytics accessible without specialized skills.

Quick Sight helps you ask questions about your data in natural language and receive instant visualizations, executive summaries, and insights. This generative AI integration provides you with answers from your dashboards and datasets without requiring technical expertise.

Using the scenarios capability, you can perform what-if analysis in natural language with step-by-step guidance, exploring complex business scenarios and finding answers faster than before.

Additionally, you can respond to insights with one-click actions by creating tickets, sending alerts, updating records, or triggering automated workflows directly from your dashboards without switching applications.

To learn more, visit Quick Sight overview page.

Quick Flows: Automation for everyone
With Quick Flows, any user can automate repetitive tasks by describing their workflow using natural language without requiring any technical knowledge. Quick Flows fetches information from internal and external sources, takes action in business applications, generates content, and handles process-specific requirements.

Starting with straightforward business requirements, it creates a multi-step flow including input steps for gathering information, reasoning groups for AI-powered processing, and output steps for generating and presenting results.

After the flow is configured, you can share it with a single click to your coworkers and other teams. To execute the flow, users can open it from the library or invoke it from chat, provide the necessary inputs, and then chat with the agent to refine the outputs and further customize the results.

To learn more, visit the Quick Flows overview page.

Quick Automate: Enterprise-scale process automation
Quick Automate helps technical teams build and deploy sophisticated automation for complex, multistep processes that span departments, systems, and third-party integrations. Using AI-powered natural language processing, Quick Automate transforms complex business processes into multi-agent workflows that can be created merely by describing what you want to automate or uploading process documentation.

While Quick Flows handles straightforward workflows, Quick Automate is designed for comprehensive and complex business processes like customer onboarding, procurement automations, or compliance procedures that involve multiple approval steps, system integrations, and cross-departmental coordination. Quick Automate offers advanced orchestration capabilities with extensive monitoring, debugging, versioning, and deployment features.

Quick Automate then generates a comprehensive automation plan with detailed steps and actions. You will find a UI agent that understands natural language instructions to autonomously navigate websites, complete form inputs, extract data, and produces structured outputs for downstream automation steps.

Additionally, you can define a custom agent, complete with instructions, knowledge, and tools, to complete process-specific tasks using the visual building experience – no code required.

Quick Automate includes enterprise-grade features such as user role management and human-in-the-loop capabilities that route specific tasks to users or groups for review and approval before continuing workflows. The service provides comprehensive observability with real-time monitoring, success rate tracking, and audit trails for compliance and governance.

To learn more, visit the Quick Automate overview page.

Additional foundational capabilities
Quick Suite includes other foundational capabilities that deliver seamless data organization and contextual AI interactions across your enterprise.

Spaces – Spaces provide a straightforward way for every business user to add their own context by uploading files or connecting to specific datasets and repositories specific to their work or to a particular function. For example, you might create a space for quarterly planning that includes budget spreadsheets, market research reports, and strategic planning documents. Or you could set up a product launch space that connects to your project management system and customer feedback databases. Spaces can scale from personal use to enterprise-wide deployment while maintaining access permissions and seamless integration with Quick Suite capabilities.

Chat agents – Quick Suite includes insights agents that you can use to interact with your data and workflows through natural language. Quick Suite includes a built-in agent to answer questions across all of your data and custom chat agents that you can configure with specific expertise and business context. Custom chat agents can be tailored for particular departments or use cases—such as a sales agent connected to your product catalog data and pricing information stored in a space or a compliance agent configured with your regulatory requirements and actions to request approvals.

Additional things to know
If you’re an existing Amazon QuickSight customer – Amazon QuickSight customers will be upgraded to Quick Suite, a unified digital workspace that includes all your existing QuickSight business intelligence capabilities (now called “Quick Sight”) plus new agentic AI capabilities. This is an interface and capability change—your data connectivity, user access, content, security controls, user permissions, and privacy settings remain exactly the same. No data is moved, migrated, or changed.

Quick Suite offers per-user subscription-based pricing with consumption-based charges for the Quick Index and other optional features. You can find more detail on the Quick Suite pricing page.

Now available
Amazon Quick Suite gives you a set of agentic teammates that helps you get the answers you need using all your data and move instantly from answers to action so you can focus on high value activities that drive better business and customer outcomes.

Visit the getting started page to start using Amazon Quick Suite today.

Happy building
— Esra and Donnie