Another AWS re:Invent 2025 Is in the Books

In this post I share my experience attending the AWS re:Invent 2025 Conference in person.

Chris Ebert standing in the Sports Expo space at Caesars Forum during AWS re:Invent 2025.
Standing inside the Sports Expo at Caesars Forum during AWS re:Invent 2025

The AWS re:Invent conference is one of the largest cloud conferences in the world, drawing more than 60,000+ attendees from across the United States and around the world. While the event centers on AWS services and technologies, it also covers broader topics such as system design, resiliency, artificial intelligence, languages, and modern application architecture (even if you don't use AWS, there are relevant topics if you're building cloud applications). This year's conference ran from December 1–5. I had the privilege of attending this conference for the third year in a row. In this post, I'll highlight the sessions and keynotes I attended, share the AWS services and ideas I plan to explore in the next year, and reflect on the key takeaways that stood out to me.

I primarily work on a cloud-native serverless application today, but my organization is also aggressively lifting, shifting, and replatforming workloads as part of our broader cloud strategy. We're beginning to embrace AI to streamline development, accelerate cloud migrations, and improve customer experiences. Reliability and systems design are top priorities due to the critical workloads we are responsible for. There's also a desire to reevaluate if we can do things better and faster. These factors shape my takeaways and perspective when I reflect on this year's content.

Navigating such a massive conference feels easier each time I have an opportunity to attend. This year, my employer sent considerably more people to the conference than in years past, so I was able to network with colleagues I don't regularly interact with in person. Having more coworkers at re:Invent made my networking experience even better. Additionally, one of my coworkers surprised me by helping get my room upgraded for free, which made the event even better.

Before re:Invent, I shared a blog post benchmarking the performance of AWS Lambdas across different architectures and runtimes. While I was at re:Invent, a few Community Builders proactively reached out to me to share their insights on my blog post and shared new testing strategies. David Behroozi shared with me that he observed coldstart differences in his benchmarks between older and new Node runtime versions. I plan to review his results and compare them to mine to determine the variation. I also received good feedback to consider leveraging CloudWatch Insights instead of DynamoDB for storing benchmark results and to use Lambda's newly announced tenant isolation feature to benchmark multiple "cold" lambdas across multiple tenants.

Before I dive into the sessions I attended and takeaways, I wanted to highlight that even if you weren't able to attend re:Invent in person, a good amount of the content is available online. All of the keynotes are available on the AWS Events YouTube Channel. Except for "Chalk Talk" and workshop sessions, most sessions have been recorded and are already available on YouTube. It's definitely worth spending some time watching the keynotes and sessions that interest you (plus you can watch them at 1.25-2x speed).

Sessions I Attended

Note: Unless a session was a "chalk talk", workshop, or otherwise unrecorded, the session titles in this section link to the YouTube recording for this session.

Monday (12/1)

  • Applied Data Modeling with Amazon DynamoDB (DAT402-R)
    • This workshop session was a practical deep dive into turning business requirements into DynamoDB access patterns. Even though I work with DynamoDB today, it was good to review first principles when designing a database for an e-commerce app with a product catalog and user cart functionality. This workshop emphasized the importance of storing data you access, along with efficient partitioning and sort keys, to enable efficient access. Also, for items that have content that doesn't change frequently (e.g., product descriptions), you reduce cost considerably by saving write-heavy items (such as inventory counts) to much smaller adjacent items.
  • An Unexpected Journey Building AWS MCP Servers (OPN401)
    • I attended this session as a simulcast between sessions at the Content Hub. This session shares AWS's learnings from building MCP servers for its customers. This session highlights the benefits and types of MCP servers. It also shared design patterns, such as having separate, isolated MCP servers for specific tasks/domains, and how to compose them to deliver more value to end users.
    • One interesting recommendation I took away from this session is that tools should generally compose multiple APIs or services to avoid 1:1 tool-to-API mappings (which helps keep context size smaller).
  • Reimagine Work with Amazon Quick Suite (BIZ202)
    • I also listed BIZ202 as a simulcast. I honestly did not enjoy this session, as it was more business-focused and less technical. It is noteworthy that Amazon QuickSight is being rebranded as Amazon Quick Suite. The session covered how Quick Suite can help make information and insights more available to business analysts.
  • Building Multi-Tenant RAG and MCP Servers (SAS306)
    • SAS306 was a great chalk talk session for anyone interested in building multi-tenant MCP servers. The session covered different strategies for tenant isolation and routing strategies. One theme I saw in this session, as well as in many others at re:Invent, was to adopt attribute-based access controls (ABAC) whenever possible when using AWS services. ABAC helps to enforce strong logical separation between tenants while helping keep costs lower by sharing resources.
  • Build a Multi-Region, Active-Active Rewards App with Aurora DSQL (DAT404-R)
    • Amazon Aurora DSQL was announced at re:Invent 2024. Amazon Aurora DSQL is a distributed, serverless SQL database that supports multi-region active-active workloads and supports Postgres syntax. This was my first time using DSQL hands-on. The workshop did an excellent job highlighting how conflicts are handled across regions, how indexes are handled, and how to work with DSQL's limitations, such as no foreign key support, that are enforced to ensure the system remains performant. AWS team members who work on DSQL attended the workshop and were willing to address any questions. It was clear to me how excited DSQL developers were about this service. I would consider DSQL for new workloads that need a relational database and high performance and availability.
    • Unfortunately, it does not sound like active-active DSQL will be available in AWS GovCloud anytime soon. Because GovCloud only has two regions and DSQL requires an additional independent region to serve as a witness for active-active coordination, DSQL cannot currently support active-active deployments in GovCloud.
  • Supercharge DevOps with AI-Driven Observability (DEV304)
    • This session helped provide realistic scenarios where AI can help team members in DevOps roles. Examples of how AI can be used to prevent outages early in the CI/CD process and how agents can diagnose and suggest corrections for production issues were provided.
  • re:Architecture Rodeo — Serverless Showdown (GHJ202)
    • This was a fun and unique session that was a great end to day one. The session was so much fun, it gave me another burst of energy. My friend Troy Dieter was one of the many AWS Solution Architects who helped organize this collaborative event. Working with a team of attendees seated at my table, we were given a realistic business scenario for a travel agency with a problematic monolith that needed to be decomposed and refactored to improve performance and reliability using serverless AWS services. After collaborating on a design, each team presented its design to real AWS Solution Architects for feedback and judging. I had a great time and met a lot of new connections I wouldn't have otherwise. I hope AWS continues to have fun events like this in the future.

Tuesday (12/2)

  • Opening Keynote — Matt Garman
    • If you only have time to watch one keynote, this is the most important one to get a sense of where AWS is moving and what it sees as important. Matt Garman highlighted AWS's continued growth as a cloud provider.
    • One of the major product announcements was the release of new Graviton5 Arm-based CPUs. As I highlighted recently in my blog post, Amazon's arm-based offering consistently delivers better performance and lower costs across workloads. Interestingly, Graviton (arm) accounts for more than half of AWS's CPU capacity for the third year in a row.
    • Amazon Bedrock AgentCore is a composable framework that simplifies the deployment of production-caliber agents. Amazon's Strand Agent SDK is a solution for developers who want more control and responsibility for the deployment process. Amazon Bedrock AgentCore was prominently featured in the keynote, and it may very much be the year of agents on AWS. The focus is shifting towards AI agents as the next frontier for delivering real business for all organizations. These agents become even more powerful as they gain greater, secure access to organizations' proprietary data.
    • Strands was also briefly mentioned, as well as AWS's newer spec-driven agentic coding tool, Kiro. AWS's future focus is on Kiro, and Amazon Q Developer will eventually be discontinued. Several product and hardware improvements were announced, along with Amazon's continued investment and improvement in its own Nova Foundation Models.
  • Evolve Amazon DynamoDB Data Models with No Application Impact (DAT449-R1)
    • This was an excellent "Chalk Talk" session presented by AWS Hero and author of The DynamoDB Book, Alex DeBrie. He highlighted strategies for updating DynamoDB schemas to support new access patterns while avoiding application downtime.
    • Alex categorized DynamoDB schema changes as one of three categories: 1. Changes that don't impact data access, 2. New indexes on existing attributes, and 3. Evolutions with changes to existing data. Alex worked through realistic scenarios for each change category, which became increasingly complex, and shared different approaches and tools that could be leveraged to help roll out these changes. DynamoDB 's recently announced support for multiple-attribute composite keys in secondary indexes can help simplify some of this work.
    • Alex provided a good reminder (which I tend to forget) that when you add a new Global Secondary Index (GSI), data cannot be accessed with the new index until all items have been processed in the background. One last excellent recommendation from Alex was to consider using ZOD to define and validate DynamoDB schemas when reading from or writing to DynamoDB.
  • Build Modern Applications with Aurora DSQL (DEV308)
    • This session compared Aurora DSQL to DynamoDB and other serverless relational options and explained the engineering design changes that help make DSQL highly available and performant. By using disaggregated building blocks, separating data storage and compute, leveraging atomic clocks for coordination, and implementing the entire service in Rust, a significant amount of engineering work went into making this service scalable and cost-effective. This is a great session to watch if you want a nice technical introduction to DSQL. It was mentioned a few times in this session that it's an AWS best practice to have a separate database per microservice, which aligns with well-architected guidance for other database technologies, too.

Wednesday (12/3)

  • The Future of Agentic AI is Here Keynote — Swami Sivasubramanian
    • I had a conflicting session at the time of this keynote, so I watched it in my room afterwards. This was another session that emphasized the transformative power of agentic AI. Amazon provides tools like AWS AgentCode that make building agentic software easier by removing undifferentiated heavy lifting. Swami highlighted how Amazon Bedrock Reinforcement Fine-Tuning makes it easier for even smaller organizations to create finely tuned models for specific cases with improved accuracy and cost. Swami also highlighted improvements to Amazon SageMaker AI that make it easier to run reinforcement learning and model distillation tasks.
  • Architecting Reliable Deployments — Amazon.com’s Blue/Green Patterns (ARC336-R1)
    • This was another "Chalk Talk" session that dove into Amazon's own deployment practices for an internal tool called Weblab, which manages feature flags and internal A/B testing on Amazon.com. This service was created early in Amazon's existence as a classic 3-tier architecture service. However, as Amazon continued to grow, the service began experiencing scaling and disruption issues, ultimately leading to a Weblab failure on June 9, 2017, that took down the online store! The presenters shared how they redesigned this service to be more performant and reliable while also simplifying its architecture.
    • The Weblab team adopted some commonly recommended AWS best practices, such as designing systems to perform continuous work to avoid unexpected load during high-demand periods or when systems like caching may become invalidated. The team also shared a simple yet robust architecture for making configuration changes available to internal clients using the S3 SDK. Weblab uses Route53 entries to route each internal tenant to an S3 Access point, which could then be used to download the latest configuration. This allows the team to ensure that no single bucket comes close to exceeding its read request limits while maintaining a simple, elegant design. Another important design principle of these improved services was to ensure that the data plane could continue to function during a momentary outage of the control plane. Amazon conducts regular quarterly and annual design reviews of its systems, an exercise I'd like to start doing more of at work.
  • Deep Dive on AWS Lambda Durable Functions (CNS380)
    • AWS Lambda Durable Functions were announced during the opening keynote, and I was able to grab a seat for this session once it opened up after the launch. Durable Functions introduce a new programming model that enables Lambdas to maintain state across long-running workflows and resume execution later. This brings orchestration capabilities directly into the Lambda runtime without requiring Step Functions.
    • AWS already provides strong SAM support, CloudWatch integration, and SDK libraries to streamline creating and deploying Durable Functions. From a usability standpoint, this model feels far more natural for developers who prefer imperative code instead of building JSON-based Step Functions definitions.
    • Going into the conference, I was personally hoping this feature might be similar to Cloudflare’s Durable Objects in terms of per-object state storage, but that’s not the direction of the current release. Durable Functions are explicitly workflow-oriented, not object-oriented, and they’re designed to help teams simplify long-running orchestration logic. They do not try to offer stateful “actor-like” primitives.
  • Transforming Monoliths with Cell-Based Architecture (MAM301)
    • This "Chalk Talk" provides a thorough breakdown of the cellular architecture pattern. Cellular architectures are a great pattern when resiliency and up-time are important. By isolating tenants into separate logical deployments, pushing updates to one cell should not affect tenants in other cells. This session did a great job highlighting possible approaches for defining cells. It also helped note that while cellular architectures offer benefits, they also have trade-offs. These types of architectures are complex and expensive to run at scale. It's critical to help the business make informed decisions before considering this architecture. AWS provides some great initial guidance for cell architecture design in their Guidance for Cell-Based Architecture on AWS Solutions Library.

Thursday (12/4)

  • Vendor meetings
  • Community Builders Networking
    • I spent the rest of my Thursday morning networking with other AWS Community Builders. It was great to learn what others are doing. One consistent theme I observed was that the rapid adoption of AI, along with agentic workflows, helped teams deploy faster, more automatically, and more efficiently while still maintaining guardrails and reliability. Kiro appeared to be a popular agentic development tool for many builders, and I heard overwhelmingly positive feedback. I've had good success using Claude Code and Speckit for similar results, but I'll have to give Kiro a try in the future.
  • Geo-Fencing & Real-Time Geospatial Alerts with Valkey (DAT408)
    • This workshop was directly relevant to the use cases we deal with at work. We have hundreds of thousands of vehicles transmitting positional data across the United States, and historically, our siloed tenancy systems only had to manage this data on a per-tenant basis. As we move to multi-tenant deployments and customers now want to share data, processing these updates becomes more challenging. Valkey’s geospatial features are high-performance and, frankly, far more cost-effective than running similar workloads on commercial mapping products at scale. The lab convinced me that Valkey is worth serious consideration when we eventually build a multi-tenant version of our unit location service.
  • Unleash Rust's Potential on AWS (DEV307)
    • Aj Stuyvenberg and Darko Mesaros are phenomenal and engaging presenters. If you ever need inspiration for presenting a technical session, they're an excellent reference for building engaging sessions.
    • This session explains the performance and efficiency improvements achieved with Rust at Datadog and AWS. Darko argued that, with the rise of agentic development tools, Rust may be a better language option than pure Node or Python for LLMs, as Rust's extensive type safety, linting, and compilation help provide LLMs with the additional context they need to produce better results.
    • Interestingly, the Amazon Aurora DQSL service was entirely rewritten in Rust to reduce garbage-collection tail latency and improve overall performance. By switching to Rust, the DSQL team achieved over a 10x performance improvement.
    • Datadog has completely replaced Go with Rust within its AWS Lambda extension. By replacing the Go extension with Rust, Datadog effectively eliminates the cold-start overhead incurred by using the extension in Lambdas. Impressive efficiencies and improvements were observed in these teams. This is similar to findings I had in my own recent Lambda benchmarks on arm.
  • Dr. Werner Vogels' Last Keynote
    • This year ended with an emotional keynote in which Dr. Werner Vogels announced it would be his last. He noted that he’ll continue working at Amazon but wants to make space for newer, more diverse voices on the re:Invent stage. His keynotes have always been timeless, and this one was no different.
    • Werner spoke directly about AI’s impact on software development, not with pessimism but with optimism. Rather than viewing AI as a threat, he emphasized that it's a new tool and we remain responsible for the output. AI gives us the freedom to experiment more, fail faster, and push the boundaries of what’s possible. Work will change quickly, roles will shift, and the landscape will look very different in the coming years. However, the best thing we can do is stay curious and keep evolving. Werners' message resonates deeply with me and aligns with the mindset I’m trying to adopt in my own career.

TLDR: Key Takeaways From re:Invent 2025

Below is a list of personal takeaways I had after attending re:Invent in person this year:

  1. 2025 is the year of agentic workflows on AWS
    1. Nearly every keynote and technical session pointed toward AI agents becoming a first-class part of cloud applications. Bedrock AgentCore, Kiro, Strand, and multi-tenant MCP patterns all reinforce that agents will increasingly participate in development, operations, and business workflows. It's time to start learning and embracing these tools.
  2. Rust continues to prove itself as the performance language of the cloud
    1. Sessions showed huge Rust wins. For example, AWS has achieved 10× improvements after moving Aurora DSQL to Rust. Rust’s safety and predictability also make it easier for LLMs to generate better results.
  3. Arm (Graviton) is now the default choice for cost and performance
    1. Graviton5 reinforced what many of us already know: Arm is not only viable, but superior for most workloads. Arm now powers more than half of AWS’s compute capacity, and nearly every engineering team presenting showcased arm-based improvements.
  4. ABAC is becoming the preferred access-control strategy for multi-tenant SaaS
    1. Multiple sessions I attended highlighted ABAC as the most flexible and cost-efficient way to isolate tenants at scale. It enables strong logical separation without forcing unnecessary infrastructure duplication.
  5. AI is reshaping how teams are structured, deploy, operate, and troubleshoot systems
    1. This theme came through repeatedly in Werner Vogels’ keynote. Smaller teams that embrace agents for deployment analysis, failure handling, code generation, and reliability reviews will ship faster and more safely. It's time to start envisioning how teams are structured organizationally to remove barriers and to accelerate adoption.
  6. Durable Functions unlock a new class of workflow patterns in Lambda
    1. Durable Functions make AWS Lambda a viable solution for workflow orchestration if there is a reason Step Functions aren't a compelling option.
  7. Valkey is a serious contender for real-time geospatial workloads
    1. Its speed and cost profile make it compelling for large-scale positional data, especially as customers move from siloed to multi-tenant architectures.
  8. Learn from Amazon’s internal engineering patterns
    1. Sessions like Weblab’s blue/green architecture and Amazon.com’s resilience reviews reinforce just how rigorously Amazon designs for failure. If you want to improve your architecture, start studying the AWS Builder’s Library. You can learn from their mistakes and process improvements at an unimaginable scale.
  9. Multi-tenant GenAI has new architectural demands
    1. RAG and MCP sessions highlighted new isolation patterns, routing strategies, and onboarding workflows that differ from traditional SaaS architectures. AI workloads require new mental models for state, security, and cost.
  10. My own work aligns well with where AWS is headed
    1. I am striving to embrace the new agentic world as I see it as being the future of software development, which was a repeating theme of the conference.
  11. Reevaluate my Lambda benchmarking techniques
    1. I received some great suggestions from other builders on how to benchmark Lambdas, including using CloudWatch Insights and using Lambda tenant isolation to test more cold lambda invocations that were previously possible. These are all great suggestions I can incorporate in the future.

Summary

I had a wonderful time at re:Invent 2025. For me, the conference centered on three big themes: agents, performance, and resilient designs. AgentCore, Strand, and MCP patterns showed how AI agents will become core to development and operations. If you're not looking into Rust, ARM, or ABAC, you're doing yourself a disservice by not researching these developments. If you attended re:Invent, I'd love to read your takeaways too.