Source linked

ChaplinのSQLエージェントがRAGの768の行方不明の健康イベントをキャッチ

RAG は 190 件の終末イベントを数え、実際の数は 958 件でした。AWS Chaplin は、Deterministic SQL クエリと Agentic AI を組み合わせて、Ops チームに正確な数値と文脈分析を提供します。

awsamazon bedrockmodel context protocolstrands agentsaws healthoperational analytics

RAG-based analysis reported 190 AWS Health events related to end-of-life when the actual count was 958. That 768-event hallucination is exactly what happens when you ask a probabilistic system to sum structured metadata.

The source blog from AWS engineers Aurelio DeSimone, Chitresh Saxena, and Mike Dennis spells out the problem clearly. Teams across 50+ accounts get bombarded with Amazon Linux 2 end-of-life notices, RDS version deprecations, EC2 instance retirements, and more. Without self-service analytics, they wait on TAMs to interpret each event. Dashboards with fixed schemas can't handle ad hoc questions like "What's the monthly spend at risk from RDS deprecation in Tier-1 production accounts?"

Enter Chaplin (Customer Health and Planned Lifecycle Intelligence Nexus), an open-source solution that sidesteps the RAG problem entirely.

Pattern-First, Then AI: A Cost-Optimized Architecture

Chaplin processes health events through three layers. First, a rule-based classifier uses regex patterns to map events into five categories: Migration Requirements, Security & Compliance, Maintenance & Updates, Cost Impact Events, and Operational Notifications. That step covers the majority of events without incurring AI inference costs.

The second layer handles structured queries: a Natural Language to SQL Agent converts plain English into precise DynamoDB queries. When you ask "Show me open EC2 retirement events in production accounts," it produces exact filters on event_type and affected_accounts, not fuzzy vector matches. No hallucinated counts.

The third layer uses Amazon Bedrock with Claude 4.5 Sonnet (but supports OpenAI, Anthropic, or local models) to interpret unstructured event descriptions against your business context: environment tags, business units, ownership. This agent doesn't count; it reasons about impact.

Real Numbers From the Walkthrough

The blog post includes a complete walkthrough using Kiro CLI as the MCP client. Some highlights:

  • ElastiCache UPDATE_AVAILABLE events: 2,145 across 140 accounts, all past due months ago. That's $717K/mo at risk from an auto-patching gap.
  • VPN REDUNDANCY_LOSS: 647 events across 14 accounts, with one untagged account alone having 327 events.
  • EC2 retirements and scheduled stops: 351 events affecting 62 accounts.
  • RDS PostgreSQL deprecation impact: 6 Tier-1 accounts past due, $304K/mo at risk.

The AI agent doesn't stop at counts. It provides a remediation plan: enable auto-minor-version-upgrade on ElastiCache, fix single-tunnel VPN architectures, migrate EC2 to ECS Fargate, tag untagged accounts.

MCP Makes It Composable

Chaplin exposes its capabilities as Model Context Protocol tools. That means any MCP-compatible client (Claude Code, Kiro CLI, Cursor, VS Code) can call summary tools, detail tools, and AI analysis tools in the same session alongside JIRA, GitHub, or ServiceNow. No custom front end. The architecture uses AWS IAM for auth, TLS 1.2+ in transit, AES-256 at rest, and CloudTrail logging.

Deployment is fully scripted. Option A runs the MCP server locally using your AWS credentials. Option B deploys it as a Lambda function for team-wide access without local dependencies.

The Next Step: Autonomous Operations

The blog teases integration with AWS DevOps Agent, which can consume Chaplin's health event intelligence during incident investigations. When an incident fires, DevOps Agent queries Chaplin for related health events, correlates them with application topology and telemetry, and produces a prioritized mitigation plan. That shifts ops from reactive triage to proactive event-driven orchestration.

Chaplin is on GitHub now. Clone it, deploy the data pipeline, and run your first query before another 768 events slip through the RAG gap.


Source: Build self-service AWS Health analytics to find actionable health insights with AI agents powered by Amazon Bedrock
Domain: aws.amazon.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.