ai
  • Crypto News
  • Ai
  • eSports
  • Bitcoin
  • Ethereum
  • Blockchain
Home»Ai»Anthropic Launches Claude Sonnet 4.5 with New Coding and Agentic State-of-the-Art Results
Ai

Anthropic Launches Claude Sonnet 4.5 with New Coding and Agentic State-of-the-Art Results

Share
Facebook Twitter LinkedIn Pinterest Email

Anthropic released Claude Sonnet 4.5 and sets a new benchmark for end-to-end software engineering and real-world computer use. The update also ships concrete product surface changes (Claude Code checkpoints, a native VS Code extension, API memory/context tools) and an Agent SDK that exposes the same scaffolding Anthropic uses internally. Pricing remains unchanged from Sonnet 4 ($3 input / $15 output per million tokens).

What’s actually new?

  • SWE-bench Verified record. Anthropic reports 77.2% accuracy on the 500-problem SWE-bench Verified dataset using a simple two-tool scaffold (bash + file edit), averaged over 10 runs, no test-time compute, 200K “thinking” budget. A 1M-context setting reaches 78.2%, and a higher-compute setting with parallel sampling and rejection raises this to 82.0%.
  • Computer-use SOTA. On OSWorld-Verified, Sonnet 4.5 leads at 61.4%, up from Sonnet 4’s 42.2%, reflecting stronger tool control and UI manipulation for browser/desktop tasks.
  • Long-horizon autonomy. The team observed >30 hours of uninterrupted focus on multi-step coding tasks — a practical jump over earlier limits and directly relevant to agent reliability.
  • Reasoning/math. The release notes “substantial gains” across common reasoning and math evals; exact per-bench numbers (e.g., AIME config). Safety posture is ASL-3 with strengthened defenses against prompt-injection.
https://www.anthropic.com/news/claude-sonnet-4-5

What’s there for agents?

Sonnet 4.5 targets the brittle parts of real agents: extended planning, memory, and reliable tool orchestration. Anthropic’s Claude Agent SDK exposes their production patterns (memory management for long-running tasks, permissioning, sub-agent coordination) rather than just a bare LLM endpoint. That means teams can reproduce the same scaffolding used by Claude Code (now with checkpoints, a refreshed terminal, and VS Code integration) to keep multi-hour jobs coherent and reversible.

On measured tasks that simulate “using a computer,” the 19-point jump on OSWorld-Verified is notable; it tracks with the model’s ability to navigate, fill spreadsheets, and complete web flows in Anthropic’s browser demo. For enterprises experimenting with agentic RPA-style work, higher OSWorld scores usually correlate with lower intervention rates during execution.

Where you can run it?

  • Anthropic API & apps. Model ID claude-sonnet-4-5; price parity with Sonnet 4. File creation and code execution are now available directly in Claude apps for paid tiers.
  • AWS Bedrock. Available via Bedrock with integration paths to AgentCore; AWS highlights long-horizon agent sessions, memory/context features, and operational controls (observability, session isolation).
  • Google Cloud Vertex AI. GA on Vertex AI with support for multi-agent orchestration via ADK/Agent Engine, provisioned throughput, 1M-token analysis jobs, and prompt caching.
  • GitHub Copilot. Public preview rollout across Copilot Chat (VS Code, web, mobile) and Copilot CLI; organizations can enable via policy, and BYO key is supported in VS Code.

Summary

With a documented 77.2% SWE-bench Verified score under transparent constraints, a 61.4% OSWorld-Verified computer-use lead, and practical updates (checkpoints, SDK, Copilot/Bedrock/Vertex availability), Claude Sonnet 4.5 is developed for long-running, tool-heavy agent workloads rather than short demo prompts. Independent replication will determine how durable the “best for coding” claim is, but the design targets (autonomy, scaffolding, and computer control) are aligned with real production pain points today.

Introducing Claude Sonnet 4.5—the best coding model in the world.

It’s the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains on tests of reasoning and math. pic.twitter.com/7LwV9WPNAv

— Claude (@claudeai) September 29, 2025


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

🔥[Recommended Read] NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

The Download: AI to detect child abuse images, and what to expect from our 2025 Climate Tech Companies to Watch list

septembre 29, 2025

Coming soon: Our 2025 list of Climate Tech Companies to Watch

septembre 29, 2025

How to Design an Interactive Dash and Plotly Dashboard with Callback Mechanisms for Local and Online Deployment?

septembre 29, 2025

This AI Research Proposes an AI Agent Immune System for Adaptive Cybersecurity: 3.4× Faster Containment with

septembre 28, 2025
Add A Comment

Comments are closed.

Top Posts

SwissCryptoDaily.ch delivers the latest cryptocurrency news, market insights, and expert analysis. Stay informed with daily updates from the world of blockchain and digital assets.

We're social. Connect with us:

Facebook X (Twitter) Instagram Pinterest YouTube
Top Insights

BitMine Grows Holdings, Bit Digital Eyes $100 Million Raise

septembre 30, 2025

Sci-Fi Turn-Based Strategy Game Menace Enters Early

septembre 30, 2025

Swift To Develop Blockchain Payments Solution With Consensys

septembre 30, 2025
Get Informed

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

Facebook X (Twitter) Instagram Pinterest
  • About us
  • Get In Touch
  • Cookies Policy
  • Privacy-Policy
  • Terms and Conditions
© 2025 Swisscryptodaily.ch.

Type above and press Enter to search. Press Esc to cancel.