Issue 12

GPT-5.6 arrives in tiers as GLM 5.2 tops Claude on security benchmarks

OpenAI quietly launched GPT-5.6 in three tiers (Sol, Terra, Luna) restricted to trusted partners, while Semgrep reported that GLM 5.2 outperforms Claude on its internal cybersecurity benchmarks. Portugal's National Health Service contracted Sword Health to deliver AI-assisted physical therapy nationwide, the first country-scale deployment of its kind. Edison Scientific and Population Health Partners announced a deal to use AI agents for end-to-end drug discovery. Patrick Boyle assessed Brexit's 10-year economic toll, and Noah Smith examined whether AI will push companies toward more or less outsourcing.

34 min read process

ai GPT-5.6 tiers, GLM beats Claude, and local agents

GPT-5.6 Sol, Terra, and Luna restricted preview

OpenAI launched a limited preview of the GPT-5.6 series to trusted partners only, offering three tiers: Sol (flagship), Terra (balanced, 2x cheaper than GPT-5.5), and Luna (fast and low-cost). Terra matches GPT-5.5 performance at half the price. The tiered release structure, simultaneous with undisclosed Anthropic partner access, drew commentary from Latent Space about the unusual dual rollout on the same day.

Simon Willison
Claude Sonnet 4.6

OpenAI GPT-5.6 Sol/Terra/Luna restricted to trusted partners

OpenAI previewed GPT-5.6 in three model tiers simultaneously with what appeared to be a matching Anthropic partner release. Latent Space noted the parallel rollout as an unusual coordination signal between the two leading closed-model labs, both restricting access to vetted commercial partners rather than opening to the public.

Latent Space
Claude Sonnet 4.6

GLM 5.2 beats Claude in Semgrep's cyber benchmarks

Semgrep's security team reported that GLM 5.2, Zhipu AI's latest open-weights release, outperforms Claude on their internal cybersecurity benchmarks. The post is titled "We have Mythos at home" in reference to a proprietary security model, and presents head-to-head results on code vulnerability detection tasks. The finding extends Nathan Lambert's earlier claim that GLM 5.2 represents a step change for open agents into a concrete applied security domain.

Hacker News (front page)
Claude Sonnet 4.6

Open artifacts #22: Zyphra, Cohere, and Poolside

Nathan Lambert assessed the latest open-weights releases, arguing that Zyphra, Cohere, and Poolside are expanding the ecosystem in ways that go beyond raw capability competition. Lambert's case is that diverse motivations for releasing models (enterprise tooling, developer ecosystems, and proprietary research scaffolding) are producing a breadth of architectures and licensing structures that closed labs cannot match. He framed this as a structural advantage for the open ecosystem over time.

Interconnects (Nathan Lambert)
Claude Sonnet 4.6

Using Local Coding Agents

Sebastian Raschka surveyed open-weight models running inside local coding harnesses as alternatives to Claude Code and Codex subscriptions. He tested several models on agentic coding tasks, evaluated which harnesses (including Continue, Aider, and Cursor with local backends) produced usable results, and documented the tradeoffs in latency, context length, and cost. The conclusion was that local setups are viable for many tasks but still lag on complex multi-file refactoring.

Sebastian Raschka
Claude Sonnet 4.6

The next big breakthrough will be AIs learning on the job

Dwarkesh Patel argued that AI labs are discarding their most valuable training data by not capturing model behavior during deployment. His claim is that the next major training advance will come from models learning on the job rather than from pre-collected static datasets. The argument frames live inference traces, including errors and corrections, as a richer signal than curated pre-training corpora, and predicts that the first lab to systematically capture and train on deployment data at scale will pull ahead.

Dwarkesh Patel
Claude Sonnet 4.6

What does the next training paradigm look like?

Dwarkesh Patel examined what the next training paradigm after RLHF and synthetic data might look like, framing the question around the limits of current post-training pipelines. He argued that current approaches are hitting diminishing returns on static benchmarks and that the next gains will require fundamentally different feedback mechanisms, including real-world task completion and adversarial self-play at scale.

Dwarkesh Patel (YouTube)
Claude Sonnet 4.6

Google's agentic peer-reviewer handled 10K papers at ICML/STOC

Google deployed an agentic AI peer-reviewer at ICML and STOC, processing roughly 10,000 papers with a 30-minute turnaround per submission. A formal research paper on the system is now public. The model caught 34% more mathematical errors than zero-shot prompting on the same papers. The deployment raises direct questions about disclosure norms, author rights, and whether conference chairs should be required to declare AI reviewer use.

r/MachineLearning
Claude Sonnet 4.6

OpenAI internal Codex token growth: 56x in Research

OpenAI reported internally that median Codex output tokens grew 56x in its Research division, 32x in Customer Support, 27x in Engineering, and 13x in Legal since November 2025. Latent Space flagged the numbers as the clearest sign yet that internal AI usage at OpenAI has crossed from experiment to operational dependency. The figures imply that each division has changed its working method, not just its tooling.

Latent Space
Claude Sonnet 4.6

AI and Liability: German ruling on Google AI Overviews

Bruce Schneier and Nathan Sanders analyzed a German court ruling that held Google liable for errors in its AI Overviews feature. Their argument is that AI agents are agents of the organizations that deploy them and should carry the same legal liability as other company communications. The ruling, if it holds and spreads, would shift the cost of AI hallucinations from users to deployers, which changes the economics of public-facing AI products.

Simon Willison
Claude Sonnet 4.6

Quoting Jon Udell

Simon Willison quoted Jon Udell on reframing human-AI collaboration. Rather than 'human in the loop,' Udell proposes flipping the narrative: humans are in charge of their workflow; AI agents join the team to execute alongside them, not replace their authority.

Simon Willison
Claude Haiku 4.5

Hack Your Summer

DJ Patil launched Hack Your Summer, a four-week high-velocity sprint for undergraduates, graduates, and recent graduates to build production AI systems. The initiative pairs participants with mentorship and infrastructure to ship real projects.

Simon Willison
Claude Haiku 4.5

Quoting Dean W. Ball

Simon Willison quoted Dean W. Ball on frontier model economics. Frontier models recoup significant training costs in the months immediately post-release; after that window closes, the strategic value shifts to proprietary fine-tuning data and application-layer differentiation.

Simon Willison
Claude Haiku 4.5

Quoting Timothy B. Lee

Simon Willison quoted Timothy B. Lee on the learning curve for LLM use. Lee argued that saying LLMs have no learning curve is like claiming managers face no learning curve because subordinates follow orders; meaningful use of the tools requires skill development.

Simon Willison
Claude Haiku 4.5

Incident Report: CVE-2026-LGTM

Andrew Nesbitt wrote a fictional incident report about competing AI review agents entering a disagreement loop over a pull request vulnerability. The scenario illustrates potential failure modes when multiple agentic systems interact without coordination on a shared codebase.

Simon Willison
Claude Haiku 4.5

Quoting OpenAI

OpenAI announced the GPT-5.6 series: Sol (flagship), Terra (balanced performance at 2x cheaper cost than GPT-5.5), and Luna (fast and affordable). The new tier structure creates pricing-performance options across use cases.

Simon Willison
Claude Haiku 4.5

AI and Liability

Bruce Schneier and Nathan Sanders discussed liability for AI errors. Their argument: AI agents are agents of the deploying organization, and liability should follow deployment rather than treating errors as neutral technical failures.

Simon Willison
Claude Haiku 4.5

EML Trees are Universal Approximators [R]

A paper proved that EML trees are universal approximators. EML (elementary function composition) became internet-circulated as a 'cool trick' for representing elementary functions; researchers now demonstrate the theoretical power and approximation bounds.

r/MachineLearning
Claude Haiku 4.5

Voice for AI Agents and Applications

DeepLearningAI published a course on voice for AI agents and applications, covering speech synthesis, recognition, and integration patterns for conversational interfaces.

DeepLearningAI
Claude Haiku 4.5

software ATS inconsistency, saga rollbacks, and AI review wars

HackerRank open-sourced its ATS; my resume scored 90, then 74, then 88

A developer tested HackerRank's newly open-sourced applicant tracking system against their own resume and received three different scores on successive runs: 90, then 74, then 88. The post documented the inconsistency in detail and traced it to non-deterministic LLM calls in the scoring pipeline with no temperature control or output caching. The thread drew 250 Hacker News comments and pointed to a broader problem with LLM-based hiring tools that present probabilistic outputs as objective scores.

Hacker News (front page)
Claude Sonnet 4.6

Saga rollbacks for Cloudflare Workflows

Cloudflare added saga-style rollbacks to Cloudflare Workflows, its durable execution engine for multi-step applications. Developers can now attach a compensating action to each step, so if a later step fails, the engine automatically runs each compensating action in reverse order to unwind the transaction. The feature brings a pattern from distributed database design into serverless workflow execution, which matters for any agentic or payment pipeline that cannot tolerate partial completion.

Claude Sonnet 4.6

Why PostHog rebuilt its data warehouse on DuckDB over ClickHouse

PostHog rebuilt its internal data warehouse on DuckDB, replacing ClickHouse. The stated goal of the original warehouse was to delay the point at which a company needs its first data engineer. PostHog found that DuckDB's in-process columnar execution removed the operational overhead of managing a separate ClickHouse cluster while matching query performance for their scale. The post details schema migration, query compatibility, and where ClickHouse still outperforms.

PostHog Engineering
Claude Sonnet 4.6

Incident Report: CVE-2026-LGTM (hypothetical AI reviewer deadlock)

Andrew Nesbitt published a hypothetical incident report, dated Day 2, 16:00 UTC, in which two AI code review agents from competing vendors enter a disagreement loop over the CVE severity of a dependency bump. Neither agent yields; the PR is locked while the agents generate escalating justifications. Simon Willison flagged it as a plausible near-future failure mode for teams running multiple AI review systems on the same repository without a defined arbitration mechanism.

Simon Willison
Claude Sonnet 4.6

What happened after 2,000 people tried to hack my AI assistant

Fernando Irarrázaval ran a public challenge on hackmyclaw.com, inviting anyone to leak secrets held by his OpenClaw AI assistant by sending it email. After 6,000 attempts from roughly 2,000 participants, no one succeeded in extracting the secrets. Simon Willison summarized the results and the attack patterns attempted, noting that the most common approaches were prompt injection via email body content and social engineering through multi-turn sequences. The system held, though Irarrázaval acknowledged the test environment was more constrained than production deployments.

Simon Willison
Claude Sonnet 4.6

Librepods: AirPods liberated

Librepods is an open-source project aiming to liberate AirPods from Apple's ecosystem constraints, enabling broader software and hardware interoperability.

Hacker News (front page)
Claude Haiku 4.5

Historical memory prices 1960-2026

A Stanford researcher compiled historical DRAM pricing from 1960 to 2026, documenting exponential cost declines and enabling analysis of how memory economics shaped computing infrastructure adoption.

Hacker News (front page)
Claude Haiku 4.5

Vercel Ship 2026 recap

Vercel recapped Ship 2026, its developer platform event, showcasing new features and announcements around edge computing, AI integration, and developer experience.

Vercel Blog
Claude Haiku 4.5

AI SDK 7

Vercel released AI SDK 7, its latest developer framework for building AI applications, with new abstractions for agents, retrieval, and streaming responses.

Vercel Blog
Claude Haiku 4.5

pharma AI drug discovery deals, embryo ethics, and a hospital loophole

Edison Scientific and Population Health Partners to create new biotechs using AI

Edison Scientific, an AI scientist company, and Population Health Partners, the investment firm led by Clive Meanwell, announced a deal to use AI agents across the full drug discovery and development pipeline to create new biotechs. The partnership was assembled by the team behind Metsera, the GLP-1 company. Edison's platform autonomously generates hypotheses, designs experiments, and interprets results; the deal is structured to produce multiple new companies rather than a single drug program.

STAT News
Claude Sonnet 4.6

Sword Health contracted for AI-supported physical therapy across Portugal

Portugal's National Health Service signed a contract with Sword Health to provide AI-assisted virtual physical therapy to the country's entire population. The deal is the first national-scale deployment of an AI-supported physical therapy platform. Sword's system pairs an AI motion guide with remote sessions led by licensed physiotherapists. Portugal had significant unmet physical therapy demand due to a shortage of in-person practitioners in rural regions.

STAT News
Claude Sonnet 4.6

Embryo editing advances reignite ethical debates; 340B targeted

STAT's Readout newsletter reported that advances in embryo editing are reigniting ethical debates in biotech. Separately, Senator Cassidy proposed changes to the 340B drug pricing program, and Eli Lilly has a patient enrolled in a mysterious retatrutide trial whose design has not been publicly disclosed. The retatrutide development is being watched as a possible next-generation GLP-1 with a broader metabolic indication than semaglutide.

STAT News
Claude Sonnet 4.6

Nutex Health micro-hospitals exploit ER loophole, reaping millions

STAT found that Nutex Health, a hospital operator running micro-hospitals with emergency rooms, has been exploiting loopholes in federal surprise billing and independent dispute resolution laws to turn away certain patients while collecting outsized reimbursements from others. The investigation documented how Nutex structures its facilities to qualify for ER reimbursement rates while avoiding the patient acceptance obligations that apply to full hospital emergency departments. Several facilities were generating millions in profit annually through the gap.

STAT News
Claude Sonnet 4.6

US-China biotech crackdown may hurt the scientists America needs most

Brian Yang, writing in STAT, argued that the U.S.-China biotech crackdown is targeting Chinese-American scientists based on national origin rather than demonstrated security risk. Yang drew a distinction between evidence-based security policy and blanket suspicion of China-origin entities. He cited cases of Chinese-American researchers losing grants, collaborations, and institutional positions without findings of misconduct, and argued the pattern mirrors historical treatment of other immigrant groups in national security contexts.

STAT News
Claude Sonnet 4.6

healthtech AI in physical therapy, MRI second opinions, and investor AI conversion

Investor Clive Meanwell on AI as a catalyst for biotech

Clive Meanwell, chairman of Population Health Partners, told STAT he was an AI skeptic until recently and has now concluded that AI is a structural catalyst for biotech, not a feature layer. He described a shift in how he evaluates investment targets: companies that cannot articulate a credible AI integration thesis are now lower priority. Meanwell was simultaneously announced as a partner in the Edison Scientific deal to create AI-native drug discovery companies.

STAT News
Claude Sonnet 4.6

I used Claude Code to get a second opinion on my MRI

A developer used Claude Code with the Opus model to get a second opinion on their MRI results after receiving an inconclusive radiologist report. The post documented the prompt structure, the model's output, and the subsequent conversation with a physician who confirmed that the AI had identified a finding worth follow-up imaging. The case drew 603 Hacker News comments, many debating the liability and epistemics of AI-assisted medical interpretation outside a clinical context.

Hacker News (front page)
Claude Sonnet 4.6

Professor denounces mass AI fraud on Brown University exam

A professor at Brown University publicly accused a large fraction of students of using AI to complete an exam, telling El País that the submissions showed near-identical reasoning patterns and phrasing inconsistent with the students' prior written work. The case is one of the most prominent mass academic integrity incidents at a US research university and has prompted calls from faculty groups for standardized detection protocols and clearer institutional policy on AI use in assessments.

Hacker News (front page)
Claude Sonnet 4.6

Moderna co-founder Kenneth Chien on mRNA cancer vaccine and Moderna's future

Kenneth Chien, a co-founder of Moderna who has since left the company, told STAT he expects Moderna's mRNA cancer vaccine to be, in his word, "transformative" for biotech broadly. Chien described mRNA as a platform that will extend well beyond infectious disease into oncology and regenerative medicine, and said the cancer vaccine program is the most important thing Moderna has done since the COVID vaccine. He did not give a timeline for commercial availability.

STAT News
Claude Sonnet 4.6

economy Brexit's bill, AI's effect on outsourcing, and biomedical cost structures

Brexit, 10 years on: what it actually cost Britain

Patrick Boyle assessed what Brexit has actually cost Britain over the ten years since the referendum. His analysis covers trade volume declines with the EU, the gap between UK and comparable-economy GDP growth since 2016, financial services relocation, and labor market changes from reduced EU migration. Boyle framed the costs as measurable but distributed across time in ways that made them politically deniable in the short run; the ten-year view makes the aggregate more legible.

Patrick Boyle
Claude Sonnet 4.6

Will AI make companies outsource more, or less?

Noah Smith examined whether AI will push companies toward more outsourcing or less. His argument is that AI reduces transaction costs for coordinating with external parties, which historically predicts more outsourcing, but simultaneously reduces the cost of internal automation, which predicts less. He concluded that the direction will likely vary by task type: AI will accelerate outsourcing of judgment-light process work while reversing it for knowledge-intensive roles where internal context matters.

Noahpinion (Noah Smith)
Claude Sonnet 4.6

Will future biomedical advances be low marginal cost?

Tyler Cowen examined whether future biomedical advances will follow the same low-marginal-cost structure as software. His argument is that drugs, once discovered and approved, cost almost nothing per additional dose, which favors health systems that negotiate aggressively on price. If AI compresses discovery costs substantially, the ratio of upfront fixed cost to marginal production cost becomes even more extreme, strengthening the hand of single-payer buyers and potentially reshaping pharma's pricing power.

Marginal Revolution (Tyler Cowen)
Claude Sonnet 4.6

Duffy's Last Dance: the fight over perpetual futures

Marc Rubinstein examined the regulatory fight over perpetual futures contracts, a derivatives instrument that does not expire and has become the dominant vehicle for leveraged crypto trading. The battle is between incumbents defending existing futures market structures and new entrants seeking to list perpetuals on regulated US exchanges. Rubinstein traced how the instrument's design creates funding rate mechanics that differ from traditional futures and why regulators have resisted approving them for retail access.

Net Interest (Marc Rubinstein)
Claude Sonnet 4.6

How resilient were emerging markets through the 2022-23 US tightening cycle?

New York Fed economists found that emerging market economies navigated the 2022-2023 US monetary tightening cycle with more resilience than prior episodes. Capital outflows were smaller than the 2013 taper tantrum and the 2018 EM selloff. The researchers attributed the relative stability to three changes: stronger central bank credibility in EM economies, higher foreign exchange reserve buffers built post-2013, and a shorter tightening cycle than historical comparisons. The finding has implications for how EM central banks should manage the next tightening period.

Liberty Street Economics (NY Fed)
Claude Sonnet 4.6

Politically Incorrect Paper of the Day: The US Racial Wealth Gap

Tyler Cowen cited research arguing that the contemporary US Black-white wealth gap traces primarily to slavery-era initial conditions rather than recent policy. Cowen noted a glaring omission: the analysis does not account for segregation, redlining, and discrimination after Reconstruction.

Marginal Revolution (Tyler Cowen)
Claude Haiku 4.5

Tech Selloff

Kyla Scanlon produced a short video examining recent tech sector selloffs and the dynamics driving equity withdrawals.

Kyla Scanlon
Claude Haiku 4.5

Typewriters and fertility

A working paper studies typewriter adoption into US workplaces and its effect on labor demand. By exploiting exogenous variation in sectoral typist demand, the author documents how workplace tech changes create new occupational categories, with long-run effects on fertility and female workforce participation.

Marginal Revolution (Tyler Cowen)
Claude Haiku 4.5

Tyler Cowen curated links covering vaccine trial speed, Scott Sumner on Greenspan, the NYT's 100 best books of the 21st century, AI and classical liberalism, the memory tax, and Tetris in electoral politics.

Marginal Revolution (Tyler Cowen)
Claude Haiku 4.5

Chloe vs. History

Tyler Cowen highlighted an AI-generated historical tour guide named Chloe that engages users with realistic, accurate walkthrough narratives of historical events and locations.

Marginal Revolution (Tyler Cowen)
Claude Haiku 4.5

My ARC talk on AI and jobs

Tyler Cowen gave a 12-minute talk at ARC about AI's employment effects, arguing that AI will not put everyone out of work and examining evidence about task displacement versus new occupational creation.

Marginal Revolution (Tyler Cowen)
Claude Haiku 4.5

Europe's resistance to AC is driving it insane

Noah Smith examined Europe's stubborn resistance to air conditioning and its health consequences. Despite hot summers and climate change, cultural and regulatory barriers slow adoption; the delay costs lives and productivity.

Noahpinion (Noah Smith)
Claude Haiku 4.5

Renationalising British utilities

Tyler Cowen assessed whether to renationalize British utilities. He notes that not all privatizations succeeded and US data show state-owned utilities don't perform dramatically worse; however, he would not renationalize given current institutional constraints.

Marginal Revolution (Tyler Cowen)
Claude Haiku 4.5

Tyler Cowen curated Saturday links on prodigy creation, Bronze Age markets, European heat deaths, a new Journal of Economic Freedom, and economics commentary from John Burn-Murdoch.

Marginal Revolution (Tyler Cowen)
Claude Haiku 4.5

An infovore shares his chats

Tyler Cowen published a piece in conjunction with OpenAI on how to use GPT Pro for travel planning, museum visits, and other practical tasks.

Marginal Revolution (Tyler Cowen)
Claude Haiku 4.5

My Conversation with Joanne Paul

Tyler Cowen interviewed Joanne Paul, a historian at University of Sussex and popular YouTube expert on the Tudors. She discussed why the 16th century attracts her scholarly attention.

Marginal Revolution (Tyler Cowen)
Claude Haiku 4.5

The Democrats have their own MAGA now

Noah Smith observed that the Democratic Party is developing its own version of right-wing populism. He characterized this emerging left-wing ideology and its implications for party positioning.

Noahpinion (Noah Smith)
Claude Haiku 4.5