Issue 06

DeepSeek tops GPT-5.5 as npm backdoors hit Claude Code and obesity drug race heats up at ADA

DeepSeek V4 Pro beat GPT-5.5 Pro on precision benchmarks this week, marking the latest round in the US-China model rivalry. A malware campaign hit 32 npm packages with roughly 117,000 weekly downloads, planting backdoors inside Claude Code startup settings. Pfizer's monthly obesity drug berobenatide continued to show promise in mid-stage data while Boehringer Ingelheim's survodutide disappointed on overall weight loss. Patrick Boyle examined what happens when a housing boom turns to bust, and Marc Rubinstein at Net Interest looked at the IPO readiness of SpaceX, Anthropic, and OpenAI.

33 min read process

ai DeepSeek tops GPT-5.5, npm backdoors, and sandbox progress

DeepSeek V4 Pro beats GPT-5.5 Pro on precision

DeepSeek V4 Pro outperformed GPT-5.5 Pro on precision benchmarks, according to testing reported by Runtime Wire. The result landed on Hacker News with 311 points and 156 comments, making it one of the most-discussed items in the current window. The finding adds to a pattern of Chinese labs releasing models that match or exceed OpenAI's flagship offerings on specific evaluation criteria.

Hacker News (front page)
Claude Sonnet 4.6

Active attack planting backdoors inside Claude Code via npm packages

A malware campaign hit 32 npm packages under the @redhat-cloud-services namespace, covering about 117,000 weekly downloads. If a developer installed an affected version, the malware inserted itself into Claude Code startup settings and VS Code project configuration files. The attack gave the malware persistent execution on every subsequent Claude Code session, potentially exposing credentials and project secrets without further interaction from the developer.

r/ClaudeAI
Claude Sonnet 4.6

micropython-wasm 0.1a2

Simon Willison released micropython-wasm 0.1a2, adding a CLI to the package and documenting it as a way for readers to try the sandbox themselves. The release is part of his ongoing work to run agent-generated code in a MicroPython WebAssembly sandbox that blocks access to the host filesystem and network. The CLI addition makes the sandbox easier to test outside of Datasette, widening its potential use in AI coding workflows.

Simon Willison
Claude Sonnet 4.6

Running Python code in a sandbox with MicroPython and WASM

Simon Willison released micropython-wasm 0.1a2 alongside a companion post explaining his approach to sandboxed Python execution for AI agents. After several years of experimenting with isolation strategies, he describes the MicroPython-in-WASM approach as the first that satisfies all the properties he was looking for: no host filesystem access, no network access, and no ability for agent-generated code to escape the sandbox. The alpha package is now available on PyPI.

Simon Willison
Claude Sonnet 4.6

OpenAI Lockdown Mode rolls out to personal and business accounts

OpenAI launched Lockdown Mode, rolling it out to Free, Go, Plus, Pro, and self-serve ChatGPT Business accounts. The feature is designed to help prevent account takeover and unauthorized access, according to the product documentation. OpenAI had teased the mode in February; the full rollout completes a cycle that began after a series of high-profile social engineering incidents against AI accounts.

Simon Willison
Claude Sonnet 4.6

Ladybird closes public pull requests as AI erodes the effort signal

Andreas Kling, creator of the Ladybird browser, announced that the project will no longer accept public pull requests. His reasoning: a substantial patch used to imply substantial effort, and that effort was a reasonable proxy for good faith. That assumption no longer holds with AI-assisted coding. Whether code was written by hand is beside the point; the signal that effort once provided is now absent, making the cost of reviewing low-quality AI-generated contributions too high relative to the benefit.

Simon Willison
Claude Sonnet 4.6

AI enthusiasts race against time; AI skeptics race against entropy

Charity Majors framed the AI divide in engineering teams as a race between two groups with opposite risk models. Enthusiasts are racing against time: they believe AI capabilities will continue to compound and that teams that fail to adopt now will fall behind permanently. Skeptics are racing against entropy: they believe AI-generated code accumulates hidden complexity that will eventually require expensive cleanup, and that the short-term gains are a debt instrument. Simon Willison quoted the framing and noted both groups are often on the same team.

Simon Willison
Claude Sonnet 4.6

LLM Research Papers: The 2026 List (January to May)

Sebastian Raschka compiled a list of notable LLM research papers published from January through May 2026. The roundup covers work on inference efficiency, long-context handling, reasoning, and alignment. Raschka's annotations identify which papers he considers most practically significant for practitioners building on top of language models, distinguishing them from papers that advance benchmark scores without clear downstream utility.

Sebastian Raschka
Claude Sonnet 4.6

Google quietly removed 'humans in the loop' from its AI oversight statement

Google asked 404 Media to publish a revised version of a statement after an initial story went live. The revision removed the phrase "it's critical that we maintain humans in the loop" from the company's position on AI oversight. Simon Willison quoted the correction as documented by Emanuel Maiberg. Google did not explain the change publicly.

Simon Willison
Claude Sonnet 4.6

DFlash speculative decoding and KV cache compression yield 3.26x speedup on RTX 5090

A benchmark post on r/LocalLLaMA tested DFlash speculative decoding combined with KV cache compression on an RTX 5090 running Qwen3.6-27B via the BeeLlama.cpp framework. The combination produced a 3.26x speedup over the baseline. The poster shared full benchmark scripts, raw data, and configuration files on request. The result is notable because speculative decoding and KV compression have historically been evaluated separately; combining them on a single consumer GPU suggests the techniques stack.

r/LocalLLaMA
Claude Sonnet 4.6

datasette-agent-edit 0.1a0

Simon Willison released datasette-agent-edit 0.1a0, a plugin enabling Datasette Agent to edit existing text collaboratively. The tool supports Markdown editing, SQL query updates, and SVG file modifications, expanding agentic capabilities beyond code generation into document workflows.

Simon Willison
Claude Haiku 4.5

Open-source runtime for agent workflows: tidebase

A developer released tidebase, an open-source runtime backend for agent workflows addressing checkpointing, retries, and run tracking. Built after repeatedly rebuilding these primitives across projects, the alpha tool targets production-scale agentic systems.

r/LLMDevs
Claude Haiku 4.5

vllm-doctor: Diagnostic CLI for inference servers

vllm-doctor is a CLI tool that diagnoses vLLM inference server issues by reading metrics and running rule-based checks. The tool detects queue pressure, latency issues, KV cache problems, and reports findings at pod-level granularity.

r/LocalLLaMA
Claude Haiku 4.5

software Cursor forks VS Code; Cloudflare buys VoidZero; Linear on speed

Cursor ditches VS Code, not everyone is happy

Cursor announced it is forking away from VS Code, building its own editor foundation rather than continuing as an extension on top of Microsoft's code editor. Fireship covered the announcement and noted the move has divided users: some see it as necessary for Cursor to ship the low-latency, AI-native editing experience the team wants, while others are concerned about losing the VS Code extension ecosystem and compatibility guarantees. Microsoft has not commented on the fork.

Fireship
Claude Sonnet 4.6

VoidZero is joining Cloudflare

Cloudflare acquired VoidZero, the team behind Vite, Vitest, Rolldown, Oxc, and Vite+. Cloudflare said the tools will remain open source and vendor-agnostic. The acquisition brings the maintainers of the most widely used JavaScript build toolchain into Cloudflare's engineering organization, giving the company significant influence over the direction of the front-end build ecosystem without immediately changing the tools' governance.

Cloudflare Blog
Claude Sonnet 4.6

Cloudflare AI Gateway adds real-time spend limits

Cloudflare added real-time spend limits to AI Gateway, allowing companies to set hard caps on token spending across multiple AI providers before bills accrue. The feature integrates with Cloudflare Access so organizations can apply identity-driven budget policies, capping spending per user or per team rather than just per account. The announcement positions AI Gateway as a cost-control layer between enterprise applications and underlying model APIs.

Cloudflare Blog
Claude Sonnet 4.6

How Linear achieves its speed: a technical breakdown

A technical breakdown of Linear's performance appeared on Hacker News with 434 points and 200 comments. The post traces the application's sub-50ms interaction latency to a combination of client-side SQLite for local state, optimistic UI updates, and a sync engine that reconciles server state without blocking the UI thread. The analysis focuses on architectural decisions made at Linear's founding that are difficult to retrofit into applications built on traditional server-round-trip models.

Hacker News (front page)
Claude Sonnet 4.6

Autonomous LLM coding only works when you have automated verification

A developer on r/ExperiencedDevs argued that autonomous LLM coding is only viable when automated verification exists. After multiple attempts at vibe-coding that produced code she could not trust, she concluded the bottleneck is not code generation but testing and verification. The post drew agreement from other experienced engineers who noted that CI systems with high test coverage make LLM autonomy practical, while codebases with sparse tests amplify the risk of accepting incorrect output without realizing it.

r/ExperiencedDevs
Claude Sonnet 4.6

Using AI as a sanity check for poor code

A developer noted that despite AI skepticism, some coworkers write code failing basic sanity checks that AI would catch. The post argues for using AI as a junior team member for baseline validation rather than dismissing it outright.

r/ExperiencedDevs
Claude Haiku 4.5

Kill switches for agents arrive too late

A developer warned that kill switches for autonomous agents are ineffective because agents can cause damage before operators react. By-the-time-you-see-it-is-too-late design implies you need prevention, not intervention.

r/LLMDevs
Claude Haiku 4.5

Landscape of memory solutions for AI workflows

A developer surveyed second-brain and memory solutions for AI-native workflows, comparing ChatGPT memory, Claude projects, GBrain, Obsidian setups, and newer agent memory systems. The landscape review identifies tools for augmenting agent context.

r/LLMDevs
Claude Haiku 4.5

pharma Obesity drug race at ADA; hepatitis B functional cure; myeloma in China

New drug functionally cures many hepatitis B infections

A new drug described as functionally curing many hepatitis B virus infections appeared on Science.org and reached Hacker News with 201 points. The drug clears the virus to undetectable levels in a substantial share of patients, a result that standard hepatitis B treatments do not achieve. Hepatitis B affects roughly 300 million people globally and has no curative standard of care; the result, if it holds in larger trials, would represent a meaningful shift in management options.

Hacker News (front page)
Claude Sonnet 4.6

Multiple myeloma may finally have a cure, discovered abroad

Multiple myeloma, a blood cancer with no historical curative standard of care in Western countries, may have a cure identified in China, according to Works in Progress. The piece traces how regulatory inertia in the US slowed adoption of a treatment approach that produced durable remissions abroad. American patients have largely been unable to access the therapy through standard channels, despite the evidence base accumulating over several years.

Works in Progress
Claude Sonnet 4.6

Pfizer's monthly obesity drug continues to show promise in mid-stage data

Pfizer's monthly obesity drug berobenatide, acquired from Metsera, continued to show promise in detailed mid-stage data presented at the American Diabetes Association meeting. The drug is designed for once-monthly dosing, which would differentiate it from weekly GLP-1 injections. STAT reported that phase 2 data supported the dosing interval and showed continued weight loss over the study period.

STAT News
Claude Sonnet 4.6

New data may cast doubt on competitiveness of Boehringer's obesity drug

New phase 3 data on Boehringer Ingelheim's obesity drug survodutide showed the compound cutting liver fat effectively but performing less impressively on overall weight loss relative to competing GLP-1 agents. STAT reported the results may cast doubt on survodutide's ability to compete directly with Novo Nordisk's semaglutide and Lilly's tirzepatide on the primary endpoint that payers and prescribers most closely watch.

STAT News
Claude Sonnet 4.6

Lilly shares safety and tolerability data on next-gen obesity drug retatrutide

Eli Lilly presented safety and tolerability data on retatrutide, its triple-hormone receptor agonist, at the ADA Scientific Sessions in New Orleans. Retatrutide targets GLP-1, GIP, and glucagon receptors simultaneously, a mechanism that has produced the highest weight loss numbers seen in any obesity drug to date. The new data addressed side effect profiles at longer treatment durations, a gap that earlier phase 2 results had left open.

STAT News
Claude Sonnet 4.6

CDC: Ebola outbreak could reach 20,000 cases without strong countermeasures

The CDC modeled the trajectory of the Ebola outbreak in Central Africa and estimated it could reach 20,000 cases or more if infected people are not isolated quickly. STAT reported that the modeling study was prepared by US analysts and circulated to public health officials. The projection is conditional on intervention speed; the study identifies isolation rate as the single variable with the most leverage over outbreak size.

STAT News
Claude Sonnet 4.6

Combination of pancreatic cancer drugs from Tango and Revolution shows high response rate

A combination of two experimental pancreatic cancer drugs from Tango Therapeutics and Revolution Medicines produced a high response rate in an early-stage trial. STAT reported the combination of vopimetostat and daraxonrasib, which target distinct cancer vulnerabilities, generated responses in patients with pancreatic cancer, a disease where response rates for most therapies remain low. The trial was early-stage; larger studies would be needed to establish durability.

Claude Sonnet 4.6

American horses are obese too

Joshua Moen notes that American horses are now obese at rates paralleling human metabolic syndrome, reflecting shared food system and lifestyle pressures. The veterinary pattern mirrors emerging public health dynamics.

STAT News
Claude Haiku 4.5

healthtech ADA conference confrontation, RFK autism data grab, and DOJ at Cleveland Clinic

Police remove physicians distributing ADA journal editorial from ADA annual conference

Police escorted five physicians out of the American Diabetes Association's Scientific Sessions in New Orleans after they distributed an editorial published in the ADA's own journal. The editorial criticized the NIH's handling of a research matter that the physicians characterized as partisan. The ADA said it called police to maintain compliance with IRS regulations governing 501(c)(3) organizations, and that the incident involved unauthorized distribution at a venue the association controlled. The confrontation was recorded on video.

r/medicine
Claude Sonnet 4.6

RFK researchers using health information exchange systems to access medical records for vaccine-autism study

RFK Jr. and other MAHA-affiliated researchers used health information exchange systems to pull patient medical records for a study on vaccines and autism, with Nebraska the first state to participate and the highest grant recipient from the CDC in the current cycle. The program is a follow-on to an earlier autism registry proposal. Critics cited in the r/medicine thread described the data access as lacking the patient consent protections typical of federally funded research.

r/medicine
Claude Sonnet 4.6

Cleveland Clinic agrees with DOJ to cease gender-affirming care for minors

Cleveland Clinic reached an agreement with the Department of Justice and the Ohio Attorney General to stop providing gender-affirming care to minors. The agreement resolves a federal investigation and bars the health system from offering the care under any framing for patients under 18. Cleveland Clinic is one of the largest academic medical centers in the country; the settlement sets a precedent for what the DOJ can extract from major hospital systems through negotiated resolution rather than litigation.

r/medicine
Claude Sonnet 4.6

$2 million gene therapies need a new financing model to reach patients

William Padula argued in STAT that the financing model for gene therapies, not the science, is the primary barrier to patient access. With some treatments priced at $2 million or more, standard insurance reimbursement structures fail because the cost is front-loaded while the benefit accrues over a lifetime. Padula proposed outcomes-based contracts and multi-year payment schedules as mechanisms that could make curative therapies economically viable without requiring manufacturers to lower list prices.

STAT News
Claude Sonnet 4.6

ADA statement on calling police to remove physicians distributing its own journal editorial

The American Diabetes Association issued a statement explaining why it called police to remove five physicians from its annual conference in New Orleans. The statement said the ADA has safeguards to ensure IRS compliance as a 501(c)(3) organization and that the incident involved unauthorized distribution inside the convention center. The physicians had been handing out an editorial published in an ADA journal that criticized the NIH's handling of a matter they characterized as politically influenced.

r/medicine
Claude Sonnet 4.6

Health Tech Nerds weekly reads

Health Tech Nerds published a weekly digest covering fraud, autism market growth, legal debates over out-of-network pricing, and emerging healthtech trends. The newsletter aggregates sector developments.

Health Tech Nerds
Claude Haiku 4.5

economy Housing busts, IPO timing, AI and corporate margins

When a housing boom turns to bust

Patrick Boyle examined what happens when a housing boom turns to bust, covering the mechanics of price corrections, developer insolvency, and the lag between rising inventory and falling prices. Boyle's analysis covers historical cases including Ireland, Spain, and parts of Canada, tracing how the sequence of events in a housing downturn follows a predictable pattern even when the trigger and geography differ.

Patrick Boyle
Claude Sonnet 4.6

When the Ducks are Quacking: SpaceX, Anthropic, OpenAI, and the business of IPOs

Marc Rubinstein at Net Interest examined the IPO readiness of SpaceX, Anthropic, and OpenAI, framing the analysis around the Wall Street adage about ducks quacking. Rubinstein argues the current environment is favorable for these listings on paper but that each company faces a specific structural reason to delay: SpaceX's government contract exposure, Anthropic's nonprofit origins and equity structure, and OpenAI's ongoing conversion from capped-profit to for-profit status. The piece is skeptical that any of the three will actually list in 2026.

Net Interest (Marc Rubinstein)
Claude Sonnet 4.6

Might AI hurt corporate profits by eliminating customer inertia?

Tyler Cowen shared a reader argument that AI may compress corporate profit margins by making customers better informed and less inert. The mechanism: many companies earn above-normal returns because customers cannot be bothered to monitor prices, switch providers, or negotiate. If AI agents do that work automatically, the rents that depend on customer passivity will erode. Cowen does not fully endorse the argument but calls it plausible and underexplored in the economics of AI literature.

Marginal Revolution (Tyler Cowen)
Claude Sonnet 4.6

This Is Probably Fine: concurrent stress signals in credit, equities, and debt

Patrick Boyle covered a set of economic signals under the working title "This Is Probably Fine," reviewing concurrent stress indicators in credit markets, equities, and government debt that individually appear manageable but are historically unusual in combination. Boyle is measured rather than alarmist, but the analysis identifies tail risks that standard macro commentary has not emphasized.

Patrick Boyle
Claude Sonnet 4.6

Why Europe should put up trade barriers against Chinese goods

Noah Smith argued that Europe should impose trade barriers against Chinese goods for reasons beyond protecting domestic industries. His case rests on security dependency, industrial resilience, and the political economy of deindustrialization rather than straightforward protectionism. Smith contends that the standard free-trade frame underweights the strategic value of maintaining manufacturing capacity, particularly in sectors relevant to defense and critical infrastructure.

Noahpinion (Noah Smith)
Claude Sonnet 4.6

High-skill immigration restrictions eroded regional productivity after 2017 BAHA order

A new paper estimates the regional economic impact of the 2017 Buy American Hire American executive order, which tightened H-1B visa adjudication. The study treats the policy as a quasi-experimental shock and finds that counties with high pre-policy shares of H-1B workers saw measurable declines in regional productivity and patenting after the restrictions took effect. Tyler Cowen flagged the paper as consistent with prior work showing high-skill immigration produces local spillovers.

Marginal Revolution (Tyler Cowen)
Claude Sonnet 4.6

New paper on iPhone diffusion and the fertility rate decline since 2007

A new economics paper examined whether the iPhone's introduction explains part of the 22 percent decline in the US general fertility rate since 2007. Standard explanations including economic conditions, contraceptive use, housing costs, and childcare costs do not fully account for the sustained drop. The paper tests whether smartphone diffusion, by reshaping how young people spend discretionary time and form romantic relationships, contributed to the fertility trend.

Marginal Revolution (Tyler Cowen)
Claude Sonnet 4.6

Hayekian literary criticism

Tyler Cowen notes that while Marx is relegated to economic history, Marx-influenced literary criticism dominates English departments. The pattern illustrates how intellectual frameworks migrate across domains and persist asymmetrically.

Marginal Revolution
Claude Haiku 4.5

Tyler Cowen's Sunday links covered Hyman Rickover's corpus, poverty documentaries, AI copyright, lawyer supply/demand, Somalia's current state, and Alan Riding's obituary. The collection spans defense policy to literary criticism.

Marginal Revolution
Claude Haiku 4.5

Let me disinherit my children, s'il vous plaît

A French billionaire told senators he cannot disinherit his children under French law. Cowen highlights the case as evidence that wealth transfer laws impose wealth preservation on founders despite their stated preferences.

Marginal Revolution
Claude Haiku 4.5

Why drugs are here to stay

An anonymous correspondent argues drugs persist because they provide psychic value unavailable through legal alternatives. The submission examines why drug prohibition fails despite enforcement and questions addiction models.

Marginal Revolution
Claude Haiku 4.5

Cowen's Saturday links spanned papal encyclicals on innovation, careers in information work, public management credibility, Scott Sumner on epidemiology, and more. The collection ranged across policy and economics.

Marginal Revolution
Claude Haiku 4.5

Barter markets in everything

Researchers offer free home cleaning in exchange for first-person footage to train household robots. The barter arrangement exchanges data value for labor, illustrating emerging data-for-services markets.

Marginal Revolution
Claude Haiku 4.5

Should you move to Argentina?

An anonymous correspondent pitches an article on why Peter Thiel's move to Argentina makes economic sense. The submission argues for Argentina as a tax and regulatory haven for entrepreneurs.

Marginal Revolution
Claude Haiku 4.5

Cowen's Friday links covered Rohin Shah on AI alignment, Arnold Kling's rereads, sports and pop culture, elasticity of supply, SSRN degradation, and New York magazine layoffs. Topics spanned AI safety to institutional decline.

Marginal Revolution
Claude Haiku 4.5

Iran's crumbling economy

Money & Macro examined Iran's crumbling economy through energy markets and inflation dynamics. The video covers macroeconomic collapse in a resource-dependent state.

Money & Macro
Claude Haiku 4.5

Why Japan isn't broke yet

Money & Macro explores why Japan maintains high debt despite budget deficits. The analysis covers interest rates, demographic support ratios, and sovereign currency advantages.

Money & Macro
Claude Haiku 4.5

Is GDP failing to capture AI?

Timothy Taylor examines whether GDP adequately captures AI's economic contribution. Korinek and McKelvey's analysis questions productivity measurement in an AI-intensive economy.

Conversable Economist
Claude Haiku 4.5