ai Open-weight model surge and attention efficiency
Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention
Sebastian Raschka surveys the architectural changes shipping inside Gemma 4, DeepSeek V4, and several other recent open-weight releases, focusing on KV cache sharing, multi-head compression, and other techniques that cut the memory cost of long contexts. The piece is technical but the payoff is concrete: these changes are what allow frontier-class models to run longer contexts at lower cost, and they are now appearing in openly released weights. Raschka's coverage is among the clearest explanations of why per-token inference costs have been falling faster than raw compute improvements alone would predict.
Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.
Nathan Lambert rounds up the latest open-weight releases in one place: Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, and GLM-5.1 all dropped within a short window. The volume is the story. An eventful month by Lambert's own description, with one flagship release after another, is starting to look like the new baseline rather than an exception. Lambert also includes his assessment of CAISI's V4 evaluation methodology, which is worth reading alongside the headline numbers to understand how these models are actually being compared.
How open model ecosystems compound
Lambert's companion piece to the model roundup argues that open ecosystems have a compounding property that closed ones struggle to replicate. When fine-tunes, evals, and tooling built on open weights accumulate publicly, the marginal cost of the next improvement falls for every participant. The piece uses China's high-participation release culture as the worked example, and it raises a structural question that closed-model labs have not fully answered: how do you compete with an ecosystem that gets smarter faster precisely because it is open?
What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang
Eric Jang rebuilt AlphaGo from scratch and sat down with Dwarkesh Patel to explain what that exercise taught him about self-play, reinforcement learning, and where large language models might go next. AlphaGo remains the cleanest worked example of the core primitives of machine intelligence: search, learning from experience, and self-play. Jang's argument is that those same primitives are underutilized in current LLM training, and that the path to more capable systems runs through better integration of search and self-generated feedback rather than simply more pretraining data.
RLVR might be disproportionately bad at science
Dwarkesh Patel makes a pointed argument that RLVR, the reinforcement learning from verifiable rewards paradigm that has driven recent reasoning improvements, may be poorly suited to scientific research. The core issue is verification lag: in science, the feedback loop on whether a theory is correct can take decades, and even then the better theory often makes worse short-term predictions. If RLVR requires fast, reliable ground truth to function, then automating scientific discovery with the same techniques that improved math and coding performance may not transfer cleanly.
The Sigmoids Won't Save You
Scott Alexander's piece on AI scaling and sigmoid curves argues that historical S-curves in technology provide false comfort to those hoping AI progress will naturally plateau before causing disruption. The piece does not claim AGI is imminent; it claims the shape of past curves tells us almost nothing useful about where the current one bends. This is a careful statistical argument dressed as a cultural essay, and it is more useful than most takes that invoke sigmoid curves either to dismiss AI risk or to amplify it.
AI is a technology not a product
John Gruber at Daring Fireball argues that AI is a technology layer, not a product category, and that treating it as a product is why so many AI startups are struggling to differentiate. The argument maps onto how electricity or the internet were eventually absorbed into existing products rather than becoming standalone consumer goods. The piece gained significant traction on Hacker News, partly because it names something practitioners feel but rarely articulate cleanly: the current wave of AI-first products often has a distribution problem, not a technology problem.
I don't think AI will make your processes go faster
Frederik Van Brabant's post, which hit the Hacker News front page with nearly 300 comments, pushes back on a widely held assumption: that AI will necessarily accelerate existing business processes. His argument is that most process bottlenecks are not speed bottlenecks; they are coordination, approval, and ambiguity bottlenecks. Making individual steps faster does not move the constraint. The piece is written from an operations perspective rather than an AI research one, and the critique lands differently because of it.
AI subscriptions are a ticking time bomb for enterprise
A piece on enterprise AI subscription economics argued that the current pricing structure is unstable: companies are paying per-seat and per-token fees simultaneously, with no clear model for how those costs scale as usage grows. The Hacker News thread it generated was one of the more substantive discussions of the week, with practitioners sharing real numbers on what they are actually spending versus what they budgeted. The argument is not that AI is overpriced but that the pricing architecture does not map well onto how enterprises actually consume software.
Databricks brings GPT-5.5 to enterprise agent workflows
Databricks integrated GPT-5.5 into enterprise agent workflows after the model posted a new top score on the OfficeQA Pro benchmark. The announcement is notable less for the benchmark result and more for the workflow context: OfficeQA Pro tests performance on realistic office tasks with multi-step dependencies, not isolated question-answering. Databricks using this as the selection criterion for a production deployment is a small signal that enterprise buyers are starting to weight task-oriented evals over general capability scores.
A new personal finance experience in ChatGPT
OpenAI launched a personal finance experience inside ChatGPT for Pro users in the US, letting them connect financial accounts and receive AI-generated analysis grounded in their actual account data. This is a meaningful surface expansion: connecting live financial data to an LLM that can reason about it is different from asking a chatbot general budgeting questions. The privacy and data-handling questions are real, and the product is currently US-only and Pro-only, which limits the immediate user base while OpenAI presumably watches for failure modes before a broader rollout.
OpenAI and Malta partner to bring ChatGPT Plus to all citizens
OpenAI and the government of Malta agreed to provide ChatGPT Plus access to all Maltese citizens, along with AI literacy training. The deal is structurally interesting as a government-distribution model: rather than selling through consumer channels, OpenAI is treating a national government as a wholesale buyer. Malta's small population makes this a low-risk test of that model, but the template is what matters. If the economics work and the literacy outcomes are measurable, similar deals with larger governments become easier to pitch.
Apple Silicon costs more than OpenRouter
A practical cost analysis compared running LLMs locally on Apple Silicon against routing equivalent workloads through OpenRouter. The conclusion was counterintuitive: for many use cases, local Apple Silicon inference costs more in electricity than the equivalent API spend, once idle power draw is accounted for correctly. The piece does not argue against local inference; it argues that the economics depend heavily on utilization rate and that the privacy or latency case for local models is stronger than the cost case for most users.
Notes on pretraining parallelisms and failed training runs.
Dwarkesh Patel published detailed notes on pretraining parallelism strategies and the failure modes that emerge in large training runs. The content covers tensor parallelism, pipeline stages, and how silent failures during training can corrupt model weights without triggering obvious error signals. This is practitioner-level content that does not show up often in public writing; most labs keep training infrastructure knowledge internal. The notes read like a field manual for teams scaling their first serious pretraining run.
NVIDIA New AI Is An Efficiency Monster
NVIDIA released a new efficiency-focused AI model that reduces computational requirements while maintaining performance, signaling a shift in the industry toward optimization over raw scale.
GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies
GPT 5.5 launched alongside DeepSeek V4, intensifying the competition for frontier model capability and reshaping compute infrastructure priorities across labs.
Introducing Claude Design by Anthropic Labs
Anthropic launched Claude Design, a visual design collaboration tool built with Claude, expanding the model's footprint beyond text and code into creative workflows.
AI layoffs are here. This is how you keep your job.
Mo Bitar analyzed how AI layoffs are already underway in the industry, laying out the skill shifts needed to stay competitive as automation displaces certain engineering roles.
they're all out of data.
Data scarcity is becoming a hard constraint on LLM training; frontier labs are running out of publicly available text to scale pretraining, forcing a shift toward synthetic and higher-quality curated datasets.
OpenAI's ChatGPT 5.5 Instant: The Good, The Bad And The Insane
OpenAI released ChatGPT 5.5 Instant with faster inference and cost reductions, positioning it as a drop-in replacement for latency-sensitive production workloads.
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)
Energy-based transformers introduced a scalable learning mechanism where models adjust their own computational allocation based on task difficulty, improving efficiency and reasoning depth.
Anthropic just admitted AI is bullsh*t
Anthropic acknowledged in public that current LLMs have fundamental limitations around reasoning and cannot be trusted for high-stakes decisions without human oversight, countering inflated capability narratives.
datasette-llm-limits 0.1a0
Released datasette-llm-limits plugin enabling per-user spending caps on LLM usage within Datasette, addressing cost control for teams running shared AI-powered data exploration.
software BitLocker backdoor, NPM hijack, and CSS reckoning
Security researcher says Microsoft built a Bitlocker backdoor, releases exploit
A security researcher published a full exploit for a BitLocker backdoor they claim Microsoft built deliberately, and the Hacker News discussion that followed was heated. The technical claim is specific: a recovery mechanism in BitLocker can be exploited to decrypt drives without the user's key, and the researcher released working code. Microsoft has not confirmed the characterization of the mechanism as intentional. The distinction between a backdoor and a poorly secured recovery channel matters legally and politically, but the practical security implication is the same either way.
A single PR just hijacked the NPM registry...
A single pull request managed to poison the NPM registry, affecting a wide range of downstream packages before it was caught. Fireship covered the mechanics: the attack exploited a gap in how NPM handles maintainer permissions after account transfers, allowing an attacker to publish malicious versions of widely used packages. Supply chain attacks on package registries have been a known vector for years, and this incident reinforces that the trust model for public package registries has not been solved. OpenAI separately published its own post-mortem on the TanStack supply chain attack this week.
Our response to the TanStack npm supply chain attack
OpenAI published a detailed account of its response to the TanStack npm supply chain attack, in which signing certificates for OpenAI macOS applications were compromised. The post describes what was taken, what protections were in place, and why macOS users need to update their OpenAI apps by June 12, 2026. The transparency is notable: most software companies describe security incidents in minimal terms. OpenAI's disclosure is more detailed than typical, including the specific certificate chain that was affected and the timeline of detection.
Moving away from Tailwind, and learning to structure my CSS
Julia Evans published a detailed account of migrating away from Tailwind CSS and building her own structured CSS system from scratch, with 644 upvotes and 361 comments on Hacker News. The piece is useful on two levels: the technical argument about Tailwind's tradeoffs, and the meta-point that getting better at CSS is a more durable investment than reaching for utility frameworks. Simon Willison quoted Evans's framing directly: the right response to 'CSS is hard' is to get better at CSS, not to abstract it away.
Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse
Cloudflare's engineering team traced a sudden billing pipeline slowdown to hidden lock contention in ClickHouse's query planner, triggered by a partitioning change to a petabyte-scale cluster. The post is a good case study in how large-scale infrastructure problems hide: standard metrics showed no obvious errors, and the root cause required tracing query planner internals rather than application-level instrumentation. The fix involved rewriting how partition metadata is locked during query planning, a change that had no obvious user-visible surface but cut job completion times substantially.
The Pulse: Forward deployed engineering heats up again
The Pragmatic Engineer flags a renewed push toward forward-deployed engineering, where technical staff embed with customers rather than sitting behind a product organization. The piece connects this to AI tool adoption and to job market data showing productivity gains and headcount reductions occurring in the same quarter. The forward-deployed model is framed as a way for engineers to stay close to real problems as the gap between senior and junior productivity widens with AI assistance.
GDS weighs in on the NHS's decision to retreat from Open Source
The UK's Government Digital Service weighed in on the NHS's decision to close its open-source repositories after security vulnerabilities were disclosed through responsible channels. GDS's position, summarized by Simon Willison, is that closing repositories in response to reported vulnerabilities inverts the logic of responsible disclosure: it punishes the researchers who found problems and removes the transparency that makes open-source security auditing possible. The NHS decision has drawn sustained criticism from multiple directions, and GDS adding its voice gives the critique institutional weight.
pharma FDA loses two leaders; Ebola declared a PHEIC
Makary resigns; Biogen reports Alzheimer's data; and more
The week's most significant pharma story is the double leadership exit at FDA: Commissioner Marty Makary resigned, and days later, Tracy Beth Høeg, the acting director of the Center for Drug Evaluation and Research, was also out. CDER is the office that reviews and approves most new drugs in the US. Losing both the commissioner and the drug center head simultaneously creates genuine uncertainty for companies with near-term FDA decisions on the calendar. Endpoints News covered both exits together with what comes next.
CDER chief Høeg fired from FDA as other top roles turn over
Tracy Beth Høeg's departure from the FDA drug center followed Makary's resignation by days. Høeg had been acting director of CDER, the division responsible for reviewing and approving most pharmaceuticals. The timing and manner of her exit, described by sources as a firing rather than a voluntary departure, suggests the leadership transition is not orderly. The article from Endpoints News is the most detailed account of what happened and what roles are now vacant.
Opinion: Marty Makary misunderstood something fundamental about the FDA
Joshua Sharfstein, a former FDA deputy commissioner, published a critical assessment of Makary's tenure in STAT News, arguing that Makary misunderstood something fundamental about how administrative power produces lasting change at a regulatory agency. The piece does not focus on individual decisions but on Makary's theory of how the FDA works, which Sharfstein argues was shaped more by outside commentary than by operational reality. It is worth reading as a diagnostic piece on what kinds of external perspectives translate well into regulatory leadership.
WHO declares Ebola outbreak an international public health emergency
The WHO declared the Bundibugyo strain Ebola outbreak in Congo's Ituri province an international public health emergency, the organization's highest alert level. The outbreak had recorded 246 suspected cases and 65 deaths before the declaration. Bundibugyo is a distinct Ebola species from the Zaire strain that caused the 2013 to 2016 West Africa epidemic; existing vaccines have lower efficacy against it, which is part of what elevated the WHO's concern. The declaration triggers international resource mobilization and travel guidance review.
Biopharma M&A maintains strength even as large deals wane
Biopharma M&A activity has stayed elevated even as the frequency of very large deals has declined. Endpoints News reports that mid-cap American drugmakers, family-owned European groups, and foundation-governed players are increasingly active buyers, filling the gap left by fewer megadeals from the largest companies. The structural driver is clear: a wave of major patent expiries is forcing companies at every scale to replace revenue through acquisitions rather than internal pipeline alone. The piece is useful for understanding why deal flow looks healthy in the aggregate even as headline-grabbing transactions are rarer.
#ASGCT26: A Zillow-like marketplace for abandoned gene therapies goes live
Two nonprofits launched a searchable marketplace for abandoned cell and gene therapies at the American Society of Gene and Cell Therapy conference. The platform is described as a Zillow-like listing service where developers who have shelved programs can connect with organizations willing to continue them. The orphan drug space has a persistent problem with promising early programs dying for commercial rather than scientific reasons; a structured clearinghouse for those assets addresses a real gap, though the matching problem between sellers and capable developers remains hard.
STAT+: Sen. Bill Cassidy loses primary as race heads to run-off in win for Trump
Senator Bill Cassidy, who chaired the Senate HELP Committee and was the most prominent Republican voice on health policy, lost his Louisiana primary after sustained opposition from Trump and MAHA-aligned groups. Cassidy had broken with Trump on the ACA and other health priorities. The loss removes a knowledgeable institutionalist from the committee and shifts the balance of health-related Senate oversight in ways that will affect FDA reauthorization, Medicaid policy, and drug pricing legislation over the next two years.
When should you get a mammogram? Conflicting advice makes it hard to know.
Updated mammography screening guidance remains inconsistent across major medical organizations, leaving patients and clinicians with conflicting recommendations about when to start routine breast imaging.
STAT+: The hantavirus outbreak is prompting Covid flashbacks; including the conspiracies
A rare hantavirus outbreak aboard a cruise ship has sparked conspiracy theories and pandemic-era anxiety among the public, complicating public health communication as CDC works to contain spread.
WHO declares Ebola outbreak an international public health emergency
The WHO declared an Ebola outbreak in the Democratic Republic of Congo a public health emergency of international concern after cases spread to Uganda, triggering coordinated response protocols.
STAT+: Takeda will pay $13.6 million to settle allegations it paid kickbacks to doctors
Takeda Pharmaceuticals paid $13.6 million to settle allegations it paid kickbacks to doctors to prescribe an antidepressant, adding to mounting compliance penalties across the industry.
Biopharma M&A maintains strength even as large deals wane
Biopharma M&A remains strong across mid-cap buyers and regional players despite a slowdown in large megadeals, as companies navigate patent cliffs with targeted acquisitions.
Amgen expands crackdown on what it says is misuse of 340B program
Amgen expanded data requirements for pharmacies using its drugs under the 340B federal discount program, claiming it addresses fraud but raising concerns about program access restrictions.
Candel reports prostate cancer drug's long-term data ahead of FDA filing
Candel Therapeutics released 20-month follow-up data for its prostate cancer gene therapy ahead of FDA filing, showing durable responses that support its 2026 submission timeline.
Andrea Pfeifer ends 23-year reign at AC Immune; Acadia's top R&D exec announces retirement
AC Immune announced CEO Andrea Pfeifer will retire after 23 years, and Acadia's top R&D executive also plans to step down, signaling leadership transitions in the biotech space.
healthtech Hantavirus, Ebola, and mammogram uncertainty
STAT+: The hantavirus outbreak is prompting Covid flashbacks; including the conspiracies
The hantavirus outbreak on a cruise ship has generated a secondary information problem: conspiracy theories following a pattern established during Covid-19. STAT News reports that the outbreak is drawing social media claims that mirror early pandemic misinformation, including speculation about origins and doubts about case counts. This is worth tracking separately from the clinical story because the information environment around an outbreak affects public cooperation with containment measures, and the pattern of rapid conspiracy theory propagation has become a predictable feature of novel infectious disease events.
Assessment of the Hantavirus with Prof Donald Milton
Eric Topol sat down with professor Donald Milton to assess the hantavirus outbreak, covering transmission dynamics, severity profile, and what is and is not known about the specific strain involved. Milton's virology expertise makes this a more technically grounded conversation than most media coverage. The key point from the discussion is that hantavirus pulmonary syndrome has a high case fatality rate in historical outbreaks but that the current cluster's epidemiology is still being characterized, and drawing conclusions from early case counts carries significant uncertainty.
When should you get a mammogram? Conflicting advice makes it hard to know
STAT News published a patient-oriented explainer on mammography screening, addressing the conflict between guidelines from the US Preventive Services Task Force, the American Cancer Society, and individual oncologists. The conflict is real and substantive: the disagreements are not just about communication preferences but about how to weight overdiagnosis risk against early detection benefit at different ages and risk profiles. The piece is useful precisely because it does not resolve the conflict but explains what drives it.
Supreme Court preserves mail access for abortion pill
The Supreme Court ruled to maintain mifepristone's current distribution protocols, including mail access, while underlying litigation proceeds. This is a continuation ruling rather than a final disposition: the legal challenge to mifepristone's approval pathway is still active. Mifepristone now accounts for the majority of abortions in the US, and any restriction on mail access would have immediate effects on access in states where clinic availability is limited. The ruling preserves the status quo but does not foreclose further legal action.
Pancreatic cancer just met its match
Works in Progress published a substantive piece on recent advances in pancreatic cancer treatment, a disease that has historically had a five-year survival rate below 15%. The piece covers early detection biomarkers, combination chemotherapy regimens, and targeted therapy development. The framing is cautiously optimistic: survival rates are improving, though from a very low baseline, and the improvements are concentrated in patients diagnosed early when surgical resection is still possible. The gap between early-stage and late-stage outcomes remains wide.
economy Trump-China thaw, US-Europe wealth gap, prediction markets
What Trump's China Visit Actually Achieved.
Patrick Boyle analyzes what Trump's China visit actually produced: a temporary tariff pause and a framework for further negotiations, but no structural resolution to the trade and technology disputes that have been building for years. Boyle's read is that the visit was meaningful as a signal that both sides want to avoid further escalation but that the underlying decoupling pressures in semiconductors, data, and strategic goods remain intact. The pause gives companies some planning room but does not restore the pre-2018 trade relationship.
Trump actually started to decouple America from China
Noah Smith argues that despite a significant tariff pause, the structural decoupling between the US and China is continuing at the level of supply chains, technology access, and financial flows. The piece notes that companies have been quietly diversifying manufacturing away from China regardless of headline trade policy, and that the semiconductor export restrictions have created a durable technology bifurcation that a tariff deal does not reverse. The decoupling is slow, Smith writes, but it is proceeding.
Yes, Europeans are poorer than Americans
Noah Smith takes on the question of whether Europeans are poorer than Americans in material terms, and his answer is a qualified yes. The piece uses purchasing power-adjusted consumption data rather than GDP per capita, and it addresses the standard counterarguments about inequality, healthcare costs, and leisure time. Smith's argument is not that European life is worse but that comfortable, leisurely stagnation is not viable as a long-term model when the world is changing quickly enough that the cost of not adapting compounds over time.
Prediction Markets Are a Scam With a Chart
Patrick Boyle examines prediction markets and argues that many of them function more as sentiment gauges than as genuine probability-weighted forecasting mechanisms. His case rests on several episodes where prediction market prices diverged sharply from subsequent outcomes in ways that better-calibrated forecasters would have avoided. The piece is not a blanket dismissal; it distinguishes between thin markets on novel events and thicker markets on recurring political outcomes where liquidity and arbitrage improve calibration. The title is deliberately provocative but the argument is more nuanced.
The Future of IR
Marc Rubinstein's Net Interest piece examines how investor relations is changing as public market participation shifts and private capital markets grow relative to public ones. The argument is that IR professionals built their practices around a world of institutional analyst coverage and quarterly earnings calls, and that world is shrinking. Retail investor communities, private market intermediaries, and AI-generated financial summaries are reshaping who the audience for IR actually is and what information they want.