An Architect’s Response to Catastrophic AI Risk

The default trajectory of AI leads to catastrophe. Here is an engineering blueprint for containment, verifiable safety, and avoiding the Great Filter.

Feb 01, 2026

∙ Paid

The silence of the universe is the most profound data point we possess. For decades, astronomers have scanned the cosmos for signs of intelligent life, yet we still have no confirmed technosignatures. This tension is often called the Fermi Paradox. One proposed solution is the Great Filter. This theory suggests that at some point in the development of any advanced civilization, it encounters a barrier that is extremely hard to cross. It suggests that civilizations inevitably destroy themselves before they can expand into the stars.

We are currently approaching a Great Filter class of challenge. The rapid acceleration of Artificial Intelligence, specifically the trajectory toward systems that surpass human intelligence in all strategically relevant domains, presents a unique class of existential risk. Recent scenario-based governance work argues that the default trajectory of advanced AI development could plausibly produce catastrophic outcomes, ranging from long-term authoritarian lock-in to human extinction.

These warnings are often met with polarized responses. One is fatalism, a belief that the technology is unstoppable and the outcome inevitable. The other is denial, a dismissal of these risks as science fiction. Neither response helps with engineering and governance decisions. As an architect of autonomous systems for the unforgiving environment of space, I view this challenge through a different lens. Catastrophic AI failure is best treated as a systems engineering problem, with philosophical questions informing the goals, constraints, and what we choose to protect.

The systems we build for space exploration are designed to operate in environments where failure means the total loss of the mission. We rely on rigorous architecture, verification where feasible, and layered fail-safe controls. We must apply this same engineering discipline to the development of advanced AI. A core concern about Artificial Superintelligence (ASI) is competence paired with misalignment: a system that is powerful, effective, and optimising toward objectives that fail to respect the constraints human survival requires. This article outlines an architectural response to this risk, proposing a framework for containment and control that moves beyond policy debates and into the physics of software and hardware assurance.

1. The Engineering Void in the Default Trajectory

The current paradigm of AI development is driven by scaling laws. We have found that adding more compute and more data to large neural networks consistently yields higher performance. This empirical success has created a race to build larger, more capable models. Companies are explicitly aiming to build systems that exceed human performance at most cognitive tasks. Some anticipate ASI within years, others within decades; uncertainty is large, and governance and assurance take time to build.

The danger lies in the methodology. We are building these systems using techniques that are fundamentally opaque. We train them through trial and error, using Reinforcement Learning from Human Feedback (RLHF) to shape their behavior. We are optimising behaviour through empirical training rather than specifying behaviour in a way we can fully audit and prove. We do not understand the internal representations these models form. We cannot formally prove their properties. We cannot reliably predict their behaviour in novel, adversarial, or high-stakes situations.

In the context of safety-critical engineering, this is an unacceptable state of affairs. If we built a nuclear reactor this way, it would struggle to pass a modern licensing and assurance process. We are building engines of immense cognitive power without a corresponding theory of control. This creates an engineering void.

The Risk of Loss of Control
The primary catastrophic risk is the loss of control. An ASI may optimise hard for its objectives, including by seeking resources and influence in ways that are difficult for humans to anticipate or constrain. If those goals are even slightly misaligned with human values, the consequences could be terminal. This is known as the alignment problem. A super-capable system optimised for a narrow metric could propose extreme interventions that satisfy the metric while violating human constraints, especially if the objective is underspecified. Without a verifiable safety architecture, we have limited ability to prevent instrumental strategies such as resource acquisition or constraint evasion, which can emerge even when the stated objective looks benign.

The Risk of Misuse and Proliferation
The second major risk is misuse. Powerful AI systems lower the barrier to entry for creating weapons of mass destruction, including biological agents and cyberweapons. If the weights of a frontier model are stolen or leaked, they can be deployed by malicious actors without the safety guardrails intended by the developers. The architecture of our current systems is brittle. Many safety measures operate at the interface layer (for example, prompt and policy enforcement) and can be bypassed under determined adversarial pressure. We need an architecture that secures the system at a fundamental level, preventing misuse even if the system falls into the wrong hands.

Continue reading this post for free, courtesy of Sylvester Kaczmarek.

Or purchase a paid subscription.

Sylvester's Frontier

An Architect’s Response to Catastrophic AI Risk

The default trajectory of AI leads to catastrophe. Here is an engineering blueprint for containment, verifiable safety, and avoiding the Great Filter.

1. The Engineering Void in the Default Trajectory

Continue reading this post for free, courtesy of Sylvester Kaczmarek.