Radically Accelerate Frontier Model Inference

At Fractile, we are revolutionising compute to build the engine that can power the next generation of AI.

Maximum throughput per Watt (and per dollar)
>10x throughput efficiency (tokens/s/MW)*
The world will soon be bottlenecked on inference throughput. Available data centre power cannot scale as fast as our aspirations. Fractile’s chips are optimised for performance-per-Watt, meaning every kWh of energy generates 10x the tokens of a frontier GPU system, transforming the economics of AI inference.
* Projected performance per megawatt on reference models versus GB200 NVL72.
World-leading inference speed without sacrifices
25x faster*
Every other hardware platform serving AI models trades off throughput and latency. Fractile doesn’t. A Fractile system can serve hundreds or thousands of concurrent user queries while providing generation latencies (TTFT, TBOT) up to two orders of magnitude faster. Across frontier models from 7B to 7T+ parameters, and from dense models to highly sparse MoEs.
* Projected performance on reference models versus GB200 NVL72.
The advantages
In-Memory Compute Technology
Turning GPUs inside-out: the world’s fastest memory system
Today’s frontier AI models have been created through the relentless pursuit of ‘scaling laws’. However, the demand for inference has scaled massively, outstripping hardware capacity to serve it. Today’s AI chips face a massive mismatch, which hamstrings their performance and efficiency.
Over the past 25 years peak compute (measured in FLOPS) has scaled 1,000,000x while bandwidth to DRAM (including HBM) has increased just 40x — meaning memory bandwidth has become the key bottleneck for GPUs and accelerators. The mitigation has been constructing workloads which allow us to re-use memory access across hundreds or thousands of computations. AI training looks like this — every time we load a model’s weights from DRAM, we run thousands of tokens through them, balancing compute and memory access. Inference presents a very different challenge — suddenly, we care as much about latency (the time to return an answer to a single user, for instance) as throughput (the aggregate tokens crunched by our processors).
What was a growing computer architecture issue has now become a crisis.
Fractile is building the first of a new generation of processors, where memory and compute are physically interleaved and innately balanced, driving faster, more efficient compute, without trade offs.
Frontier AI inference: up and to the right
The number of tokens we are processing with frontier AI models is growing by more than 10x every year. Compounding effects of increasingly widespread applications, the advent of reasoning models that generate many more tokens in pursuit of the best quality answers, and AI agents that autonomously crunch through thousands of tokens while carrying out tasks means that this exponential is set to continue.
Frontier model inference has two critical requirements that existing hardware cannot satisfy simultaneously: low latency and high throughput. To meet even modest latency objectives (like generating enough tokens per second per user to provide search results in a couple of seconds), today’s hardware drops to below 1% of its peak efficiency. Again, the compute-memory mismatch is largely responsible for this.
Fractile’s hardware delivers both, simultaneously — serving thousands of tokens per second to thousands of concurrent users, for many tens of millions of tokens per second per rack of compute (on a frontier MoE reasoning model).
Scalable, composable, compatible
We’re proud to be building a new paradigm for AI hardware that will transform the future of AI scaling. However, ‘better’ AI hardware is not better if it cannot adapt and grow as new models and architectures are developed and introduced. Algorithmic advances in AI are — for now — outstripping the pace of hardware improvement, delivering more than 10x performance-per-dollar improvements every year.
In order for Fractile’s advantages to be truly multiplicative with those of our customers, we have invested heavily in developing a hardware architecture and a software stack that lets us compose hardware, from a single chip up to data-centre scale orchestration, and to provide a turnkey drop-in for existing model inference stacks on competitor hardware.
Team & Jobs
Join us and build the future of AI
Fractile’s hardware performance is only possible because of the full-stack approach we take to building the next class of processors for AI acceleration. Our team spans transistor-level circuit design up to cloud inference server logic, and everything in between.
Fractile is home to some of the world’s most talented, driven and energetic technologists and thinkers, who are inspired to take on some of the world’s most impactful technical challenges in a deeply collaborative environment.
If you are interested in being a part of the Fractile mission, then we would love to hear from you.
News
How ‘inference’ is driving competition to Nvidia’s AI chip dominance
Rivals focus efforts on how AI is deployed, in their efforts to disrupt the world’s most valuable semiconductor company
Startup with ‘radical’ concept for AI chips emerges from stealth with $15 million to try to challenge Nvidia
A British startup hoping to challenge Nvidia’s dominance in chips for AI applications with a radical new hardware design has just emerged from operating in stealth with $15 million in seed funding to pursue its idea.