Radically Accelerate Frontier Model Inference
At Fractile, we are revolutionising compute to build the engine that can power the next generation of AI.
Maximum throughput per Watt (and per dollar)
>10x throughput efficiency (tokens/s/MW)*
The world will soon be bottlenecked on inference throughput. Available data centre power cannot scale as fast as our aspirations. Fractile’s chips are optimised for performance-per-Watt, meaning every kWh of energy generates 10x the tokens of a frontier GPU system, transforming the economics of AI inference.
* Projected performance per megawatt on reference models versus Vera Rubin NVL144.
World-leading inference speed without sacrifices
>10x faster*
Every other hardware platform serving AI models trades off throughput and latency. Fractile doesn’t. A Fractile system can serve hundreds or thousands of concurrent user queries while providing generation latencies (TTFT, TBOT) up to two orders of magnitude faster. Across frontier models from 7B to 7T+ parameters, and from dense models to highly sparse MoEs.
* Projected performance on reference models versus Vera Rubin NVL144.
The advantages
In-Memory Compute Technology
Exponential demand
The number of tokens we are processing with frontier AI models is growing by more than 10x every year.
This exponential is set to continue, above all because of the continuing trend that AI models that reason with more tokens produce far smarter outputs. Players that fail to scale their token processing will be out of the race.
Cheaper and faster
Frontier model inference has two critical requirements that existing hardware cannot satisfy simultaneously: low latency and high throughput.
Fractile is building the first of a new generation of processors, where memory and compute are physically interleaved to deliver both, simultaneously — serving thousands of tokens per second to thousands of concurrent users, at a power budget and scale that no other system can match.
The fast frontier
Serving tokens cheaper at such a radically faster pace will not only optimise existing deployments, it will create entire new possibilities. Massively longer context windows will enable new workloads, with models capable of complex autonomous tasks like research and software development, compressed from days of human work into minutes.
Team & Jobs
Join us and build the future of AI
Fractile’s hardware performance is only possible because of the full-stack approach we take to building the next class of processors for AI acceleration. Our team spans transistor-level circuit design up to cloud inference server logic, and everything in between.
Fractile is home to some of the world’s most talented, driven and energetic technologists and thinkers, who are inspired to take on some of the world’s most impactful technical challenges in a deeply collaborative environment.
If you are interested in being a part of the Fractile mission, then we would love to hear from you.
News
How ‘inference’ is driving competition to Nvidia’s AI chip dominance (Financial Times)
Rivals focus efforts on how AI is deployed, in their efforts to disrupt the world’s most valuable semiconductor company
Startup with ‘radical’ concept for AI chips emerges from stealth with $15 million to try to challenge Nvidia (Fortune)
Can this tiny U.K. AI chip company best Nvidia?

