A Finnish startup called Flow Computing is making a bold claim never before heard in silicon engineering circles: By adding its proprietary companion chip, any CPU can instantly double its performance, and with software tweaks it can increase it by up to 100 times.
If this works, it could help the industry keep up with the insatiable computing demands of AI makers.
Flow is a spin-out company of Finland's state-run research organisation, VTT, and acts as something of a national laboratory: the chip technology it commercialises, branded Parallel Processing Unit, is the result of research carried out there (VTT is an investor, but the IP is owned by Flow).
As Flow is quick to admit, this claim is laughable on its face: you can't magically squeeze extra performance out of a CPU, regardless of architecture or code base — if that were the case, Intel, AMD, etc. would have done it years ago.
But while Flow has been working on something that is theoretically possible, no one has been able to make it happen.
Central processing units have come a long way since the early days of vacuum tubes and punch cards, but some fundamental aspects remain the same. Their main limitation is that they are serial, not parallel, processors, so they can only run one operation at a time. Of course, they switch that operation between multiple cores and pathways a billion times per second, but these are all ways of dealing with the single-lane nature of a CPU. (In contrast, a GPU runs many related calculations at once, but it is specialized for a particular operation.)
“The CPU is the weakest link in computing,” said Flow co-founder and CEO Timo Valtonen. “It's not doing its job. We need to change that.”
CPUs have become extremely fast, but even with nanosecond-level responsiveness, there is a fundamental limitation that one task must be completed before the next can begin, resulting in a huge waste in how instructions are executed (I'm simplifying here, as I'm not a chip engineer).
What Flow claims to have done is remove this limitation, turning the CPU from a one-lane road into a multi-lane highway. While the CPU is still limited to only being able to run one task at a time, Flow's PPU (as the company calls it) essentially performs nanosecond traffic management on-die, allowing tasks to be moved in and out of the processor faster than ever before.
Think of the CPU as a chef working in a kitchen. The chef can only work so fast, but what if the chef had a superhuman assistant who could take knives and utensils in and out of the chef's hands, clear away cooked ingredients, add new ingredients, and take away all the non-chef work? The chef would only have two hands, but he could work 10 times faster.
Graph showing the improvement of the FPGA PPU enhanced chip vs. the unmodified Intel chip (notes in the log). Increasing the number of PPU cores results in a continuous increase in performance. Image credit: Flow Computing
It's not a perfect analogy, but at least according to Flow's internal testing and demos with industry (they talk to all kinds of people), it gives you an idea of what's going on here: the PPU doesn't increase clock frequencies or push the system in any other way that would lead to extra heat or power. In other words, Chef isn't being asked to chop twice as fast; it's just making more efficient use of the CPU cycles it's already taking.
This sort of thing is not entirely new, Valtonen says: “It has been studied and discussed at a high level in academia. Parallelization is already possible, but it would break legacy code and make it useless.”
So yes, it can be done. It's just not feasible, because it would require a complete rewrite of every piece of code in the world. A similar problem was solved by another Nordic computing company, ZeroPoint, which achieved high levels of memory compression while still maintaining data transparency with the rest of the system.
In other words, Flow's big achievement isn't high-speed traffic management, but achieving it without modifying code on a tested CPU or architecture. It sounds a bit insane to say you can run any code twice as fast on any chip, with no changes other than integrating the PPU on the die.
And herein lies Flow's main challenge to becoming a successful business: unlike a software product, its technology needs to be built in at the chip design level, so it won't work retroactively, and the first chips with PPUs are inevitably quite some way off. Flow has shown that the technology works in an FPGA-based test setup, but chipmakers will need to commit significant resources to see the benefits in question.
Flow's founding team: (left) Jussi Roivainen, Martti Forsell, and Timo Valtonen. Image courtesy of Flow Computing
But given the scale of these gains, and the fact that CPU improvements over the past few years have been iterative and piecemeal, chipmakers may be pretty desperate for Flow: if a single change to the layout can double performance in a single generation, that makes sense.
Refactoring and recompiling software to work better with the PPU-CPU combination can improve performance even further. Flow claims that modifying code (not necessarily completely rewriting it) to take advantage of its technology has resulted in up to 100x performance improvements. For software makers who want to optimize for Flow-enabled chips, the company is working on providing recompilation tools to make the job easier.
Kevin Crewel, an analyst at Tirias Research who was briefed on Flow's technology and cited as an outside perspective on these issues, is more concerned about industry adoption than fundamentals.
He quite rightly pointed out that AI acceleration is currently the biggest market, and one that can be targeted with specialized silicon like Nvidia's popular H100. PPU-accelerated CPUs are lucrative across the board, but chipmakers may not want to rock the boat. And there's the question of whether these companies are willing to invest significant resources into a largely unproven technology when they have five-year plans that could be disruptive with their choices.
Will Flow's technology become a must-have component for any chipmaker, propelling the company to fortune and fame? Or will stingy chipmakers decide to stick to the status quo and squeeze a cut of the profits out of a steadily growing computing market? Probably somewhere in between. But even if Flow achieves some big technological feat here, it's clear that, like every startup, its future depends on its customers.
Flow has only just emerged from stealth after raising €4 million (approximately $4.3 million) in pre-seed funding led by Butterfly Ventures, with participation from FOV Ventures, Sarsia, Stephen Industries, Superhero Capital, and Business Finland.