AI is just the newest and most demanding market for high-performance computing, with system architects working around the clock to squeeze the most performance out of every watt. Swedish startup ZeroPoint, which just raised 5 million euros ($5.5 million) in new funding, wants to help them with a new memory compression technique at the nanosecond scale. And the technique is exactly as complex as its name suggests.
Here's the concept: By losslessly compressing data just before it enters RAM, and then decompressing it afterwards, we effectively expand the memory channel by more than 50% by adding just one small section to the chip.
Of course, compression is a fundamental technology in computing. ZeroPoint CEO Klas Moreau (pictured above, left, with co-founders Per Stenström and Angelos Allerakis) pointed out: According to research, 70% of the data in memory is unnecessary. So why not compress it in memory?”
The answer is “I don't have time.” Compressing (or encoding, in the case of video and audio) large files for storage can take seconds, minutes, or hours, depending on your needs. But data moves through memory in a fraction of a second, shifting in and out as fast as the CPU can do it. A 1 microsecond delay to remove “unnecessary” bits from a piece of data entering a memory system has a devastating impact on performance.
Memory doesn't necessarily advance at the same rate as CPU speed, but the two (along with many other chip components) are closely related: if the processor is too slow, data piles up in memory, and if the memory is too slow, the processor wastes cycles waiting for the next pile of bits. It all works together, as you'd expect.
While lightning-fast memory compression has been demonstrated, this introduces a second problem: you essentially need to be able to decompress the data and put it back into its original state as fast as you compressed it, otherwise the system won't know how to process it at all. So unless you convert your entire architecture to this new compressed memory mode, it's pointless.
ZeroPoint claims to have solved both of these problems with ultra-fast, low-level memory compression that requires no real changes to the rest of the computing system. When you add their technology to a chip, it's as if it has twice the memory.
While the finer details would likely only be understood by those in the field, the basics are easy to grasp even for beginners, as Morrow proved when he explained it to me.
“What we're doing is taking a very small amount of data — a cache line, maybe 512 bits — and identifying patterns within it,” he said. “It's the nature of the data, that it contains information that's not very efficient, information that's sparsely spaced. It depends on the data — the more random it is, the less compressible it is. But when you look at most data loads, you'll see it's in the 2-4x range.” [more data throughput than before]. ”
This is different from what actual memory looks like, but I think you get the idea. Image credit: ZeroPoint
We all know that memory can be compressed. Morrow said that while everyone in large-scale computing knows it's possible (he showed us a paper that demonstrated it in 2012), academically speaking, most consider it impossible to implement at scale. But because Zero Point has solved the problems of compression (reorganizing compressed data more efficiently) and transparency, Morrow said, not only does the technology work, it works very seamlessly with existing systems. And it all happens in nanoseconds.
“Most compression technologies, both software and hardware, are on the order of a few thousand nanoseconds. CXL [compute express link, a high-speed interconnect standard] “We can get that down to a couple hundred,” Morrow said. “We can get it down to three or four.”
CTO Angelos Allerakis explains:
ZeroPoint's debut is timely as companies around the world seek faster and cheaper computing to train new generations of AI models. Most hyperscalers (if we have to call them that) are passionate about technology that allows them to get more power per watt or even lower their electric bills.
The main caveat to all of this, as mentioned above, is that this needs to be built into the chip and integrated from the start — you can't just stick a ZeroPoint dongle in a rack. For this reason, the company is working with chipmakers and system integrators to license this technology and hardware design into standard chips for high-performance computing.
Of course, it's not just Nvidia and Intel, but a growing number of companies like Meta, Google, and Apple have designed custom hardware to run AI and other high-cost tasks in-house. However, ZeroPoint positions its technology as a cost reduction rather than a premium. Perhaps effectively doubling the memory will soon pay for this technology.
The recently closed €5 million A round was led by Matterwave Ventures, with Industrifonden as the local leader in Northern Europe, and existing investors Climentum Capital and Chalmers Ventures.
Moreau said the funding would not only allow the company to expand into the U.S. market, but also double down on its expansion into the Swedish market, which it is already pursuing.