Aleph Alpha and Graphcore think parsimony is the way to go


European companies show the first result of their cooperation: an 80% lighter Aleph Alpha language model.

Large language models like OpenAI’s GPT-3 or Google’s PaLM have over a hundred billion parameters. Even with new insights into the role of training data in Deepmind’s Chinchilla, larger models are to be expected.

In fact, language models such as the Google Switch Transformer already exist with 1.6 trillion parameters, but they rely on parsimonious modeling, in Google’s case specifically on expert blended Transformer architecture.

Whereas with GPT-3, for example, all parts of the neural network are involved in each processing step, sparse models such as Switch Transformer use processes in which only the parts of the network relevant to the task become active. This greatly reduces the computing power required for network queries.

A d

Classical neural networks are trained to be “dense”. By using sparse modeling, networks can be reduced in complexity while maintaining approximately the same performance. | Image: Graphcore/Aleph Alpha

European AI collaboration shows first results

Google uses sparse modeling in the case of Switch Transformer to scale language models. But conversely, it can also be used to train smaller networks with performance similar to larger models.

That’s exactly what AI chipmaker Graphcore and AI startup Aleph Alpha have now done. The two European AI companies announced a collaboration in June 2022 which aims, among other things, to develop large European AI models. Germany’s Aleph Alpha recently launched Europe’s fastest commercial AI data center.

Aleph Alpha CEO Jonas Andrulis highlighted the advantages of Graphcore hardware for sparse modeling last summer: “The Graphcore IPU provides a new opportunity to evaluate advanced technological approaches such as conditional sparseness. These architectures will undoubtedly play a role in future Aleph Alpha research.

Graphcore and Aleph Alpha present lightweight Luminous language model

Both companies managed to lose weight “Luminous Base” language model from 13 billion parameters of Aleph Alpha to 2.6 billion parameters. The companies also showed off the lite variant running Lumi, a “conversational add-on” for Luminous.

At the Super Computing Conference 2022 (SC22) in Texas, Aleph Alpha and Graphcore demonstrated how the sparse variant of Luminous drives the Lumi module. Lumi is a kind of “chatbot mode” of the language model. | Image: Aleph Alpha

Parsimonious modeling reduced nearly 80% of model weights while preserving most of its capabilities, according to the press release.


Deepmind: a new AI deciphers, locates and dates ancient texts
Deepmind: a new AI deciphers, locates and dates ancient texts

The new model uses pointwise sparse matrix multiplications supported by Graphcore’s Intelligence Processing Unit (IPU) and requires only 20% of the computing power and 44% of the memory of the original model , did he declare.

The small size allows the 2.6 billion parameter model to be kept entirely on the ultra-fast on-chip memory of a Graphcore IPU-POD16 Classic – for maximum performance. The model also requires 38% less energy.

Central “sparsification” for the next generation of AI models

For the next generation of models, “sparsification” will be key, the companies said. This would allow the specialized sub-models to master the selected knowledge more effectively.

“This breakthrough in sparsification modeling is impacting the commercial potential of AI companies like Aleph Alpha, enabling them to deliver high-performance AI models to customers with minimal compute requirements“, adds the press release.

Google is also following this path. In October 2021, AI chief Jeff Dean spoke for the first time about the search giant’s AI future: Pathways will one day become some sort of general-purpose AI system – and will relies on parsimonious modeling as a central element.

Leave a Comment

Your email address will not be published. Required fields are marked *