GATO - A general purpose AI from Deepmind Capable of performing 450+ tasks

Like many of OpenAI's DALL-E 2, Google's PaLM, Lambda 2, and Deepmind's Chinchilla and Flamingo, the London-based AI firm is now demonstrating another huge AI model that beats existing systems.

To majority in the AI field, the ultimate success is developing a system with artificial general intelligence (AGI), or the capacity to grasp and learn any task that a person can. Long confined to science fiction, it has been proposed that Artificial General Intelligence(AGI) will create systems capable of reasoning, planning, learning, representing information, and communicating in plain language.

DeepMind's latest model is called A Generalist Agent (Gato). This post will go into its insights, how it works, the parameters utilized, and much more.

What is this Generalist Agent (Gato) by DeepMind?

Gato is a "general-purpose" system or one that can be taught to execute a variety of jobs, according to DeepMind.

DeepMind researchers taught Gato to execute 604 tasks, including labeling photographs, participating in the discussion, stacking blocks with a real robot arm, and playing Atari games, as well as writing captions for images, conversing, and stacking blocks with a robot arm.

GATO from Deepmind is a general purpose AI agent from Deepmind Source

According to Deepmind, Gato performs over 450 out of 604 tasks at over a 50% expert score threshold.

How does it work?

Gato's main design philosophy is to train on as many different types of data as possible, including photos, text, proprioception, joint torques, button pushes, and other discrete and continuous observations and activities.

GATO from Deepmind is a general purpose AI agent from Deepmind Source

They serialize all data into a flat series of tokens to facilitate the analysis of this multimodal input. The token sequence length is 1024 and the transformer input embedding size is 2048. Decoder-only has 24 layers and a hidden size of 8196 after post-attention feedforward.

Gato can be trained and sampled from this representation in the same way that a normal large-scale language model can. The parameterized embedding function, which converts tokens to token embeddings, and the sequence model, which produces a distribution over the next discrete token, are the two primary components of Gato's network design. While any broad sequence model can be used to anticipate the next token, they chose a transformer for its simplicity and scalability.

Gato employs a 1.2B parameter decoder-only transformer with 24 layers, a 2048 embedding size, and an 8196 post-attention feedforward hidden size.

A prompt is tokenized at the deployment step, producing the first sequence, following which the environment provides the first observation, which is tokenized and added to the sequence. The model then samples an action vector one token at a time, autoregressive.

GATO from Deepmind is a general purpose AI agent from Deepmind Source

The study showed that transformer sequence models perform better as multitasking strategies in real-world settings, including visual and robotic activities. Gato demonstrates how, rather than training a model from scratch, prompting can be used to take the initial step in learning new tasks.

GATO Parameters used

In terms of parameter count, Gato is orders of magnitude less than single-task systems like GPT-3. Parameters are system components learned from training data that fundamentally describe the system's ability to solve a task, such as the text generation. GPT-3 has more than 170 billion, whereas Gato has just 1.2 billion. Gato was intentionally kept tiny by DeepMind researchers so that the system could control a robot arm in real-time. They believe that if Gato is scaled enough, it could take on any job, activity, or embodiment of interest.


If you believe we need universal systems, as many people in the AI and machine learning fields do, Gato is a major thing. It is wonderful to see that AI can perform all of these seemingly disparate activities because creating words is quite different from guiding a robot to humans.

Building these large scale models require costly infrastructure such as GPU clusters and most of the times research is limited by the project cost. Q Blocks enables access of such costly GPU infrastructure at a fraction of the cost of cloud platforms using decentralization.

If you are building large NLP and CV models then sign up today to get started with computing on Q Blocks.

Train ML Models at upto 60% low cost

Decentralized GPUs 🙌

Get access