In the last post, we explored HuggingGPT which proposed using ChatGPT to take autonomous programming a step further. In this post, we look at how LLMs can be used as a Robotic Brain. The paper ‘LLM as A Robotic Brain: Unifying Egocentric Memory and Control’ by Mai et. al., proposes using zero-shot learning to model and language-based communication between different LMs to build a Brain.
Design
The setup consists of 3 parts, The Eye, The Nerve and The Brain.
The Eye
The Eye is a Visual Language Model (VLM). Its goal is to capture the environment's visual information and answer related questions. The Nerve asks the questions and aims to use visual information to answer them in Natural Language.
The Nerve
The Nerve perform 2 actions
Asking detailed questions regarding the environment to The Eye.
Summarising the answers and generating the corresponding possible robot actions.
The Brain
The Brain is responsible for the following tasks:
Memorising the 3D environment based on the description given by The Nerve.
Planning future actions of the robot.
Control the actions of the robot based on the current scene and memory.
Execution
The pipeline has 5 stages:
Role Initialization
This is a detailed prompt to set up the 3 agents.
Eye-Nerve Perception
This is an iterative process where The Nerve repeatedly asks questions to The Eye to get as detailed a view of the environment as possible.
Brain-Nerve Collaboration
The Brain acts as a higher-level agent and instructs the next steps to The Nerve.
Brain Reasoning and Control
Using the power of LLMs, The Brain reasons about the most optimal action to take based on current inputs. (Recall Chain-of-Thoughts prompts of LLMs)
Brain-Human Interaction
The Brain of the robot is also responsible to interact with the user.
Example
Initialization
Working
Conclusion
LLMs are getting more deeply integrated into various aspects of programming. As the LLMs’s ability to be effective at reasoning improves, we might get truly, fully autonomous agents, which given a task, choose the models, methods and information required to achieve it. This is an exciting time in the field of Computer Science and probably also human history!
📖Resources
That’s it for this issue. I hope you found this article interesting. Until next time!
Let’s connect :)