Embodied Intelligence

Enable a large language model to control your robot.

By the end of this tutorial you will be able to:

  1. Connect your robots to the latest artificial intelligence models
  2. Specify the desired behaviour for your robot at the highest level of abstraction
  3. Define a simple robot assistant using a large language model

These tutorials have been designed as a practical guide to using the BOW SDK, and to help develop an understanding of the key challenges in robotics, and how the BOW platform (including the robot drivers, the inverse kinematics solver, BOW Insight etc.) help us to overcome them. It should be clear at this point that the key design principle behind the BOW platform has been ‘to make development more intuitive through abstraction’.

We’ve seen how communications with a robot’s sensors and motors are abstracted away from the hardware details, so that data can be exchanged intuitively using the <channel>.get and <channel>.sety commands. Data exchanged in the process are abstracted away from low-level types to the level of universal messages, organised intuitively in terms of communication channels. The representation of a robot’s body structure is abstracted to the level of a generic kinematic tree, allowing reference to a robot’s effectors to translate intuitively between robot models and across form factors. The inverse kinematics solver allows the stack of coordinate systems for each joint to be abstracted away, leaving a single coordinate system in which we can more intuitively specify movement commands. And we’ve seen how the correlations between image pixels in a robot’s camera image can be exploited by a neural network to abstract (identify and localise) objects in the visual scene, so that intuitive labels can be used when defining goal-directed behaviours.

The net effect of these layers of abstraction is that our most intuitive symbol system, natural language, can now be brought to bear on robotics.

It is particularly timely to be working in this space. We started developing the BOW SDK not long before large language models (LLMs) like ChatGPT from OpenAI first hit the market and re-ignited the world’s fascination with Artificial Intelligence. These A.I. systems do not display genuine intelligence, but their ability to model and recreate sequences of symbols with realistic temporal structure is a powerful reminder of the promise of combining neural networks with big data. The foundations have been in place since Nobel laureate Geoff Hinton introduced error correction methods for training neural nets in the 1980s, and another psychologist (Jeffrey Elman) showed how nets with internal processing loops could capture language structure in the early 1990s. Now that methods for harnessing massive datasets from the internet have caught up, the possibility of using natural language to control robots is becoming a reality.

Hooking a LLM up to a robot is great fun, and we are certainly not the first to have done it. But connecting LLMs to robots was never our goal. For us, the ability to control a robot using a LLM is instead a critical test of how successful we have been in pursuit of an appropriately intuitive abstraction of robotics communications and control. If we’ve done a good job, and the BOW SDK really is the tool to accelerate the progress of robotics, then hooking up a LLM should be straightforward. Spoiler alert… it is!

Take a look at the following to see what we're aiming for:

About the Application

The purpose of the application you will develop here is to define a simple robot assistant using a large language model. The structure of the application is as follows:

Before we get stuck in:

If you are just browsing to get a sense of what's possible, take a look at the code online.

Running the Application

Navigate to the Applications/EmbodiedIntelligence/Python folder in the SDK Tutorials repository:

cd SDK-Tutorials/Applications/EmbodiedIntelligence/Python

Execute the example program:

python main.py

Investigation

When you run the application a gui will appear which, on the left, will show images from the robot's camera with object detections being overlaid.

To communicate with your robot assistant type messages in the white box on the right side of the gui and hit enter or press the send button. You will see a response from the ChatGPT assistant in the black output box, which will not only answer your questions but also command your robot to perform its available actions and communicate these actions to you.

If your robot has speech capabilities, these responses will also be vocalised by your robot.

OpenAI >< BOW GUI Window

Code Structure

The code has three key python files:

The file called main.py is the execution point for this project and contains:

  • The gui description
  • A small class for storing detected objects and their details
  • The main function. The main function launches the gui, connects to the robot, begins sampling images from the robot and passing them into the local yolo model, the output of this model is then parsed to store details of the detected objects. These objects are then drawn onto the image and displayed in the gui.

The file called robot_controller.py contains the process for connecting to a robot and the local functions for controlling the robot which can be called by the assistant. These are very basic implementations meant only as a starting point.

The file called openai_brain.py contains the implementation used to communicate with the assistant and parse its function calling capabilities.

On this page