OpenAI Integration
Welcome to this tutorial series in robotics powered by the BOW SDK. This example was originally showcased in a BOW webinar "Bring your robot dog to life with the ChatGPT API and BOW SDK" which you can watch below.
We highly recommend a Quadruped robot for this tutorial, so we suggest:
- InMotion Robotics - Lite3
- InMotion Robotics - X30
- Unitree - Go2
Prerequisites
Before trying these tutorials make sure you have followed the instructions from the dependencies step to set up the development environment for your chosen programming language.
These tutorials also assume you have installed the BOW Hub available for download from https://bow.software and that you have subscribed with a Standard Subscription (or above) or using the 30 day free trial which is required to simulate robots.
This tutorial requires that you have a paid openai account and have an API key setup and exported as an environment variable "OPENAI_API_KEY=sk-xxxxx". See the OpenAI documentation for more details on setting up the OpenAI API.
What is the aim of this tutorial?
Since OpenAI rocketed into relevancy and the mainstream, people have been asking how do we embody this powerful technology so that it is not limited to responding only with text and images, and can instead interact with the real world. This tutorial aims to give a very brief example of how combining the BOW SDK and the OpenAI API can make this a reality without any complexity.
For this tutorial we are going to focus on the tasks of object detection and searching for objects. Although OpenAI provides methods of analysing images, this process is (currently) not fast enough to be used for real world control, as such, object detection will be performed locally by a computer vision model, YOLOv8. This model comes pre-trained using the COCO (Common Objects in COntext) dataset and as such is capable of detecting any object in the labels associated with this dataset. The results of this object detection can then be communicated to the OpenAI API.
Further we have to provide functions which the API can call to control the robot, in this case it will be a very basic search function which allows the robot to rotate on the spot, therefore changing its view of the world, until the target object becomes visible.
These functions will all be contained within a simple-to-use GUI which not only displays the robots viewpoint, but also allows you to communicate with the OpenAI powered robot and read its responses.
Sense
- Connect to a robot by calling QuickConnect
- Open a stream of communication with the OpenAI Assistant
- Get the images sampled by the robot by calling GetModality("vision")
- Perform object detection locally using YOLOv8 the cutting edge computer vision model
- Overlay object detections onto the images and display them within the GUI
- Communicate the state of the robot to the OpenAI Assistant
- Provide information to the assistant in the form of user messages
Decide
In this case decisions are all being made by the OpenAI Assistant. Based on the information we provide to the assistant it can choose to:
- Search for an object in the COCO dataset
- Stop
- Request a list of currently visible objects
- Request a list of currently running functions
- Speak to the user
- Ask the user for clarification/further information
Act
- Assistant triggers one of the above actions
- Assistant communicates its chosen action to the user in the form of a message
- Message is passed to the robot to be spoken
- If a search is trigger the robot will being to rotate on the spot using its method of locomotion.
Preparation
Creating an OpenAI Assistant
Before running the demo it is first necessary to create an openai assistant. This is an instance of an openai model which not only responds to queries but also has the ability to call functions from within your application. To understand more about OpenAI Assistants read their overview.
To create an assistant navigate to the "Assistant Functions" folder within "OpenAI Integration" directory, which is part of the SDK-Tutorials repository.
Execute the "openai_create_assistant.py" script:
The output of this script contains your assistant ID, this is a string in the form "asst_abc123", take note of this string as it is the reference to your created assistant. It is also possible to view, create and delete assistants in your browser by logging into your openai account.
Replace the assistant ID currently declared on line 9 of openai_brain.py with your new assistant ID:
to
Preparing your simulated world
If you are running this demonstration in the simulator, then we recommend you add some objects from the COCO Dataset to the world for your robot to detect. Some which are readily available in webots are:
- humans/pedestrian/Pedestrian
- objects/animals/Cat
- objects/animals/Sheep
- objects/traffic/StopSign
Running and interacting with the tutorial
Simply run the gui.py script in the "OpenAI_Integration" directory to begin the tutorial.
A gui will appear which, on the left, will show images from the robot's camera with object detections being overlaid.
To communicate with your robot assistant type messages in the white box on the right side of the gui and hit enter or press the send button. You will see a response from the ChatGPT assistant in the black output box, which will not only answer your questions but also command your robot to perform its available actions and communicate these actions to you.
If your robot has speech capabilities, these responses will also be vocalised by your robot.
Code Structure
The code has 3 key python files:
gui.py
This file is execution point for this project and contains:
- The gui description
- A small class for storing detected objects and their details
- The main function. The main function launches the gui, connects to the robot, begins sampling images from the robot and passing them into the local yolo model, the output of this model is then parsed to store details of the detected objects. These objects are then drawn onto the image and displayed in the gui.
robot_controller.py
This file Contains the process for connecting to a robot and the local functions for controlling the robot which can be called by the assistant. These are very basic implementations meant only as a starting point.
openai_brain.py
This file contains the implementation used to communicate with the assistant and parse its function calling capabilities.
Code Breakdown
Coming Soon!