YOLO Object Detection
This tutorial will guide you through the process of performing object detection using the Ultralytics YOLOv8 model on images obtained from your robots cameras.
This tutorial can be ran with any robot with the vision modality, some examples are:
- InMotion Robotics - Lite3
- Unitree - Go2
- Softbank Robotics - Nao
- PAL Robotics - Tiago
Prerequisites
Before trying these tutorials make sure you have followed the instructions from the dependencies step to set up the development environment for your chosen programming language.
These tutorials also assume you have installed the BOW Hub available for download from https://bow.software and that you have subscribed with a Standard Subscription (or above) or using the 30 day free trial which is required to simulate robots.
What is the aim of this tutorial?
The purpose of this tutorial is to demonstrate how to use the BOW vision modality to perform object detection on the images from a robots cameras. It therefore gives a demonstration of how to obtain images, pass them into an external tool, display and annotate the images using opencv.
In this case the object detection if performed by the Ultralytics implementation of the YOLOv8 model, this model comes pre-trained using the COCO (Common Objects in COntext) dataset and as such is capable of detecting any object in the labels associated with this dataset. To expand on this you could try using a different detection algorithm, such as those built in to opencv, or another external tool.
Sense
- Connect to a robot by calling QuickConnect
- Get the images sampled by the robot by calling GetModality("vision")
Decide
- Using the YOLO model, decide which objects are visible in the image
Act
- Draw a box around the detected object and label it
- Display the image
Preparation
If you are planning to run this tutorial in the simulator, then we recommend you add some objects from the COCO Dataset to the world for your robot to detect. Some which are readily available in webots are:
- humans/pedestrian/Pedestrian
- objects/animals/Cat
- objects/animals/Sheep
- objects/traffic/StopSign
Similarly, if you are running this tutorial on a physical robot then make sure you have some objects to detect for testing. Potted plants, cups, bottles, mobile phones and of course yourself are all great for testing.
Running the tutorial
Simply run the main.py script in the "Python/Vision_Object_Detection" directory to begin the tutorial.
An opencv image window will appear containing images from the first camera supplied by the robot. When an object is detected by the YOLO model, a green box will appear around it alongside a label of the object's classification. Bring new objects into view and see how well the off the shelf YOLO detection works.
Code Breakdown
Imports
We start by importing all of our required python modulesThese modules are required for all BOW python programs.
cv2 is the opencv python module which we use for the computer vision processes, in this case displaying the images and annotating them with detections.
This is the ultralytics YOLO implementation which we are using for object detection.
Connecting to our robot
We start by creating a logger for our robot connection and then print the BOW client version information
We use the quick_connect function to connect to the robot. This function will automatically connect to the robot selected in your BOW Hub at runtime. We pass the logger we just made and the list of modalities we wish to use, in this case only "vision" as we are only interested in receiving camera images.
We then check that the connection was successful and exit the program if it was not.
Setup
We create a flag which is used to exit the main loop in the case of user interruption
We load a YOLO model from the imported module. In this case we use 'yolov8n.pt', a pretrained model using the COCO dataset. On first run this may take a short while to download, but will be stored locally for subsequent use.
We the define the colour used for our image annotations in blue, green, red order. In this case its the trademark BOW green.
The main loop
The whole main loop is wrapped in a try/except block, which destroys the image window and exits the loop by setting the flag to true in the case of a keyboard interrupt (ctrl-c) or if the program is closed.
We call the get_modality function on the vision channel in order to obtain the list of images samples from the robot. We then test for success and that the returned list is not empty, if this test fails we begin the loop again.
We obtain the first image sample from the list and test that it contains valid data.# In this simple example we always use the first camera in the list, however you could query the list of image samples in order to choose whichever camera you wanted, for example using the "source" field for the name of the camera, or using the "Transform.Position" field to select an image based on its location on the robot.
We are only interested in the image data from this sample. This data is already in the form of an opencv mat and so does not need any conversion before being worked with.
We then pass this image into the YOLO model, which returns a list of predictions about the objects in the image in the form of Tensors.
First we check that there is at least one detected object, if so we being iterating through these object in order to draw them onto the image.
The two pieces of information we require in this case, is the location of the detection in the frame, and the predicted classification of this detection. The xyxy.numpy() returns the coordinates of top left and bottom right corners of the objects bounding box in a numpy array, a format that we can easily work with. The classification is obtained by looking up the value in the first element of the data tensor in the model.names list which contains all of the classifications associated with this model.
We draw the bounding box of the object onto the image in our chosen colour using the rectangle function from opencv.
Next we specify the location to write the classification label on the image. As standard we define this as 10 pixels above the top left of the box, however we then check if we are within 10 pixels of teh top of the image, and if so we instead set the location to 20 pixels below the bottom left of the box. We then use this location alongside out chosen colour in the opencv putText function which then draws the text onto the image in this location.
We then use the imshow function to display the resulting image in a window. The camera name obtained from the Source field of the image sample is used as the window title.
Finally, opencv requires a small delay following an imshow call in order to render the image. In this case we use the waitKey function to delay for 1 millisecond and obtain any key press. We also use this as a method of testing for the escape key (27) and break from the main loop if it has been pressed.
Exit
The last required actions are to clean everything up following the exit of the program. We ensuire the image windows have been removes, disconnect from the robot and then close our client.