YOLO Object Detection

This tutorial will guide you through the process of performing object detection using the Ultralytics YOLOv8 model on images obtained from your robots cameras.

Recommended Robots

This tutorial can be ran with any robot with the vision modality, some examples are:

InMotion Robotics - Lite3
Unitree - Go2
Softbank Robotics - Nao
PAL Robotics - Tiago

Prerequisites

Before trying these tutorials make sure you have followed the instructions from the dependencies step to set up the development environment for your chosen programming language.

These tutorials also assume you have installed the BOW Hub available for download from https://bow.software and that you have subscribed with a Standard Subscription (or above) or using the 30 day free trial which is required to simulate robots.

What is the aim of this tutorial?

The purpose of this tutorial is to demonstrate how to use the BOW vision modality to perform object detection on the images from a robots cameras. It therefore gives a demonstration of how to obtain images, pass them into an external tool, display and annotate the images using opencv.

In this case the object detection if performed by the Ultralytics implementation of the YOLOv8 model, this model comes pre-trained using the COCO (Common Objects in COntext) dataset and as such is capable of detecting any object in the labels associated with this dataset. To expand on this you could try using a different detection algorithm, such as those built in to opencv, or another external tool.

Sense

Connect to a robot by calling QuickConnect
Get the images sampled by the robot by calling GetModality("vision")

Decide

Using the YOLO model, decide which objects are visible in the image

Act

Draw a box around the detected object and label it
Display the image

Preparation

If you are planning to run this tutorial in the simulator, then we recommend you add some objects from the COCO Dataset to the world for your robot to detect. Some which are readily available in webots are:

humans/pedestrian/Pedestrian
objects/animals/Cat
objects/animals/Sheep
objects/traffic/StopSign

Similarly, if you are running this tutorial on a physical robot then make sure you have some objects to detect for testing. Potted plants, cups, bottles, mobile phones and of course yourself are all great for testing.

Running the tutorial

Simply run the main.py script in the "Python/Vision_Object_Detection" directory to begin the tutorial.

python main.py

An opencv image window will appear containing images from the first camera supplied by the robot. When an object is detected by the YOLO model, a green box will appear around it alongside a label of the object's classification. Bring new objects into view and see how well the off the shelf YOLO detection works.

Object Detection Window

Code Breakdown

Imports

We start by importing all of our required python modules

    import bow_client as bow
    import bow_utils
    import logging

These modules are required for all BOW python programs.

    import cv2

cv2 is the opencv python module which we use for the computer vision processes, in this case displaying the images and annotating them with detections.

    from ultralytics import YOLO

This is the ultralytics YOLO implementation which we are using for object detection.

Connecting to our robot

    log = bow_utils.create_logger("Bow Tutorial: Vision - Object Detection", logging.INFO)
    log.info(bow.version())

We start by creating a logger for our robot connection and then print the BOW client version information

    myrobot, error = bow.quick_connect(pylog=log, modalities=["vision"])
    if not error.Success:
        log.error("Failed to connect to robot", error)
        sys.exit()

We use the quick_connect function to connect to the robot. This function will automatically connect to the robot selected in your BOW Hub at runtime. We pass the logger we just made and the list of modalities we wish to use, in this case only "vision" as we are only interested in receiving camera images.

We then check that the connection was successful and exit the program if it was not.

Setup

    stopFlag = False

We create a flag which is used to exit the main loop in the case of user interruption

    model = YOLO('yolov8n.pt')

We load a YOLO model from the imported module. In this case we use 'yolov8n.pt', a pretrained model using the COCO dataset. On first run this may take a short while to download, but will be stored locally for subsequent use.

colour = (61, 201, 151)

We the define the colour used for our image annotations in blue, green, red order. In this case its the trademark BOW green.

The main loop

    try:
        while not stopFlag:
            ...
    except KeyboardInterrupt or SystemExit:
        cv2.destroyAllWindows()
        log.info("Closing down")
        stopFlag = True

The whole main loop is wrapped in a try/except block, which destroys the image window and exits the loop by setting the flag to true in the case of a keyboard interrupt (ctrl-c) or if the program is closed.

    image_list, err = myrobot.get_modality("vision", True)
    if not err.Success or len(image_list) == 0:
        continue

We call the get_modality function on the vision channel in order to obtain the list of images samples from the robot. We then test for success and that the returned list is not empty, if this test fails we begin the loop again.

    img_data = image_list[0]
    if img_data is not None and img_data.new_data_flag:
        ...

We obtain the first image sample from the list and test that it contains valid data.# In this simple example we always use the first camera in the list, however you could query the list of image samples in order to choose whichever camera you wanted, for example using the "source" field for the name of the camera, or using the "Transform.Position" field to select an image based on its location on the robot.

    myIm = img_data.image

We are only interested in the image data from this sample. This data is already in the form of an opencv mat and so does not need any conversion before being worked with.

    results = model.predict(source=myIm, show=False, stream_buffer=False, verbose=False)

We then pass this image into the YOLO model, which returns a list of predictions about the objects in the image in the form of Tensors.

if len(results) > 0:
    for box in results[0].boxes.cpu():
        ...

First we check that there is at least one detected object, if so we being iterating through these object in order to draw them onto the image.

    corners = box.xyxy.numpy()[0] # Corners of the detected objects bounding box
    classification = model.names[int(box.data[0][-1])]

The two pieces of information we require in this case, is the location of the detection in the frame, and the predicted classification of this detection. The xyxy.numpy() returns the coordinates of top left and bottom right corners of the objects bounding box in a numpy array, a format that we can easily work with. The classification is obtained by looking up the value in the first element of the data tensor in the model.names list which contains all of the classifications associated with this model.

    myIm = cv2.rectangle(myIm, (int(corners[0]), int(corners[1])), (int(corners[2]), int(corners[3])),
    colour, thickness=3)

We draw the bounding box of the object onto the image in our chosen colour using the rectangle function from opencv.

    label_x = int(corners[0])
    label_y = int(corners[1]) - 10  # Position above the box by default
    # Check if the label is going off-screen
    if label_y < 10:
        label_y = int(corners[3]) + 20  # Position below the box
 
    cv2.putText(myIm, classification, (label_x, label_y), cv2.FONT_HERSHEY_SIMPLEX, 0.7, colour, 1)

Next we specify the location to write the classification label on the image. As standard we define this as 10 pixels above the top left of the box, however we then check if we are within 10 pixels of teh top of the image, and if so we instead set the location to 20 pixels below the bottom left of the box. We then use this location alongside out chosen colour in the opencv putText function which then draws the text onto the image in this location.

    cv2.imshow(img_data.source, myIm)

We then use the imshow function to display the resulting image in a window. The camera name obtained from the Source field of the image sample is used as the window title.

    j = cv2.waitKeyEx(1)
    if j == 27:
        break

Finally, opencv requires a small delay following an imshow call in order to render the image. In this case we use the waitKey function to delay for 1 millisecond and obtain any key press. We also use this as a method of testing for the escape key (27) and break from the main loop if it has been pressed.

Exit

    cv2.destroyAllWindows()
    myrobot.disconnect()
    bow.close_client_interface()

The last required actions are to clean everything up following the exit of the program. We ensuire the image windows have been removes, disconnect from the robot and then close our client.

The Complete Code

    # Imports
    import bow_client as bow
    import bow_utils
    import logging
    import sys
    import cv2
    from ultralytics import YOLO
 
    # Create a logger for our robot connection and print version info
    log = bow_utils.create_logger("Bow Tutorial: Vision - Object Detection", logging.INFO)
    log.info(bow.version())
 
    # Connect to the robot selected in BOW Hub
    myrobot, error = bow.quick_connect(pylog=log, modalities=["vision"])
    if not error.Success:
        log.error("Failed to connect to robot", error)
        sys.exit()
 
    # Create a flag so we can exit our main loop
    stopFlag = False
 
    # Create our YOLO object detection model
    model = YOLO('yolov8n.pt')  # load an official detection model
 
    # Define a colour for our image annotations
    colour = (61, 201, 151)
 
    try:
        while not stopFlag:
            # Sense
 
            # Retrieve all images from the robot using the vision modality
            image_list, err = myrobot.get_modality("vision", True)
            # Test for failures
            if not err.Success or len(image_list) == 0:
                continue
 
            # For this example we will always use the first camera in the list
            img_data = image_list[0]
 
            # Test for new and valid data
            if img_data is not None and img_data.new_data_flag:
 
                # Extract OpenCV image
                myIm = img_data.image
 
                # Pass image into the yolo model for object detection
                results = model.predict(source=myIm, show=False, stream_buffer=False, verbose=False)
 
                # Iterate though detected objects and draw them on the image
                if len(results) > 0:
                    for box in results[0].boxes.cpu():
                        # Extract data from results in a useful form
                        corners = box.xyxy.numpy()[0] # Corners of the detected objects bounding box
                        classification = model.names[int(box.data[0][-1])] # models predicted object classification
 
                        # Draw the bounding box onto the image in our chosen colour
                        myIm = cv2.rectangle(myIm, (int(corners[0]), int(corners[1])), (int(corners[2]), int(corners[3])),
                        colour, thickness=3)
 
                        # Set the label position
                        label_x = int(corners[0])
                        label_y = int(corners[1]) - 10  # Position above the box by default
                        # Check if the label is going off-screen
                        if label_y < 10:
                        label_y = int(corners[3]) + 20  # Position below the box
 
                        # Draw the label onto the image in our chosen colour
                        cv2.putText(myIm, classification, (label_x, label_y),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.7, colour, 1)
 
                # Display the image
                cv2.imshow(img_data.source, myIm)
 
            # Check for keyboard escape
            j = cv2.waitKeyEx(1)
            if j == 27:
                break
 
    # Kill on ctrl-c or closure
    except KeyboardInterrupt or SystemExit:
        cv2.destroyAllWindows()
        log.info("Closing down")
        stopFlag = True
 
    # Handle disconnect of robot on exit
    cv2.destroyAllWindows()
    myrobot.disconnect()
    bow.close_client_interface()