ROS2 node for open-vocabulary object detection using NanoOWL.
NanoOWL optimizes OWL-ViT to run real-time on NVIDIA Jetson Orin with TensorRT. This project provides a ROS 2 package for object detection using NanoOWL.
- Set up your Isaac ROS development environment following instructions here.
- Clone required projects under
${ISAAC_ROS_WS}/src
:
cd ${ISAAC_ROS_WS}/src
git clone https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_common.git
git clone https://github.com/NVIDIA-AI-IOT/ROS2-NanoOWL.git
git clone https://github.com/NVIDIA-AI-IOT/nanoowl
git clone https://github.com/NVIDIA-AI-IOT/torch2trt
git clone --branch humble https://github.com/ros2/demos.git
- Launch the docker container using the
run_dev.sh
script:
cd ${ISAAC_ROS_WS}/src/isaac_ros_common
./scripts/run_dev.sh
- Install dependencies:
- Pytorch: The Isaac ROS development environment that we set up in step 1 comes with PyTorch preinstalled. Check your PyTorch version using the interactive Python interpreter by running python from terminal, and these commands:
import torch torch.__version__
- NVIDIA TensorRT: If you’re developing on an NVIDIA Jetson, TensorRT is pre installed as part of JetPack. Verify the installation by running python from terminal, and then this command in the interactive Python interpreter:
import tensorrt
. If it says ‘ModuleNotFound’, try the following command and check again following the steps above:If this fails, run the following command and try again:sudo apt-get install python3-libnvinfer-dev
In case the 'ModuleNotFound' error still shows up - The python bindings to tensorrt are available insudo apt-get install apt-utils
dist-packages
, which may not be visible to your environment. We adddist-packages
toPYTHONPATH
to make this work:Ifexport PYTHONPATH=/usr/lib/python3.8/dist-packages:$PYTHONPATH
tensorrt
is still not installed, try the following command:pip install pycuda
- Torchvision: Identify which version of torchvision is compatible with your PyTorch version from here. Clone and install that specific version from source in your workspace's src folder:
git clone –-branch <version> https://github.com/pytorch/vision.git
. For example:Verify that torchvision has been installed correctly using the interactive Python interpreter by running python from terminal, and these commands:cd ${ISAAC_ROS_WS}/src git clone --branch v0.13.0 https://github.com/pytorch/vision.git cd vision pip install .
If it says ‘ModuleNotFound’, try each of the following and check again following the steps above:cd ../ import torchvision torchvision.__version__
sudo apt install nvidia-cuda-dev pip install ninja sudo apt-get install ninja-build
- Transformers library:
pip install transformers
- Matplotlib:
pip install matplotlib
- torch2trt:
Enter the torch2trt repository cloned in step 2 and install the package:
cd ${ISAAC_ROS_WS}/src/torch2trt pip install .
- NanoOWL:
Enter the NanoOWL repository cloned in step 2 and install the package:
cd ${ISAAC_ROS_WS}/src/nanoowl pip install .
- cam2image:
We want to use the image_tools package from the
demos
repository that we cloned to take input from an attached usb camera. Build and source this package from your workspace:Verify that the cam2image node works by running the following command in a terminal and viewing topiccd ${ISAAC_ROS_WS} colcon build --symlink-install --packages-select image_tools source install/setup.bash
/image
in RViz/Foxglove from another terminal:ros2 run image_tools cam2image
- Pytorch: The Isaac ROS development environment that we set up in step 1 comes with PyTorch preinstalled. Check your PyTorch version using the interactive Python interpreter by running python from terminal, and these commands:
- Build ros2_nanoowl:
cd ${ISAAC_ROS_WS}
colcon build --symlink-install --packages-select ros2_nanoowl
source install/setup.bash
- Build the TensorRT engine for the OWL-ViT vision encoder - this step may take a few minutes:
cd ${ISAAC_ROS_WS}/src/nanoowl
mkdir -p data
python3 -m nanoowl.build_image_encoder_engine data/owl_image_encoder_patch32.engine
Copy this data
folder with the generated engine file to the ROS2-NanoOWL folder:
cp -r data/ ${ISAAC_ROS_WS}/src/ROS2-NanoOWL
- Run the image publisher node to publish input images for inference. We can use the sample image in
${ISAAC_ROS_WS}/src/nanoowl/assets/
:
cd ${ISAAC_ROS_WS}
ros2 run image_publisher image_publisher_node src/nanoowl/assets/owl_glove_small.jpg --ros-args --remap /image_raw:=/input_image
- You can also play a rosbag for inference. Make sure to remap the image topic to
input_image
. For example:
ros2 bag play <path-to-rosbag> --remap /front/stereo_camera/left/rgb:=/input_image
- From another terminal, publish your input query as a list of objects on the
input_query
topic using the command below. This query can be changed anytime while theros2_nanoowl
node is running to detect different objects. Another way to publish your query is through thepublish
panel in Foxglove (instructions given below in this repository).
ros2 topic pub /input_query std_msgs/String 'data: a person, a box, a forklift'
- Run the launch file to start detecting objects. Find more information on usage and arguments below:
ros2 launch ros2_nanoowl nano_owl_example.launch.py thresholds:=0.1 image_encoder_engine:='src/ROS2-NanoOWL/data/owl_image_encoder_patch32.engine'
- The
ros2_nanoowl
node prints the current query to terminal, so you can check that your most recent query is being used:
If an older query is being published, please update it:
- If using Foxglove: Check that the query on the panel is correct and click the Publish button again. Remember to click the Publish button everytime you update your query!
- If using command line: Rerun the
ros2 topic pub
command (given in step 9) with the updated query.
- Visualize output on topic
/output_image
using RVIZ or Foxglove. Output bounding boxes are published on topic/output_detections
. - To perform inference on a live camera stream, run the following launch file. Publish a query as given in step 9:
ros2 launch ros2_nanoowl camera_input_example.launch.py thresholds:=0.1 image_encoder_engine:='src/ROS2-NanoOWL/data/owl_image_encoder_patch32.engine'
ros2 launch ros2_nanoowl nano_owl_example.launch.py thresholds:=<threshold-value> image_encoder_engine:=<path-to-encoder-engine>
ROS Parameter | Type | Default | Description |
---|---|---|---|
thresholds | float | 0.1 | Threshold for filtering detections |
image_encoder_engine | string | "src/ROS2-NanoOWL/data/owl_image_encoder_patch32.engine" | Path to the TensorRT engine for the OWL-ViT vision encoder |
ROS Topic | Interface | Description |
---|---|---|
input_image | sensor_msgs/Image | The image on which detection is to be performed |
input_query | std_msgs/String | List of objects to be detected in the image |
ROS Topic | Interface | Description |
---|---|---|
output_image | sensor_msgs/Image | The output image with bounding boxes and labels around detected objects |
output_detections | vision_msgs/Detection2DArray | Output detections including bounding box coordinates and label information for each detected object in the image |
- Download and install Foxglove on your Jetson.
- Open Foxglove and click on Open connection.
- Click on the Foxglove WebSocket option - it tells you to connect to your system using the Foxglove Websocket protocol. This option requires running an extra ROS node called the foxglove_bridge.
- Follow instructions on installing and launching the Foxglove bridge.
- Once you’ve successfully launched foxglove_bridge in a terminal, Foxglove should connect to your system and show the default layout.
- Use the Import from file option to import the NanoOWL_Layout.json file included in this repository.
- From the panel at the bottom, you can publish and update queries to the ros2_nano_owl node. Type in the objects you want to detect and click on the red Publish button to start inference!