Traffic Segmentation Using Ultra96v2 And Vitis-ai

About the project

This is a real time demo running on Ultra96V2 for segmentation a live YouTube video stream from Shibuya Crossing, Tokyo.

Project info

Difficulty: Difficult

Platforms: AvnetXilinx

Estimated time: 4 days

License: Apache License 2.0 (Apache-2.0)

Items used in this project

Hardware components

Avnet Ultra96-V2 Avnet Ultra96-V2 x 1

Software apps and online services

Xilinx Vitis AI 1.2.1 Xilinx Vitis AI 1.2.1
Xilinx Petalinux 2020.1 Xilinx Petalinux 2020.1
Xilinx Vitis 2020.2 Xilinx Vitis 2020.2



Traffic congestion has become a major problem for almost every large metropolitan area. There is a requirement for an intelligent traffic control system to establish a reliable transportation system. In this project I demonstrate a method for data collection for determining traffic congestion on roads using image processing and a pre-trained FPN model.

Ultra96V2 board Setup

The pre-built SD card image can be downloaded from which includes compiled models with the B2304 (1 x DPU) with low RAM usage configuration. The image is flashed to the SD card using balenaEchter. After powering and booting the Ultra96V2 board it can be connected over wifi (look for SSID similar to Ultra96V2). We can access the default web interface at where the wifi can be configured to connect the local network (home/office wifi router) and can use the internet.


I have used Xilinx Vitis-AI development stack for inferencing. The Vitis AI Runtime (VART) API has been used to develop a Python script for inferencing. The code can be found in the GitHub repository mentioned in the code section at the end.

The workflow is to read a YouTube live stream from a video camera mounted at Shibuya Crossing, Tokyo and do segmentation using a Caffe FPN model from Vitis AI Model Zoo. Although the C++ API seems faster but I have chosen Python API for the demonstration and better understanding of the code for beginners. The Ultra96V2 board has wifi so I used it as an edge device without any camera or monitor connected. The Ultra96V2 does not have a hardware video decoder so I am using OpenCV to process the incoming video stream and output is streamed over local network using Gstreamer. The script is using 4 threads: 2 threads for DPU inferencing tasks, 1 thread for capturing live stream and 1 thread is used for streaming output.

The Ultra96V2 comes with a heatsink which works well but running multithreaded inferencing makes it very hot and CPU gets throttled so I put it on the table fan cover for better cooling.

An FPN (Feature Pyramid Network) model pre-trained on Cityscapes database is used for segmentation task.

Real time Inferencing

Please follow steps below to run inferencing demo.

Connect to the Ultra96V2 board using ssh (default user and password is root)

  1. $ ssh root@ultra96v2-ip-address

Clone the repository and install prerequisites:

  1. $ git clone
  2. $ cd AdaptiveComputing
  3. $ pip3 install pafy

Please change following variable in the to match your target machine ip address:

ip      = ''

Run the inferencing script:

  1. $ python3

Install GStreamer in the target machine and run following command:

  1. $ gst-launch-1.0 udpsrc port=1234 ! application/x-rtp,encoding-name=JPEG,payload=26 ! rtpjpegdepay ! jpegdec ! autovideosink

I would like to thank Avnet for providing the Ultra96-V2 development board.


Traffic Segmentation


Photo of knaveen


Bioinformatician, Researcher, Programmer, Maker, Community contributor Machine Learning Tokyo


Leave your feedback...