Real-time motion detection in video on Zynq FPGA

UPDATE 23.2.2017: Since I no longer work with Xilinx products (more than 3 years!) I cannot provide any updates or support regarding the information given below. You may continue reading at your own risk!


Since I am leaving my current employer UTIA by the end of the year and thus I will–likely–not be working with FPGAs any more (at least not for living), I wanted to implement a larger design in Zynq to enjoy the technology for the last time 😉

The selected application is real-time motion detection in video stream. We implemented it in the project SMECY in Spartan 6 FPGA using the master/worker abstractions. The original design in Spartan 6 achieved 5 FPS at best, and 3 FPS when other features are included (morphology, labeling).

Here I used Zynq XC7Z020 FPGA in the ZC702 board with the IMAGEON daughter card. No code is reused from the SMECY solution. Video pipeline is realized using AXI Streams, HDMI is used for input and output, the accelerator was implemented using Vivado HLS (high-level synthesis from C). The synthesis tool used is Vivado 2013.3 with IP Integrator (replaces XPS).

One possible practical application of the motion detection is in smart cameras for surveillance (security, safety) use — see the second youtube video below. The HDMI input would be replaced with a camera interface and the FPGA system could be integrated in the camera module.

Below is a demonstration video. The application runs at 8.2 FPS with one accelerator, and 14 FPS with two accelerators (not shown in the videos).

Video input and output is via HDMI on the Imageon extension card. The input 1080p video is fed via HDMI from a PC running Ubuntu. Output is 1280x720p to a monitor. The output image contains a top-right 640×480 corner which is (also) the input to the motion detection. Output 640×480 black&white mask is positioned visually next to it.

Motion detection algorithm

The system implements real-time video motion detection, sometimes also called foreground/background pixel segmentation. The algorithm is derived from a paper by Kaewtrakulpong; the implementation does not use shadow detection and it has several modifications intended to lower compute complexity.

The goal of image segmentation is to mark each pixel in an image frame as a part of static background or moving foreground. The decision depends on statistical models and their mixtures. All pixels in image are considered independently. Each pixel is modelled by a mixture of K strongest Gaussian models of background, K=4 in the implementation. Each Gaussian model k is defined by a set of 3 mean values \mu_{R,k}, \mu_{G,k}, \mu_{B,k}, corresponding to the three primary colours red, green, blue; by variance \sigma_k, and by weight w_k. Models represent RGB colours that are considered to be `stationary background’ colours of the pixel. As there are K=4 independent Gaussian models kept for each pixel the algorithm allows for situations when the pixel periodically changes between two colours, such as moving escalators or trees in wind–these scenes are classified as stationary. Each model also contains the weight parameter w_k indicating how often that particular model successfully described background in the pixel.


The picture above shows how the algorithm updates the models; for simplicity the presentation ignores RGB colours and shows only three models. The first picture at the top shows initial situation with three models M1=(\mu_1,\sigma_1), M2=(\mu_2,\sigma_2), and M3=(\mu_3, \sigma_3). Mean values \mu_i position `bell’ shaped models on the horizontal colour (greyscale) axis; variances \sigma_i define widths of the `bells’; and model weights w_i are represented by the heights of the `bells’. When new pixel colour hits in one of the models the model is `strengthen’ by slightly increasing its weight, and the pixel colour is classified as a background that is stationary. This situation is shown in the picture in the middle: the colour hits in model M3, the weight w_3 is increased. If the hit is not precise the model is also slightly shifted towards the new colour.

However, when new pixel colour does not hit any existing Gaussian model, the colour is classified as a foreground that is moving. The weakest model is erased and replaced by a new model representing the new colour, albeit with small initial weight. This is illustrated in the last subpicture above: the weakest model M3 has been replaced by new model.

This algorithm was selected and implemented in a “high-level” C code (intended for CPU execution) by Roman Bartosinsky, a colleague in the SMECY project.

Implementation details

The picture below shows annotated block diagram from IP Integrator. Click to see larger version.

Zynq design of the real-time motion detection in video (640x480)
Zynq design of the real-time motion detection in video (640×480)

The system consists of three main parts: video input pipeline, video output pipeline, and accelerator subsystem.

The video input path is highlighted using the yellow colour in the system image above. The pipeline consists of the following processing cores:

  1. HDMI input and decoder
  2. video to AXI-Stream covertor
  3. YUV 4:2:2 to 4:4:4 expander (16 to 24 bits) (custom core in VHDL)
  4. YUV 4:4:4 to RGB colour space converter
  5. 24-to-32 bits pixel padding
  6. Video DMA storing the input video stream into the main memory via
  7. AXI memory interconnect and
  8. Zynq HP0 port (150MHz, 64bits).

The video output path is highlighted using the blue colour. It basically mirrors the input path in the reverse order:

  1. Zynq HP1 port (150MHz, 64bits),
  2. AXI memory interconnect,
  3. Video DMA reading via the above ports and producing pixel stream on its AXI-Stream output,
  4. 32-to-24 bits trim
  5. RGB to YUV 4:4:4 colour space convertor
  6. YUV 4:4:4 to YUV 4:2:2 conversion (24 to 16 bits)
  7. AXI-Stream to video stream
  8. HDMI output coder.

The accelerator subsystem uses the red path.

  1. Pixel and context data is accessed using Zynq HP2 port (100MHz, 64bits),
  2. by the Central DMA engine (in scatter/gather mode that automatically fetches new block descriptors via the ACP port)
  3. and transferred via AXI interconnects to
  4. AXI BRAM controllers
  5. that connect to the actual BRAMs.
  6. The BRAMs keep the working data for the accelerator – pixels in, context, pixels out.
  7. The accelerator HW implemented by Vivado HLS.

The accelerator HW block requires: 17 DSP48E blocks, 5967 FFs, and 7229 LUTs. It runs at 100 MHz clock. In can process 1024 pixels in one activation. This requires 4kB for pixel input and output data, and 5*16kB=80kB for context data. The accelerator is internally pipelined. Processing of a single pixel takes 96 clock-cycles, a new pixel can be accepted into the pipeline every 9 clock-cycles. Using a single accelerator instance delivers about 8.2 FPS.

UPDATE: A configuration with two accelerator instances achieves 14 FPS.

 UPDATE 2: The source codes can be downloaded from here.


19 thoughts on “Real-time motion detection in video on Zynq FPGA”

  1. hello

    I realy interested that is very good work

    i want to ask you, are you use c code and you implement it by using vivado HLS or you use only the IPs already exist in vivado?


    1. I started with a generic C code implementing the core function (background detection). This code was molested in Vivado HLS until it synthesized into a performing hardware. The HLS has good analysis views that give you accurate what-if results without doing all the synthesis and measurement in hardware.
      Data transfers and video interfaces were realized using existing cores in Vivado, The IP core generated by HLS runs over a fragment (small area) of image in its local buffer.

      1. Hi, I want to rebuilt the project. I use the same development environment Vivado 2013.3 as you did. And then run your code there. However I found that it will stop because of the memory address setting. It seems that something is wrong with the CDMA or BRAM.
        So anyway, it seems that the code you provided can not run directly. Is there anything else we should change?

  2. Hello
    thank you very much for your answer.

    I’m master’s student and the first step in my project is to do the background subtraction. I found the OpenCV library and i do this function but the problem was i can’t implement it in vivado hls because it is non-synthesizable. Are you use the same library or you use another think.
    Thank you

  3. Ok, thank you I read this answer now
    “I used Vivado HLS – High Level Synthesis from C. But a C code must be heavily massaged before the HLS will give you any usable results. So it’s not so simple as may seem.
    Regarding OpenCV: the obvious solution is re-implementing algorithm manually in VHDL or Verilog. A good point to start could be a GPU parallel implementations, such as OpenCL or CUDA. There also exist CUDA/OpenCL-to-FPGA compilers, so this could be another venue. But that compilers I have never tried.”

  4. I am trying for MATLAB to HDL generation from image segmentation algorithm. I have written code in MATLAB (.m file) and has been converted to VHDL using HDL coder. Now How can I use this code to implement it on Zynq FPGA board. Can you provide some basic code to read and write images from PC to FPGA and vice-versa? Thank you…..

    1. Hello hiren, sorry it’s been many years since I worked with Matlab/HDL so I cannot provide you with up-to-date answer. The way we did it in the past was having BlockRAMs as an interface between ARM and Matlab HDL code. Then you can run Linux on ARM in Zynq with full ethernet and TCP/IP support, and have a custom application in it that passes data from TCP/IP to BlockRAMs (they must be memory-mapped to ARM) and orchestrates the execution.

  5. hi sir,
    im doing my project on shape detection.. if i have and image in DDR3 ,, i will read using VDMA , the output is now AXI4 stream. i need help is:

    how can i read reading stream data from VDMA and store in buffer, and apply sobel edge detection, and based on shape i need to count how many grains are available in one single frame .

  6. Hi sir,
    Thank you for you hard work.
    I’ve downloaded your source code and tried to implement on my kit but the result is totally white. The input is from my laptop running win10 and the output is connected to another moniter. How can i fix this?
    Thank you in advance.

    1. **
      Hi sir,
      there is also this message on the console screen:

      Waiting for ADV7611 to locked on incoming video …
      ERROR : ADV7611 has NOT locked on incoming video, aborting !

      Thank you

  7. Hi,
    I want to ask you about my project with Xilinx ZC702. I want to encode my video file (in the SD Card) from YUV format to H265 format with Kvazaar, and next the h265 file will placed in the SD Card too. How I can do with my board (My board use Petalinux V2013.10 and my PC use Ubuntu 14.04)
    Thank You for your attention.

  8. Anyone should forget about this project. The hardware does not match with the software. If you open it with vivado and check the hardware design you will find the difference between the design you have and that on this page. One simple example, interrupt output of the CDMA is not linked to the PS_7. Then how could this work. And the bugs in real are more than you expected. Be smart forget about it. It is almost impossible to change the hardware design.

    1. Hi! I am sorry the information provided free-of-charge on my page did not help you. I no longer work with Xilinx products since 2013 so I cannot give any updates. You’re probably right that the design files are not complete. It was never meant as a finished IP core that someone could just download and plug in their product. Rather I wanted to evaluate Vivado HLS (using their 30-day demo license) and see for myself how it stands. Jara.

      1. Hi! I am sorry for the rude words. This is a good project and thank you for your information and your source code. You are right you are not responsible for the software and do not need guarantee that all the code could work.
        I really apology for my words and thank you for your work.

  9. Hello! I hope you are doing well. I am working on a similar project using Vivado HLS and would need some help with it. When I run a similar code in SDK it works fine, but the same code in HLS gives noisy output. I was hoping you could provide some valuable help. My email ID is Thanks in advance.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.