Zynq FPGA: OpenCV + OpenMP + Petalinux

Developing applications in computer vision tasks needs to meet next requirements: fast developing (e.g. time-to-market) and well-performing algorithms. This requirements are contradictory: it takes a lot of time not only to release well performance and non-bugging algorithms, but to understand mathematics of the desired algorithms as well. Fortunately there are some ways meet these requirements in computer vision: to use ready-to-use libraries like OpenCV  and parallelization algorithms with OpenMP technology.
In this post I show how to use OpenCV library with  OpenMP API on Zynq FPGA. I’ll create Linux Image by using Petalinux  and Vivado 2019.1 for Arty Z7-20 board with dual core Cortex A9 ARM.


Prerequisites
PC with Linux installed (Ubuntu 16.04 on virtual machine)
SD card ( 8 GB or bigger)
SD card reader
Xilinx ZYNQ Digilent Arty z7-20 board
PC with Vivado 2019.1 and Petalinux installed

Hardware design
Video for vivado project creation: 



After synthesis and implementation it’s necessary to export and run project in Xilinx SDK for getting opencv_openmp_lnx.sdk folder  which is going to be used by Petalinux.

Create and config petalinux project 

Video describing this part

Create new Petalinux project with name ‘opencv_openmp’ for hardware design described in project_vivado.sdk directory

petalinux-create --template zynq --type project --name opencv_openmp
cd opencv_openmp
petalinux-config --get-hw-description=/home/alexey/Documents/opencv_openmp_lnx.sdk

In our Embedded Linux OS we have to add OpenCV and OpenMP packages. For adding them we should edit petalinux-image.bbappend file that is situated in project-spec/meta-user/recipes-core/images/ folder. So just add some lines listed below

IMAGE_INSTALL_append = " opencv-dev"
IMAGE_INSTALL_append = " packagegroup-core-buildessential"
IMAGE_INSTALL_append = " libgomp"
IMAGE_INSTALL_append = " libgomp-dev"
IMAGE_INSTALL_append = " libgomp-staticdev"

All “libgomp- “ allow to include OpenMP libraries and headers, “opencv-dev” - libraries and headers for OpenCV, “packagegroup-core-buildessential” - for including compiler GCC and all necessary depences. 
Include packages to rootfs of our Linux image by running command
petalinux-config -c rootfs
Go to user packages menu and check all packages


Change IP from DHCP to static. For setting IP address run
petalinux-config
Subsystem AUTO Hardware Settings, Ethernet Settings, unmark Obtain IP address automatically and set desire IP address


Then go to Image Packaging Configuration:
Root filesystem type -> SD card
Copy final images to tftpboot -> unmark
Exit and save settings, build project by executing command
petalinux-build
After building project finished we have to create BOOT.BIN file by launching command 

petalinux-package --boot --format BIN --fsbl ./images/linux/zynq_fsbl.elf --fpga ./images/linux/design_1_wrapper.bit --u-boot

File design_1_wrapper.bit may be found in .sdk directory and simple copy to images/linux/ folder.
After these steps we are ready to create boot SD card.

Prepare SD card for booting Linux 

Video on youtube



I use gparted utility for creating partitions on bootable SD card. Partitions created on /dev/sda looks like this


For BOOT fat32 file system we have to reserve minimum 500 MiB and for ext4 rootfs file system more than 1.5 GiB. Files BOOT.BIN and image.ub from images/linux/ directory have to be copied to BOOT partition of SD card. Filesystem rootfs has to be extracted on second partition by running commands (assuming you are inside <root_petalinux_project>/images/linux directory)

sudo umount /dev/sda2
sudo dd if=rootfs.ext4 of=/dev/sda2

When extracting rootfs.ext4 is finished our SD card is ready!

Create user app for testing OpenCV and OpenMP API 

I create simple user application that I’ll run on Arty z7-20 board. User application performs multiplication of four pairs of float32 matrix size of 2048x2048 each.  I use for-loop cycle for multiplication every pair of matrix. Multiplication in first for-loop cycle is performed sequentially , in second for-loop -- I used #pragma omp parallel for num_threads for creating two threads: each thread will perform multiplying of two pairs of  matrix. So multiplication should be faster with using OpenMP API. Check it!





Code we are going to test in main.cpp:

#include <iostream>
#include <opencv2/opencv.hpp>
#include <vector>
#include <math.h>
#include <time.h>
#include <omp.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>

#define CHANNELS 4
using namespace std;
using namespace cv;

static double dtime(){
    double t;
    struct timeval tv;
    gettimeofday(&tv, NULL);
    t = tv.tv_sec + ((double)tv.tv_usec)/1e6;
    return t;
}

int main()
{
    cout << "Start programm" << endl;

    vector<Mat>      matrix_1(CHANNELS);
    vector<Mat>      matrix_2(CHANNELS);
    vector<Mat>      result(CHANNELS);
    Mat rnd_image = Mat(2048, 2048, CV_32FC1, Scalar(0));
    // matrix initialization

for (int i = 0; i < CHANNELS; i++) {
 randn(rnd_image, Scalar(0), Scalar(16000));
 rnd_image.copyTo(matrix_1[i]);
 randn(rnd_image, Scalar(0), Scalar(16000));
 rnd_image.copyTo(matrix_2[i]);
}

 double start = dtime();
for (int i = 0; i < CHANNELS; i++) {
 multiply(matrix_1[i], matrix_2[i], result[i]);
}

 cout << "Time taken by for_loop: " << dtime() - start << " seconds " << endl;
 start = dtime();

#pragma omp parallel for num_threads(2)
 for (int i = 0; i < CHANNELS; i++) {
  multiply(matrix_1[i], matrix_2[i], result[i]);
 }

 cout << "Time taken by for_loop with openmp: " << dtime() - start  << " seconds " << endl;
    return 0;
}


After Log in just copy main.cpp file into /home/root/ directory and run command for compiling main.cpp:

g++ main.cpp -fopenmp -I /usr/include/opencv2 -L /usr/lib -lopencv_core

I set flag -fopenmp for g++ compiler and point where my opencv headers and libs are situated. I’ve got a.out executable file, run one and got in console:

Start programm
Time taken by for_loop: 0.187379 seconds
Time taken by for_loop with openmp: 0.105422 seconds



So for_loop multiplication with OpenMP runs faster because of parallelization by two threads.

Comments

  1. Betway Casino & Resort, Las Vegas, NV - MapYRO
    MapYRO 상주 출장안마 is a Casino and Resort in Las Vegas, NV and offers a 세종특별자치 출장샵 number of table games. 포항 출장마사지 Book 양산 출장안마 online 광주광역 출장안마 or call (702) 770-7577.

    ReplyDelete

Post a Comment