Quick Start
Summary
1. Check the Requirements
To be able to quick start with the framework you would need:
- One (or more) servers with
Singularity CE - Singularity Download
Compatible MPICH installation (MPICH v. 4.0.2) - MPICH Download
AMD Vitis Tools 2022.1 (or superior) - Vitis Page
AMD XRT version 2.13.466 (or superior) - XRT Page
One (or more) AMD Alveo Board
2. Download the Container
For this tutorial, we will be using the OMPC FPGA container. It includes all the tools needed to run applications using the framework. We suggest using Singularity.
Download the container using Singularity:
singularity pull docker://pedroohr/runtime-fpga:latest
3. Get the Application
As an example, let’s start with a basic Vector Addition example. Following figure shows the application:
A basic CPU kernel to execute the application can be implemented as the following:
1void vadd_cpu(int *A, int *B, int *C, int size) {
2 for (int i = 0; i < size; i++)
3 C[i] = A[i] + B[i];
4}
And it is called somewhere in the application as:
1vadd(A, B, C, N);
Note
This application is already available in the container on the path /examples/vadd/vadd_cpu.cpp.
But you can find a fully functional implementation example here: vadd_cpu
4. Find an FPGA Kernel
The framework facilitates the usage of any kernel that can be used as an alternative to a defined CPU function (i.e.: share equivalent prototypes).
Tip
Application developers can always change the CPU functions to match a desired FPGA kernel (even if the arguments will not be used in the CPU implementation).
So, let’s say we found an FPGA implementation for the vadd kernel (in this case, an HLS version of the kernel):
1void vadd_fpga(int *A, int *B, int *C, int size) {
2#pragma HLS INTERFACE m_axi port = A
3#pragma HLS INTERFACE m_axi port = B
4#pragma HLS INTERFACE m_axi port = C
5#pragma HLS INTERFACE s_axilite port = return
6 for (int i = 0; i < size; i++)
7 C[i] = A[i] + B[i];
8}
That implementation can be compiled using the AMD VitisTM Compiler. The code below shows how to compile for the AMD Alveo u55c board.
v++
Note
This kernel implementation is already available in the container on the path /examples/vadd/fpga_kernel.cpp.
But you can find the kernel implementation here: vadd_cpu.cpp
5. Integrate Application and FPGA Kernel
The integration of the FPGA kernel can be done with just a few lines of code.
To make the program understand we want to use the FPGA kernel as an alternative to the CPU kernel we need two lines of code (lines 1 and 2):
1void vadd_fpga(int *A, int *B, int *C, int size);
2#pragma omp declare variant(vadd_fpga) match(device={arch(alveo)})
3void vadd_cpu(int *A, int *B, int *C, int size) {
4 for (int i = 0; i < size; i++)
5 C[i] = A[i] + B[i];
6}
Finally, in the line we call that function in the code we need to create an OpenMP Target task (line 1) and establish a syncronization point (line 4), so the program knows when to execute the kernels.
1#pragma omp target map(to: A[:N], B[:N]) map(tofrom: C[:N]) nowait
2vadd_cpu(A, B, C, N);
3
4#pragma omp taskwait
Important
Observe how the original call to vadd_cpu do not change even if using FPGAs!
Note
This application is already available in the container on the path /examples/vadd/vadd_fpga.cpp.
But you can find a fully functional implementation example here: vadd_fpga.cpp
6. Run on FPGAs
To run the application using the FPGA kernel, one need to compile first, and then run, using the provided container:
Compiling it using Singularity:
singularity exec runtime-fpga_latest.sif clang++ -fopenmp -fopenmp-targets=alveo -fno-openmp-new-driver vadd_fpga.cpp -o vadd_fpga
Run it using Singularity:
# Runs using 1 worker node containing FPGAs
mpirun -np 2 singularity exec runtime-fpga_latest.sif ./fpga_vadd
Important
Currently, we run the applications using mpirun, the number of nodes will always be: 1 + number of workers
That is it! Happy coding with FPGAs