OpenCL on Intel Programmable Acceleration Card with Intel Arria 10 GX FPGA Quick Start User Guide
The Acceleration Stack for Intel? Xeon? CPU with FPGAs 1.0 Alpha Release includes OpenCL? support, located in the dcp_1_0_alpha/opencl folder, and consisting of the following files:
Acceleration Stack for Intel? Xeon? CPU with FPGAs 1.0 OpenCL? BSP
OpenCL? example designs tested with Acceleration Stack for Intel? Xeon? CPU with FPGAs:
Pre-compiled kernels <aocx>:
About this Document
Table 1. Document Conventions
Precedes a command that indicates the command is to be entered as root.
Indicates a command is to be entered as a user.
Filenames, commands, and keywords are printed in this font. Long command lines are printed in this font. Although long command lines may wrap to the next line, the return is not part of the command; do not press enter.
Indicates the placeholder text that appears between the angle brackets must be replaced with an appropriate value. Do not enter the angle brackets.
Table 2. Acceleration Stack for Intel? Xeon?CPU with FPGAs Glossary
Acceleration Stack for Intel? Xeon?CPU with FPGAs
A collection of software, firmware and tools that provides performance-optimized connectivity between an Intel?FPGA and an Intel? Xeon?processor.
Intel? Programmable Acceleration Card with Intel? Arria? 10 GX FPGA
Intel? PAC with Arria? 10
PCIe?accelerator card with an Intel? Arria? 10 FPGA. Programmable Acceleration Card is abbreviated PAC.
Contains a FPGA Interface Manager (FIM) that pairs with an Intel? Xeon?processor over PCIe?bus.
Intel? Xeon? Processor with Integrated FPGA
Integrated FPGA Platform
Intel? Xeon?plus FPGA platform with the Intel? Xeon?and an FPGA in a single package and sharing a coherent view of memory via Quick Path Interconnect (QPI).
This user guide describes how to get started with the OpenCL? on the Intel? PAC with Arria? 10 1.0 Beta Release. The instructions use the precompiled OpenCL? kernels included in this 1.0 Beta Release. This user guide also includes a brief introduction to compiling OpenCL? kernels.
OpenCL? designs comprise two components, the kernel and the host. The kernel includes the accelerator code. The host runs on the host machine. The the accelerator card plugs into the host machine.
Note: You must have root permission on the host machine to setup OpenCL? .
Select Intel FPGA Runtime Environment for OpenCL? Linux x86-64 RPM and click Download.
Accept the License Agreement.
Log in when requested. (Create an account if you do not already have one.)
Save the file, aocl-rte-17.0.0-1.x86_64.rpm.
Install the runtime by running the command:
# sudo yum install aocl-rte-17.0.0-1.x86_64.rpm
Installing the Release
You build the OpenCL? BSP provided with this release on top of the Acceleration Stack for Intel? Xeon? CPU with FPGAs. Follow instructions from the IntelAcceleration Stack Quick Start Guide for Intel? Programmable Acceleration Card with Intel? Arria? 10 GX FPGA to set up the Intel? PAC with Arria? 10 FPGA.
Install the Intel? Programmable Acceleration Card with Intel? Arria? 10 GX FPGA board.
Flash the Intel? Programmable Acceleration Card with Intel? Arria? 10 GX FPGA.
Install the FPGA driver.
Build the OPAE Open Programmable Acceleration Engine (OPAE) software.
Run the following command:
# sudo ldconfig
You can now run the OPAE software in a non-virtualized environment or in a virtualized environment with Single Root IO Virtualization (SR-IOV) disabled. Complete the following additional steps for a virtualized environment that includes virtual functions (VFs) and SRIOV:
For OpenCL? functionality, load the OpenCL? configuration before enabling SRIOV mode.
Set the CL_CONTEXT_COMPILER_MODE_ALTERA environment variable to disable FPGA configuration or reconfiguration during OpenCL? host runtime:
After completing these steps, the following environment variable is available:
DCP_LOC – points to the location of the extracted release archive.
Note: To avoid having to reset DCP_LOC after a reboot, save this variable to your shell initialization script.
Installing the OpenCL BSP
The OpenCL? BSP is an archive file. To extract this file, create a working directory for the the BSP and run the following commands:
$ tar xf $DCP_LOC/opencl/opencl_bsp_*.tar.gz
$ cd opencl_bsp
$ export AOCL_BOARD_PACKAGE_ROOT=`pwd`
To avoid having to reset the AOCL_BOARD_PACKAGE_ROOT environment variable after a reboot, save it to your shell initialization script.
Setting Up Permissions
Running OpenCL? requires you to set various permissions and system parameters. Running the setup_permissions.sh script completes this task. The script uses sudo internally; consequently, it requires root privileges.
Run the script once, when you enable OpenCL? for first time on this host:
aocl diagnose: Running diagnose from /mnt/Tools/<user_name>_tools/dcp_1.0_rush_creek_b11/opencl/opencl_bsp/linux64/libexec
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_f400000)
Using Device from vendor: Intel Corp clGetDeviceInfo CL_DEVICE_GLOBAL_MEM_SIZE = 8589934592
clGetDeviceInfo CL_DEVICE_MAX_MEM_ALLOC_SIZE = 8588886016
Memory consumed for internal use = 1048576
Actual maximum buffer size 8588886016 bytes
Writing 8191 MB to global memory...
Allocated 1073741824 Bytes host buffer for large transfers
Write speed: 6576.87 MB/s [6567.03 -> 6580.40]
Reading and verifying 8191 MB from global memory ...
Read speed: 6915.93 MB/s [6885.51 -> 6926.66]
Successfully wrote and readback 8191 MB buffer
Transferring 262144 KBs in 512 512 KB blocks ... 3748.69 MB/s
Transferring 262144 KBs in 256 1024 KB blocks ... 3870.31 MB/s
Transferring 262144 KBs in 128 2048 KB blocks ... 4528.58 MB/s
Transferring 262144 KBs in 64 4096 KB blocks ... 5405.89 MB/s
Transferring 262144 KBs in 32 8192 KB blocks ... 5923.38 MB/s
Transferring 262144 KBs in 16 16384 KB blocks ... 6246.18 MB/s
Transferring 262144 KBs in 8 32768 KB blocks ... 6427.20 MB/s
Transferring 262144 KBs in 4 65536 KB blocks ... 6624.94 MB/s
Transferring 262144 KBs in 2 131072 KB blocks ... 6774.19 MB/s
Transferring 262144 KBs in 1 262144 KB blocks ... 6855.82 MB/s
As a reference:
PCIe Gen1 peak speed: 250MB/s/lane
PCIe Gen2 peak speed: 500MB/s/lane
PCIe Gen3 peak speed: 985MB/s/lane
Writing 262144 KBs with block size (in bytes) below:
Write top speed = 6603.61 MB/s
Read top speed = 6855.82 MB/s
Throughput = 6729.71 MB/s
OpenCL Support for Multi-Card Systems
Before running an OpenCL? application, program the PAC card with an Accelerator Function (AF) that includes the BSP logic. Use the aocl program command to load an aocx file to the PAC card. It is only necessary to program the AF one time per PAC card. After the initial programming, you can use the OpenCL? API to load different applications to the PAC card using aocx program command.
Run the aocl diagnose -probe command to determine how many FPGAs the system includes. For example, running the aocl diagnose -probe command on a system with three PAC cards might show output similar to the following:
Kernel initialization is complete.
Launching the kernel...
Thread #2: Hello from Altera’s OpenCL Compiler!
Kernel execution is complete.
Running Vector Add
$ cd $DCP_LOC/opencl
$ mkdir exm_opencl_vector_add_x64_linux
$ cd exm_opencl_vector_add_x64_linux
$ tar xf ../exm_opencl_vector_add_x64_linux.tgz
$ cd vector_add
Copy precompiled OpenCL? kernel to bin folder:
$ cp $DCP_LOC/opencl/vector_add.aocx ./bin
Example sample output:
Platform: Intel(R) FPGA SDK for OpenCL(TM)
Using 1 device(s)
pac_a10 : PAC Arria 10 Platform (pac_a10_f400000)
Using AOCX: vector_add.aocx
Reprogramming device  with handle 1
Launching for device 0 (1000000 elements)
Time: 7.282 ms
Kernel time (device 0): 3.451 ms
Compiling OpenCL Kernels
Refer to the Intel FPGA SDK for OpenCL? Getting Started Guide for more details, including how to compileOpenCL? kernels.
After completing instructions from the current document and the Intel FPGA SDK for OpenCL? Getting Started Guide, you can compile an OpenCL? kernel to an aocx file using a command similar to the following: