OpenCL on Intel Programmable Acceleration Card with Intel Arria 10 GX FPGA Quick Start User Guide
发布时间:2018/3/17
Release Content
About this Document
Conventions
Convention | Description |
---|---|
# | Precedes a command that indicates the command is to be entered as root. |
$ | Indicates a command is to be entered as a user. |
This font | Filenames, commands, and keywords are printed in this font. Long command lines are printed in this font. Although long command lines may wrap to the next line, the return is not part of the command; do not press enter. |
<variable_name> | Indicates the placeholder text that appears between the angle brackets must be replaced with an appropriate value. Do not enter the angle brackets. |
Acceleration Glossary
Term | Abbreviation | Description |
---|---|---|
Acceleration Stack for Intel? Xeon?CPU with FPGAs | Acceleration Stack | A collection of software, firmware and tools that provides performance-optimized connectivity between an Intel?FPGA and an Intel? Xeon?processor. |
Intel? Programmable Acceleration Card with Intel? Arria? 10 GX FPGA | Intel? PAC with Arria? 10 | PCIe? accelerator card with an Intel? Arria? 10 FPGA. Programmable Acceleration Card is abbreviated PAC. Contains a FPGA Interface Manager (FIM) that pairs with an Intel? Xeon? processor over PCIe?bus. |
Intel? Xeon? Processor with Integrated FPGA | Integrated FPGA Platform | Intel? Xeon? plus FPGA platform with the Intel? Xeon? and an FPGA in a single package and sharing a coherent view of memory via Quick Path Interconnect (QPI). |
Introduction
This user guide describes how to get started with the OpenCL? on the Intel? PAC with Arria? 10 1.0 Beta Release. The instructions use the precompiled OpenCL? kernels included in this 1.0 Beta Release. This user guide also includes a brief introduction to compiling OpenCL? kernels.
Release Content
Setting Up the Host Machine
Installing the OpenCL Runtime
- Navigate to the Intel FPGA SDK for OpenCL? download page.
- Click on the RTE (runtime environment) tab.
- Select Intel FPGA Runtime Environment for OpenCL? Linux x86-64 RPM and click Download.
- Accept the License Agreement.
- Log in when requested. (Create an account if you do not already have one.)
- Save the file, aocl-rte-17.0.0-1.x86_64.rpm.
- Install the runtime by running the command:
# sudo yum install aocl-rte-17.0.0-1.x86_64.rpm
Installing the Release
- Install the Intel? Programmable Acceleration Card with Intel? Arria? 10 GX FPGA board.
- Flash the Intel? Programmable Acceleration Card with Intel? Arria? 10 GX FPGA.
- Install the FPGA driver.
- Build the OPAE Open Programmable Acceleration Engine (OPAE) software.
- Run the following command:# sudo ldconfig
- You can now run the OPAE software in a non-virtualized environment or in a virtualized environment with Single Root IO Virtualization (SR-IOV) disabled. Complete the following additional steps for a virtualized environment that includes virtual functions (VFs) and SRIOV:
- For OpenCL? functionality, load the OpenCL? configuration before enabling SRIOV mode.
- Set the CL_CONTEXT_COMPILER_MODE_ALTERA environment variable to disable FPGA configuration or reconfiguration during OpenCL? host runtime:$ CL_CONTEXT_COMPILER_MODE_ALTERA=3
- DCP_LOC – points to the location of the extracted release archive.
Installing the OpenCL BSP
$ tar xf $DCP_LOC/opencl/opencl_bsp_*.tar.gz
$ cd opencl_bsp
$ export AOCL_BOARD_PACKAGE_ROOT=`pwd`
To avoid having to reset the AOCL_BOARD_PACKAGE_ROOT environment variable after a reboot, save it to your shell initialization script.
Setting Up Permissions
Running OpenCL? requires you to set various permissions and system parameters. Running the setup_permissions.sh script completes this task. The script uses sudo internally; consequently, it requires root privileges.
Procedure:
- Run the script once, when you enable OpenCL? for first time on this host:
$ $AOCL_BOARD_PACKAGE_ROOT/linux64/libexec/setup_permissions.sh
- Reboot the computer because some permanent settings only take effect after a reboot.
- Some of the settings are not permanent. Consequently, you must rerun the setup_permissions.shcommand after rebooting.
$ $AOCL_BOARD_PACKAGE_ROOT/linux64/libexec/setup_permissions.sh
Initializing the Run Time Environment (RTE)
Before running OpenCL? examples, you must initialize the RTE.
$ export ALTERAOCLSDKROOT=/opt/altera/aocl-rte
$ source /opt/altera/aocl-rte/init_opencl.sh
Setup Summary
- Set the following environment variables:
- DCP_LOC
- AOCL_BOARD_PACKAGE_ROOT
- Run the permissions script:
$ $AOCL_BOARD_PACKAGE_ROOT/linux64/libexec/setup_permissions.sh
- Initialize RTE:
$ source /opt/altera/aocl-rte/init_opencl.sh
Running Diagnostics
Before running diagnostics, load an OpenCL? kernel to the board. The following instructions use the hello_worldkernel, you may also use your own.
- Load hello_world OpenCL? kernel:
$ aocl program acl0 $DCP_LOC/opencl/hello_world.aocx
- Run the simple diagnostic utility:
$ aocl diagnose
Sample diagnostic output:aocl diagnose: Running diagnose from /mnt/Tools/<user_name>_tools/dcp_1.0_rush_creek_b11/opencl/opencl_bsp/linux64/libexec
------------------------- acl0 -------------------------
Vendor: Intel Corp
Phys Dev Name Status Information
pac_a10_f400000 Passed PAC Arria 10 Platform (pac_a10_f400000)
PCIe 04:00.0
FPGA temperature = 47 degrees C.
DIAGNOSTIC_PASSED
---------------------------------------------------------
- Run the advanced diagnostic:
$ aocl diagnose acl0
Sample advanced diagnostic output:aocl diagnose: Running diagnose from /mnt/Tools/<user_name>_tools/dcp_1.0_rush_creek_b11/opencl/opencl_bsp/linux64/libexec
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_f400000)
Using Device from vendor: Intel Corp clGetDeviceInfo CL_DEVICE_GLOBAL_MEM_SIZE = 8589934592
clGetDeviceInfo CL_DEVICE_MAX_MEM_ALLOC_SIZE = 8588886016
Memory consumed for internal use = 1048576
Actual maximum buffer size 8588886016 bytes
Writing 8191 MB to global memory...
Allocated 1073741824 Bytes host buffer for large transfers
Write speed: 6576.87 MB/s [6567.03 -> 6580.40]
Reading and verifying 8191 MB from global memory ...
Read speed: 6915.93 MB/s [6885.51 -> 6926.66]
Successfully wrote and readback 8191 MB buffer
Transferring 262144 KBs in 512 512 KB blocks ... 3748.69 MB/s
Transferring 262144 KBs in 256 1024 KB blocks ... 3870.31 MB/s
Transferring 262144 KBs in 128 2048 KB blocks ... 4528.58 MB/s
Transferring 262144 KBs in 64 4096 KB blocks ... 5405.89 MB/s
Transferring 262144 KBs in 32 8192 KB blocks ... 5923.38 MB/s
Transferring 262144 KBs in 16 16384 KB blocks ... 6246.18 MB/s
Transferring 262144 KBs in 8 32768 KB blocks ... 6427.20 MB/s
Transferring 262144 KBs in 4 65536 KB blocks ... 6624.94 MB/s
Transferring 262144 KBs in 2 131072 KB blocks ... 6774.19 MB/s
Transferring 262144 KBs in 1 262144 KB blocks ... 6855.82 MB/s
As a reference:
PCIe Gen1 peak speed: 250MB/s/lane
PCIe Gen2 peak speed: 500MB/s/lane
PCIe Gen3 peak speed: 985MB/s/lane
Writing 262144 KBs with block size (in bytes) below:
Block_Size Avg Max Min End-End (MB/s)
524288 3489.94 3670.94 914.56 2766.34
1048576 3396.38 3473.50 3004.03 3040.98
2097152 4391.87 4528.58 3931.17 4104.98
4194304 5365.16 5405.89 4999.12 5157.90
8388608 5896.15 5923.38 5699.69 5816.82
16777216 6215.94 6246.18 6135.78 6150.34
33554432 6398.59 6427.20 6376.68 6358.53
67108864 6532.26 6542.09 6518.01 6516.37
134217728 6582.69 6590.64 6574.77 6580.38
268435456 6603.61 6603.61 6603.61 6603.61
Reading 262144 KBs with block size (in bytes) below:
Block_Size Avg Max Min End-End (MB/s)
524288 3396.10 3748.69 3144.31 2673.85
1048576 3686.15 3870.31 3250.39 3407.62
2097152 3862.68 4044.24 3623.88 3641.46
4194304 3981.96 4082.71 3878.14 3870.63
8388608 5006.64 5050.74 4849.07 4904.61
16777216 5813.24 5835.33 5783.57 5761.21
33554432 6319.69 6331.87 6305.91 6283.10
67108864 6621.20 6624.94 6618.82 6604.06
134217728 6772.23 6774.19 6770.28 6768.46
268435456 6855.82 6855.82 6855.82 6855.82
Write top speed = 6603.61 MB/s
Read top speed = 6855.82 MB/s
Throughput = 6729.71 MB/s
DIAGNOSTIC_PASSED
OpenCL Support for Multi-Card Systems
Before running an OpenCL? application, program the PAC card with an Accelerator Function (AF) that includes the BSP logic. Use the aocl program command to load an aocx file to the PAC card. It is only necessary to program the AF one time per PAC card. After the initial programming, you can use the OpenCL? API to load different applications to the PAC card using aocx program command.
Run the aocl diagnose -probe command to determine how many FPGAs the system includes. For example, running the aocl diagnose -probe command on a system with three PAC cards might show output similar to the following:
- $ aocl diagnose -probe
aocl diagnose: Running diagnose from /storage/shared/home_directories/
gsouther/regtest/2017-12-15/1101.53/adapt_remote_tests/dcp_1_0_skx/opencl/
boardtest/opencl_bsp_build/linux64/libexec
pac_a10_f200001
pac_a10_f200000
pac_a10_f200002
- The following command programs the first card listed in 1 Step 1:$ aocl program pac_a10_f200001 hello_world.aocx
aocl program: Running program from /storage/shared/home_directories/
gsouther/regtest/2017-12-15/1101.53/adapt_remote_tests/dcp_1_0_skx/opencl/
boardtest/opencl_bsp_build/linux64/libexec
Program succeed. - The following command programs the second card listed in 1 Step 1:$ aocl program pac_a10_f200000 hello_world.aocx
aocl program: Running program from /storage/shared/home_directories/
gsouther/regtest/2017-12-15/1101.53/adapt_remote_tests/dcp_1_0_skx/opencl/
boardtest/opencl_bsp_build/linux64/libexec
Program succeed. - After programming the FPGAs, the aocl diagnose command provides information about them:$ aocl diagnose
aocl diagnose: Running diagnose from /storage/shared/home_directories/
gsouther/regtest/2017-12-15/1101.53/adapt_remote_tests/dcp_1_0_skx/opencl/
boardtest/opencl_bsp_build/linux64/libexec
------------------------- acl0 -------------------------
Vendor: Intel Corp
Phys Dev Name Status Information
pac_a10_f200001 Passed PAC Arria 10 Platform (pac_a10_f200001)
PCIe 05:00.0
FPGA temperature = 79 degrees C.
DIAGNOSTIC_PASSED
---------------------------------------------------------
------------------------- acl1 -------------------------
Vendor: Intel Corp
Phys Dev Name Status Information
pac_a10_f200000 Passed PAC Arria 10 Platform (pac_a10_f200000)
PCIe 03:00.0
FPGA temperature = 79 degrees C.
DIAGNOSTIC_PASSED
---------------------------------------------------------
Running Samples
Running Hello World
- Extract hello_world example:
$ cd $DCP_LOC/opencl
$ mkdir exm_opencl_hello_world_x64_linux
$ cd exm_opencl_hello_world_x64_linux
$ tar xf ../exm_opencl_hello_world_x64_linux.tgz
- Build example:
$ cd hello_world
$ make
- Copy aocx to example bin folder:
$ cp $DCP_LOC/opencl/hello_world.aocx ./bin/
- Run example:
$ ./bin/host
Example sample output:Querying platform for info:
==========================
CL_PLATFORM_NAME = Intel(R) FPGA SDK for OpenCL(TM)
CL_PLATFORM_VENDOR = Intel(R) Corporation
CL_PLATFORM_VERSION = OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 17.0
Querying device for info:
========================
CL_DEVICE_NAME = pac_a10 : PAC Arria 10 Platform (pac_a10_f400000)
CL_DEVICE_VENDOR = Intel Corp
CL_DEVICE_VENDOR_ID = 4466
CL_DEVICE_VERSION = OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 17.0
CL_DRIVER_VERSION = 17.0
CL_DEVICE_ADDRESS_BITS = 64
CL_DEVICE_AVAILABLE = true
CL_DEVICE_ENDIAN_LITTLE = true
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 32768
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 0
CL_DEVICE_GLOBAL_MEM_SIZE = 8589934592
CL_DEVICE_IMAGE_SUPPORT = true
CL_DEVICE_LOCAL_MEM_SIZE = 16384
CL_DEVICE_MAX_CLOCK_FREQUENCY = 1000
CL_DEVICE_MAX_COMPUTE_UNITS = 1
CL_DEVICE_MAX_CONSTANT_ARGS = 8
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 2147483648
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3
CL_DEVICE_MEM_BASE_ADDR_ALIGN = 8192
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE = 1024
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR = 4
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT = 2
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT = 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG = 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT = 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE = 0
Command queue out of order? = false
Command queue profiling enabled? = true
Using AOCX: hello_world.aocx
Reprogramming device [0] with handle 1
Kernel initialization is complete.
Launching the kernel...
Thread #2: Hello from Altera’s OpenCL Compiler!
Kernel execution is complete.
Running Vector Add
- Extract example:
$ cd $DCP_LOC/opencl
$ mkdir exm_opencl_vector_add_x64_linux
$ cd exm_opencl_vector_add_x64_linux
$ tar xf ../exm_opencl_vector_add_x64_linux.tgz
- Build example:
$ cd vector_add
$ make
- Copy precompiled OpenCL? kernel to bin folder:
$ cp $DCP_LOC/opencl/vector_add.aocx ./bin
- Run example:
$ ./bin/host
Example sample output:Initializing OpenCL
Platform: Intel(R) FPGA SDK for OpenCL(TM)
Using 1 device(s)
pac_a10 : PAC Arria 10 Platform (pac_a10_f400000)
Using AOCX: vector_add.aocx
Reprogramming device [0] with handle 1
Launching for device 0 (1000000 elements)
Time: 7.282 ms
Kernel time (device 0): 3.451 ms
Verification: PASS
Compiling OpenCL Kernels
Refer to the Intel FPGA SDK for OpenCL? Getting Started Guide for more details, including how to compileOpenCL? kernels.
$ aoc $DCP_LOC/opencl/exm_opencl_vector_add_x64_linux/vector_add/device/vector_add.cl
aoc --list-boards
Output
Board list:
pac_a10
Document Revision History
Date | Version | Changes |
---|---|---|
December 2017 | 2017.12.22 | Initial release. |