Intel High Level Synthesis Compiler User Guide
发布时间:2018/3/17
Compared to traditional RTL development, the Intel? HLS Compiler offers the following advantages:
Fast and easy verification
Algorithmic development in C++
Automatic integration of RTL verification with a C++ testbench
Powerful microarchitecture optimizations
Overview of the Intel High Level Synthesis (HLS) Compiler
The Intel? HLS Compiler is command-line compatible with g++, and supports most of the g++ compiler flags. The Intel? HLS Compiler recognizes the same file name extensions as g++, namely .c, .C, .cc, .cpp, .CPP, .c++, .cp, and .cxx. The compiler treats all of these file types as C++. The compiler does not explicitly support C, other than as a subset of C++.
When targeting an FPGA, the Intel? HLS Compiler outputs an executable and a project directory. The default executable is a.out on Linux and a.exe on Windows. The default project directory is a.prj, and it contains HLS results, including the generated IP. It also contains reports and auxiliary information for verification purposes.
To specify the name of the compiler output, include the -o <result> option in your i++ command, where <result> is the name of the executable. This command creates a project directory called <result>.prj.
Running the executable file runs your testbench. When you compile your design to an FPGA architecture, the output executable runs a simulation. When you compile your design to the x86-64 architecture, the output executable runs your design on the CPU.
High Level Synthesis Design Flow
- Creating your component and testbench.
You can write a complete C++ application that contains both your component code and your testbench code.
For details, see Creating a High-Level Synthesis Component and Testbench.
- Verify the function of your component algorithm and testbench by compiling your design to x86-64 executable.
For details, see Verifying the Functionality of Your IP Design.
- Optimize and refine the FPGA performance of your component.
For details, see Optimizing and Refining Your Component.
After initial optimizations, you can see where to further refine your component by compiling it for simulation. For details, see Verifying Your IP with Simulation.
- Synthesize your component with Intel? Quartus? Prime.
For details, see Synthesize your Component IP with Intel Quartus Prime.
Synthesizing your component also generates accurate quality-of-results (QoR) metrics like area and performance estimates.
- Integrate your IP into a system with Intel? Quartus? Prime or Platform Designer (formerly Qsys).
For details, see Integrating your IP into a System.
The Project Directory
Directory | Description |
---|---|
components | Contains a folder for each component, and all HDL and IP files that are needed to use that component in a design. |
verification | Contains all the files for the verification testbench. |
reports | Contains reports with information that is useful for analyzing the hardware implementation of the synthesized components. |
quartus | Contains an Intel? Quartus? Prime project that instantiates the components. You can compile this Intel? Quartus? Prime project to generate more detailed timing and area reports. |
Creating a High-Level Synthesis Component and Testbench
Write the functions for your components in the OpenCL?-supported subset of C99 whenever possible. The compiler is capable of synthesizing some C++ constructs, which might be easier for you to use to create cleaner code.
For more information on the supported subset of C99 and its restrictions, see "Supported Subset for Component Synthesis" in Intel? High Level Synthesis Compiler Reference Manual.
Be careful to avoid combining these methods because you might unexpectedly synthesize unwanted components for some of your functions. If you combine these methods, components are synthesized for all functions labeled with the component keyword as well as all components listed in the --component<component_list> option of the i++ command.
If you do not want components synthesized for a function, ensure that you do not have the component attribute specified in the function and ensure that the function is not specified in the --component <component_list>option of the i++ command.
Review the "Area Analysis by Source" section of the high-level design report ( <name>.prj/reports/report.html) to see a breakdown of functions per component.
The HLS compiler creates an executable to run on the CPU. The compiler then sends any calls to functions that you declared as components to simulation of the synthesized IP core, and the simulation results are returned.
Compiler-Defined Preprocessor Macros
The compiler-defined preprocessor macros are __INTELFPGA_COMPILER__ and __INTELFPGA_TYPE__.
Tool Invocation | __INTELFPGA_COMPILER__ | __INTELFPGA_TYPE__ |
---|---|---|
g++ | Undefined | Undefined |
i++ -march=x86-64 | "17.1" | "NONE" |
i++ -march="<FPGA_family_or_part_number>" | "17.1" | "VERILOG" |
Verifying the Functionality of Your IP Design
Verify the functionality of your design by compiling your component and testbench to an x86-64 executable that you can debug with your preferred C++ debugger.
Compiling your design to an x86-64 executable is faster than having to compile your component to hardware or a hardware simulation. This faster compilation time lets you debug and refine your component algorithms quickly before you move on to see how your component is implemented in hardware.
Ensure that you set your compiler command to include debug information. The i++ command generates debug information by default. You can use GDB (on Linux operating systems) or Microsoft Visual Studio (on Windows operating systems) to debug your component and testbench, even if you used the i++ command to compile your code for functional verification.
Optimizing and Refining Your Component
i++ -march="<FPGA_family_or_part_number>" --simulator noneYou can also compile your component with a ModelSim? simulation flow by omitting the --simulator noneoption, but a simulation flow compile might take slightly longer. However, compiling your component with a simulation flow gives you additional information in the high-level design report.
The Intel? HLS Compiler High Level Design Report (report.html)
The high-level design report is an HTML file called report.html that you can open in a web browser to review. You can find the high-level design report in the <name>.prj/reports folder created when you compile your component to RTL.
After you run a simulation flow, the report also shows you verification statistics such as component reset latency.
For more information about the high-level design report and how to use it to optimize and refine your component, see Reviewing the High Level Design Report (report.html).
Verifying Your IP with Simulation
i++ -march="Arria 10" --component <component_list> […] design.cpp
Generation of the Verification Testbench Executable
- Parses your design, and extracts the functions and symbols necessary for component synthesis to the FPGA. The HLS compiler also extracts the functions and symbols necessary for compiling the C++ testbench.
- Compiles the testbench code to generate an x86-64 executable that also runs the simulator.
- Compile the code for component synthesis to the FPGA. This compilation generates Verilog for the component and a Verilog-DPI testbench.
Debugging during Verification
By default, the HLS compiler instructs the simulator not to log any signals because logging the signals slows the simulation and the waveforms files can be very large. However, you can configure the compile to save these waveforms for debugging purposes.
To enable signal logging in the simulator, include the -ghdl option in your i++ command:
i++ -march="<FPGA_family_or_part_number>" -ghdl <input files>
When you view the waveform in ModelSim? , you can view the component top-level signals (start, busy, stall, done, parameters, and outputs) by right-clicking on the <component_name>_inst block and selecting Add Wave.
High-Throughput Simulation (Asynchronous Component Calls) Using Enqueue Function Calls
Function | Description |
---|---|
ihc_hls_enqueue(void* retptr, void* funcptr, …) | This function enqueues one invocation of an HLS component. The return value is stored in the first argument which should be a pointer to the return type. The component does not execute until theihc_hls_component_run_all() function is invoked. |
ihc_hls_enqueue_noret(void* funcptr, …) | This function is similar to ihs_hls_enqueue(void* retptr, void* funcptr, …), except that it does not have an output stream to capture return values. |
ihc_hls_component_run_all (void* funcptr) | This function executes all enqueued calls to the specified component in a pipelined fashion. |
Execution Model
Comparison of Explicit and Enqueued Function Calls
Figure 1 illustrates the waveform of the signals for the component dut. As shown in the following code snippet, the testbench does not include any enqueue function calls.
#include "HLS/hls.h"
#include <stdio.h>
component int dut(int a, int b) {
return a*b;
}
int main (void) {
int x1, x2, x3;
x1 = dut(1, 2);
x2 = dut(3, 4);
x3 = dut(5, 6);
printf("x1 = %d, x2 = %d, x3 = %d\n", x1, x2, x3);
return 0;
}

#include "HLS/hls.h"
#include <stdio.h>
component int dut(int a, int b) {
return a*b;
}
int main (void) {
int x1, x2, x3;
ihs_hls_enqueue(&x1, &dut, 1, 2);
ihs_hls_enqueue(&x2, &dut, 3, 4);
ihs_hls_enqueue(&x3, &dut, 5, 6);
ihs_hls_component_run_all(&dut);
printf("x1 = %d, x2 = %d, x3 = %d\n", x1, x2, x3);
return 0;
}

Synthesize your Component IP with Intel Quartus Prime
When you are happy with the predicted performance of your component, you can then perform the longer hardware synthesis compilation with Intel? Quartus? Prime. This compilation also generates accurate area and performance estimates for your design.
After the Intel? Quartus? Prime compilation completes, the high level design report file shows the area and performance data for your components. These estimates are more accurate than estimates generated when you compile your component with the Intel? HLS Compiler.
To synthesize your component IP and generate quality of results (QoR) data, do one of the following actions:
- Instruct the HLS compiler to run the Intel? Quartus? Prime compilation flow automatically after synthesizing the components. Include the --quartus-compile option in your i++ command.
i++ -march="<FPGA_family_or_part_number>" --quartus-compile --component ...
- If you already have the RTL for you component synthesized, you can navigate to the quartus directory and compile the Intel? Quartus? Prime project by invoking the following command:
quartus_sh --flow compile quartus_compile
Integrating your IP into a System
To integrate your HLS compiler-generated IP into a system with Intel? Quartus? Prime, you must be familiar with Intel? Quartus? Prime Standard Edition or Intel? Quartus? Prime Pro Edition as well as the Platform Designer (formerly Qsys/Qsys Pro) system integration tool included with Intel? Quartus? Prime.
Adding the HLS Compiler-Generated IP into an Intel Quartus Prime Project
- For Intel? Quartus? Prime Standard Edition, add the .qsys file to the project.
- For Intel? Quartus? Prime Pro Edition, add the .ip file to the project
- Create an Intel? Quartus? Prime project.
- Click Project > Add/Remove Files in Project.
- Perform one of the following tasks:
- For the Intel? Quartus? Prime Standard Edition software, in the Settings dialog box, browse to and select the component’s .qsys file.
For example, <result>.prj/components/<component_name>/<component_name>.qsys- For the Intel? Quartus? Prime Pro Edition software, in the Settings dialog box, browse to and select the component’s .ip file.
For example, <result>.prj/components/<component_name>/<component_name>.ip - Instantiate the component top-level module in the Intel? Quartus? Prime project. For an example on how to instantiate the component’s top-level module, refer to the <result>.prj/components/<component_name>/<component_name>_inst.v file.
Adding the HLS Compiler-Generated IP into a Platform Designer System
In Platform Designer, if your HLS compiler-generated IP does not appear in the IP Catalog, perform the following tasks:
- In Intel? Quartus? Prime, click Tools > Options.
- In the Options dialog box, under Category, expand IP Settings and click IP Catalog Search Locations.
- Perform one of the following tasks:
- For Intel? Quartus? Prime Standard Edition, in the IP Catalog Search Locations dialog box , add the path to the directory that contains the .qsys file to IP Search Paths. To find all the components, specify the path as <result>.prj/components/**/*.
- For Intel? Quartus? Prime Pro Edition, in the IP Catalog Search Locations dialog box, add the path to the directory that contains the .ip file to IP Search Paths as <result>.prj/components/<component_name>/<component_name>.ip.
- In IP Catalog, add your IP to the Platform Designer system by selecting it from the HLS project directory.
- "Creating a System with Platform Designer (Standard)" in Intel? Quartus? Prime Standard Edition Handbook Volume 1: Design and Compilation
- "Creating a System with Platform Designer" in Intel? Quartus? Prime Pro Edition Handbook Volume 1: Design and Compilation
Limitations of the Intel HLS Compiler
Compiler support
- Linux compiler support
- The HLS compiler does not support GCC 4.7.0 or newer. The compiler requires GCC compiler and C++ Libraries version 4.4.7
- Windows compiler support
- The HLS compiler for Windows is compatible with Microsoft Visual Studio 2010 only.
C++ Language Restrictions
Program your source code in C++. Where possible, adhere to the OpenCL-supported subset of C99 for code that is intended for synthesis (that is, code inside component functions).
- A component cannot include virtual functions, function pointers, or bit fields.
- Function-scoped static variables that are a part of the component cannot use function arguments for initialization.
- C++11 restrictions
- The HLS compiler does not support certain C++11 features such as initializer lists and lambda functions.
- Class membership
- HLS components cannot be a C++ class member or part of a declared namespace. If you must use a component this way, create a component that is not part of a class or namespace that actually calls the implementation, then call that component.
- Exception handling
- A component cannot contain exception handling.
- Library calls
- The HLS compiler does not currently call to C++ runtime libraries on Windows, including calls from the testbench code.
- Library functions
- A component cannot contain standard C or C++ library functions, unless they are explicitly supported by header files provided with the Intel? HLS Compiler.
A component that contains printf() or cout calls works in its x86 implementation. However, the generated RTL does not include the printf() or cout function calls if you include the HLS/stdio.h library or the HLS/iostream standard C library functions provided with the Intel? HLS Compiler. If you try to generate RTL with the regular stdio.h or iostream headers you will likely experience compiler errors.
- A component cannot contain standard C or C++ library functions, unless they are explicitly supported by header files provided with the Intel? HLS Compiler.
- Multiple inheritance
- The HLS compiler does not support classes with multiple inheritance used as parameters. You may use classes as parameters as long as each class inherits from, at most, one class directly.
- Namespaces
- HLS components cannot be a C++ class member or part of a declared namespace. If you must use a component this way, create a component that is not part of a class or namespace that actually calls the implementation, then call that component.
- Overloading/Templates
- Components cannot be templated functions or overloaded functions. If you must use a component this way, create a component that is not part of a templated function or overloaded function, then call that component.
- Parameters
- The HLS compiler does not support classes with multiple inheritance used as parameters. You may use classes as parameters as long as each class inherits from, at most, one class directly.
- Recursion
- The HLS compiler does not support the synthesis of components that use recursion; however, tail recursion is supported.
If a component has an algorithm that uses recursion, and it is identified for FPGA acceleration, modify the algorithm to use tail recursion, if possible.
- The HLS compiler does not support the synthesis of components that use recursion; however, tail recursion is supported.
Reviewing the High Level Design Report (report.html)
High Level Design Report Layout
Report Menu
From the View reports pull-down menu, you can select a report to see an analysis of different parts of yourcomponent design.
Analysis Pane
The analysis pane displays detailed information of the report that you selected from the View reports pull-down menu.
Source Code Pane
The source code pane displays the code for all the source files in your component.
To select between different source files in your component, click the pull-down menu at the top of the source code pane. To collapse the source code pane, do one of the following actions:
- Click the X icon beside the source code pane pull- down menu.
- Click the vertical ellipsis icon on the right-hand side of the report menu and then select Show/Hide source code.
If you previously collapsed the source code pane and want to expand it, click the vertical ellipsis icon on the right-hand side of the report menu and then select Show/Hide source code.
Details Pane
For each line that appears in a loop analysis or area report, the details pane shows additional information, if available, that elaborates on the comment in the Details column report. To collapse the details pane, do one of the following actions:
- Click the X icon on the right-hand side of the details pane.
- Click the vertical ellipsis icon on the right-hand side of the report menu and then select Show/Hide details.
Reviewing the Report Summary
The report summary gives you a quick overview of the results of compiling your design including a summary of each component in your design and a summary of the estimated resources that each component in your design uses.
The report summary is divided into four sections: Info, Quartus Fit Summary, Estimated Resource Usage, and Compile Warnings.
Info
- Name of the project
- Target FPGA family and device
- Intel? Quartus? Prime version
- HLS compiler version
- The command that was used to compile the design
- The date and time at which the reports were generated
Quartus Fit Summary
- Quartus Fit Clock Summary
- Quartus Fit Resource Utilization Summary
The Quartus Fit Clock Summary section shows the maximum clock frequencies that can be achieved for the design.
The Quartus Fit Resource Utilization Summary section shows the total area utilization both for the entire design, and for each component individually. There is no breakdown of area information by source line.
Estimated Resource Usage
The Estimated Resource Usage section shows a summary of the estimated resources used by each component in your design, as well as the total resources used for all components.
Compile Warnings
The Compile Warnings section shows some of the compiler warnings generated during the compilation.
Reviewing Loop Information
The High Level Design Report ( <result>.prj/reports/report.html ) file contains information about all the loops in your design and their unroll statuses. This loop analysis report helps you examine whether the Intel? HLS Compiler is able to maximize the throughput of your component.
- #pragma unroll
For details about #pragma unroll, see "Loop Unrolling (unroll Pragma)" in Intel? High Level Synthesis Compiler Reference Manual.
- #pragma loop_coalesce
For details about #pragma loop_coalesce, see "Loop Coalescing (loop_coalesce Pragma)" in Intel? High Level Synthesis Compiler Reference Manual.
- #pragma ii
For details about #pragma ii, see "Loop Initiation Interval (ii Pragma)" in Intel? High Level Synthesis Compiler Reference Manual.
- Click View reports > Loop Analysis .
- In the analysis pane, select Show fully unrolled loops to obtain information about the loops in your design.
- Consult the flowchart below to identify actions you can take to improve the throughput of your design.Remember: II refers to the initiation interval of a loop, which is the launch frequency of a new loop iteration. An II value of 1 is ideal; it indicates that the pipeline is functioning at maximum efficiency because the pipeline can process a new loop iteration every clock cycle.
Loop Analysis Example
Consider the following example code snippet for transpose_and_fold.cpp:
01: #include "HLS/hls.h"
02: #include <stdio.h>
03: #include <stdlib.h>
04:
05: #define SIZE 32
06:
07: typedef ihc::stream_in<int> my_operand;
08: typedef ihc::stream_out<int> my_result;
09:
10: component void transpose_and_fold(my_operand &data_in, my_result &res)
11: {
12: int i;
13: int j;
14: int in_buf[SIZE][SIZE];
15: int tmp_buf[SIZE][SIZE];
16: for (i = 0; i < SIZE * SIZE; i++) {
17: in_buf[i / SIZE][i % SIZE] = data_in.read();
18: tmp_buf[i / SIZE][i % SIZE] = 0;
19: }
20:
21: #ifdef USE_IVDEP
22: #pragma ivdep safelen(SIZE)
23: #endif
24: for (j = 0; j < SIZE * SIZE * SIZE; j++) {
25: #pragma unroll
26: for (i = 0; i < SIZE; i++) {
27: tmp_buf[j % SIZE][i] += in_buf[i][j % SIZE];
28: }
29: }
30: for (i = 0; i < SIZE * SIZE; i++) {
31: res.write(tmp_buf[i / SIZE][i % SIZE]);
32: }
33: }

The transpose_and_fold component has four loops. The loop analysis report shows that the compiler performed different kinds of loop optimizations:
- The loop on line 24 is fully unrolled, as defined by #pragma unroll.
- The loops on lines 16 and 30 are pipelined with an II value of 1.
The Block1.start loop in the loop analysis report is not present in the code. It is an implicit infinite loop that the compiler adds to allow the component to run continuously, instead of only once.
Reviewing Your Component Area Usage
The High Level Design Report (report.html) provides a detailed breakdown of the estimated FPGA area usage. It also provides feedback on key hardware features such as private memory configuration.
The estimated area usage information correlates with, but does not necessarily match, the resource usage results from the Intel? Quartus? Prime software. Use the estimated area usage to identify parts of the design with large area overhead. You can also use the estimates to compare area usage between different designs. Do not use the estimated area usage information for final resource utilization planning.
- Quartus Fit Clock Summary
- Quartus Fit Resource Utilization Summary
The Quartus Fit Clock Summary section shows the maximum clock frequencies that can be achieved for the design.
The Quartus Fit Resource Utilization Summary section shows the total area utilization both for the entire design, and for each component individually. There is no breakdown of area information by source line.
Before compiling your design with Intel? Quartus? Prime software, the High Level Design Report looks like the following example:

After compiling your design with Intel? Quartus? Prime software, the High Level Design Report looks like the following example:

Area Analysis Example
Area Analysis by Source
Area analysis by source shows an approximation of how each line of the source code affects area. In the area analysis by source view, the report shows the area hierarchically.

The System entry in the area report refers to all the components in the design. Expanding the System entry allows you to view all the components in the design. In this example, there is only one component (that is, transpose_and_fold).
Each line in the report contains state and corresponding information. In the figure below, the example area report shows that on line 17, where a stream of data is stored to in_buf, the consumed area is used for computing the pointer value and then storing it. On line 14, area consumption is a result of in_buf using 16 RAM blocks and some logic.

Area Analysis by System
Area analysis of system shows an area breakdown that is closest to the actual hardware implemented in the FPGA.

Viewing Your Component Design
Reviewing Your Component Interfaces
Some interface arguments in your component can be marked as being stable. A stable interface argument is an argument that does not change while your component executes, but the argument might change between component executions.
In the Component Viewer report, a stable node does not have any edge connection.
Default Interface Arguments
#include "HLS/hls.h"
#include "stdio.h"
struct coordinate_t {
int x;
int y;
};
component int default_comp(int b, coordinate_t p) {
return b + p.x;
}

For each default interface argument node, you can view details about the node when you hover over the node:
Pointer, Pass-By-Reference, and Avalon? MM Master Interface Arguments
#include "HLS/hls.h"
#include "stdio.h"
component int master_comp(
int *pointer_d,
ihc::mm_master<int, ihc::aspace<3>, ihc::awidth<4>, ihc::dwidth<32>,ihc::latency<1>, ihc::align<4> > &master_i,
int &result
)
{
result = *pointer_d + *master_i;
return result;
}
- Stable
- Describes whether the interface argument is stable.
- Data width
- The width of the memory-mapped data bus in bits.
- Address width
- The width of the memory-mapped address bus in bits.
- Latency
- The guaranteed latency from when the read command exits the component to when the external memory returns valid read data.
- Maximum burst
- The maximum number of data transfers that can associate with a read or write transaction. For fixed latency interfaces, this value is set to 1.
- Alignment
- The byte alignment of the base pointer address. The Intel? HLS Compiler uses this information to determine the amount of coalescing that is possible for loads and stores to this pointer.
Avalon? MM Slave Register Interface Arguments
#include "HLS/hls.h"
#include "stdio.h"
component int slavereg_comp(
int hls_avalon_slave_register_argument slave_scalar_f,
int* hls_avalon_slave_register_argument slave_pointer_g
) {
return slave_scalar_f + *slave_pointer_g;
}
The resulting memory map is described in the automatically generated header file <component_name>_csr.h. This header file is available in the menu in the source editor. Clicking on the CSR container node in the Component Viewer report also opens up the header file:
#include "HLS/hls.h"
#include "stdio.h"
hls_avalon_slave_component
component int slavereg_comp(
int hls_avalon_slave_register_argument slave_scalar_f,
int* hls_avalon_slave_register_argument slave_pointer_g
) {
return slave_scalar_f + *slave_pointer_g;
}
Avalon? MM Slave Memory Interface Arguments
#include "HLS/hls.h"
#include "stdio.h"
hls_avalon_slave_component
component int slavemem_comp(
hls_avalon_slave_memory_argument(4096) int* slave_mem_h,
int index,
int hls_avalon_slave_register_argument slave_scalar_f
) {
return slave_mem_h[index] * slave_scalar_f;
If you look at the same Avalon? MM slave memory interface in the Component Memory Viewer report, the same<slave memory name> LD/ST node is shown to be connected to an external RW port.
Avalon? Streaming Interface Arguments
#include "HLS/hls.h"
#include "stdio.h"
component int stream_comp(
ihc::stream_in<int> &stream_in_c,
ihc::stream_out<int> &stream_out_e,
int scalar_b
) {
stream_out_e.write(scalar_b + 1);
return stream_in_c.read() + scalar_b * 2;
}
- Width
- The width of the data bus in bits.
- Depth
- The depth of the stream.
- Bits per symbol
- Describes how the data is broken into symbols on the data bus.
- Uses Packets
- Indicates whether the interface exposes the startofpacket and endofpacket sideband signals on the stream interfaces. The signals can be access by the packet-based reads and writes.
- Uses Valid
- (stream_in) Indicates whether a valid signal is present on the stream interface. When Yes, the upstream source must provide valid data on every cycle that ready is asserted.
- Uses Reader
- (stream_in) Indicates whether a ready signal is present on the stream interface. When Yes, the downstream sink must be able to accept data on every cycle that valid is asserted.
Reviewing Memory Replication and Stallable LSU Information
Consider the following code excerpt from the transpose_and_fold component (part of the tutorial files provided in <QPDS_installdir>/hls/examples/tutorials/loop_memory_dependency):
01 #include "HLS/hls.h"
02 #include "stdio.h"
03 #include "stdlib.h"
04
05 #define SIZE 32
06
07 typedef altera::stream_in<int> my_operand;
08 typedef altera::stream_out<int> my_result;
09
10 void transpose_and_fold(my_operand &a, my_operand &b, my_result &c)
11 {
12 int i;
13 int j;
14 int a_buf[SIZE][SIZE];
15 int b_buf[SIZE][SIZE];
16 for (i = 0; i < SIZE * SIZE; i++) {
17 a_buf[i / SIZE][i % SIZE] = a.read();
18 b_buf[i / SIZE][i % SIZE] = b.read();
19 }
20 #ifdef USE_IVDEP
21 #pragma ivdep
22 #endif
23 for (j = 0; j < SIZE * SIZE * SIZE; j++) {
24 #pragma unroll
25 for (i = 0; i < SIZE; i++) {
26 b_buf[j % SIZE][i] += a_buf[i][j % SIZE];
27 }
28 }
29 for (i = 0; i < SIZE * SIZE; i++) {
30 c.write(b_buf[i / SIZE][i % SIZE]);
31 }
32 }
The figure below shows that Block3 on line 23 is highlighted in red to prompt you to review the loop. Because loop analysis of Block3 shows that it is a pipelined loop with an II value of 2, the loop pipeline might affect the throughput of your design. The Component Viewer shows that the II value is caused by a memory dependency on loads to the b_buf variable.

By hovering your mouse over a node, you can view the tooltip and details that provide more information on the LSU. In the figure below, the tooltip shows information like the latency of the load is 6, and the LSU is stall-free.

The Component Viewer allows you to select the type of connections you want to view. Selecting Controlinstructs the system viewer to display the connections between blocks and loops. Selecting Memory instructs the Component Viewer to display the connections to and from global and local memories. Selecting Streamsinstructs the system viewer to display the connections reading from and writing to streams.

Viewing Your Component Memory System
Data movement is often a bottleneck in many algorithms. The component memory viewer in the High Level Design Report (report.html) shows you how the Intel? High Level Synthesis (HLS) Compiler interprets the data connections across the memory system of your component . Use the Component Memory Viewer to help you identify data movement bottlenecks in your component design.
Also, some patterns in memory accesses can cause undesired arbitration in the load-store units (LSUs), which can affect the throughput performance of your component . Use the Component Memory Viewer to find where you might have unwanted arbitration in the LSUs.
- Memory List
- The Memory List pane shows you a hierarchy of components , memories in that component , and the corresponding memory banks.
Clicking a memory name in the list displays a graphical representation of the memory in the Componentmemory viewer pane. Also, the line in your code where you declared the memory is highlighted in the Source Code pane.
Clearing a check box for a memory bank collapses that bank in the Component Memory Viewer pane, which can help you to focus on specific memory banks when you view a complex memory design. By default, all banks in component memory are selected and shown in the Component Memory Viewer pane.
- Component Memory Viewer
- The Component Memory Viewer pane shows you connections between loads and stores to specific logical ports on the banks in a memory system. The following types of nodes might be shown in the Component Memory Viewer pane, depending on the component memory system:
- Memory node: The component memory.
- Bank node: A bank in the memory. Only banks selected in the Memory List pane are shown. Select banks in the Memory List pane to help you focus on specific on memory banks when you view a complex memory design.
- Port node: The logical port for a bank. There are three types of port:
- R: A read-only port
- W: A write-only port
- RW: A read and write port
- LSU node: A store (ST) or load (LD) node connected to the memory.
- Arbitration node: An arbitration (ARB) node shows that LSUs compete for access to a shared port node,which can lead to stalls.
- Port-sharing node: A port-sharing node (SHARE) shows that LSUs have mutually exclusive access to a shared port node, so the load-store units are free from stalls.
Hover over any node to view the attributes of that node.
Hover over an LSU node to highlight the path from the LSU node to all of the ports that the LSU connects to.
Hover over a port node to highlight the path from the port node to all of the LSUs that store to the port node.
Click on a node to select it and have the node attributes displayed in the Details pane.
- Details
- The Details pane shows the attributes of the node selected in the Component Memory Viewer pane. For example, when you select a memory in a component , the Details pane shows information such as the width and depths of the memory banks, as well as any user-defined HLS attributes that you specified in your source code.
The content of the Details pane persists until you select a different node in the Component Memory Viewer pane.
Reviewing Your Component Verification Results
The verification statistics report becomes available after you simulate your component.
- The data presented in the verification statistics report might be dependent on the input values to the component from the test bench.
- The verification statistics report only reports the component loop initiation interval (II) values and throughput for enqueued invocations.
The following example verification statistics report is for a component dut that has been run once as a simple function call and 100 times as an enqueued invocation:
For components that use explicit streams, such as ihc::stream_in<> or ihc::stream_out<>, the verification statistics report also provides the throughput for each individual stream, as shown in the details pane:
Document Revision History
Date | Version | Changes |
---|---|---|
December 2017 | 2017.12.22 |
|
November 2017 | 2017.11.06 |
|
June 2017 | 2019.06.23 |
|
June 2017 | 2017.06.09 |
|
February 2017 | 2017.02.03 |
|
November 2016 | 2016.11.30 |
|
September 2016 | 2016.09.12 | Initial release. |