Example 1: Matrix multiplication (Single function)
==============================
In this section, we'll explore the end-to-end workflow of RealProbe with a Vitis HLS project that implements a basic matrix multiplication kernel. This example is from `Kastner et al.'s excellent Parallel Programming for FPGAs github repository `_. The tutorial example setup includes HLS source files (matrixmultiplication.cpp), testbench files (matrixmultiplication-top.cpp), and data files (matrixmultiplication.gold.dat), along with a pre-configured hls.tcl script.
Preparing Vitis HLS tcl Script
--------------------
Vitis HLS can be operated through the GUI or via a batch-mode Tcl script. Below is a breakdown of the hls.tcl script structure, crucial for RealProbe as it reads ``solution_name``, ``project_name``, and ``target_device`` for future bitstream generation:
1. Define Variables
.. code-block::
set solution_name solution1
set project_name project
set target_device {xc7z020clg400-1}
2. Setup Project
.. code-block::
open_project -reset $project_name
add_files matrixmultiplication.cpp
add_files -tb matrixmultiplication-top.cpp
add_files -tb matrixmultiplication.gold.dat
set_top matrixmul
open_solution -reset $solution_name
set_part $target_device
create_clock -period 10 -name default
3. Run Simulations and Synthesis
.. code-block::
csim_design -code_analyzer -clean
csynth_design
export_design -format ip_catalog
cosim_design -trace_level all
close_project
.. note::
Ensure that the variable names for ``solution_name``, ``project_name``, and ``target_device`` remain unchanged. The Tcl script name must also be kept as ``hls.tcl``
Run RealProbe
--------------------
To integrate RealProbe and override the default Vitis HLS functions, execute the following command:
.. code-block::
make realprobe
Deploy on FPGA
--------------------
Post RealProbe execution, a directory named ``FPGA`` will be generated within your project directory containing all necessary files for on-board execution. This includes an automatically generated Jupyter Notebook file (excluding software implementation for functional verification but incorporating RealProbe results).
On the Synestia Pynq-Z2 FPGA Jupyter server, navigate to the ``FPGA`` directory in the project folder (accessible via both Synestia desktop and Pynq-Z2 board) and execute the commands in the notebook using ``Shift + Enter``.
RealProbe Output Results
--------------------
When running the RealProbe output section in the notebook, you'll observe the results as shown below:
.. image:: ../img/ex1_realprobe_output.png
:alt:
Compare with Co-sim results
--------------------
RealProbe recorded a total of 103,830 cycles for the operation. To contrast, let's review the Co-simulation results, which do not provide internal cycle counts but do report total latency for the top module. Refer to the report found at ``$project_name/$solution_name/sim/report/$topmodule_name_cosim.rpt``, showing 50,842 clock cycles.
.. image:: ../img/cosim_rpt.png
:alt:
This discrepancy highlights a -104.2% difference between the Co-simulation and actual on-FPGA results, emphasizing the importance of RealProbe in understanding true FPGA performance.
.. note::
Even though Co-simulation does not provide cycle counts per module, its waveform can be examined for detailed timing analysis. Below is a waveform snapshot from this matrix multiplication example, marked with the start and end of the top function. Using the set 10ns clock cycle, the timing is calculated, resulting in a close approximation to the reported cycle count.
.. image:: ../img/ex1_waveform.png
:alt: