New improvements for the ADI DMAC IP:
1)The capability to manually overwrite the DMA_AXI_ADDR_WIDTH(from GUI or from tcl)
2)DMA_AXI_ADDR_WIDTH attribute is now visible in the Vivado GUI:
-"Auto mode": Automatically calculated by the core tcl files based on the existing attached address segments.
-"Manual mode": Specify the desired dma_width between 32-64 bits.
3)Added two new debug registers that return higher part of the current source/destination address.
Signed-off-by: Filip Gherman <Filip.Gherman@analog.com>
FPGAs support different widths for the read and write port of the block
SRAM cells. The DMAC can make use of this feature when the source and
destination interface have a different width to up-size/down-size the data
bus.
Using memory cells with asymmetric port width consumes the same amount of
SRAM cells, but allows to bypass the re-size blocks inside the DMAC that
are otherwise used for up- and down-sizing. This reduces overall resource
usage and can improve timing.
If the ratio between the destination and source port is too larger to be
handled by SRAM alone the SRAM block will be configured to do partial up-
or down-sizing and a resize block will be inserted to take care of the
remaining up-/down-sizing. E.g. if a 256-bit interface is connected to a
32-bit interface the SRAM will be used to do an initial resizing of 256 bit
to 64 bit and a resize block will be used to do the remaining resizing from
64 bit to 32 bit.
Currently this feature is disabled for Intel FPGAs since Quartus does not
properly infer a block RAM with different read and write port widths from
the current ad_asym_mem module. Once that has been resolved support for
asymmetric memories can also be enabled in the DMAC.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Data mover/ src axis changes
Request rewind ID if TLAST received during non-last burst
Consume (ignore) descriptors until last segment received
Block descriptors towards destination until last segment received
Request generator changes
Rewind the burst ID if rewind request received
Consume (ignore) descriptors until last segment received
If TLAST happened on last segment replay next transfer (in progress or
completed) with the adjusted ID
Create completion requests for ignored segments
Response generator changes
Track requests
Complete segments which got ignored
Length of partial transfers are stored in a queue for SW reads.
The presence of partial transfer is indicated by a status bit.
The reporting can be enabled by a control bit.
The progress of any transfer can be followed by a debug register.
Reduce the width of ID signals to avoid size mismatches in Arria 10 SoC
projects where the ID width of the hard IP is 4.
The width of ID that reaches the slave can be increased by the interconnect if
multiple masters access the slave so we end up with mismatches.
Since these signals are unused it is safe to reduce them to minimum width and
let the interconnect zero-extend them as required.
The buffers inside the interconnect are sized based on maximum burst sizes
the masters can produce.
For AXI4 the max burst size is 128 but for these projects for the
default burst size of 128 bytes the DMACs are creating only burst of 8 or
16 beats depending on the bus width (128bits and 64 bits respectively).
These burst sizes can fit in the AXI3 protocol where the max burst
length is 16. Therefore the interconnect will be reduced.
The observed reduction is around 4 Mb of block RAM per project.
Another benefit is a better timing closure,
since these buffers reside in the DDR3 clock domain.
Vivado recognises .h files as C header files,
the expected extension for Verilog Header is .vh
This causes issues in simulating block designs since these files
won't be exported for the simulation even if they are
part of the simulation fileset.
This change adds a diagnostic interface to the DMAC core.
The interface exposes internal information about the core,
information which can't be exposed through AXI registers
due the latency and update rate.
Such information is the fullness of the internal buffer.
For this is exposed in bursts and is driven from the destination
clock domain, as this is reflected in its name.
The signal has a fixed size and is dimensioned by
taking in account the supported maximum number of bursts of 128.
This change adds the TLAST signal to the AXI streaming interface
of the source side for Intel targets.
Xilinx based designs already have this since the tlast is part of the
interface definition.
In order to make the signal optional and let the tool connect a
default value to the it, the USE_TLAST_SRC/DEST parameter is
added to the configuration UI. This conditions the tlast port on
the interface of the DMAC.
Xilinx handles the optional signals much better so the parameter
is not required there.
Currently the DMAC uses a simple FIFO as the store-and-forward buffer. The
FIFO handshaking is beat based whereas the remainder of the DMAC is burst
based. This means that additional control signals have to be combined with
the FIFO handshaking signal to generate the external handshaking signals.
Re-work the store-and-forward buffer to utilize a BRAM that is subdivided
into N segments. Where N is the maximum number of bursts that can be stored
in the buffer and each segment has the size of the maximum burst length.
Each segment stores the data associated with one burst and even when the
burst is shorter than the maximum burst length the next burst will be
stored in the next segment.
The new store-and-forward buffer takes care of generating all the
handshaking signals. This means handshaking is generated in a central place
and does not have to be combined from multiple data-paths simplifying the
overall logic.
The new store-and-forward buffer also takes care of data width up- and
down-sizing in case that the source and sink modules have a different data
width. This tighter integration will allow future enhancements like using
asymmetric memory.
This re-work lays the foundation of future enhancements to the DMA like
support for un-aligned transfers and early transfer abort which would have
been much more difficult to implement with the previous architecture.
In addition it significantly reduces the resource utilization of the
store-and-forward buffer and allows for better timing due to reduced
combinatorial path lengths.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
For the memory-mapped AXI read interface the slave asserts rlast for the
last beat in a burst.
This means we don't have to count the number of beats to know when the
burst is completed but instead can use rlast. This slightly reduces the
amount of resources needed for the MM-AXI source module and given that the
beat_counter is often the bottleneck timing wise this should also improve
the timing.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The DMAC allows a transfer to be aborted. When a transfer is aborted the
DMAC shuts down as fast as possible while still completing any pending
transactions as required by the protocol specifications of the port. E.g.
for AXI-MM this means to complete all outstanding bursts.
Once the DMAC has entered an idle state a special synchronization signal is
send to all modules. This synchronization signal instructs them to flush
the pipeline and remove any stale data and metadata associated with the
aborted transfer. Once all data has been flushed the DMAC enters the
shutdown state and is ready for the next transfer.
In addition each module has a reset that resets the modules state and is
used at system startup to bring them into a consistent state.
Re-work the shutdown process to instead of flushing the pipeline re-use the
startup reset signal also for shutdown.
To manage the reset signal generation introduce the reset manager module.
It contains a state machine that will assert the reset signals in the
correct order and for the appropriate duration in case of a transfer
shutdown.
The reset signal is asserted in all domains until it has been asserted for
at least 4 clock cycles in the slowest domain. This ensures that the reset
signal is not de-asserted in the faster domains before the slower domains
have had a chance to process the reset signal.
In addition the reset signal is de-asserted in the opposite direction of
the data flow. This ensures that the data sink is ready to receive data
before the data source can start sending data. This simplifies the internal
handshaking.
This approach has multiple advantages.
* Issuing a reset and removing all state takes less time than
explicitly flushing one sample per clock cycle at a time.
* It simplifies the logic in the faster clock domains at the expense of
more complicated logic in the slower control clock domain. This allows
for higher fMax on the data paths.
* Less signals to synchronize from the control domain to the data domains
The implementation of the pause mode has also slightly changed. Pause is
now a simple disable of the data domains. When the transfer is resumed
after a pause the data domains are re-enabled and continue at their
previous state.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Move the transfer logic, including the 2d module, into its own sub-module.
This allows testing of the full transfer logic independently of the
register map logic.
The top-level module now only instantiates the register map and transfer
module, but does not have any logic on its own.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
A larger store-and-forward memory provides better protection against worst
case memory interface latencies by being able to store more data before
over-/underflowing.
Based on empirical testing it was found that using a size of 4 bursts can
still result in underflows/overflows under certain conditions. These do not
happen when using a size of 8 bursts.
This change does not significantly increase resource consumption. Both on
Intel and Xilinx the block RAM has a minimum depth of 512 entries. With a
default burst length of 16 beats that allows for up to 32 bursts without
requiring additional block RAM.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The label for the store-and-forward memory size configuration option at the
moment is just "FIFO Size" and while the store-and-forward memory uses a
FIFO that is just a implementation detail.
Change the label to "Store-and-Forward Memory Size". This is more
descriptive as it references the function not the implementation.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
For correct operation the store-and-forward memory size must be a
power-of-two in the range of 2 to 32.
This is simple enough so we can list all values and let the IP Integrator
and QSYS perform proper validation of the parameter.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Split the register map code into a separate sub-module instead of having it
as part of the top-level axi_dmac.v file.
This makes it easier to component test the register map behavior
independently from the DMA transfer logic.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The MAX_BYTES_PER_BURST option allows to configure the maximum bytes that
are part of a burst. This can be an arbitrary value.
At the same time there is a limit of how many bytes can be supported by the
memory buses. A AXI3 interface supports a maximum of 16 beats per burst
and a AXI4 interface supports a maximum of 256 beats per burst.
At the moment the it is possible to specify a MAX_BYTES_PER_BURST value
that exceeds what can be supported by the AXI memory-mapped bus. If that is
the case undefined behavior will occur and the DMAC will function
incorrectly.
To avoid this make sure that the MAX_BYTES_PER_BURST value does not exceed
the maximum that can be supported by the interfaces.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
When the DMAC is used in async clock domains the data FIFO instantiate
an ad_mem component to handle properly the clock crossing.
For Intel, this mode is used only in FMCJESDADC1 designs but without this
an error could appear in other projects too if the user reconfigures the core.
Exposed AXI3 interface on the Intel version of the IP for UI and feature consistency.
Some of the signals that are defined as optional in the AMBA standard
are marked as mandatory in Qsys in case of AXI3. Because of this such signals
were added to the interface of the DMAC and driven with default values.
For Xilinx in order to keep existing behavior the newly added signals
are hidden from the interface.
New parameters are added to define the width of the AXI transaction IDs;
these are hidden from the UI; We can add them to the UI if the fixed size
of the IDs will cause port incompatibility issues.
The primary use-case of the DMA controller is in non-2D mode. Make this the
default, since allows projects to instantiate the controller with the
default configuration without having to explicitly disable 2D support.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Currently the axi_dmac_hw.tcl script does not create interfaces if they
are not used in the current configuration. This has the disadvantage that
the ports belonging to these interfaces are not included in the generated
HDL wrapper. Which will generate a fair bunch of warnings when synthesizing
the HDL.
Instead always generate all interfaces, but disable those that are not used
in the current configuration. This will make sure that the ports belonging
to these interfaces are properly tied-off in the generate wrapper HDL.
This reduces the amount of false positive warnings generated and makes it
easier to spot actual issues.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The external s_axi_{awaddr,araddr} signals that are connect to the core
have their width set according to the specified size of the register map.
If the s_axi_{awaddr,araddr} signal of the core is wider (as it currently
is for many cores) the MSBs of those signals are left unconnected, which
generates a warning.
To avoid this make sure that the signal width matches the declared register
map size.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The axi_dmac can issue up to FIFO_SIZE read and write requests in parallel.
This is done in order to maximize throughput and compensate for for
latency.
Set the {read,write}IssuingCapability properties accordingly on the AXI
master interfaces. Otherwise qsys might decide to insert bridges that
artificially limit the number of requests, which in turn might affect
performance.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Qsys allows to query to query the clock domain that is associated with a
clock input of a peripheral. This allows to automatically detect whether
the different clocks of the DMAC are asynchronous and CDC logic needs to be
inserted or not.
Auto-detection has the advantages that the configuration parameters don't
need to be set manually and the optional configuration will be choose
automatically. There is also less chance of error of leaving the settings
in a wrong configuration when e.g. the clock domains change.
In case the auto-detection should ever fail configuration options that
provide a manual overwrite are added as well.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Group configuration parameters by function, provide human readable labels
as well as specify the allowed ranges for each parameter.
This prevents accidental misconfiguration and also makes it easier to
inspect (or change) the configuration in the Qsys GUI.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Use the ad_ip_intf_s_axi helper function to create the axi4lite slave
interface for memory mapped peripherals. This slightly reduces the amount
of boilerplate code in the peripheral's *hw.tcl
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The axi_dmac can issue up to FIFO_SIZE read and write requests in parallel.
This is done in order to maximize throughput and compensate for for
latency.
Set the {read,write}IssuingCapability properties accordingly on the AXI
master interfaces. Otherwise qsys might decide to insert bridges that
artificially limit the number of requests, which in turn might affect
performance.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Move the CDC helper modules to a dedicated helper modules. This makes it
possible to reference them without having to use file paths that go outside
of the referencing project's directory.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>