This commit introduces a different interface to submit transfers, using
DMA descriptors.
The structure of the DMA descriptor is as follows:
struct dma_desc {
u32 flags,
u32 id,
u64 dest_addr,
u64 src_addr,
u64 next_sg_addr,
u32 y_len,
u32 x_len,
u32 src_stride,
u32 dst_stride,
};
The 'flags' field currently offers two control bits:
- bit 0: if set, the transfer will complete after this last descriptor
is processed, and the DMA core will go back to idle state; if cleared,
the next DMA descriptor pointed to by 'next_sg_addr' will be loaded.
- bit 1: if set, an end-of-transfer interrupt will be raised after the
memory segment pointed to by this descriptor has been transferred.
The 'id' field corresponds to an identifier of the descriptor.
The 'dest_addr' and 'src_addr' contain the destination and source
addresses to use for the transfer, respectively.
The 'x_len' field contains the number of bytes to transfer,
minus one.
The 'y_len', 'src_stride' and 'dst_stride' fields are only useful for
2D transfers, and should be set to zero if 2D transfers are not
required.
To start a transfer, the address of the first DMA descriptor must be
written to register 0x47c and the HWDESC bit of CONTROL register must
be set. The Scatter-Gather transfer is queued similarly to the simple
transfers, by writing 1 in TRANSFER_SUBMIT.
The Scatter-Gather interface has a dedicated AXI-MM bus configured for
read transfers, with its own dedicated clock, which can be asynchronous.
The Scatter-Gather reset is generated by the reset manager to reset the
logic after completing any pending transactions on the bus.
When the Scatter-Gather is enabled during runtime, the legacy cyclic
functionality of the DMA is disabled.
Signed-off-by: Ionut Podgoreanu <ionut.podgoreanu@analog.com>
Due to nets being optimized at IP-level during the no-OOC synthesis flow,
constraints related to req_clk (request clock) were not being applied,
causing the design to not meet timing.
The fix considers the synchronous modes, appending the possible resulting
req_clk's names after the synthesis flow.
Due to grounded signals in the DMA_TYPE_SRC != DMA_TYPE_STREAM_AXI config.,
sync_rewind is removed during synthesis, even so, constraints were
trying to be applied to those nets.
To resolve this, sync_rewind block was moved to inside the generate.
Vivado seems to properly suppress "Empty list" warnings when the circuit does not exist because of a generate rule.
Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Signed-off-by: Ionut Podgoreanu <ionut.podgoreanu@analog.com>
Data mover/ src axis changes
Request rewind ID if TLAST received during non-last burst
Consume (ignore) descriptors until last segment received
Block descriptors towards destination until last segment received
Request generator changes
Rewind the burst ID if rewind request received
Consume (ignore) descriptors until last segment received
If TLAST happened on last segment replay next transfer (in progress or
completed) with the adjusted ID
Create completion requests for ignored segments
Response generator changes
Track requests
Complete segments which got ignored
Drive the descriptor from the source side to destination
so we can abort consecutive transfers in case TLAST asserts.
For AXIS count the length of the burst and pass that value to the
destination instead the programmed one. This is useful when the
streams aborts early by asserting the TLAST. We want to notify the
destination with the right number of beats received.
For FIFO source interface reuse the same logic due the small footprint
even if the stream does not got interrupted in that case.
For MM source interface wire the burst length from the request side to
destination.
Currently the destination side request ID is synchronized response ID from
the source side. This signal is effectively the same as the synchronized
src ID inside the burst memory. The only difference is that they might not
increment in the exact same clock cycle.
Exporting the request ID from the burst memory means we can remove the extra
synchronizer block.
This has the added bonus that the request ID will increment in the same
clock cycle as when the data becomes available from the memory.
This means we can assume that when there is a outstanding burst request
indicated via the ID that data is available from the memory and vice versa
when data is available from the memory that there is a outstanding burst
request.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Currently the DMAC uses a simple FIFO as the store-and-forward buffer. The
FIFO handshaking is beat based whereas the remainder of the DMAC is burst
based. This means that additional control signals have to be combined with
the FIFO handshaking signal to generate the external handshaking signals.
Re-work the store-and-forward buffer to utilize a BRAM that is subdivided
into N segments. Where N is the maximum number of bursts that can be stored
in the buffer and each segment has the size of the maximum burst length.
Each segment stores the data associated with one burst and even when the
burst is shorter than the maximum burst length the next burst will be
stored in the next segment.
The new store-and-forward buffer takes care of generating all the
handshaking signals. This means handshaking is generated in a central place
and does not have to be combined from multiple data-paths simplifying the
overall logic.
The new store-and-forward buffer also takes care of data width up- and
down-sizing in case that the source and sink modules have a different data
width. This tighter integration will allow future enhancements like using
asymmetric memory.
This re-work lays the foundation of future enhancements to the DMA like
support for un-aligned transfers and early transfer abort which would have
been much more difficult to implement with the previous architecture.
In addition it significantly reduces the resource utilization of the
store-and-forward buffer and allows for better timing due to reduced
combinatorial path lengths.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The DMAC allows a transfer to be aborted. When a transfer is aborted the
DMAC shuts down as fast as possible while still completing any pending
transactions as required by the protocol specifications of the port. E.g.
for AXI-MM this means to complete all outstanding bursts.
Once the DMAC has entered an idle state a special synchronization signal is
send to all modules. This synchronization signal instructs them to flush
the pipeline and remove any stale data and metadata associated with the
aborted transfer. Once all data has been flushed the DMAC enters the
shutdown state and is ready for the next transfer.
In addition each module has a reset that resets the modules state and is
used at system startup to bring them into a consistent state.
Re-work the shutdown process to instead of flushing the pipeline re-use the
startup reset signal also for shutdown.
To manage the reset signal generation introduce the reset manager module.
It contains a state machine that will assert the reset signals in the
correct order and for the appropriate duration in case of a transfer
shutdown.
The reset signal is asserted in all domains until it has been asserted for
at least 4 clock cycles in the slowest domain. This ensures that the reset
signal is not de-asserted in the faster domains before the slower domains
have had a chance to process the reset signal.
In addition the reset signal is de-asserted in the opposite direction of
the data flow. This ensures that the data sink is ready to receive data
before the data source can start sending data. This simplifies the internal
handshaking.
This approach has multiple advantages.
* Issuing a reset and removing all state takes less time than
explicitly flushing one sample per clock cycle at a time.
* It simplifies the logic in the faster clock domains at the expense of
more complicated logic in the slower control clock domain. This allows
for higher fMax on the data paths.
* Less signals to synchronize from the control domain to the data domains
The implementation of the pause mode has also slightly changed. Pause is
now a simple disable of the data domains. When the transfer is resumed
after a pause the data domains are re-enabled and continue at their
previous state.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
This reverts commit 4b1d9fc86b "axi_dmac: Modified in order to avoid
vivado crash".
Vivado no longer crashes and this structure is much more efficient when it
comes to resource usage and timing. The intention here is to create a 1-bit
memory that is N entries deep and not a N bit signal.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The debug registers are useful during development but are rarely used in a
production design. Add a option that allows to disable them, this reduces
the resource utilization of the DMAC.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Replace "PRIMITIVE_SUBGROUP == flop" with "IS_SEQUENTIAL" as the former is
series7 specific while the later works on all platforms. This fixes the
axi_dmac timing constraints for ultrascale based platforms.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
When all clocks are synchronous there are no synchronizers and the
constraint for the CDC registers can't find any cells which generates a
warning. To avoid this don't add CDC constraints when all the clocks are
synchronous.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
According to the documentation when using a BRAM block in SDP mode the
REGCEB pin is not used and should be connected to GND. The tools though
when inferring a BRAM connect REGCEB to the same signal REGCEA. This causes
issues with timing verification since the REGCEB pin is associated with the
write clock whereas the REGCEA pin is associated with the read clock.
Until this is fixed in the tools mark all paths to the REGCEB pin as false
paths.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
When having multiple DMA cores sharing the same constraint file Vivado
seems to apply the constraints from the first core to all the other cores
when re-running synthesis and implementation from within the Vivado GUI.
This causes wrong timing constraints if the DMA cores have different
configurations. To avoid this issue use a TTCL template that generates a
custom constraint file for each DMA core instance.
This also allows us to drop the asynchronous clock detection hack from the
constraint file and move it to the template and only generate the CDC
constraints if the clock domains are asynchronous.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>