This simplifies the burst length in the response manager significantly
while not costing much extra resources in the burst memory.
This change will also enable other future improvements.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
One of the major features of the DMAC is being able to handle non matching
interface widths for the destination and source side.
Currently the test benches only support the case where the width for the
source and the destination side are the same. Extend them so that it is
possible to also test and verify setups where the width is not the same.
To accomplish this each byte memory location is treated as if it contained
the lower 8 bytes of its address. And then the written/read data is
compared to the expected data based on that.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
On Arria10 there are 6 transceivers in a single bank. If more than 6
transceivers are used these will end up in multiple banks.
The ATX PLL can directly connect to the transceivers in the same bank
through the 1x clock network. To connect to transceivers in another bank it
has to go through a master clock generation block (MCGB) and the xN clock
network.
Add support for instantiating the MCGB if more than 6 lanes are used. In
this case the first 6 transceivers will still have a direct connection to
the PLL while all other transceivers will be clocked by the MCGB.
Note that this requires that the first 6 transceivers are all in the same
bank.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
All projects have been updated to use the new pack/unpack infrastructure.
The old util_cpack and util_upack cores are now unused an can be removed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The util_cpack2 core is similar to the util_upack core. It packs, or
interleaves, a data from multiple ports into a single data. Ports can
optionally be enabled or disabled.
On the input side the cpack2 core uses a multi-port FIFO interface. There
is a single data write signal (fifo_wr_en) for all ports. But each port can
be individually enabled or disabled using the enable signals.
On the output side the cpack2 core uses a single port FIFO interface. When
data is available on the output interface the data write signal
(packed_fifo_wr_en). Data on the packed_fifo_wr_data signal is only valid
when packed_fifo_wr_en is asserted. At other times the content is
undefined. The cpack2 core offers no back-pressure. If data is not consumed
when it is made available it will be lost.
Data from the input ports is accumulated inside the cpack2 core and if
enough data is available to produce a full output vector the data is
forwarded.
This core is build using the common pack infrastructure. The core that is
specific to the cpack2 core is mainly only responsible for generating the
control signals for the external interfaces.
The core is accompanied by a test bench that verifies correct behavior for
all possible combinations of enable masks.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The util_upack2 core is similar to the util_upack core. It unpacks, or
deinterleaves, a data stream onto multiple ports.
The upack2 core uses a streaming AXI interface for its data source instead
of a FIFO interface like the upack core uses.
On the output side the upack2 core uses a multi-port FIFO interface. There
is a single data request signal (fifo_rd_en) for all ports. But each port
can be individually enabled or disabled using the enable signals.
This modified architecture allows the upack2 core to better generate the
valid and underflow control signals to indicate whether data is available
in a response to a data request.
If fifo_rd_en is asserted and data is available the fifo_rd_valid signal
are asserted in the following clock cycle. The enabled fifo_rd_data ports
will be contain valid data during the same clock cycle as fifo_rd_valid is
asserted. During other clock cycles the output data is undefined. On
disabled ports the data is always undefined.
If no data is available instead the fifo_rd_underflow signal is asserted in
the following clock cycle and the output of all fifo_rd_data ports is
undefined.
This core is build using the common pack infrastructure. The core that is
specific to the upack2 core is mainly only responsible for generating the
control signals for the external interfaces.
The core is accompanied by a test bench that verifies correct behavior for
all possible combinations of enable masks.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Pack and unpack operations are very similar in structure as such it makes
sense for pack and unpack core to share a common infrastructure.
The infrastructure introduced in this patch is based on a routing network
which can implement the pack and unpack operations and grows with a
complexity of N * log(N) where N is the number of channels times the number
of samples per channel that are process in parallel.
The network is constructed from a set of similar stages composed of either
2x2 or 4x4 switches. Control signals for the switches are fully registered
and are generated one cycle in advance.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Add support for Vivado's simulator. By default the run script is using
the Icarus simulator.
If the user want to switch to another simulator, it can be explicitly
specify the required simulator tool in the SIMULATOR variable.
Currently, beside Icarus, Modelsim (SIMULATOR="modelsim") and Vivado's
xsim (SIMULATOR="xsim") is supported.
For consistent simulation behavior it is recommended to annotate all source
files with a timescale. Add it to those where it is currently missing.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
By default inferred output reset signals have an active low polarity. The
axi_ad9361 rst output signal is active high though. Currently when
connecting it to a input reset with active high polarity will generate an
error in IPI.
Fix this by explicitly marking the polarity of the rst signal as active
high.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Replace the open-coded instances of a perfect shuffle in the DAC framer with
the new helper module.
Using the helper module gives well defined semantics and hopefully makes
the code easier to understand.
There are no changes in behavior.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The perfect shuffle is a common operation in data processing. Add a shared
module that implements this operation.
Having this in a shared module rather than open-coding every instance makes
sure that there are clear and well defined semantics associated with the
operation that are the same each time. This should ease review, maintenance and
understanding of the code.
The perfect shuffle splits the input vector into NUM_GROUPS groups and then
each group in WORDS_PER_GROUP. The output vector consists of
WORDS_PER_GROUP groups and each group has NUM_GROUPS words. The data is
remapped, so that the i-th word of the j-th word in the output vector is
the j-th word of the i-th group of the input vector.
The inverse operation of the perfect shuffle is the perfect shuffle with
both parameters swapped.
I.e. [perfect_suffle B A [perfect_shuffle A B data]] == data
Examples:
NUM_GROUPS = 2, WORDS_PER_GROUP = 4
[A B C D a b c d] => [A a B b C c D d]
NUM_GROUPS = 4, WORDS_PER_GROUP = 2
[A a B b C c D d] => [A B C D a b c d]
NUM_GROUPS = 3, WORDS_PER_GROUP = 2
[A B a b 1 2] => [A a 1 B b 2]
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The write logic (DMA side) has to be independent from the read logic (DAC side).
In general the FIFO is always ready for the DMA, and every DMA transaction will
interrupt the read-back process, and the module will stop sending data,
until the initialization is finished.
Bringing back the write address tot he DMA clock domain is totally
redundant, so delete it.
Expose the TX configurable driver ports, more specifically the
TX_DIFFCTRL, TX_POSTCURSORE and TX_PRECURSORE for software. This
provides a soft tunning capability of the transmit side of the
transceivers, in cases where the insertion loss of the channel is too
high or low, comparing to the default value supported by the default
configuration of the GTs.
You can find information about these configuration ports under the
section called 'TX Configurable Driver' in the GT transceivers user
guide. (UG476, UG576)
This commit does not contain any functional modification.
Because the wizard generates the attributes in binary, we should use
binary mode too, so we can compare different configurations more easily.
If the req_valid asserts faster than the ID gets synchronized over we
assert the xfer request without being ready to accept data.
This can lead to overflow assertion when using a FIFO like interface.
Data mover/ src axis changes
Request rewind ID if TLAST received during non-last burst
Consume (ignore) descriptors until last segment received
Block descriptors towards destination until last segment received
Request generator changes
Rewind the burst ID if rewind request received
Consume (ignore) descriptors until last segment received
If TLAST happened on last segment replay next transfer (in progress or
completed) with the adjusted ID
Create completion requests for ignored segments
Response generator changes
Track requests
Complete segments which got ignored
Length of partial transfers are stored in a queue for SW reads.
The presence of partial transfer is indicated by a status bit.
The reporting can be enabled by a control bit.
The progress of any transfer can be followed by a debug register.
Drive the descriptor from the source side to destination
so we can abort consecutive transfers in case TLAST asserts.
For AXIS count the length of the burst and pass that value to the
destination instead the programmed one. This is useful when the
streams aborts early by asserting the TLAST. We want to notify the
destination with the right number of beats received.
For FIFO source interface reuse the same logic due the small footprint
even if the stream does not got interrupted in that case.
For MM source interface wire the burst length from the request side to
destination.
The constraint for the synchronizer that synchronizes the sync_status
signal of the link only works correctly for the first link. For other links
no timing exception is applied, which leads to timing failures.
Fix this by using a wildcard constraint for the synchronizer reg number.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>