Current implementation does not supports updated versions of Vivado
e.g. 2017.4.1 or 2018.2.1
This fix ignores the update number from the version checking.
Registers from this component can fit in the 2k address range.
Since Vivado's minimal address range is 4k, use that instead.
This will allow placing the independent TPLs to base addresses
that mach the addresses from the monolithic blocks ensuring no software
intervention.
The DMAC has the requirement that the length of the transfer is aligned to
the widest interface width. E.g. if the widest interface is 256 bit or 32
bytes the length of the transfer needs to be a multiple of 32.
This restriction can be relaxed for the memory mapped interfaces. This is
done by partially ignoring data of a beat from/to the MM interface.
For write access the stb bits are used to mask out bytes that do not
contain valid data.
For read access a full beat is read but part of the data is discarded. This
works fine as long as the read access is side effect free. I.e. this method
should not be used to access data from memory mapped peripherals like a
FIFO.
This means that for example the length alignment requirement of a DMA
configured for a 64-bit memory and a 16-bit streaming interface is now only
2 bytes instead of 8 bytes as before.
Note that the address alignment requirement is not affected by this. The
address still needs to be aligned to the width of the MM interface that it
belongs to.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
FPGAs support different widths for the read and write port of the block
SRAM cells. The DMAC can make use of this feature when the source and
destination interface have a different width to up-size/down-size the data
bus.
Using memory cells with asymmetric port width consumes the same amount of
SRAM cells, but allows to bypass the re-size blocks inside the DMAC that
are otherwise used for up- and down-sizing. This reduces overall resource
usage and can improve timing.
If the ratio between the destination and source port is too larger to be
handled by SRAM alone the SRAM block will be configured to do partial up-
or down-sizing and a resize block will be inserted to take care of the
remaining up-/down-sizing. E.g. if a 256-bit interface is connected to a
32-bit interface the SRAM will be used to do an initial resizing of 256 bit
to 64 bit and a resize block will be used to do the remaining resizing from
64 bit to 32 bit.
Currently this feature is disabled for Intel FPGAs since Quartus does not
properly infer a block RAM with different read and write port widths from
the current ad_asym_mem module. Once that has been resolved support for
asymmetric memories can also be enabled in the DMAC.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The handling of the src_data_valid_bytes signal and its related signal is
tightly coupled to the behavior of the resize_src module. The code that
handles it makes assumptions about the internal behavior of the resize_src
module.
Move the handling of the src_data_valid_bytes signal when upsizing the data
bus into the resize_src module so that all the code that is related is in
the same place and the code outside of the module does not have to care
about the internals.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The DMA_LENGTH_ALIGN LSBs of all length For the most part the tools are
able to deduce this using constant propagation.
But this propagation does not work across the asynchronous meta data FIFO
in the burst memory module.
Add a DMA_LENGTH_ALIGN parameter to the burst_memory module which is used
to explicitly keep the LSBs of length registers on the destination side
fixed at 1'b1. This reduces resource use and improves timing by allowing
better constant propagation and unused logic elimination.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
This simplifies the burst length in the response manager significantly
while not costing much extra resources in the burst memory.
This change will also enable other future improvements.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
One of the major features of the DMAC is being able to handle non matching
interface widths for the destination and source side.
Currently the test benches only support the case where the width for the
source and the destination side are the same. Extend them so that it is
possible to also test and verify setups where the width is not the same.
To accomplish this each byte memory location is treated as if it contained
the lower 8 bytes of its address. And then the written/read data is
compared to the expected data based on that.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
On Arria10 there are 6 transceivers in a single bank. If more than 6
transceivers are used these will end up in multiple banks.
The ATX PLL can directly connect to the transceivers in the same bank
through the 1x clock network. To connect to transceivers in another bank it
has to go through a master clock generation block (MCGB) and the xN clock
network.
Add support for instantiating the MCGB if more than 6 lanes are used. In
this case the first 6 transceivers will still have a direct connection to
the PLL while all other transceivers will be clocked by the MCGB.
Note that this requires that the first 6 transceivers are all in the same
bank.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
All projects have been updated to use the new pack/unpack infrastructure.
The old util_cpack and util_upack cores are now unused an can be removed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The util_cpack2 core is similar to the util_upack core. It packs, or
interleaves, a data from multiple ports into a single data. Ports can
optionally be enabled or disabled.
On the input side the cpack2 core uses a multi-port FIFO interface. There
is a single data write signal (fifo_wr_en) for all ports. But each port can
be individually enabled or disabled using the enable signals.
On the output side the cpack2 core uses a single port FIFO interface. When
data is available on the output interface the data write signal
(packed_fifo_wr_en). Data on the packed_fifo_wr_data signal is only valid
when packed_fifo_wr_en is asserted. At other times the content is
undefined. The cpack2 core offers no back-pressure. If data is not consumed
when it is made available it will be lost.
Data from the input ports is accumulated inside the cpack2 core and if
enough data is available to produce a full output vector the data is
forwarded.
This core is build using the common pack infrastructure. The core that is
specific to the cpack2 core is mainly only responsible for generating the
control signals for the external interfaces.
The core is accompanied by a test bench that verifies correct behavior for
all possible combinations of enable masks.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The util_upack2 core is similar to the util_upack core. It unpacks, or
deinterleaves, a data stream onto multiple ports.
The upack2 core uses a streaming AXI interface for its data source instead
of a FIFO interface like the upack core uses.
On the output side the upack2 core uses a multi-port FIFO interface. There
is a single data request signal (fifo_rd_en) for all ports. But each port
can be individually enabled or disabled using the enable signals.
This modified architecture allows the upack2 core to better generate the
valid and underflow control signals to indicate whether data is available
in a response to a data request.
If fifo_rd_en is asserted and data is available the fifo_rd_valid signal
are asserted in the following clock cycle. The enabled fifo_rd_data ports
will be contain valid data during the same clock cycle as fifo_rd_valid is
asserted. During other clock cycles the output data is undefined. On
disabled ports the data is always undefined.
If no data is available instead the fifo_rd_underflow signal is asserted in
the following clock cycle and the output of all fifo_rd_data ports is
undefined.
This core is build using the common pack infrastructure. The core that is
specific to the upack2 core is mainly only responsible for generating the
control signals for the external interfaces.
The core is accompanied by a test bench that verifies correct behavior for
all possible combinations of enable masks.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Pack and unpack operations are very similar in structure as such it makes
sense for pack and unpack core to share a common infrastructure.
The infrastructure introduced in this patch is based on a routing network
which can implement the pack and unpack operations and grows with a
complexity of N * log(N) where N is the number of channels times the number
of samples per channel that are process in parallel.
The network is constructed from a set of similar stages composed of either
2x2 or 4x4 switches. Control signals for the switches are fully registered
and are generated one cycle in advance.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Add support for Vivado's simulator. By default the run script is using
the Icarus simulator.
If the user want to switch to another simulator, it can be explicitly
specify the required simulator tool in the SIMULATOR variable.
Currently, beside Icarus, Modelsim (SIMULATOR="modelsim") and Vivado's
xsim (SIMULATOR="xsim") is supported.
For consistent simulation behavior it is recommended to annotate all source
files with a timescale. Add it to those where it is currently missing.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
By default inferred output reset signals have an active low polarity. The
axi_ad9361 rst output signal is active high though. Currently when
connecting it to a input reset with active high polarity will generate an
error in IPI.
Fix this by explicitly marking the polarity of the rst signal as active
high.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Replace the open-coded instances of a perfect shuffle in the DAC framer with
the new helper module.
Using the helper module gives well defined semantics and hopefully makes
the code easier to understand.
There are no changes in behavior.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The perfect shuffle is a common operation in data processing. Add a shared
module that implements this operation.
Having this in a shared module rather than open-coding every instance makes
sure that there are clear and well defined semantics associated with the
operation that are the same each time. This should ease review, maintenance and
understanding of the code.
The perfect shuffle splits the input vector into NUM_GROUPS groups and then
each group in WORDS_PER_GROUP. The output vector consists of
WORDS_PER_GROUP groups and each group has NUM_GROUPS words. The data is
remapped, so that the i-th word of the j-th word in the output vector is
the j-th word of the i-th group of the input vector.
The inverse operation of the perfect shuffle is the perfect shuffle with
both parameters swapped.
I.e. [perfect_suffle B A [perfect_shuffle A B data]] == data
Examples:
NUM_GROUPS = 2, WORDS_PER_GROUP = 4
[A B C D a b c d] => [A a B b C c D d]
NUM_GROUPS = 4, WORDS_PER_GROUP = 2
[A a B b C c D d] => [A B C D a b c d]
NUM_GROUPS = 3, WORDS_PER_GROUP = 2
[A B a b 1 2] => [A a 1 B b 2]
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The write logic (DMA side) has to be independent from the read logic (DAC side).
In general the FIFO is always ready for the DMA, and every DMA transaction will
interrupt the read-back process, and the module will stop sending data,
until the initialization is finished.
Bringing back the write address tot he DMA clock domain is totally
redundant, so delete it.
Expose the TX configurable driver ports, more specifically the
TX_DIFFCTRL, TX_POSTCURSORE and TX_PRECURSORE for software. This
provides a soft tunning capability of the transmit side of the
transceivers, in cases where the insertion loss of the channel is too
high or low, comparing to the default value supported by the default
configuration of the GTs.
You can find information about these configuration ports under the
section called 'TX Configurable Driver' in the GT transceivers user
guide. (UG476, UG576)