Current implementation does not supports updated versions of Vivado
e.g. 2017.4.1 or 2018.2.1
This fix ignores the update number from the version checking.
Registers from this component can fit in the 2k address range.
Since Vivado's minimal address range is 4k, use that instead.
This will allow placing the independent TPLs to base addresses
that mach the addresses from the monolithic blocks ensuring no software
intervention.
The DMAC has the requirement that the length of the transfer is aligned to
the widest interface width. E.g. if the widest interface is 256 bit or 32
bytes the length of the transfer needs to be a multiple of 32.
This restriction can be relaxed for the memory mapped interfaces. This is
done by partially ignoring data of a beat from/to the MM interface.
For write access the stb bits are used to mask out bytes that do not
contain valid data.
For read access a full beat is read but part of the data is discarded. This
works fine as long as the read access is side effect free. I.e. this method
should not be used to access data from memory mapped peripherals like a
FIFO.
This means that for example the length alignment requirement of a DMA
configured for a 64-bit memory and a 16-bit streaming interface is now only
2 bytes instead of 8 bytes as before.
Note that the address alignment requirement is not affected by this. The
address still needs to be aligned to the width of the MM interface that it
belongs to.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
FPGAs support different widths for the read and write port of the block
SRAM cells. The DMAC can make use of this feature when the source and
destination interface have a different width to up-size/down-size the data
bus.
Using memory cells with asymmetric port width consumes the same amount of
SRAM cells, but allows to bypass the re-size blocks inside the DMAC that
are otherwise used for up- and down-sizing. This reduces overall resource
usage and can improve timing.
If the ratio between the destination and source port is too larger to be
handled by SRAM alone the SRAM block will be configured to do partial up-
or down-sizing and a resize block will be inserted to take care of the
remaining up-/down-sizing. E.g. if a 256-bit interface is connected to a
32-bit interface the SRAM will be used to do an initial resizing of 256 bit
to 64 bit and a resize block will be used to do the remaining resizing from
64 bit to 32 bit.
Currently this feature is disabled for Intel FPGAs since Quartus does not
properly infer a block RAM with different read and write port widths from
the current ad_asym_mem module. Once that has been resolved support for
asymmetric memories can also be enabled in the DMAC.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The handling of the src_data_valid_bytes signal and its related signal is
tightly coupled to the behavior of the resize_src module. The code that
handles it makes assumptions about the internal behavior of the resize_src
module.
Move the handling of the src_data_valid_bytes signal when upsizing the data
bus into the resize_src module so that all the code that is related is in
the same place and the code outside of the module does not have to care
about the internals.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The DMA_LENGTH_ALIGN LSBs of all length For the most part the tools are
able to deduce this using constant propagation.
But this propagation does not work across the asynchronous meta data FIFO
in the burst memory module.
Add a DMA_LENGTH_ALIGN parameter to the burst_memory module which is used
to explicitly keep the LSBs of length registers on the destination side
fixed at 1'b1. This reduces resource use and improves timing by allowing
better constant propagation and unused logic elimination.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
This simplifies the burst length in the response manager significantly
while not costing much extra resources in the burst memory.
This change will also enable other future improvements.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
One of the major features of the DMAC is being able to handle non matching
interface widths for the destination and source side.
Currently the test benches only support the case where the width for the
source and the destination side are the same. Extend them so that it is
possible to also test and verify setups where the width is not the same.
To accomplish this each byte memory location is treated as if it contained
the lower 8 bytes of its address. And then the written/read data is
compared to the expected data based on that.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
On Arria10 there are 6 transceivers in a single bank. If more than 6
transceivers are used these will end up in multiple banks.
The ATX PLL can directly connect to the transceivers in the same bank
through the 1x clock network. To connect to transceivers in another bank it
has to go through a master clock generation block (MCGB) and the xN clock
network.
Add support for instantiating the MCGB if more than 6 lanes are used. In
this case the first 6 transceivers will still have a direct connection to
the PLL while all other transceivers will be clocked by the MCGB.
Note that this requires that the first 6 transceivers are all in the same
bank.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
All projects have been updated to use the new pack/unpack infrastructure.
The old util_cpack and util_upack cores are now unused an can be removed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The util_cpack2 core is similar to the util_upack core. It packs, or
interleaves, a data from multiple ports into a single data. Ports can
optionally be enabled or disabled.
On the input side the cpack2 core uses a multi-port FIFO interface. There
is a single data write signal (fifo_wr_en) for all ports. But each port can
be individually enabled or disabled using the enable signals.
On the output side the cpack2 core uses a single port FIFO interface. When
data is available on the output interface the data write signal
(packed_fifo_wr_en). Data on the packed_fifo_wr_data signal is only valid
when packed_fifo_wr_en is asserted. At other times the content is
undefined. The cpack2 core offers no back-pressure. If data is not consumed
when it is made available it will be lost.
Data from the input ports is accumulated inside the cpack2 core and if
enough data is available to produce a full output vector the data is
forwarded.
This core is build using the common pack infrastructure. The core that is
specific to the cpack2 core is mainly only responsible for generating the
control signals for the external interfaces.
The core is accompanied by a test bench that verifies correct behavior for
all possible combinations of enable masks.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The util_upack2 core is similar to the util_upack core. It unpacks, or
deinterleaves, a data stream onto multiple ports.
The upack2 core uses a streaming AXI interface for its data source instead
of a FIFO interface like the upack core uses.
On the output side the upack2 core uses a multi-port FIFO interface. There
is a single data request signal (fifo_rd_en) for all ports. But each port
can be individually enabled or disabled using the enable signals.
This modified architecture allows the upack2 core to better generate the
valid and underflow control signals to indicate whether data is available
in a response to a data request.
If fifo_rd_en is asserted and data is available the fifo_rd_valid signal
are asserted in the following clock cycle. The enabled fifo_rd_data ports
will be contain valid data during the same clock cycle as fifo_rd_valid is
asserted. During other clock cycles the output data is undefined. On
disabled ports the data is always undefined.
If no data is available instead the fifo_rd_underflow signal is asserted in
the following clock cycle and the output of all fifo_rd_data ports is
undefined.
This core is build using the common pack infrastructure. The core that is
specific to the upack2 core is mainly only responsible for generating the
control signals for the external interfaces.
The core is accompanied by a test bench that verifies correct behavior for
all possible combinations of enable masks.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Pack and unpack operations are very similar in structure as such it makes
sense for pack and unpack core to share a common infrastructure.
The infrastructure introduced in this patch is based on a routing network
which can implement the pack and unpack operations and grows with a
complexity of N * log(N) where N is the number of channels times the number
of samples per channel that are process in parallel.
The network is constructed from a set of similar stages composed of either
2x2 or 4x4 switches. Control signals for the switches are fully registered
and are generated one cycle in advance.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Add support for Vivado's simulator. By default the run script is using
the Icarus simulator.
If the user want to switch to another simulator, it can be explicitly
specify the required simulator tool in the SIMULATOR variable.
Currently, beside Icarus, Modelsim (SIMULATOR="modelsim") and Vivado's
xsim (SIMULATOR="xsim") is supported.
For consistent simulation behavior it is recommended to annotate all source
files with a timescale. Add it to those where it is currently missing.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
By default inferred output reset signals have an active low polarity. The
axi_ad9361 rst output signal is active high though. Currently when
connecting it to a input reset with active high polarity will generate an
error in IPI.
Fix this by explicitly marking the polarity of the rst signal as active
high.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Replace the open-coded instances of a perfect shuffle in the DAC framer with
the new helper module.
Using the helper module gives well defined semantics and hopefully makes
the code easier to understand.
There are no changes in behavior.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The perfect shuffle is a common operation in data processing. Add a shared
module that implements this operation.
Having this in a shared module rather than open-coding every instance makes
sure that there are clear and well defined semantics associated with the
operation that are the same each time. This should ease review, maintenance and
understanding of the code.
The perfect shuffle splits the input vector into NUM_GROUPS groups and then
each group in WORDS_PER_GROUP. The output vector consists of
WORDS_PER_GROUP groups and each group has NUM_GROUPS words. The data is
remapped, so that the i-th word of the j-th word in the output vector is
the j-th word of the i-th group of the input vector.
The inverse operation of the perfect shuffle is the perfect shuffle with
both parameters swapped.
I.e. [perfect_suffle B A [perfect_shuffle A B data]] == data
Examples:
NUM_GROUPS = 2, WORDS_PER_GROUP = 4
[A B C D a b c d] => [A a B b C c D d]
NUM_GROUPS = 4, WORDS_PER_GROUP = 2
[A a B b C c D d] => [A B C D a b c d]
NUM_GROUPS = 3, WORDS_PER_GROUP = 2
[A B a b 1 2] => [A a 1 B b 2]
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The write logic (DMA side) has to be independent from the read logic (DAC side).
In general the FIFO is always ready for the DMA, and every DMA transaction will
interrupt the read-back process, and the module will stop sending data,
until the initialization is finished.
Bringing back the write address tot he DMA clock domain is totally
redundant, so delete it.
Expose the TX configurable driver ports, more specifically the
TX_DIFFCTRL, TX_POSTCURSORE and TX_PRECURSORE for software. This
provides a soft tunning capability of the transmit side of the
transceivers, in cases where the insertion loss of the channel is too
high or low, comparing to the default value supported by the default
configuration of the GTs.
You can find information about these configuration ports under the
section called 'TX Configurable Driver' in the GT transceivers user
guide. (UG476, UG576)
This commit does not contain any functional modification.
Because the wizard generates the attributes in binary, we should use
binary mode too, so we can compare different configurations more easily.
If the req_valid asserts faster than the ID gets synchronized over we
assert the xfer request without being ready to accept data.
This can lead to overflow assertion when using a FIFO like interface.
Data mover/ src axis changes
Request rewind ID if TLAST received during non-last burst
Consume (ignore) descriptors until last segment received
Block descriptors towards destination until last segment received
Request generator changes
Rewind the burst ID if rewind request received
Consume (ignore) descriptors until last segment received
If TLAST happened on last segment replay next transfer (in progress or
completed) with the adjusted ID
Create completion requests for ignored segments
Response generator changes
Track requests
Complete segments which got ignored
Length of partial transfers are stored in a queue for SW reads.
The presence of partial transfer is indicated by a status bit.
The reporting can be enabled by a control bit.
The progress of any transfer can be followed by a debug register.
Drive the descriptor from the source side to destination
so we can abort consecutive transfers in case TLAST asserts.
For AXIS count the length of the burst and pass that value to the
destination instead the programmed one. This is useful when the
streams aborts early by asserting the TLAST. We want to notify the
destination with the right number of beats received.
For FIFO source interface reuse the same logic due the small footprint
even if the stream does not got interrupted in that case.
For MM source interface wire the burst length from the request side to
destination.
The constraint for the synchronizer that synchronizes the sync_status
signal of the link only works correctly for the first link. For other links
no timing exception is applied, which leads to timing failures.
Fix this by using a wildcard constraint for the synchronizer reg number.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
If DDS_DW is equal to DDS_D_DW there is no signal truncation and
consequentially no rounding should be performed. But the check whether
rounding should be performed currently is for if DDS_DW is less or equal to
DDS_D_DW.
When both are equal C_T_WIDTH is 0. This results in the expression
'{(C_T_WIDTH){dds_data_int[DDS_D_DW-1]}};' being a 0 width signal. This is
not legal Verilog, but both the Intel and Xilinx tools seem to accept it
nevertheless.
But the iverilog simulation tools generates the following error:
ad_dds_2.v:102: error: Concatenation repeat may not be zero in this context.
Xilinx Vivado also generates the following warning:
WARNING: [Synth 8-693] zero replication count - replication ignored [ad_dds_2.v:102]
Change the condition so that truncation is only performed when DDS_DW is
less than DDS_D_DW. This fixes both the error and the warning.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The DISPLAY_NAME of a module is supposed to be a short human readable
description of the IP core.
Currently this is set to the name of the IP, which already has its own
property called NAME.
This causes Platform Designer to display the descriptive labels if the IP
core basically as "$ip_core_name ($ip_core_name)".
The value that all current user of ad_ip_create pass for the description
parameter matches this criteria (And not so much the requirements for the
actual DESCRIPTION property).
Change things, so that the DISPLAY_NAME property is set to what is
currently passed as the description parameter.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The Xilinx's reset interface expect that every reset have an associated
interface and clock signal. The tool will try to find its clock and interface,
and automatically associated clock signal to it.
The PLL resets are individual asynchronous resets. To simplify the design
and avoid invalid critical warnings all the reset interface inference
for the PLL resets were removed.
Most converters refer to their different operating modes as a "Mode X"
(where X is a number) in their datasheet. Each mode has a specific framer
configuration associated with it.
Provide a set of Platform Designer (previously known as Qsys) preset files
for each mode. This allows to quickly select a specific operating mode
without having to lookup the corresponding framer configuration from the
datasheet.
A preset can be selected either in the Platform Designer GUI or from a tcl
script using the apply_preset command. E.g.
add_instance ad9172_transport ad_ip_jesd204_tpl_dac
apply_preset ad9172_transport "AD9172 Mode 10"
The preset files are generated using the scripts/generate_presets.py
script and the scripts/modes.txt file.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
A converter typically only supports a specific subset of framer
configurations.
Add a configuration parameter to select a specific converter part number.
Based on the selected part a mode validation will be performed and if the
selected framer configuration is not supported by the part an error will be
generated.
This helps to catch invalid configurations early on rather than having to
first build the bitstream and then notice that it does not work.
When using "Generic" for the part configuration parameter no validation
will be done and any framer configuration can be selected.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The exact layout of the input data into the DAC transport layer core
depends on the framer configuration. The number of input channels is
always equal to the NUM_CHANNELS parameter, but the number of samples per
channel per beat depends on the ratio of number of lanes, number of
channels and bits per sample.
It is possible to compute this manually, but this might require in-depth
knowledge about how the JESD204 framer works. Add read-only parameters that
display the number of samples per channel per beat as well as the total
width of the channel data signal.
This information can also be queried in QSys scripts and used to
automatically configure the input pipeline. E.g. like the upack core:
set NUM_OF_CHANNELS [get_instance_parameter_value jesd204_transport NUM_CHANNELS]
set CHANNEL_DATA_WIDTH [get_instance_parameter_value jesd204_transport CHANNEL_DATA_WIDTH]
add_instance util_dac_upack util_upack
set_instance_parameter_values util_dac_upack [list \
CHANNEL_DATA_WIDTH $CHANNEL_DATA_WIDTH \
NUM_OF_CHANNELS $NUM_OF_CHANNELS \
]
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
For a specific set of L, M and NP framer configuration parameters there is
an infinite set of possible values for the S and F configuration parameters
as long as S and F are integer and the following relationship is met
S / F = (L * 8) / (M * NP)
Typically the preferred framer configuration is the one with the lowest
latency. The lowest latency is achieved when S is minimal.
Automatically compute and select this value for S instead of having the
user to manually provide a value.
Since some converters allow modes where S is not minimal provide a manual
overwrite to specify S manually in case somebody wants to use such a mode.
For completeness also add a read-only OCTETS_PER_FRAME (F) parameter that
can be used to verify and check which value for F was chosen.
There is no manual overwrite for F since if L, M, NP and S are set to a
fixed value there is only a single possible value for F, which is computed
automatically.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The ad_ip_jesd204_tpl_dac currently only supports JESD204 modes that have
both N and N' set to 16.
Newer DACs like the AD9172 support modes where N and N' are not equal to
16. Add support for these modes.
The width of the internal channel data path is set to N, only processing as
many bits as necessary. At the framer the data is up-sized to N' bits with
tail bits inserted as necessary. This data is then passed to the link
layer.
The width at the DMA interface is kept at 16 bits per sample regardless of
the configuration of either N or N'. This is done to keep the interface
consistent with the existing infrastructure it will connect to like upack
and DMA. The data is expected to the LSB aligned, the unused MSBs will be
ignored.
Same is true for the test-pattern data registers. These register keep their
existing 16-bit layout, but unused MSBs will be ignored by the core.
The PN generators are modified to create only N bits of data per sample.
Note that while the core can now support modes with N' = 12 there is still
the restriction that requires the number of frames per beat to be an even
number. Which means that not all modes with N' = 12 can be supported yet.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The current framer implementation is limited in that it only supports N'=16
and either S=1 or F=1.
Rework the framer implementation to be more flexible and support more
framer setting combinations.
The new framer implementation performs the mapping in two steps. First it
groups samples into frames, as there might be more than one frame per beat.
In the second step the frames are distributed onto the lanes.
Note that this still results in a single input bit being mapped onto a
single output bit and no combinatorial logic is involved. The two step
implementation just makes it (hopefully) easier to follow.
The only restriction that remains is that number of frames per beat must be
integer. This means that F must be either 1, 2 or 4. Supporting partial
frames would result in partial sample sets being consumed at the input,
which is not supported by input pipeline.
The new framer has provisions for handling values for the number of octets
per beat other than 4, but this is not exposed as a configuration option
yet since the link layer can only handle 4 octets per beat. Making the
octets per beat configurable is something for future iterations of the
core.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The ad_ip_jesd204_tpl_dac currently is only instantiated as a submodule by
other cores like the axi_ad9144 or axi_ad9152. These cores typically only
support one specific framer configuration.
In an effort to allow more framer configurations to be used the core is
re-worked, so it can be instantiated standalone.
As part of this effort provide GUI integration for Xilinx IP Integrator
where users can instantiate and configure the core.
For this group the configuration parameters by function, provide
descriptive label and a list of allowed values for parameter validation.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The ad_ip_jesd204_tpl_dac currently is only instantiated as a submodule by
other cores like the axi_ad9144 or axi_ad9152. These cores typically only
support one specific framer configuration.
In an effort to allow more framer configurations to be used the core is
re-worked, so it can be instantiated standalone.
As part of this effort provide GUI integration for Intel Platform Designer
(previously known as Qsys) where users can instantiate and configure the
core.
For this group the configuration parameters by function, provide
descriptive label and a list of allowed values for parameter validation.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The framer module is purely combinational at this point and the clk signal
is unused.
This is a leftover of commit commit 5af80e79b3 ("ad_ip_jesd204_tpl_dac:
Drop extra pipeline stage from the framer").
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Commit 5d044b9fd3 ("ad_ip_jesd204_tpl_dac: Share PN sequence generator
between all channels") add a new file to the ad_ip_jesd204_tpl_dac, but
neglected to update the hw.tcl for the axi_ad9144 and axi_ad9152 which use
this file.
The result is that Intel project using these cores currently do not build.
Fix it by adding the missing file to the file list.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Reduce the width of ID signals to avoid size mismatches in Arria 10 SoC
projects where the ID width of the hard IP is 4.
The width of ID that reaches the slave can be increased by the interconnect if
multiple masters access the slave so we end up with mismatches.
Since these signals are unused it is safe to reduce them to minimum width and
let the interconnect zero-extend them as required.
The buffers inside the interconnect are sized based on maximum burst sizes
the masters can produce.
For AXI4 the max burst size is 128 but for these projects for the
default burst size of 128 bytes the DMACs are creating only burst of 8 or
16 beats depending on the bus width (128bits and 64 bits respectively).
These burst sizes can fit in the AXI3 protocol where the max burst
length is 16. Therefore the interconnect will be reduced.
The observed reduction is around 4 Mb of block RAM per project.
Another benefit is a better timing closure,
since these buffers reside in the DDR3 clock domain.
This improvement will solve a couple of [DRC REQP-1839] warning:
"The RAMB36E1 has an input control pin * which is driven by a register * that has
an active asynchronous set or reset. This may cause corruption of the memory
contents and/or read values when the set/reset is asserted and is not analyzed
by the default static timing analysis. It is suggested to eliminate the use of
a set/reset to registers driving this RAMB pin or else use a synchronous reset
in which the assertion of the reset is timed by default."
The frame synchronization between axi_hdmi_tx and axi_dmac is based
on the DMA(2D streaming) last signal. The last signal will be used as
an end of frame signal marking the beginning of the future frame to be
transferred by the DMA.
Only after both HDMI and DMA are ready for a "new frame" data will be
requested from the DMA.
The datarate and CDC between the axi_dmac and axi_hdmi_tx cores
will be handled by axi_hdmi_tx's DMA interface based on a backpressure
mechanism.
Add a interface definition for the link interface that combines the valid,
ready and data signals into a AXI streaming interface.
This allows to connect the interface to the JESD204 link layer peripheral
in one go without having to manually connect each signal.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Some modes produce only one sample per channel per beat, e.g. when M=2*L.
In this case the pattern output needs to alternate between the two patterns
from beat to beat.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
All channels have a copy of the same logic to generate the PN sequences.
Sharing the PN sequence generator among all channels slightly reduces the
resource utilization of the core.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Only the N (where N is the size of the PN sequence) MSB bits of the reset
state of the PN generator should be set to 1. All other bits should be
initialized following the PN generator sequence.
Otherwise the first set of samples contain an incorrect PN sequence.
This does not increase the complexity of the PN generator, all reset values
are still constant.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
All the inputs to the framer are registered. And the framer itself does not
have any combinatorial logic, it just re-orders the wire numbering of the
individual bits.
Currently the framer module adds a output register stage, but since there
is no logic in the framer this just means that these registers are directly
connected to the output of the previous register stage.
Remove the extra pipeline register. This slightly reduces utilization and
pipeline delay of the core.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Remove unused register from the ad_ip_jesd204_tpl_dac_channel module.
Commit commit 92f0e809b5 ("jesd204/ad_ip_jesd204_tpl_dac: Updates for
ad_dds phase acc wrapper") removed all users of those registers.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
All parameters are DAC related since this is a peripheral that handles
DACs. Having DAC as a prefix on some of the parameter names is a bit
redundant, so remove them.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Use a relative path for all IP local files. This is the common style
throughout the HDL repository and also makes it easier to move the
directory around.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
- connect unused GPIO inputs to loopback
- connect unconnected inputs to zero
- complete interface for system_wrapper instantiated in all system_top
fixes incomplet portlist WARNING [Synth 8-350]
fixes undriven inputs WARNING [Synth 8-3295]
The change excludes the generated system.v and Xilinx files.