All projects have been updated to use the new pack/unpack infrastructure.
The old util_cpack and util_upack cores are now unused an can be removed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The util_cpack2 core is similar to the util_upack core. It packs, or
interleaves, a data from multiple ports into a single data. Ports can
optionally be enabled or disabled.
On the input side the cpack2 core uses a multi-port FIFO interface. There
is a single data write signal (fifo_wr_en) for all ports. But each port can
be individually enabled or disabled using the enable signals.
On the output side the cpack2 core uses a single port FIFO interface. When
data is available on the output interface the data write signal
(packed_fifo_wr_en). Data on the packed_fifo_wr_data signal is only valid
when packed_fifo_wr_en is asserted. At other times the content is
undefined. The cpack2 core offers no back-pressure. If data is not consumed
when it is made available it will be lost.
Data from the input ports is accumulated inside the cpack2 core and if
enough data is available to produce a full output vector the data is
forwarded.
This core is build using the common pack infrastructure. The core that is
specific to the cpack2 core is mainly only responsible for generating the
control signals for the external interfaces.
The core is accompanied by a test bench that verifies correct behavior for
all possible combinations of enable masks.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The util_upack2 core is similar to the util_upack core. It unpacks, or
deinterleaves, a data stream onto multiple ports.
The upack2 core uses a streaming AXI interface for its data source instead
of a FIFO interface like the upack core uses.
On the output side the upack2 core uses a multi-port FIFO interface. There
is a single data request signal (fifo_rd_en) for all ports. But each port
can be individually enabled or disabled using the enable signals.
This modified architecture allows the upack2 core to better generate the
valid and underflow control signals to indicate whether data is available
in a response to a data request.
If fifo_rd_en is asserted and data is available the fifo_rd_valid signal
are asserted in the following clock cycle. The enabled fifo_rd_data ports
will be contain valid data during the same clock cycle as fifo_rd_valid is
asserted. During other clock cycles the output data is undefined. On
disabled ports the data is always undefined.
If no data is available instead the fifo_rd_underflow signal is asserted in
the following clock cycle and the output of all fifo_rd_data ports is
undefined.
This core is build using the common pack infrastructure. The core that is
specific to the upack2 core is mainly only responsible for generating the
control signals for the external interfaces.
The core is accompanied by a test bench that verifies correct behavior for
all possible combinations of enable masks.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Pack and unpack operations are very similar in structure as such it makes
sense for pack and unpack core to share a common infrastructure.
The infrastructure introduced in this patch is based on a routing network
which can implement the pack and unpack operations and grows with a
complexity of N * log(N) where N is the number of channels times the number
of samples per channel that are process in parallel.
The network is constructed from a set of similar stages composed of either
2x2 or 4x4 switches. Control signals for the switches are fully registered
and are generated one cycle in advance.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Add support for Vivado's simulator. By default the run script is using
the Icarus simulator.
If the user want to switch to another simulator, it can be explicitly
specify the required simulator tool in the SIMULATOR variable.
Currently, beside Icarus, Modelsim (SIMULATOR="modelsim") and Vivado's
xsim (SIMULATOR="xsim") is supported.
For consistent simulation behavior it is recommended to annotate all source
files with a timescale. Add it to those where it is currently missing.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
By default inferred output reset signals have an active low polarity. The
axi_ad9361 rst output signal is active high though. Currently when
connecting it to a input reset with active high polarity will generate an
error in IPI.
Fix this by explicitly marking the polarity of the rst signal as active
high.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Replace the open-coded instances of a perfect shuffle in the DAC framer with
the new helper module.
Using the helper module gives well defined semantics and hopefully makes
the code easier to understand.
There are no changes in behavior.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The perfect shuffle is a common operation in data processing. Add a shared
module that implements this operation.
Having this in a shared module rather than open-coding every instance makes
sure that there are clear and well defined semantics associated with the
operation that are the same each time. This should ease review, maintenance and
understanding of the code.
The perfect shuffle splits the input vector into NUM_GROUPS groups and then
each group in WORDS_PER_GROUP. The output vector consists of
WORDS_PER_GROUP groups and each group has NUM_GROUPS words. The data is
remapped, so that the i-th word of the j-th word in the output vector is
the j-th word of the i-th group of the input vector.
The inverse operation of the perfect shuffle is the perfect shuffle with
both parameters swapped.
I.e. [perfect_suffle B A [perfect_shuffle A B data]] == data
Examples:
NUM_GROUPS = 2, WORDS_PER_GROUP = 4
[A B C D a b c d] => [A a B b C c D d]
NUM_GROUPS = 4, WORDS_PER_GROUP = 2
[A a B b C c D d] => [A B C D a b c d]
NUM_GROUPS = 3, WORDS_PER_GROUP = 2
[A B a b 1 2] => [A a 1 B b 2]
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The write logic (DMA side) has to be independent from the read logic (DAC side).
In general the FIFO is always ready for the DMA, and every DMA transaction will
interrupt the read-back process, and the module will stop sending data,
until the initialization is finished.
Bringing back the write address tot he DMA clock domain is totally
redundant, so delete it.
Expose the TX configurable driver ports, more specifically the
TX_DIFFCTRL, TX_POSTCURSORE and TX_PRECURSORE for software. This
provides a soft tunning capability of the transmit side of the
transceivers, in cases where the insertion loss of the channel is too
high or low, comparing to the default value supported by the default
configuration of the GTs.
You can find information about these configuration ports under the
section called 'TX Configurable Driver' in the GT transceivers user
guide. (UG476, UG576)
This commit does not contain any functional modification.
Because the wizard generates the attributes in binary, we should use
binary mode too, so we can compare different configurations more easily.
If the req_valid asserts faster than the ID gets synchronized over we
assert the xfer request without being ready to accept data.
This can lead to overflow assertion when using a FIFO like interface.
Data mover/ src axis changes
Request rewind ID if TLAST received during non-last burst
Consume (ignore) descriptors until last segment received
Block descriptors towards destination until last segment received
Request generator changes
Rewind the burst ID if rewind request received
Consume (ignore) descriptors until last segment received
If TLAST happened on last segment replay next transfer (in progress or
completed) with the adjusted ID
Create completion requests for ignored segments
Response generator changes
Track requests
Complete segments which got ignored
Length of partial transfers are stored in a queue for SW reads.
The presence of partial transfer is indicated by a status bit.
The reporting can be enabled by a control bit.
The progress of any transfer can be followed by a debug register.
Drive the descriptor from the source side to destination
so we can abort consecutive transfers in case TLAST asserts.
For AXIS count the length of the burst and pass that value to the
destination instead the programmed one. This is useful when the
streams aborts early by asserting the TLAST. We want to notify the
destination with the right number of beats received.
For FIFO source interface reuse the same logic due the small footprint
even if the stream does not got interrupted in that case.
For MM source interface wire the burst length from the request side to
destination.
The constraint for the synchronizer that synchronizes the sync_status
signal of the link only works correctly for the first link. For other links
no timing exception is applied, which leads to timing failures.
Fix this by using a wildcard constraint for the synchronizer reg number.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
If DDS_DW is equal to DDS_D_DW there is no signal truncation and
consequentially no rounding should be performed. But the check whether
rounding should be performed currently is for if DDS_DW is less or equal to
DDS_D_DW.
When both are equal C_T_WIDTH is 0. This results in the expression
'{(C_T_WIDTH){dds_data_int[DDS_D_DW-1]}};' being a 0 width signal. This is
not legal Verilog, but both the Intel and Xilinx tools seem to accept it
nevertheless.
But the iverilog simulation tools generates the following error:
ad_dds_2.v:102: error: Concatenation repeat may not be zero in this context.
Xilinx Vivado also generates the following warning:
WARNING: [Synth 8-693] zero replication count - replication ignored [ad_dds_2.v:102]
Change the condition so that truncation is only performed when DDS_DW is
less than DDS_D_DW. This fixes both the error and the warning.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The DISPLAY_NAME of a module is supposed to be a short human readable
description of the IP core.
Currently this is set to the name of the IP, which already has its own
property called NAME.
This causes Platform Designer to display the descriptive labels if the IP
core basically as "$ip_core_name ($ip_core_name)".
The value that all current user of ad_ip_create pass for the description
parameter matches this criteria (And not so much the requirements for the
actual DESCRIPTION property).
Change things, so that the DISPLAY_NAME property is set to what is
currently passed as the description parameter.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The Xilinx's reset interface expect that every reset have an associated
interface and clock signal. The tool will try to find its clock and interface,
and automatically associated clock signal to it.
The PLL resets are individual asynchronous resets. To simplify the design
and avoid invalid critical warnings all the reset interface inference
for the PLL resets were removed.
Most converters refer to their different operating modes as a "Mode X"
(where X is a number) in their datasheet. Each mode has a specific framer
configuration associated with it.
Provide a set of Platform Designer (previously known as Qsys) preset files
for each mode. This allows to quickly select a specific operating mode
without having to lookup the corresponding framer configuration from the
datasheet.
A preset can be selected either in the Platform Designer GUI or from a tcl
script using the apply_preset command. E.g.
add_instance ad9172_transport ad_ip_jesd204_tpl_dac
apply_preset ad9172_transport "AD9172 Mode 10"
The preset files are generated using the scripts/generate_presets.py
script and the scripts/modes.txt file.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
A converter typically only supports a specific subset of framer
configurations.
Add a configuration parameter to select a specific converter part number.
Based on the selected part a mode validation will be performed and if the
selected framer configuration is not supported by the part an error will be
generated.
This helps to catch invalid configurations early on rather than having to
first build the bitstream and then notice that it does not work.
When using "Generic" for the part configuration parameter no validation
will be done and any framer configuration can be selected.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The exact layout of the input data into the DAC transport layer core
depends on the framer configuration. The number of input channels is
always equal to the NUM_CHANNELS parameter, but the number of samples per
channel per beat depends on the ratio of number of lanes, number of
channels and bits per sample.
It is possible to compute this manually, but this might require in-depth
knowledge about how the JESD204 framer works. Add read-only parameters that
display the number of samples per channel per beat as well as the total
width of the channel data signal.
This information can also be queried in QSys scripts and used to
automatically configure the input pipeline. E.g. like the upack core:
set NUM_OF_CHANNELS [get_instance_parameter_value jesd204_transport NUM_CHANNELS]
set CHANNEL_DATA_WIDTH [get_instance_parameter_value jesd204_transport CHANNEL_DATA_WIDTH]
add_instance util_dac_upack util_upack
set_instance_parameter_values util_dac_upack [list \
CHANNEL_DATA_WIDTH $CHANNEL_DATA_WIDTH \
NUM_OF_CHANNELS $NUM_OF_CHANNELS \
]
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
For a specific set of L, M and NP framer configuration parameters there is
an infinite set of possible values for the S and F configuration parameters
as long as S and F are integer and the following relationship is met
S / F = (L * 8) / (M * NP)
Typically the preferred framer configuration is the one with the lowest
latency. The lowest latency is achieved when S is minimal.
Automatically compute and select this value for S instead of having the
user to manually provide a value.
Since some converters allow modes where S is not minimal provide a manual
overwrite to specify S manually in case somebody wants to use such a mode.
For completeness also add a read-only OCTETS_PER_FRAME (F) parameter that
can be used to verify and check which value for F was chosen.
There is no manual overwrite for F since if L, M, NP and S are set to a
fixed value there is only a single possible value for F, which is computed
automatically.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The ad_ip_jesd204_tpl_dac currently only supports JESD204 modes that have
both N and N' set to 16.
Newer DACs like the AD9172 support modes where N and N' are not equal to
16. Add support for these modes.
The width of the internal channel data path is set to N, only processing as
many bits as necessary. At the framer the data is up-sized to N' bits with
tail bits inserted as necessary. This data is then passed to the link
layer.
The width at the DMA interface is kept at 16 bits per sample regardless of
the configuration of either N or N'. This is done to keep the interface
consistent with the existing infrastructure it will connect to like upack
and DMA. The data is expected to the LSB aligned, the unused MSBs will be
ignored.
Same is true for the test-pattern data registers. These register keep their
existing 16-bit layout, but unused MSBs will be ignored by the core.
The PN generators are modified to create only N bits of data per sample.
Note that while the core can now support modes with N' = 12 there is still
the restriction that requires the number of frames per beat to be an even
number. Which means that not all modes with N' = 12 can be supported yet.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The current framer implementation is limited in that it only supports N'=16
and either S=1 or F=1.
Rework the framer implementation to be more flexible and support more
framer setting combinations.
The new framer implementation performs the mapping in two steps. First it
groups samples into frames, as there might be more than one frame per beat.
In the second step the frames are distributed onto the lanes.
Note that this still results in a single input bit being mapped onto a
single output bit and no combinatorial logic is involved. The two step
implementation just makes it (hopefully) easier to follow.
The only restriction that remains is that number of frames per beat must be
integer. This means that F must be either 1, 2 or 4. Supporting partial
frames would result in partial sample sets being consumed at the input,
which is not supported by input pipeline.
The new framer has provisions for handling values for the number of octets
per beat other than 4, but this is not exposed as a configuration option
yet since the link layer can only handle 4 octets per beat. Making the
octets per beat configurable is something for future iterations of the
core.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The ad_ip_jesd204_tpl_dac currently is only instantiated as a submodule by
other cores like the axi_ad9144 or axi_ad9152. These cores typically only
support one specific framer configuration.
In an effort to allow more framer configurations to be used the core is
re-worked, so it can be instantiated standalone.
As part of this effort provide GUI integration for Xilinx IP Integrator
where users can instantiate and configure the core.
For this group the configuration parameters by function, provide
descriptive label and a list of allowed values for parameter validation.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The ad_ip_jesd204_tpl_dac currently is only instantiated as a submodule by
other cores like the axi_ad9144 or axi_ad9152. These cores typically only
support one specific framer configuration.
In an effort to allow more framer configurations to be used the core is
re-worked, so it can be instantiated standalone.
As part of this effort provide GUI integration for Intel Platform Designer
(previously known as Qsys) where users can instantiate and configure the
core.
For this group the configuration parameters by function, provide
descriptive label and a list of allowed values for parameter validation.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The framer module is purely combinational at this point and the clk signal
is unused.
This is a leftover of commit commit 5af80e79b3 ("ad_ip_jesd204_tpl_dac:
Drop extra pipeline stage from the framer").
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Commit 5d044b9fd3 ("ad_ip_jesd204_tpl_dac: Share PN sequence generator
between all channels") add a new file to the ad_ip_jesd204_tpl_dac, but
neglected to update the hw.tcl for the axi_ad9144 and axi_ad9152 which use
this file.
The result is that Intel project using these cores currently do not build.
Fix it by adding the missing file to the file list.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Reduce the width of ID signals to avoid size mismatches in Arria 10 SoC
projects where the ID width of the hard IP is 4.
The width of ID that reaches the slave can be increased by the interconnect if
multiple masters access the slave so we end up with mismatches.
Since these signals are unused it is safe to reduce them to minimum width and
let the interconnect zero-extend them as required.
The buffers inside the interconnect are sized based on maximum burst sizes
the masters can produce.
For AXI4 the max burst size is 128 but for these projects for the
default burst size of 128 bytes the DMACs are creating only burst of 8 or
16 beats depending on the bus width (128bits and 64 bits respectively).
These burst sizes can fit in the AXI3 protocol where the max burst
length is 16. Therefore the interconnect will be reduced.
The observed reduction is around 4 Mb of block RAM per project.
Another benefit is a better timing closure,
since these buffers reside in the DDR3 clock domain.
This improvement will solve a couple of [DRC REQP-1839] warning:
"The RAMB36E1 has an input control pin * which is driven by a register * that has
an active asynchronous set or reset. This may cause corruption of the memory
contents and/or read values when the set/reset is asserted and is not analyzed
by the default static timing analysis. It is suggested to eliminate the use of
a set/reset to registers driving this RAMB pin or else use a synchronous reset
in which the assertion of the reset is timed by default."
The frame synchronization between axi_hdmi_tx and axi_dmac is based
on the DMA(2D streaming) last signal. The last signal will be used as
an end of frame signal marking the beginning of the future frame to be
transferred by the DMA.
Only after both HDMI and DMA are ready for a "new frame" data will be
requested from the DMA.
The datarate and CDC between the axi_dmac and axi_hdmi_tx cores
will be handled by axi_hdmi_tx's DMA interface based on a backpressure
mechanism.
Add a interface definition for the link interface that combines the valid,
ready and data signals into a AXI streaming interface.
This allows to connect the interface to the JESD204 link layer peripheral
in one go without having to manually connect each signal.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Some modes produce only one sample per channel per beat, e.g. when M=2*L.
In this case the pattern output needs to alternate between the two patterns
from beat to beat.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
All channels have a copy of the same logic to generate the PN sequences.
Sharing the PN sequence generator among all channels slightly reduces the
resource utilization of the core.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Only the N (where N is the size of the PN sequence) MSB bits of the reset
state of the PN generator should be set to 1. All other bits should be
initialized following the PN generator sequence.
Otherwise the first set of samples contain an incorrect PN sequence.
This does not increase the complexity of the PN generator, all reset values
are still constant.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
All the inputs to the framer are registered. And the framer itself does not
have any combinatorial logic, it just re-orders the wire numbering of the
individual bits.
Currently the framer module adds a output register stage, but since there
is no logic in the framer this just means that these registers are directly
connected to the output of the previous register stage.
Remove the extra pipeline register. This slightly reduces utilization and
pipeline delay of the core.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Remove unused register from the ad_ip_jesd204_tpl_dac_channel module.
Commit commit 92f0e809b5 ("jesd204/ad_ip_jesd204_tpl_dac: Updates for
ad_dds phase acc wrapper") removed all users of those registers.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
All parameters are DAC related since this is a peripheral that handles
DACs. Having DAC as a prefix on some of the parameter names is a bit
redundant, so remove them.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Use a relative path for all IP local files. This is the common style
throughout the HDL repository and also makes it easier to move the
directory around.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
- connect unused GPIO inputs to loopback
- connect unconnected inputs to zero
- complete interface for system_wrapper instantiated in all system_top
fixes incomplet portlist WARNING [Synth 8-350]
fixes undriven inputs WARNING [Synth 8-3295]
The change excludes the generated system.v and Xilinx files.
This patch will fix the following critical warning, generated by Quartus:
"Critical Warning (18061): Ignored Power-Up Level option on the following
registers
Critical Warning (18010): Register ad_rst:i_core_rst_reg|rst_sync will power
up to High File: ad_rst.v Line: 50"
For a proper reset synchronization, the asynchronous reset signal should
be connected to the reset pins of the two synchronizer flop, and the
data input of the first flop should be connected to VCC.
In the first stage we're synchronizing just the reset de-assertion, avoiding
the scenario when different parts of the design are reseting at different time,
causing unwanted behaviours.
In the second stage we're synchronizing the reset assertion.
The module expects an ACTIVE_HIGH input reset signal, and provides an ACTIVE_LOW
(rstn) and an ACTIVE_HIGH (rst) synchronized reset output signal.
Assign a unique value to each lane's error count register and verify that
the correct value is returned for the right lane.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The RX register map testbench currently fails because the expected value
for the version register was not updated, when the version was incremented.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The loopback testbench currently fails, because the cfg_links_disable signal is not connected to the RX side of the link.
Fix this.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
In case when the SYSREF is connected to an FPGA IO which has a limitation
on the IOB register IN_FF clock line and the required ref clock is high
we can't use the IOB registers.
e.g. the max clock rate on zcu102 HP IO FF is 312MHz but ref clock is 375MHz;
If IOB is used in this case a pulse width violation is reported.
This change makes the IOB placement selectable in such case or
for targets which don't require class 1 operation.
The round function from tcl is a rounding to nearest. Using it in address
width calculation produces incorrect values.
e.g.
round(log(0xAF000000)/log(2)) will produce 31 instead of 32
The fix is to replace the rounding function with ceiling that guarantees
rounding up.
- remove reset logic
- add wait for dac valid logic
- rewrite sine concatenation on wires for different path width to
suppress warnings
- use computed atan LUT tables
The CORDIC has a selectable width range for phase and data of 8-24.
Regarding the width of phase and data, the wider they are the smaller
the precision loss when shifting but with the cost of more FPGA
utilization. The user must decide between precision and utilization.
The DDS_WD parameter is independent of CORDIC(CORDIC_DW) or
Polynomial(16bit), letting the user chose the output width.
Here we encounter two scenarios:
* DDS_DW < DDS data width - in this case, a fair rounding will be
implemented corresponding to the truncated bits
* DDS_DW > DDS data width - DDS out data left shift to get the
corresponding concatenation bits.
Update for the parametrized ad_mul module. This will scale
a selectable sine width in a multiplication module.
Rename the data and phase width parameters for legibility.
When the tool calculates the X value for different phase widths, we
get rounding errors for every width in the interval [8;24].
Depending on the width thess errors cause overflows or smaller amplitudes
of the sine waves.
The error is not linear nor proportional with the phase. To fix the issue
a simple aproximation was chosen.
Perform the shifting operation before addition/subtraction in a
rotation stage. In the previous method, the result of the arithmetic
operation was shifted and the outcome was presented to the next stage.
In this way, data connections will be reduced between pipeline stages
Add parameters:
- to select the sine generator (polynomial/CORDIC)
- to select the CORDIC data width(default 16)
Suppress the warnings generated when the DDS is disabled.
https://en.wikipedia.org/wiki/CORDIC
Configurable in/out data width (14,16,18,20);
The HDL implementation requires pipelines, resulting in a
data_width + 2 clock cycles delay between the phase input data and the
sine data. For this reason, a ddata (delay data) was propagated through
the pipeline stages to help in future use scenarios
Typically only one of the character error conditions is true at a time. And
even if multiple errors were present at the same time we'd only want to
count one error per character.
For each character track whether at least one of the monitored error
conditions is true. Then count the number of characters for which at least
one error condition occurred. And finally add that sum to the total numbers
of errors.
This results in a slightly better utilization.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
When the link is explicitly disabled through the control interface reset
the error statistics counter.
There is usually little benefit to preserving until after the link has been
disabled. If software is interested in the values it can read them before
disabling the link. Having them reset makes the behavior consistent with
all other internal state of the jesd204 RX peripheral, which is reset when
the link is disabled.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
In MM2S applications like video DMA it is useful to mark the end of the stream
with the TLAST.
The change enables the generation of the TLAST on the last beat of the
last row of the 2d transfer.
The index on MSB of addresses was set to 31,
but the width of address in the axi_dmac depends on a parameter.
The mismatch causes issues in the Xilinx simulator which does not extends the
narrower width signal with zeros, instead the wider signal gets 'Z' on its MSBs.
When the address was incremented with the stride it became 'X' due the uninitialized
MSBs.
Vivado recognises .h files as C header files,
the expected extension for Verilog Header is .vh
This causes issues in simulating block designs since these files
won't be exported for the simulation even if they are
part of the simulation fileset.
When creating a block design targeted for simulation, in the testbench
it is useful to know the parameters of the sub components (e.g DMAC)
Xilinx's way to pass the parameters to the testbench in case of it's AXI
verification IP is through package files. We will do the same for the DMAC.
The package file can be generated from template files (ttcl).
These will be added only to the simulation file set of the project and
won't affect synthesis.
This change adds a diagnostic interface to the DMAC core.
The interface exposes internal information about the core,
information which can't be exposed through AXI registers
due the latency and update rate.
Such information is the fullness of the internal buffer.
For this is exposed in bursts and is driven from the destination
clock domain, as this is reflected in its name.
The signal has a fixed size and is dimensioned by
taking in account the supported maximum number of bursts of 128.
This change adds the TLAST signal to the AXI streaming interface
of the source side for Intel targets.
Xilinx based designs already have this since the tlast is part of the
interface definition.
In order to make the signal optional and let the tool connect a
default value to the it, the USE_TLAST_SRC/DEST parameter is
added to the configuration UI. This conditions the tlast port on
the interface of the DMAC.
Xilinx handles the optional signals much better so the parameter
is not required there.
In its current implementation the DMAC requires that the length of a
transfer is aligned to the widest interface. E.g. if the widest interface
is 128 bits wide the length of the transfer needs to be a multiple of 16
bytes.
If the requested length is not aligned to the interface width it will be
rounded up.
This works fine as long as both interfaces have the same width. If they
have different widths it is possible that the length is rounded up to
different values on the source and destination side. In that case the DMA
will deadlock because the transfer lengths don't match and either not enough
of too much data is delivered from the source to the destination side.
Currently it is up to software to make sure that such an invalid
configuration is not possible.
Also enforce this requirement in the DMAC itself by setting the LSBs of the
transfer length to a fixed 1 so that the length is always aligned to the
widest interface.
Software can also use this to discover the length alignment requirement, by
first writing a zero to the length register and then reading the register
back. The LSBs of the read back value will be non-zero indicating the
alignment requirement.
In a similar way the stride needs to be aligned to the width of its
respective interface, so the generated addresses stay aligned. Enforce this
in the same way by keeping the LSBs cleared.
Increment the minor version number to reflect these changes.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The transfer abort logic in the src_axi_stream module is making some
assumptions about the internal timings of the data mover module.
Move this logic inside the data mover module. This will make it easier to
update the internal logic without having to update other modules.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The only two users of the data mover module both implement the same
sync-transfer-start logic. Move this into the data mover module to avoid
the duplicated code.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
With the recent rework there is now a fair amount of dead code in the
datamover module that is no longer used. Remove it.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Data is gated on the source side interface and not let into the pipeline if
there is no space available inside the store and forward memory.
This means whenever data is let into the pipeline space is available and
backpressure wont be asserted. Remove the backpressure signals altogether
to simplify the design.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Currently the source side of the DMAC can issue requests for up to
2*FIFO_SIZE-1 bursts even though there is only room for FIFO_SIZE bursts in
the store and forward memory.
This can problematic for memory mapped buses. If the data is not read fast
enough from the DMAC back-pressure will propagate through the whole system
memory subsystem and can cause significant performance penalty or even a
deadlock halting the whole system.
To avoid this make sure that not more that than what fits into the
store-and-forward memory is requested by throttling the request ID based
on how much room is available in the memory.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The second destination side register slice was put in place to provide
additional slack on some of the datapath control signals. It looks as if
this is no longer required for the latest version of the DMA controller.
All timing paths have sufficient margin.
So remove this extra slice register which just takes up resources and adds
pipeline latency.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Currently both the source side and the destination side interfaces employ a
beat counter to identify the last beat in a burst.
The burst memory already has an internal last signal on the destination
side. Exporting it allows the destination side interfaces to use it instead
of having to generate their own signal. This allows to eliminate the beat
counters on the destination side and simplify the data path logic.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Currently the destination side request ID is synchronized response ID from
the source side. This signal is effectively the same as the synchronized
src ID inside the burst memory. The only difference is that they might not
increment in the exact same clock cycle.
Exporting the request ID from the burst memory means we can remove the extra
synchronizer block.
This has the added bonus that the request ID will increment in the same
clock cycle as when the data becomes available from the memory.
This means we can assume that when there is a outstanding burst request
indicated via the ID that data is available from the memory and vice versa
when data is available from the memory that there is a outstanding burst
request.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Currently the DMAC uses a simple FIFO as the store-and-forward buffer. The
FIFO handshaking is beat based whereas the remainder of the DMAC is burst
based. This means that additional control signals have to be combined with
the FIFO handshaking signal to generate the external handshaking signals.
Re-work the store-and-forward buffer to utilize a BRAM that is subdivided
into N segments. Where N is the maximum number of bursts that can be stored
in the buffer and each segment has the size of the maximum burst length.
Each segment stores the data associated with one burst and even when the
burst is shorter than the maximum burst length the next burst will be
stored in the next segment.
The new store-and-forward buffer takes care of generating all the
handshaking signals. This means handshaking is generated in a central place
and does not have to be combined from multiple data-paths simplifying the
overall logic.
The new store-and-forward buffer also takes care of data width up- and
down-sizing in case that the source and sink modules have a different data
width. This tighter integration will allow future enhancements like using
asymmetric memory.
This re-work lays the foundation of future enhancements to the DMA like
support for un-aligned transfers and early transfer abort which would have
been much more difficult to implement with the previous architecture.
In addition it significantly reduces the resource utilization of the
store-and-forward buffer and allows for better timing due to reduced
combinatorial path lengths.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
There is an implicit dependency between the outgoing data stream and the
incoming response stream. The AXI specification requires that the
corresponding response is not sent before the last beat of data has been
received.
We can take advantage of this and remove the currently explicit dependency
between the data and response paths. This slightly simplifies the design.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
For the AXI streaming interfaces we need to make sure that the handshaking
rules for the external interface are met. Hence we can't just disable the
DMA and have to wait for any pending beats to complete.
For the FIFO interfaces on the other hand no such requirements exist. All
handshaking is for the internal pipeline which will be reset as a whole so
it is OK to violate the handshaking without causing any undefined behavior.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
For the memory-mapped AXI read interface the slave asserts rlast for the
last beat in a burst.
This means we don't have to count the number of beats to know when the
burst is completed but instead can use rlast. This slightly reduces the
amount of resources needed for the MM-AXI source module and given that the
beat_counter is often the bottleneck timing wise this should also improve
the timing.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
When the DMA is disabled it should gracefully shutdown and eventually end
up in an idle state. All outstanding AXI MM requests need to complete
before the DMA is fully disabled.
Add testbenches that test this for both AXI MM read and write behavior.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The DMAC allows a transfer to be aborted. When a transfer is aborted the
DMAC shuts down as fast as possible while still completing any pending
transactions as required by the protocol specifications of the port. E.g.
for AXI-MM this means to complete all outstanding bursts.
Once the DMAC has entered an idle state a special synchronization signal is
send to all modules. This synchronization signal instructs them to flush
the pipeline and remove any stale data and metadata associated with the
aborted transfer. Once all data has been flushed the DMAC enters the
shutdown state and is ready for the next transfer.
In addition each module has a reset that resets the modules state and is
used at system startup to bring them into a consistent state.
Re-work the shutdown process to instead of flushing the pipeline re-use the
startup reset signal also for shutdown.
To manage the reset signal generation introduce the reset manager module.
It contains a state machine that will assert the reset signals in the
correct order and for the appropriate duration in case of a transfer
shutdown.
The reset signal is asserted in all domains until it has been asserted for
at least 4 clock cycles in the slowest domain. This ensures that the reset
signal is not de-asserted in the faster domains before the slower domains
have had a chance to process the reset signal.
In addition the reset signal is de-asserted in the opposite direction of
the data flow. This ensures that the data sink is ready to receive data
before the data source can start sending data. This simplifies the internal
handshaking.
This approach has multiple advantages.
* Issuing a reset and removing all state takes less time than
explicitly flushing one sample per clock cycle at a time.
* It simplifies the logic in the faster clock domains at the expense of
more complicated logic in the slower control clock domain. This allows
for higher fMax on the data paths.
* Less signals to synchronize from the control domain to the data domains
The implementation of the pause mode has also slightly changed. Pause is
now a simple disable of the data domains. When the transfer is resumed
after a pause the data domains are re-enabled and continue at their
previous state.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Move the transfer logic, including the 2d module, into its own sub-module.
This allows testing of the full transfer logic independently of the
register map logic.
The top-level module now only instantiates the register map and transfer
module, but does not have any logic on its own.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The timing exceptions for the debug paths are currently a bit to broad and
can include paths that should not have an exception.
All the debug signals are coming from the i_request_arb instance, so
include that in the match to avoid false positives.
For most projects this wont have been a problem since there is usually a
fair amount of slack on the paths that were affected by this. But in
projects with high utilization this might result in undefined behavior.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Fix the read side of the CDC data FIFO. The read address generation did not
function correctly.
Redesign the read side of the FIFO, and make sure that it becomes empty after
the DMA transfer ends; and never get stock in a cyclic mode.
The dac_last signal is not used anywhere in the module. Remove it and its
synchronization registers.
Fixes the following warnings:
[Synth 8-6014] Unused sequential element dac_dlast_reg was removed. ["axi_dacfifo_rd.v":372]
[Synth 8-6014] Unused sequential element dac_dlast_m1_reg was removed. ["axi_dacfifo_rd.v":373]
[Synth 8-6014] Unused sequential element dac_dlast_m2_reg was removed. ["axi_dacfifo_rd.v":374]
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Commit bfc8ec28c3 ("util_axis_fifo: instantiate block ram in async mode")
added the read-enable (reb) signal to the ad_mem block.
It didn't update the ad_mem instance in axi_dacfifo_address_buffer.v. This
results in the read-enable of the address_buffer being tied to 0.
Fix this by connecting the same signal that increments the read address.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The DMAC implementation guarantees that the expression `dma_valid &
dma_xfer_req` is always identical to just dma_valid.
When generating the util_dacfifo dma_wren_s signal the optimizer doesn't know
of this though and hence will route both signals into the LUT that drives
the write enable for the BRAMs.
Simplify the expression by removing dma_xfer_req from it. Considering this
can be a fairly high fan-out net and is typically the bottleneck for the
util_dacfifo timing this helps to improve the timing.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Some parts of the util_cdc library rely on dead logic elimination to remove
unused logic. Unfortunately with newer Vivado versions this results in
warnings about unused sequential elements being removed. Like:
WARNING: [Synth 8-6014] Unused sequential element cdc_sync_stage1_reg was removed.
To avoid this encase the logic in generate blocks that makes sure they are
not generated when not needed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
This reverts commit 4b1d9fc86b "axi_dmac: Modified in order to avoid
vivado crash".
Vivado no longer crashes and this structure is much more efficient when it
comes to resource usage and timing. The intention here is to create a 1-bit
memory that is N entries deep and not a N bit signal.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The burst_count signal and its derived signals are not used until the
burst_count has been explicitly initialized by loading a transfer. There is
no need to have a reset.
This reduces the fan-out of the reset signal.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The data path register of the 2d_transfer module are qualified by the
corresponding valid signal. Their content is not used until they have been
explicitly initialized. There is no need to reset them explicitly.
This reduces the fan-out of the reset signal.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
There is no need to reset the data path in the address generator. The
values of the register on the data path are not used until they have been
explicitly initialized. Removing the reset simplifies the structure and
reduces the fan-out of the reset signal.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Xilinx tools don't allow to use $clog2() when computing the value of a
localparam, even though it is valid Verilog.
For this reason a parameter was used for BYTES_PER_BURST_WIDTH so far. But
that generates warnings from both Quartus and Vivado since the parameter is
not part of the parameter list.
Fix this by changing it to a localparam and computing the log2() manually.
The upper limit for the burst length is known to be 4k, so values larger
than that don't have to be supported.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
A larger store-and-forward memory provides better protection against worst
case memory interface latencies by being able to store more data before
over-/underflowing.
Based on empirical testing it was found that using a size of 4 bursts can
still result in underflows/overflows under certain conditions. These do not
happen when using a size of 8 bursts.
This change does not significantly increase resource consumption. Both on
Intel and Xilinx the block RAM has a minimum depth of 512 entries. With a
default burst length of 16 beats that allows for up to 32 bursts without
requiring additional block RAM.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The label for the store-and-forward memory size configuration option at the
moment is just "FIFO Size" and while the store-and-forward memory uses a
FIFO that is just a implementation detail.
Change the label to "Store-and-Forward Memory Size". This is more
descriptive as it references the function not the implementation.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
For correct operation the store-and-forward memory size must be a
power-of-two in the range of 2 to 32.
This is simple enough so we can list all values and let the IP Integrator
and QSYS perform proper validation of the parameter.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
This comment hasn't been true in a long long time. It does not have any
relation to the code around it anymore.
So just remove it.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
* jesd204: Add RX error statistics
Added 32 bit error counter per lane, register 0x308 + lane*0x20
On the control part added register 0x244 for performing counter reset and counter mask
Bit 0 resets the counter when set to 1
Bit 8 masks the disparity errors, when set to 1
Bit 9 masks the not in table errors when set to 1
Bit 10 masks the unexpected k errors, when set to 1
Unexpected K errors are counted when a character other than k28 is detected. The counter doesn't add errors when in CGS phase
Incremented version number
Commit e6aacd2f56 ("axi_dmac: Better support debug IDs when ID_WIDTH !=
3") managed to get the order of the IDs in the debug register wrong.
Restore the original order.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The cfg_links_disable register will mask the SYNC lines, disabled links
will always have a de-asserted SYNC (logic state HIGH).
The FSM will stay in CGS as long as there is one active link with an
asserted SYNC (logic state LOW).
Update the test bench to generate the SYNC signals in different clock
edges, so it can test all the possible scenarios.
A multi-link is a link where multiple converter devices are connected to a
single logic device (FPGA). All links involved in a multi-link are synchronous
and established at the same time. For a TX link this means that the FPGA receives
multiple SYNC signals, one for each link. The state machine of the TX link
peripheral must combine those SYNC signals into a single SYNC signal that is
asserted when either of the external SYNC signals is asserted.
Dynamic multi-link support must allow to select to which converter devices on
the multi-link the SYNC signal is propagated too. This is useful when depending
on the use case profile some converter devices are supposed to be disabled.
Add the cfg_links_disable[0x081] register for multi-link control and
propagate its value to the TX FSM.
A multi-link is a link where multiple converter devices are connected to a
single logic device (FPGA). All links involved in a multi-link are synchronous
and established at the same time. For a RX link this means that the SYNC signal
needs to be propagated from the FPGA to each converter.
Dynamic multi-link support must allow to select to which converter devices on
the multi-link the SYNC signal is propagated too. This is useful when depending
on the usecase profile some converter devices are supposed to be disabled.
Add the cfg_links_disable[0x081] register for multi-link control and
propagate its value to the RX FSM.
Split the register map code into a separate sub-module instead of having it
as part of the top-level axi_dmac.v file.
This makes it easier to component test the register map behavior
independently from the DMA transfer logic.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
In DUAL mode half of the data ports are unused and the unused inputs need
to be connected to dummy signals.
Completely hide the unused ports in DUAL mode to remove that requirement.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
When the axi_ad9144 core is configured for DUAL mode two of the four
channels are unused. But there is still some residual logic left for those
unused channels that can't be removed by the optimizer.
Completely disable the unused channels by reducing the channel and lane
count. This slightly reduces utilization.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Replace the axi_ad9152 implementation with the new generic JESD204
interface DAC core. The replacement is functionally equivalent.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Replace the axi_ad9144 implementation with the new generic JESD204
interface DAC core. The replacement is functionally equivalent.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
For most of the DACs that use JESD204 as the data transport the digital
interface is very similar. They are mainly differentiated by number of
JESD204 lanes, number of converter channels and number of bits per sample.
Currently for each supported converter there exists a converter specific
core which has the converter specific requirements hard-coded.
Introduce a new generic core that has the number of lanes, number of
channels and bits per sample as synthesis-time configurable parameters. It
can be used as a drop-in replacement for the existing converter specific
cores.
This has the advantage of a shared and reduced code base. Code improvements
will automatically be available for all converters and don't have to be
manually ported to each core individually.
It also makes it very easy to introduce support for new converters that
follow the existing schema.
Since the JESD204 framer is now procedurally generated it is also very
easy to support board or application specific requirements where the lane
to converter ratio differs from the default (E.g. use 2 lanes/2 converters
instead of 4 lanes/2 converters).
This new core is primarily based on the existing axi_ad9144.
For the time being the core is not user instantiatable and will only be
used as a based to re-implement the converter specific cores. It will be
extended in the future to allow user instantiation.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Replace the axi_ad9680 implementation with the new generic JESD204
interface ADC core. The replacement is functionally equivalent.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Replace the axi_ad9250 implementation with the new generic JESD204
interface ADC core. The replacement is functionally equivalent, except that
the converter clock ratio is now correctly reported as 2 rather than 1 as
before.
Also the adc_rst output port is removed. It is not used in any design. The
current guidelines for the reset for the JESD204 subsystem is to use an
external reset generator.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Replace the axi_ad6676 implementation with the new generic JESD204
interface ADC core. The replacement is functionally equivalent, except that
the converter clock ratio is now correctly reported as 2 rather than 1 as
before.
Also the adc_rst output port is removed. It is not used in any design. The
current guidelines for the reset for the JESD204 subsystem is to use an
external reset generator.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
For most of the ADCs that use JESD204 as the data transport the digitial
interface is very similar. They are mainly differentiated by number of
JESD204 lanes, number of converter channels and number of bits per sample.
Currently for each supported converter there exists a converter specific
core which has the converter specific requirements hard-coded.
Introduce a new generic core that has the number of lanes, number of
channels and bits per sample as synthesis-time configurable parameters. It
can be used as a drop-in replacement for the existing converter specific
cores.
This has the advantage of a shared and reduced code base. Code improvements
will automatically be available for all converters and don't have to be
manually ported to each core individually.
It also makes it very easy to introduce support for new converters that
follow the existing schema.
Since the JESD204 deframer is now procedurally generated it is also very
easy to support board or application specific requirements where the lane
to converter ratio differs from the default (E.g. use 2 lanes/2 converters
instead of 4 lanes/2 converters).
This new core is primarily based on the existing axi_ad9680.
For the time being the core is not user instantiatable and will only be
used as a based to re-implement the converter specific cores. It will be
extended in the future to allow user instantiation.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The ADC DMA will never underflow and unsurprisingly the adc_dunf signal is
never used anywhere. It is very unlikely it will ever be used, so remove
it.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The DAC DMA will never overflow and unsurprisingly the dac_dovf signal is
never used anywhere. It is very unlikely it will ever be used, so remove
it.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Some designs choose to swap the positive and negative side of the of the
JESD204 lanes. One reason for this would be because it can simplify the
PCB layout. The polarity is in most cases also only applied to a subset of
the used lanes.
Add support for this to the util_adxcvr module. This done by adding new
parameter to the modules that allows to specify a per lane polarity
inversion. Each bit in the parameter corresponds to one lane. If the bit is
set the polarity is inverted for his lane. E.g. setting the parameter to
0xc will invert the 3rd and 4th lane.
The setting is forwarded to the Xilinx transceiver for the corresponding
lane.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Some designs choose to swap the positive and negative side of the of the
JESD204 lanes. One reason for this would be because it can simplify the
PCB layout. The polarity is in most cases also only applied to a subset of
the used lanes.
Add support for this to the adi_jesd204 and jesd204_phy for Altera modules.
This done by adding new parameter to the modules that allows to specify a
per lane polarity inversion. Each bit in the parameter corresponds to one
lane. If the bit is set the polarity is inverted for his lane. E.g. setting
the parameter to 0xc will invert the 3rd and 4th lane.
The setting is forwarded depending on whether soft or hard PCS is used to
either the soft PCS module or the transceiver block itself.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Add a parameter to the soft_pcs_loopback_tb that allows to test whether the
soft PCS modules work correctly when the lane polarity is inverted.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Some designs choose to swap the positive and negative side of the of the
JESD204 lanes. One reason for this would be because it can simplify the
PCB layout.
To support this add a parameter to the jesd204_soft_pcs_tx module that
allows to specify whether the lane polarity is inverted or not.
The way the polarity inversion is implemented is for free since it just
inverts the output mapping of the 8b10b encoder LUT tables.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Some designs choose to swap the positive and negative side of the of the
JESD204 lanes. One reason for this would be because it can simplify the
PCB layout.
To support this add a parameter to the jesd204_soft_pcs_rx module that
allows to specify whether the lane polarity is inverted or not.
The way the polarity inversion is implemented it is for free since it will
only invert the input mapping of the 8b10b decoder LUT tables.
The pattern align module does not care whether the polarity is inverted or
not since the pattern align symbols look the same in both cases.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
When the source and destination bus widths don't match a resize block is
inserted on the side of the narrower bus. This resize block can contain
partial data.
To ensure that there is no residual partial data is left in the resize
block after a transfer shutdown the resize block is reset when the DMA is
disabled.
Currently this is implemented by tying the reset signal of the resize block
to the enable signal of the DMA. This enable signal is only a indicator
though that the DMA should shutdown. For a proper shutdown outstanding
transactions still need to be completed.
The data that is in the resize block might be required to complete those
transactions. So performing the reset when the enable signal goes low can
lead to a situation where the DMA tries to complete a transaction but can't
do it because the data required to do so has been erased by resetting the
resize block. This leads to a dead lock and the system has to be rebooted
to recover from it.
To solve this use the sync_id signal to reset the resize block. The sync_id
signal will only be asserted when both the destination and source side
module have indicated that they are ready to be reset and there are no more
pending transactions.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The MAX_BYTES_PER_BURST option allows to configure the maximum bytes that
are part of a burst. This can be an arbitrary value.
At the same time there is a limit of how many bytes can be supported by the
memory buses. A AXI3 interface supports a maximum of 16 beats per burst
and a AXI4 interface supports a maximum of 256 beats per burst.
At the moment the it is possible to specify a MAX_BYTES_PER_BURST value
that exceeds what can be supported by the AXI memory-mapped bus. If that is
the case undefined behavior will occur and the DMAC will function
incorrectly.
To avoid this make sure that the MAX_BYTES_PER_BURST value does not exceed
the maximum that can be supported by the interfaces.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The width of the AXI burst length field depends on the AXI standard
version. For AXI3 the width is 4 bits allowing a maximum burst length of 16
beats, for AXI4 it is 8 bits wide allowing a maximum burst length of 256
beats.
At the moment the width of the length signals are determined by type of the
source AXI interface, even if the source interface type is not AXI. This
means if the source interface is set to AXI3 and the destination interface
is set to AXI4 the internal width of the signal for all interfaces will be
4 bits. This leads to a truncation of the destination bus length field,
which is supposed to be 8 bits.
If burst are generated that are longer than 16 beats the upper bits of the
length signal will be truncated. The result of this will be that the
external AXI slave interface (e.g. the DDR memory) and the internal logic
in the DMA disagree about burst length. The DMA will eventually lock up
when its internal buffers are full.
To avoid this issue have different configuration parameters for the source
and destination interface that configure the AXI bus length field width.
This way one of the interfaces can be configured for AXI3 and the other for
AXI4 without interfering with each other.
Fixes: commit 495d2f3056 ("axi_dmac: Propagate awlen/arlen width through the core")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
This commit fixes the following warning from the IP packaging flow:
"[IP_Flow 19-801] The last file in file group "Synthesis" should be an HDL file:
"axi_dmac_constr.ttcl". During generation the IP Flow uses the last file to
determine library and other information when generating the top wrapper file.
If possible, please make sure that non-HDL files are located earlier in the list
of files for this file group."
Having the ttcl or other non HDL file at the end of the file group causes issues
when the project preferred language is set to VHDL. Since the synthesis file group
is set to "xilinx_anylanguagesynthesis" the tool tries to guess the type of wrapper
to be generated for that IP based on the last file from the file group.
If the file is non HDL then he defaults to the preferred language (this case VHDL)
Due some issue when the tool tries to create a VHDL wrapper for an IP that has
a Verilog top file with boolean parameters set from the IP packager he fails.
After we reorder the files after each non HDL file addition
he will create a correct Verilog wrapper for it with all parameters
which can be integrated in a VHDL system top file without issues.
Fixes the following warning:
[BD 41-1731] Type mismatch between connected pins: /util_fmcomms11_xcvr/tx_out_clk_0(clk) and /axi_ad9162_core/tx_clk(undef)
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
When the DMAC is used in async clock domains the data FIFO instantiate
an ad_mem component to handle properly the clock crossing.
For Intel, this mode is used only in FMCJESDADC1 designs but without this
an error could appear in other projects too if the user reconfigures the core.
The set_false_path constraint targeted to the *ram* cells of the dmac
matched several intra clock domain paths where the timing analysis got
ignored resulting in intermitent data integrity issues.
Exposed AXI3 interface on the Intel version of the IP for UI and feature consistency.
Some of the signals that are defined as optional in the AMBA standard
are marked as mandatory in Qsys in case of AXI3. Because of this such signals
were added to the interface of the DMAC and driven with default values.
For Xilinx in order to keep existing behavior the newly added signals
are hidden from the interface.
New parameters are added to define the width of the AXI transaction IDs;
these are hidden from the UI; We can add them to the UI if the fixed size
of the IDs will cause port incompatibility issues.
Fix the following warnings that are generated by Quartus:
Warning (10230): Verilog HDL assignment warning at ad_sysref_gen.v(68): truncated value with size 32 to match size of target (8)
No functional changes.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>