up_rdata is qualified by the up_rack signal. There is no need to reset it
since by the time the signal is read the reset value has already been
overwritten anyway.
Also gate the up_rdata registers if no read operation is in progress. In
this case any changes would be ignored anyway.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The axi_adc_trigger does not use the full width of the AXI interface
address. It only responds to register access in the first 32 registers.
Reduce the size of the AXI address to 7 bit accordingly. This allows the
scripts to correctly infer the internal register map size which will cause
the interconnect to filter out access to these unused register.
This slightly reduces utilization by getting rid of some pipeline
registers.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The axi_adc_decimate does not use the full width of the AXI interface
address. It only responds to register access in the first 32 registers.
Reduce the size of the AXI address to 7 bit accordingly. This allows the
scripts to correctly infer the internal register map size which will cause
the interconnect to filter out access to these unused register.
This slightly reduces utilization by getting rid of some pipeline
registers.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The axi_dac_interpolate does not use the full width of the AXI interface
address. It only responds to register access in the first 32 registers.
Reduce the size of the AXI address to 7 bit accordingly. This allows the
scripts to correctly infer the internal register map size which will cause
the interconnect to filter out access to these unused register.
This slightly reduces utilization by getting rid of some pipeline
registers.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The axi_logic_analyzer does not use the full width of the AXI interface
address. It only responds to register access in the first 32 registers.
Reduce the size of the AXI address to 7 bit accordingly. This allows the
scripts to correctly infer the internal register map size which will cause
the interconnect to filter out access to these unused register.
This slightly reduces utilization by getting rid of some pipeline
registers.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The AXI DMAC peripheral only uses 11-bit of the register map interface
address. Reducing the signal width to this value allows the scripts to
correctly infer the size of the register map.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Not all peripherals need the full address space. To be able to infer the
size of the address space of a peripheral allow the size of the AXI address
signals to be configurable rather than hardcoding its width to 32 bit.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Currently the register map range of a peripheral is hardcoded to 64k. Not
all peripherals need that much space though and reducing the size of the
address can reduce the amount of logic required, both in the interconnect
as well as in the peripheral.
Let adi_ip_properties() infer the size of the register map from the number
of bits of the address when creating the register map.
For backwards compatibility limit the register map size to 64k since
currently peripherals have a address width of 32 bits, event if they use
less.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Currently the AXI address width of the DMA is always 32-bit. But not all
address spaces are so large that they require 32-bit to address all memory.
Extract the size of the address space that the DMA is connected too and
configure reduce the address size to the minimum required to address the
full address space.
This slightly reduces utilization.
If no mapped address space can be found the default of 32 bits is used for
the address.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The delay_clk is only used internally when the IODELAYs are enabled. This
means the port has no function when the IODELAYs are disabled so hide the
port in that case.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Typically when a port has a enablement dependency it also should have a
tie-off value to the port is connected to when disabled.
Make it possible to specify this tie-off value when calling
adi_set_ports_dependency().
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
At the moment the PS7 is using three PLLs to generate its clocking tree.
One for the DDR, one for the ARM and one for the IO. This allows to run all
components at their respective maximum clock and extract maximum
performance from all components.
With some slight modifications it is possible to trade maximum performance
for a reduction in power consumption by using the same PLL for all three
sets of components and disabling the other two PLLs.
The CPU is now running at 500MHz rather than 666MHz and the DDR memory at
500MHz rather than 533MHz. This reduces power consumption by ~125mW.
This is OK since neither of them is a bottleneck for overall system
performance.
In addition software will downclock the CPU to 250MHz when full performance
is not required.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The 200 MHz clock was only used as the IODELAY controller clock. Since the
design does not use any IODELAYs anymore this clock can be removed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The output data mux is used to bypass the filter when it is not used. Which
setting is used for the mux depends on the 3-bit filter_mask signal.
Registering the control logic into a single bit signal reduces the amount
of routing resources required. Since changing the filter_mask settings is
asynchronous to the processing anyway the extra clock cycle delay
introduced by this change does not affect behaviour.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Move the processing pipeline of the axi_adc_decimate core to its own
sub-module. This makes it easier to simulate the processing independent of
the register map.
Also since the filter is two instances of the same logic, one for each
channel, let the new sub-module model one channel and instantiate it twice.
This allows to change the implementation without having to change the same
code twice.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The output data of the decimation block is 16-bit signed. Properly sign
extend the 12-bit input signal when the filter is bypassed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The minimum number of bits required for the adders in a CIC filter depends
on the decimation rate. Higher decimation factors require more bits. This
means for a multirate filter the size of the logic structures is determined
by the highest supported rate.
The current implementation of the filter always uses all bits of the
structure to compute the results, that means even when running with the
lowest decimation factor all the bits that are required for the highest
decimation factor are used. This will work fine as additional bits do not
affect the output of the filter.
This patch implements dynamic partial gating of the filter structure based
on the selected decimation factor. Bits that are not required for a certain
rates are gated and the carry bits are masked from propagating through the
adder chain. This results in significant power savings at smaller
decimation factors.
This means that the filter itself is now using more power the higher the
decimation rate. But this is offset by the reduced data output rate running
subsequent processing stages at a lower rate and reducing power consumption
there. This results in a more or less flat power profile regardless of
decimation factor.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Allow to split a CIC int or comb block into multiple stages and be able to
dynamically gate some of the stages. Also prevent carry propagation in
gated stages to keep the adder output constant.
This is useful for multi-rate filter where not all bits are needed all the
time.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The minimum decimation rate of the CIC block is five, this means data
arrives at the FIR filter at most every five clock cycles. The decimation
rate of the filter is two so the filter produces an output at most every
ten clock cycles. This allows for ten clock cycles to compute the result.
The current implementation of the filter uses a fully pipelined
architecture with one multiplier for each coefficient. Which then do work
for one clock cycle and sit idle for the next nine clock cycles.
Rework the filter to be sequential reducing the number of required
multipliers to one. In addition exploit the symmetric structure of the
filter to make use of the preadder reducing the required multiply
operations by two.
This significantly reduces the logic utilization of the filter as well as
moderately reduces power consumption.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The minimum decimation of the CIC block is 5. This means new data arrives
at the comb stages at most every 5 clock cycles. Rather than letting the
logic sit idle during those 4 extra cycles use it to sequentially process
the comb stages of the filter. This reduces the logic utilization of the
filter by quite a bit.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The output data mux is used to bypass the filter when it is not used. Which
setting is used for the mux depends on the 3-bit filter_mask signal.
Registering the control logic into a single bit signal reduces the amount
of routing resources required. Since changing the filter_mask settings is
asynchronous to the processing anyway the extra clock cycle delay
introduced by this change does not affect behaviour.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Re-implement the CIC using the basic building blocks from the util_cic
library.
This new implementation is structurally equivalent to the previous version,
but will be used as a platform for implementing changes that will improve
area and power consumtion of the filter
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Move the processing pipeline of the axi_adc_decimate core to its own
sub-module. This makes it easier to simulate the processing independent of
the register map.
The debug register logic for the DMA take up a fair amount of resources.
Disabling them frees up space in the FPGA and also helps a bit with power.
Since those registers are mainly useful in development and not so much in
production the change shouldn't have any visible external effects.
It is possible to re-enable the debug registers by setting DEBUG_BUILD=1.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The debug registers are useful during development but are rarely used in a
production design. Add a option that allows to disable them, this reduces
the resource utilization of the DMAC.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The global reset signals are only asserted for a short moment during system
startup and deasserted during normal operation, which is the case we care
about for power analysis. Giving them a static switching probability
indicating that they are always de-asserted will yield better results for
power analysis.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The RX datapath has a lot of things (IQ correction, DC filter, ...) that
take up a lot of space which are all not really needed in this project. So
disable the RX datapath.
It was previously enabled because the ad9963 core did not perform
sign-extension on the ADC data signal when the datapath was disabled. But
this has now been addressed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Currently the BRAM and data registers in the util_axis_data are ungated
when the FIFO is ready to receive data. This good for high-performance
since it reduces the number of control signals. But it is bad from a power
point of view since it causes additional reads and writes.
Change the core gate the BRAM and data register if either the consumer is
not ready to accept data or the producer has no data to offer.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>