* ng-ultra: new architecture
* Implementation as in D2 deliverable
* Support for nxdesignsuite-24.0.0.0-20240429T102300
* Save memory by directly outputing json
* Add support for bidirectional IOs
* cleanup
* Create BFRs properly
* Add IOM insertion
* Cleanup
* Block certain pips depending of DDFR mode
* Add LUT bypass to improve routability
* Add bypass for CSC mode of GCK
* Fix IOM case
* Initial memory support
* Better RF/XRF handling
* fix
* RF placement and legalization
* Disconnect non available ports for NX_RAM
* cleanup
* Add RFB/RAM context support for latest release
* Remove ports that must not be used
* Proper port used only on RFB
* Add structure for clock sinks
* Use cell type where applicable
* Add clock sinks for other cell types
* Validation check fixes
* Commented too restrictive placement
* Added more crossbar wire type
* Hande IO termination input
* Fail early due to NX tools limitation for now
* Validations and fixes for RAM I/Os
* Fix for latest version of tools
* Use ctx->idf where applicable
* warn if RAM ports are not actually used
* Fix IOM packing
* Fix CY packing
* Change how constants are handled on CY
* Post placement optimization for CY
* Address comments for PR
* pack and export GCK, WFG and PLL
* Cover more global routing cases
* Constraing to location if provided
* Place at LOC
* Pack and export DSP
* wip
* wip
* notes
* wip
* wip
* Validate DSPs
* DSP cascading
* Check mandatory parameters for DSP
* existing gck
* wip
* export all the rest for bitstream
* CDC packing
* add more sinks
* place FIFO
* map rest of FIFO ports
* enable pll by default
* cleanup
* Initial XLUT support
* Fix statistics
* Properly duplicate GCKs
* RRSTO and WRSTO are not used on XFIFO
* Fix for latest version of JSON format
* Implement GCK limitations
* cleanup
* cleanup
* Add more signals and use lowskew name
* cleanup code a bit
* Fix wfb
* detect cascaded GCKs
* Handle DFR
* Route dfr clock properly
* Cleanup
* Cleanup bitstream code
* Review issues addressed
* Move helper routines
* Expose private members for unit tests
* cleanup
* remove scale factor
* make all location helper arrays static
* Addressed review comments
* Support post-routing CSC and SCC
* Support NX_BFF
* Place CSS and SCC only on allowed locations
* Support latest Impulse
* ng_ultra: Expand bounding box further for left-edge IO
Signed-off-by: gatecat <gatecat@ds0.me>
* Export all IO parameters in bitstream
* Handle new CSV order or parameters and additional validation
* Add some more undocumented values for CSV
* Support for old and new CSV formats
* Initial DDFR support
* Display warning message once per file
* Address review issues
* Fix crash on memory access
* Make boundbox fit NG-Ultra internal design
* Update attributes after dff rewrite
* Implement basic NG-Ultra LUT-DFF unit tests
* Always use first seen xbar input
Signed-off-by: gatecat <gatecat@ds0.me>
* Simplified crossbar pip detection
* Change order to prevent issues with some unconnected constants
* Pack LUT and multiple DFF in stripe
* Place DFF chains
* Improve large DFF chains
* Rename to pack_dff_chains
* Better use XLUTs when possible
* pack output DFF together with XLUT
* option to disable XLUT optimiziations
* Make more optimizations optional
* fix to use pre-increment
* GCK for lowskew signals
* Bugfix for nets that are not part of lowskew network
* Fix bitstream export for PLL cell
* Remove separate route lowskew
* Allow WFG mode 2
* Merge inverter into GCK
* Add CSC per TILE when needed
* Improve reusage of existing cell for CSC
* Take preferred CSC
* Cleanup
* When in place CSC size not important
* Cleanup
* Reset and Load restriction
* make csc optimisation optional
* Proper count for IO resources
* Detect when there is no next cell for DSP chain
* Do not incorporate loops in XLUT
* Check if output exists
* Update copyright for delivery
* Make building NG-Ultra chip database optional, follow filename convention
* Ported drawing code to new API
* Update expandBoundingBox for NG-Ultra
* Copyright and license update
* Add README information
* cleanup and constids
* Using ctx->idf where applicable
* remove if_using_basecluster
* refactor extra data usage
* refactor to use create_cell_ptr only
* optimized getCSC
* optimize critical path a bit
* clangformat
* disable clangformat where applicable
---------
Signed-off-by: gatecat <gatecat@ds0.me>
Co-authored-by: Lofty <dan.ravensloft@gmail.com>
Co-authored-by: gatecat <gatecat@ds0.me>
* Gowin. Add IODELAY.
Input/Output delay (IODELAY) is programmable delay uint in IO block.
This delay line is enabled before/after the IO pad and allows the signal
to be delayed statically or dynamically during 0-127 stages each lasting
from 18 to 30 picoseconds depending on the chip family.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Replacing assertions with log_error.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Add sampling part to IO blocks (input only). This edge detector will
allow to dynamically adjust DDR decoding window in the future.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Extend Himbaechel API with gfx drawing methods
* Add bel drawing in example uarch
* changed API and added tile wire id in db
* extend API so we can distinguish CLK wires
* added bit more wires
* less horrid way of handling gfx ids
* loop wire range
* removed not needed brackets
* bump database version to 5
* Removed not used GfxFlags
* Gowin. FFs placement.
* Allow clusters to be created from FFs and LUTs;
* Immediately create pass-through LUTs from free LUTs adjacent to FF - at the same time ensure alternating use of LUT inputs;
* In case of constant networks, such pass-through LUTs are disconnected from networks altogether;
* Allow FF to be placed directly into SSRAM slides - this is useful when using synchronous reading.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Fix aux name creation
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Use I3 for pass-trough LUTs
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Fix the port check for connectivity.
What happens is that it's not enough to check for a network, we also
need to make sure that the network is functional: has src and sinks.
And the style edits - they get automatically when I make sure to run
clang-format10.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Fix the port check for connectivity.
What happens is that it's not enough to check for a network, we also
need to make sure that the network is functional: has src and sinks
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Implement the EMCU primitive.
Add support for the GW1NSR-4C's embedded Cortex-M3 processor. Since it
uses flash in its own way, we disable additional flash processing for
this case.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Fix merge.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Add DHCEN primitive.
This primitive allows you to dynamically turn off and turn on the
networks of high-speed clocks.
This is done tracking the routes to the sinks and if the route passes
through a special HCLK MUX (this may be the input MUX or the output MUX,
as well as the interbank MUX), then the control signal of this MUX is
used.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Change the DHCEN binding
Use the entire PIP instead of a wire - avoids normalisation and may also
be useful in the future when calculating clock stuff.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
Some Clocks PIPS were not created due to a check for the presence of a
delay class, now all wires are attributed to the class so that there is
no longer any need for this check.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Implement the UserFlash primitive
Some Gowin chips have embedded flash memory accessible from the fabric.
Here we add primitives that allow access to this memory.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Gowin. Fix cell creation
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
---------
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
* Himbaechel Gowin: Add support for CLKDIV and CLKDIV2
* Himbaechel Gowin: Add support for CLKDIV and CLKDIV2
* Gowin Himbaechel: HCLK Bug fixes and corrections
DQCE and DCS primitives are added.
DQCE allows the internal logic to enable or disable the clock network in
the quadrant. When clock network is disabled, all logic drivern by this
clock is no longer toggled, thus reducing the total power consumtion of
the device.
DCS allows you to select one of four sources for two clock wires (6 and 7).
Wires 6 and 7 have not been used up to this point.
Since "hardware" primitives operate strictly in their own quadrants,
user-specified primitives are converted into one or more "hardware"
primitives as needed.
Also:
- minor edits to make the most of helper functions like connectPorts()
- when creating bases, the corresponding constants are assigned to the
VCC and GND wires, but for now huge nodes are used because, for an
unknown reason, the constants mechanism makes large examples
inoperable. So for now we remain on the nodes.
Compatible with older Apicula databases.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
It was not taken into account that there are only 6 ALUs per cell. As a
result, on complex designs where ALUs and LUT-based memory are involved
and there are many LUTs (like in the RISCV emulator), there were
sometimes false positives about placement conflicts.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
The statement in the Gowin documentation that in the reading mode
"READ_MODE=0" the output register is not used and the OCE signal is
ignored is not confirmed by practice - if the OCE was left unconnected
or connected to the constant network, then a change in output data was
observed even with CE=0, as well as the absence of such at CE=1.
Synchronizing CE and OCE helps and the memory works properly in complex
systems such as RISC-V emulation and i8080 emulation (with 32K RAM and
16K BSRAM based ROM), but there is no theoretical basis for this fix, so
it is a hack.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
For pROM(X9) primitives in images generated by Gowin IDE, there is an
interesting recommunication of inputs depending on the data bit depth.
For example, in some cases, a high logical level may be applied to the
Write Enable input, which, let’s say, is not entirely usual for Read
Only memory.
Here we will do similar manipulations.
In addition, several minor bug fixes are included:
- Fixed bit numbering for non-X9 series primitives.
- Fixed decoder generation for BLKSEL - do not assume unused inputs are
connected to GND.
- Use default values for BSRAM parameters - don't assume their
mandatory presence.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
As the board on the GW1N-1 chip becomes a rarity, its replacement is the
Tangnano1k board with the GW1NZ-1 chip. This chip has a unique mechanism
for turning off power to important things such as OSC, PLL, etc.
Here we introduce a primitive that allows energy saving to be controlled
dynamically.
We also bring the names of some functions to uniformity.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
In the images generated by Gowin IDE, the signals for dynamic BSRAM
block selection (BLKSEL[2:0]) are not always connected directly to the
ports - some chips add LUT2, LUT3 or LUT4 to turn these signals into
Clock Enable. Apparently there are chips with an error in the operation
of these ports.
Here we make such a decoder instead of using ports directly.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>
It seems that the internal registers on the BSRAM output pins in
READ_MODE=1'b1 (pipeline) mode do not function properly because in the
images generated by Gowin IDE an external register is added to each pin,
and the BSRAM itself switches to READ_MODE=1'b0 (bypass) mode .
This is observed on Tangnano9k and Tangnano20k boards.
Here we repeat this fix.
Signed-off-by: YRabbit <rabbit@yrabbit.cyou>