diff --git a/docs/library/spi_engine/index.rst b/docs/library/spi_engine/index.rst
index 88eba154c..b07136eb0 100644
--- a/docs/library/spi_engine/index.rst
+++ b/docs/library/spi_engine/index.rst
@@ -14,7 +14,9 @@ SPI Engine
    Offload Control Interface<offload-control-interface>
    SPI Bus Interface<spi-bus-interface>
    Instruction Set Specification<instruction-format>
+   Pipeline Delays<pipeline-delays>
    Tutorial - PulSAR ADC<tutorial>
+   
 
 SPI Engine is a highly flexible and powerful SPI controller framework.
 It consist out of multiple sub-modules which communicate over well defined
@@ -65,7 +67,7 @@ Related IP Cores
 --------------------------------------------------------------------------------
 
 This list contains cores that are not part of the core SPI engine framework but
-make use of its interfaces and are intend to be used together with the SPI engine
+make use of its interfaces and are intended to be used together with the SPI engine
 framework.
 
 * :dokuwiki:`util-sigma-delta-spi <resources/fpga/peripherals/util_sigma_delta_spi>`:
@@ -93,4 +95,5 @@ Additional Resources
 --------------------------------------------------------------------------------
 
 * :download:`Presentation: SPI Engine Design Philosophy <https://wiki.analog.com/_media/resources/fpga/peripherals/spi-engine3.pdf>`.
+* :ref:`spi_engine pipeline-delays`
 * :ref:`spi_engine tutorial`.
diff --git a/docs/library/spi_engine/instruction-format.rst b/docs/library/spi_engine/instruction-format.rst
index fdf05340c..8681fe242 100644
--- a/docs/library/spi_engine/instruction-format.rst
+++ b/docs/library/spi_engine/instruction-format.rst
@@ -62,15 +62,17 @@ SPI Engine execution module.
 
 Before and after the update is performed the execution module is paused for the
 specified delay. The length of the delay depends on the module clock frequency,
-the setting of the prescaler register and the t parameter of the instruction.
-This delay is inserted before and after the update of the chip-select signal,
-so the total execution time of the chip-select
-instruction is twice the delay, plus a fixed 2 clock cycles (fast clock, not prescaled)
-for the internal logic.
+the setting of the prescaler register and the parameter :math:`t` of the
+instruction. This delay is inserted before and after the update of the
+chip-select signal, so the total execution time of the chip-select instruction
+is twice the delay, with an added fixed 2 clock cycles (fast clock, not
+prescaled) before for the internal logic.
 
 .. math::
 
-   delay = t * \frac{(div + 1)*2}{f_{clk}}
+   delay_{before} = 2+ t * \frac{(div + 1)*2}{f_{clk}}
+
+   delay_{after}  = t * \frac{(div + 1)*2}{f_{clk}}
 
 .. list-table::
    :widths: 10 15 75
@@ -128,7 +130,8 @@ Synchronize Instruction
 The synchronize instruction generates a synchronization event on the SYNC output
 stream. This can be used to monitor the progress of the command stream. The
 synchronize instruction is also used by the :ref:`spi_engine interconnect`
-module to identify the end of a transaction and re-start the arbitration process.
+module to identify the end of a transaction and re-start the arbitration
+process.
 
 .. list-table::
    :widths: 10 15 75
diff --git a/docs/library/spi_engine/pipeline-delays.rst b/docs/library/spi_engine/pipeline-delays.rst
new file mode 100644
index 000000000..8f4649b98
--- /dev/null
+++ b/docs/library/spi_engine/pipeline-delays.rst
@@ -0,0 +1,140 @@
+.. _spi_engine pipeline-delays:
+
+SPI Engine Pipeline Delays
+================================================================================
+
+The SPI Engine implementation imposes certain constraints on the timing of
+different commands. Each instruction requires some number of cycles to execute,
+which may depend on the instruction parameters. Additionally, there are delays
+associated with the internal architecture of the SPI Engine, which become
+relevant unless we are using the Offload functionality.
+
+.. _instruction_execution_times:
+
+Instruction Execution
+--------------------------------------------------------------------------------
+
+Every instruction requires 1 cycle minimum for communication between the offload
+module and the execution module. Additionally, the Chip Select, Sleep, Transfer
+and Sync instructions require another cycle for checking the idle condition
+(total 2 fixed delay for these). 
+
+The exact values are, counting from the execution 
+module:
+
+.. list-table::
+   :widths: 10 80
+   :header-rows: 1
+
+   * - Instruction
+     - Cycles
+   * - Configuration Write
+     - 1 cycle.
+   * - Sync
+     - 2 cycles.
+   * - Chip-select
+     - :math:`2+ 2*t*((div+1)*2)`. Where :math:`t` is the chip select delay
+       parameter on the instruction, and :math:`div` is the prescaler register
+       value. The CS value change happens after the first
+       :math:`2+t*((div+1)*2)` cycles.
+   * - Sleep
+     - :math:`2 + t*((div+1)*2)`. Where :math:`t` is the sleep delay parameter
+       on the instruction, and :math:`div` is the prescaler register value.
+   * - Transfer
+     - 2 cycles, plus the transfer time.
+
+Counting from the execution module means that these values are useful for
+calculating the delays on the offload case (simply add up each instruction
+execution time). For other cases, the detailed delays of the architecture are
+needed.
+
+.. _detailed_delays:
+
+Detailed Delays
+--------------------------------------------------------------------------------
+
+This section lists the delays inside the SPI Engine architecture. To make use of
+this information, one needs some degree of familiarity with the hdl
+implementation (knowledge of the sub-modules and the way they communicate). 
+
+See also: :ref:`spi_engine control-interface`, 
+:ref:`spi_engine offload-control-interface`.
+
+Offload Module
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+All the delays measured for this module are in terms of SPI Engine clock cycles.
+
+* Trigger input to command valid output: 1 + 1-2(from a 2FF CDC, 0 if not
+  asynchronous) cycles.
+* Trigger in to sdo_data_valid: 1 + 1-2(from a 2FF CDC, 0 if not asynchronous)
+  cycles.  
+* Maximum command throughput: 1 command per cycle.
+* sdi_data_valid to offload_sdi_valid: 0 cycles.
+  
+Interconnect Module
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+All the delays measured for this module are in terms of SPI Engine clock cycles.
+
+The interconnect will only accept one master at a time, and will wait until a
+sync handshake back to the master is completed to free the channel.
+
+* Command valid input to command valid output (s0/s1 to m): 1 cycle if idle, 0
+  if already "owned" by the source (s0 or s1).
+* Sync valid from m side to s0/s1 sync valid (back to originating master): 0
+  cycle.
+* Sync ready to idle (delay after finishing transaction response): 1 cycle
+* Thus, 2 cycles per command minimum if changing masters, 3 if accounting for
+  sync (this is the worst case).
+* 1 cycle per command (can accept back to back) if from same master.
+* Thus, :math:`2+N_{cmd}` minimum cycles per :math:`N_{cmd}` "burst" from same
+  source.
+* s0/s1_sdo_valid to m_sdo_valid:  0 if already "owned" by the source (s0 or
+  s1). Otherwise has to wait until s0/s1 owns the channel.
+* m_sdi_valid to s0/s1_sdi_valid:  0 if already "owned" by the sink (s0 or s1).
+  Otherwise has to wait until s0/s1 owns the channel.
+
+Execution Module 
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+All the delays measured for this module are in terms of SPI Engine clock cycles.
+See above: :ref:`instruction_execution_times`
+
+* Every instruction requires 1 cycle minimum for communication between the
+  Offload Module and the Execution Module. Additionally, the Chip Select, Sleep,
+  Transfer and Sync instructions require another cycle for checking the idle
+  condition (total 2 fixed delay for these). 
+  
+  * Chip Select, Sleep and Transfer have additional cycle requirements due to
+    intentional delays in execution. This is better detailed at
+    :ref:`instruction_execution_times`.
+
+* SDI data delay: 0 cycles (sdi_data_valid arrives at the same cycle as the
+  Transfer instruction finishes and the next command is accepted).
+
+AXI Module
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+* AXI transaction to take effect internally: 1 (AXI clock).
+
+  * Meaning: if counting delay to other parts of the design (e.g. command fifo),
+    this is the AXI delay. Other AXI delays affect only AXI throughput, creating
+    backpressure for the AXI master.
+
+* Throughput: 4 cycles (AXI clock) per transaction.
+* Command FIFO delay: depends on parametrization:
+  
+  * Synchronous, 1 deep: 1 clk (AXI clock = SPI Engine clock).
+  * Asynchronous, 1 deep: 1 (AXI clock), + 1-2 (SPI Engine clock) (2FF CDC)
+    input to output; + 1-2 (AXI clock) (2FF CDC) until ready to accept next.
+  * Asynchronous, true FIFO: 2 (AXI clock) (mem write + bin2gray addr), + 1-2
+    (SPI Engine clock) (2FF CDC), + 2 (SPI Engine clock) (gray2bin + valid) .
+  
+* AXI transaction start to command valid (total for async FIFO case): 3 AXI
+  clock + 3-4 SPI Engine clock.
+* SDO Data FIFO delay: same as Command FIFO.
+* SDI Data FIFO delay: depends on parametrization:
+  
+  * Synchronous, 1 deep: 1 clk (AXI clock = SPI Engine clock).
+  * Asynchronous, 1 deep: 1 (SPI Engine clock), + 1-2 (AXI clock) (2FF CDC)
+    input to output; + 1-2 (SPI Engine clock) (2FF CDC) until ready to accept
+    next.
+  * Asynchronous, true FIFO: 2 (SPI Engine clock) (mem write + bin2gray addr), +
+    1-2 (AXI clock) (2FF CDC), + 2 (AXI clock) (gray2bin + valid) .
\ No newline at end of file