Rev. F
|
Page 5 of 64
|
October 2013
SIMD Computational Engine
The processors contain two computational processing elements
that operate as a single-instruction, multiple-data (SIMD)
engine. The processing elements are referred to as PEX and PEY
and each contains an ALU, multiplier, shifter, and register file.
PEX is always active, and PEY may be enabled by setting the
PEYEN mode bit in the MODE1 register. When this mode is
enabled, the same instruction is executed in both processing ele-
ments, but each processing element operates on different data.
This architecture is efficient at executing math intensive DSP
algorithms.
Entering SIMD mode also has an effect on the way data is trans-
ferred between memory and the processing elements. When in
SIMD mode, twice the data bandwidth is required to sustain
computational operation in the processing elements. Because of
this requirement, entering SIMD mode also doubles the band-
width between memory and the processing elements. When
using the DAGs to transfer data in SIMD mode, two data values
are transferred with each access of memory or the register file.
Independent, Parallel Computation Units
Within each processing element is a set of computational units.
The computational units consist of an arithmetic/logic unit
(ALU), multiplier, and shifter. These units perform all opera-
tions in a single cycle. The three units within each processing
element are arranged in parallel, maximizing computational
throughput. Single multifunction instructions execute parallel
ALU and multiplier operations. In SIMD mode, the parallel
ALU and multiplier operations occur in both processing
elements. These computation units support IEEE 32-bit single-
precision floating-point, 40-bit extended precision floating-
point, and 32-bit fixed-point data formats.
Data Register File
A general-purpose data register file is contained in each pro-
cessing element. The register files transfer data between the
computation units and the data buses, and store intermediate
results. These 10-port, 32-register (16 primary, 16 secondary)
register files, combined with the ADSP-2136x enhanced Har-
vard architecture, allow unconstrained data flow between
computation units and internal memory. The registers in PEX
are referred to as R0–R15 and in PEY as S0–S15.
Context Switch
Many of the processor’s registers have secondary registers that
can be activated during interrupt servicing for a fast context
switch. The data registers in the register file, the DAG registers,
and the multiplier result registers all have secondary registers.
The primary registers are active at reset, while the secondary
registers are activated by control bits in a mode control register.
Universal Registers
These registers can be used for general-purpose tasks. The
USTAT (4) registers allow easy bit manipulations (Set, Clear,
Toggle, Test, XOR) for all system registers (control/status) of
the core.
The data bus exchange register (PX) permits data to be passed
between the 64-bit PM data bus and the 64-bit DM data bus, or
between the 40-bit register file and the PM data bus. These reg-
isters contain hardware to handle the data width difference.
Timer
A core timer that can generate periodic software Interrupts. The
core timer can be configured to use FLAG3 as a timer expired
signal.
Single-Cycle Fetch of Instruction and Four Operands
The ADSP-21367/ADSP-21368/ADSP-21369 feature an
enhanced Harvard architecture in which the data memory
(DM) bus transfers data and the program memory (PM) bus
With separate program and data memory buses and on-chip
instruction cache, the processors can simultaneously fetch four
operands (two over each data bus) and one instruction (from
the cache), all in a single cycle.
Instruction Cache
The processors include an on-chip instruction cache that
enables three-bus operation for fetching an instruction and four
data values. The cache is selective—only the instructions whose
fetches conflict with PM bus data accesses are cached. This
cache allows full-speed execution of core, looped operations
such as digital filter multiply-accumulates, and FFT butterfly
processing.
Data Address Generators with Zero-Overhead Hardware
Circular Buffer Support
The ADSP-21367/ADSP-21368/ADSP-21369 have two data
address generators (DAGs). The DAGs are used for indirect
addressing and implementing circular data buffers in hardware.
Circular buffers allow efficient programming of delay lines and
other data structures required in digital signal processing, and
are commonly used in digital filters and Fourier transforms.
The two DAGs contain sufficient registers to allow the creation
of up to 32 circular buffers (16 primary register sets, 16 second-
ary). The DAGs automatically handle address pointer
wraparound, reduce overhead, increase performance, and sim-
plify implementation. Circular buffers can start and end at any
memory location.
Flexible Instruction Set
The 48-bit instruction word accommodates a variety of parallel
operations for concise programming. For example, the
ADSP-21367/ADSP-21368/ADSP-21369 can conditionally exe-
cute a multiply, an add, and a subtract in both processing
elements while branching and fetching up to four 32-bit values
from memory—all in a single instruction.
On-Chip Memory
The processors contain two megabits of internal RAM and six
megabits of internal mask-programmable ROM. Each block can
be configured for different combinations of code and data stor-
single-cycle, independent accesses by the core processor and I/O