
Data Sheet
July 2000
DSP16210 Digital Signal Processor
14
DRAFT COPY
Lucent Technologies Inc.
Hardware Architecture
(continued)
DSP16000 Core Architectural Overview
See the DSP16000 Digital Signal Processor Core nfor-
mation Manual for a complete description of the
DSP16000 core.
Figure 2 on page 16
shows a block
diagram of the core that consists of four major blocks:
System Control and Cache (SYS), Data Arithmetic Unit
(DAU), Y-Memory Space Address Arithmetic Unit
(YAAU), and X-Memory Space Address Arithmetic Unit
(XAAU). Bits within the
auc0
and
auc1
registers con-
figure the DAU mode-controlled operations.
System Control and Cache (SYS)
This section consists of the control block and the
cache.
The control block provides overall system coordination
that is mostly invisible to the user. The control block
includes an instruction decoder and sequencer, a
pseudorandom sequence generator (PSG), an inter-
rupt and trap handler, a wait-state generator, and low-
power standby mode control logic. An interrupt and trap
handler provides a user-locatable vector table and
three levels of user-assigned interrupt priority.
SYS contains the
alf
register, which is a 16-bit register
that contains AWAIT, a power-saving standby mode
bit, and peripheral flags. The
inc0
and
inc1
registers
are 20-bit interrupt control registers, and
ins
is a 20-bit
interrupt status register.
Programs use the instruction cache to store and exe-
cute repetitive operations such as those found in an
FIR or IIR filter section. The cache can contain up to 31
16-bit and 32-bit instructions. The code in the cache
can repeat up to 2
16
– 1 times without looping over-
head. Operations in the cache that require a coefficient
access execute at twice the normal rate because the
XAAU and its associated bus are not needed for fetch-
ing instructions. The cache greatly reduces the need
for writing in-line repetitive code and, therefore,
reduces instruction/coefficient memory size require-
ments. In addition, the use of cache reduces power
consumption because it eliminates memory accesses
for instruction fetches.
The
cloop
register controls the cache loop count. The
cstate
register contains the current state of the cache.
The 32-bit
csave
register holds the opcode of the
instruction following the loop instruction in X-
memory. The cache provides a convenient, low-over-
head looping structure that is interruptible, savable, and
restorable. The cache is addressable in both the X and
Y memory spaces. An interrupt or trap handling routine
can save and restore
cloop
,
cstate
,
csave
, and the
contents of the cache.
Data Arithmetic Unit (DAU)
The DAU is a power-efficient, dual-MAC (multiply/accu-
mulate) parallel-pipelined structure that is tailored to
communications applications. It can perform two dou-
ble-word (32-bit) fetches, two multiplications, and two
accumulations in a single instruction cycle. The dual-
MAC parallel pipeline begins with two 32-bit registers,
x
and
y
. The pipeline treats the 32-bit registers as four
16-bit signed registers if used as input to two signed
16-bit x 16-bit multipliers. Each multiplier produces a
full 32-bit result stored into registers
p0
and
p1
. The
DAU can direct the output of each multiplier to a 40-bit
ALU or a 40-bit 3-input ADDER. The ALU and ADDER
results are each stored in one of eight 40-bit accumula-
tors,
a0
through
a7
. The ALU includes an ACS
(add/compare/select) function for Viterbi decoding. The
DAU can direct the output of each accumulator to the
ALU/ACS, the ADDER, or a 40-bit BMU (bit manipula-
tion unit).
The ALU implements addition, subtraction, and various
logical operations. To support Viterbi decoding, the
ALU has a split mode in which it computes two simulta-
neous 16-bit additions or subtractions. This mode,
available in a specialized dual-MAC instruction, is used
to compute the distance between a received symbol
and its estimate.
The ACS
provides the add/compare/select function
required for Viterbi decoding. This unit provides flags to
the traceback encoder for implementing mode-con-
trolled side-effects for ACS operations. The source
operands for the ACS are any two accumulators, and
results are written back to one of the source accumula-
tors.
The BMU implements barrel-shift, bit-field insertion, bit-
field extraction, exponent extraction, normalization, and
accumulator shuffling operations.
ar0
through
ar3
are
auxiliary registers whose main function is to control
BMU operations.
The user can enable overflow saturation to affect the
multiplier output and the results of the three arithmetic
units. Overflow saturation can also affect an accumula-
tor value as it is transferred to memory or other register.
These features accommodate various speech coding
standards such as GSM-FR, GSM-HR, and GSM-EFR.
Shifting in the arithmetic pipeline occurs at several
stages to accommodate various standards for mixed-
and double-precision multiplications.