TMS320C6414T, TMS320C6415T, TMS320C6416T
FIXEDPOINT DIGITAL SIGNAL PROCESSORS
SPRS226M NOVEMBER 2003 REVISED APRIL 2009
10
POST OFFICE BOX 1443
HOUSTON, TEXAS 772511443
CPU (DSP core) description (continued)
The two .M functional units perform all multiplication operations. Each of the C64x .M units can perform two
16
× 16-bit multiplies or four 8 × 8-bit multiplies per clock cycle. The .M unit can also perform 16 × 32-bit multiply
operations, dual 16
× 16-bit multiplies with add/subtract operations, and quad 8 × 8-bit multiplies with add
operations. In addition to standard multiplies, the C64x .M units include bit-count, rotate, Galois field multiplies,
and bidirectional variable shift hardware.
The two .S and .L functional units perform a general set of arithmetic, logical, and branch functions with results
available every clock cycle. The arithmetic and logical functions on the C64x CPU include single 32-bit, dual
16-bit, and quad 8-bit operations.
The processing flow begins when a 256-bit-wide instruction fetch packet is fetched from a program memory.
The 32-bit instructions destined for the individual functional units are “l(fā)inked” together by “1” bits in the least
significant bit (LSB) position of the instructions. The instructions that are “chained” together for simultaneous
execution (up to eight in total) compose an execute packet. A “0” in the LSB of an instruction breaks the chain,
effectively placing the instructions that follow it in the next execute packet. A C64x
DSP device enhancement
now allows execute packets to cross fetch-packet boundaries. In the TMS320C62x
/TMS320C67x DSP
devices, if an execute packet crosses the fetch-packet boundary (256 bits wide), the assembler places it in the
next fetch packet, while the remainder of the current fetch packet is padded with NOP instructions. In the C64x
DSP device, the execute boundary restrictions have been removed, thereby, eliminating all of the NOPs added
to pad the fetch packet, and thus, decreasing the overall code size. The number of execute packets within a
fetch packet can vary from one to eight. Execute packets are dispatched to their respective functional units at
the rate of one per clock cycle and the next 256-bit fetch packet is not fetched until all the execute packets from
the current fetch packet have been dispatched. After decoding, the instructions simultaneously drive all active
functional units for a maximum execution rate of eight instructions every clock cycle. While most results are
stored in 32-bit registers, they can be subsequently moved to memory as bytes, half-words, words, or
doublewords. All load and store instructions are byte-, half-word-, word-, or doubleword-addressable.
For more details on the C64x CPU functional units enhancements, see the following documents:
The TMS320C6000 CPU and Instruction Set Reference Guide (literature number SPRU189)
TMS320C64x Technical Overview (literature number SPRU395)
For more detailed information on the device compatibility, similarities/differences, and migration from the
TMS320C6414/15/16 devices to the TMS320C6414T/15T/16T devices, see the following document:
Migrating From TMS320C6416/15/14 to TMS320C6416T/15T/14T application report (literature number
SPRA981).
TMS320C67x is a trademark of Texas Instruments.