MOTOROLA
2-2
INTEGER CPU
MMC2001
REFERENCE MANUAL
2.2 Features
The main features of the MCORE are as follows:
32-bit load/store RISC architecture
Fixed 16-bit instruction length
16-entry, 32-bit general-purpose register file
Efficient 4-stage execution pipeline, hidden from application software
Single-cycle instruction execution for many instructions
Two cycles for taken branches and memory access instructions
Support for byte, halfword, and word memory accesses
Fast interrupt support with 16-entry dedicated alternate register file
Vectored and autovectored interrupt support
2.3 Microarchitecture Summary
The MCORE instruction execution pipeline consists of the following stages:
Instruction fetch
Instruction decode/register file read
Execute
Register writeback
These stages operate in an overlapped fashion, allowing single-clock instruction exe-
cution for most instructions.
Sixteen general-purpose registers are provided for source operands and instruction
results. Register R15 is used as the link register to hold the return address for sub-
routine calls, and register R0 is associated with the current stack pointer value by
convention.
The execution unit consists of a 32-bit arithmetic/logic unit (ALU), a 32-bit barrel
shifter, a find-first-one unit (FFO), result feed-forward hardware, and miscellaneous
support hardware for multiplication and multiple register loads and stores. Arithmetic
and logical operations are executed in a single cycle with the exception of the multi-
ply, signed divide, and unsigned divide instructions. The multiply instruction is imple-
mented with a 2-bit per clock, overlapped-scan, modified Booth algorithm with early-
out capability to reduce execution time for operations with small multiplier values. The
signed divide and unsigned divide instructions also have data-dependent timing. A
find-first-one unit operates in a single clock cycle.
The program counter unit has a PC incrementer and a dedicated branch address
adder to minimize delays during change of flow operations. Branch target addresses
are calculated in parallel with branch instruction decode, with a single pipeline bubble
for taken branches and jumps. This results in an execution time of two clocks. Condi-
tional branches that are not taken execute in a single clock.
Memory load and store operations are provided for byte, halfword, and word (32-bit)
data with automatic zero extension of byte and halfword load data. These instructions
can execute in two clock cycles. Load and store multiple register instructions allow
low overhead context save and restore operations. These instructions can execute in
(N+1) clock cycles, where N is the numbers of registers to transfer.