PIC32MX5XX/6XX/7XX
DS61156G-page 50
2009-2011 Microchip Technology Inc.
3.2
Architecture Overview
The MIPS M4K processor core contains several
logic blocks working together in parallel, providing an
efficient high-performance computing engine. The
following blocks are included with the core:
Execution Unit
Multiply/Divide Unit (MDU)
System Control Coprocessor (CP0)
Fixed Mapping Translation (FMT)
Dual Internal Bus interfaces
Power Management
MIPS16e Support
Enhanced JTAG (EJTAG) Controller
3.2.1
EXECUTION UNIT
The MIPS M4K processor core execution unit imple-
ments a load/store architecture with single-cycle ALU
operations (logical, shift, add, subtract) and an autono-
mous multiply/divide unit. The core contains thirty-two
32-bit General Purpose Registers (GPRs) used for
integer operations and address calculation. One addi-
tional register file shadow set (containing thirty-two reg-
isters) is added to minimize context switching overhead
during interrupt/exception processing. The register file
consists of two read ports and one write port and is fully
bypassed to minimize operation latency in the pipeline.
The execution unit includes:
32-bit adder used for calculating the data address
Address unit for calculating the next instruction
address
Logic for branch determination and branch target
address calculation
Load aligner
Bypass multiplexers used to avoid stalls when
executing instruction streams where data
producing instructions are followed closely by
consumers of their results
Leading Zero/One detect unit for implementing
the CLZ and CLO instructions
Arithmetic Logic Unit (ALU) for performing bitwise
logical operations
Shifter and store aligner
3.2.2
MULTIPLY/DIVIDE UNIT (MDU)
MIPS M4K processor core includes a Multiply/Divide
Unit (MDU) that contains a separate pipeline for multi-
ply and divide operations. This pipeline operates in par-
allel with the Integer Unit (IU) pipeline and does not stall
when the IU pipeline stalls. This allows MDU opera-
tions to be partially masked by system stalls and/or
other integer unit instructions.
The high-performance MDU consists of a 32x16 booth
recoded multiplier, result/accumulation registers (HI
and LO), a divide state machine, and the necessary
multiplexers and control logic. The first number shown
(‘32’ of 32x16) represents the rs operand. The second
number (‘16’ of 32x16) represents the rt operand. The
PIC32 core only checks the value of the latter (rt)
operand to determine how many times the operation
must pass through the multiplier. The 16x16 and 32x16
operations pass through the multiplier once. A 32x32
operation passes through the multiplier twice.
The MDU supports execution of one 16x16 or 32x16
multiply operation every clock cycle; 32x32 multiply
operations can be issued every other clock cycle.
Appropriate interlocks are implemented to stall the
issuance of back-to-back 32x32 multiply operations.
The multiply operand size is automatically determined
by logic built into the MDU.
Divide operations are implemented with a simple 1 bit
per clock iterative algorithm. An early-in detection
checks the sign extension of the dividend (rs) operand.
If rs is 8 bits wide, 23 iterations are skipped. For a 16 bit
wide rs, 15 iterations are skipped and for a 24 bit wide rs,
7 iterations are skipped. Any attempt to issue a
subsequent MDU instruction while a divide is still active
causes an IU pipeline stall until the divide operation is
completed.
Table 3-1 lists the repeat rate (peak issue rate of cycles
until the operation can be reissued) and latency (num-
ber of cycles until a result is available) for the PIC32
core multiply and divide instructions. The approximate
latency and repeat rates are listed in terms of pipeline
clocks.