
Instruction Cycle Times
8-16
Copyright 2000 ARM Limited. All rights reserved.
ARM DDI 0165B
8.10
Multiply and multiply accumulate
The multiply instructions make use of special hardware that implements integer
multiplication. All cycles except the last are internal.
During the first (Execute) stage of a multiply instruction, the multiplier and
multiplicand operands are read onto the A and B buses, on which the multiplier unit is
connected. The first stage of the multiplier performs Booth recoding and partial product
summation, using 16 bits of the multiplier operand each cycle.
During the second (Memory) stage of a multiply instruction, the partial product result
from the Execute stage is added with an optional accumulate term (read onto the C bus)
and a possible feedback term from a previous multiply step for multiplications which
require additional cycles.
Note
In Thumb state, only the MULS and MLAS operations are possible.
8.10.1
Interlocks
The multiply unit in ARM9E-S operates in both the Execute and Memory stage of the
pipeline. Because of this, the multiplier result is not available until the end of the
Memory stage of the pipeline. If the following instruction requires the use of the
multiplier result, then it must be interlocked so that the correct value is available. This
applies to all instructions that require the multiply result for the first Execute cycle or
first Memory cycle of the instruction except for multiply accumulate instructions using
the previous multiply result as the accumulator operand.
As an example, the following sequence incurs a single-cycle interlock:
MUL
r0, r1, r2
SUB
r4, r0, r3
The following cycle also incurs a single-cycle interlock:
MLA
r0, r1, r2, r3
STR
r0, [r8]
The following example does not incur an interlock:
MLA
r0, r1, r2, r0
MLA
r0, r3, r4, r0