
TM1100 Preliminary Data Book
Philips Semiconductors
2-4
PRELIMINARY INFORMATION
File: intro.fm5, modified 7/23/99
The core uses a VLIW instruction-set architecture and is
fully general-purpose. TM1100 uses a VLIW instruction
length that allows five simultaneous operations to be is-
sued every clock cycle. These operations can target any
five of the 27 functional units in the DSPCPU, including
integer and floating-point arithmetic units and data-paral-
lel multimedia operation units.
Although the processor core runs a real-time operating
system to coordinate all activities in the TM1100 system,
the processor core is not intended for true general-pur-
pose computer use. For example, the TM1100 processor
core does not implement demand paged virtual memory,
memory address translation, or 64 bit floating point - all
essential features in a general-purpose computer sys-
tem.
TM1100 uses a VLIW architecture to maximize proces-
sor throughput at the lowest possible cost. VLIW archi-
tectures have performance exceeding that of supersca-
lar
general-purpose
CPUs
without
the
cost
and
complexity of a superscalar CPU implementation. The
hardware saved by eliminating superscalar logic reduces
cost and allows the integration of multimedia-specific
features that enhance the power of the processor core.
The TM1100 operation set includes all traditional micro-
processor operations. In addition, multimedia operations
are included that dramatically accelerate standard video
and audio compression and decompression algorithms.
As just one of the five operations issued in a single
TM1100 instruction, a single “custom” or “media” opera-
tion can implement up to 11 traditional microprocessor
operations. These multimedia operations combined with
the VLIW architecture result in tremendous throughput
for multimedia applications.
The DSPCPU core is supported by separate 16-KB data
and 32-KB instruction caches. The data cache is dual-
ported to allow two simultaneous accesses, and both
caches are eight-way set-associative with a 64-byte
block size.
2.5.3
Video-In Unit
The video-in unit interfaces directly to any CCIR 601/
656-compliant device that outputs eight-bit parallel, 4:2:2
YUV time-multiplexed data. Such devices include direct
digital camera systems, which can connect gluelessly to
TM1100 or through the standard CCIR 656 connector
with only the addition of ECL level converters. A single
chip external device can be used to convert to/from serial
D1 professional video. Non-CCIR-compliant devices can
use a digital video decoder chip, such as the Philips
SAA7113, to interface to TM1100.
The video-in unit demultiplexes the captured YUV data
before writing it into local TM1100 SDRAM. Separate
planar data structures are maintained for Y, U, and V.
The video-in unit can be programmed to perform on-the-
fly horizontal resolution subsampling by a factor of two if
needed. Many camera systems capture a 640-pixel/line
or 720-pixel/line image; with subsampling, direct conver-
sion to a 320-pixel/line or a 360-pixel/line image can be
performed with no DSPCPU intervention. Performing
this function during video input reduces initial storage
and bus bandwidth requirements for applications requir-
ing reduced resolution.
2.5.4
Video-Out Unit
The video-out unit essentially performs the inverse func-
tion of the video-in unit. Video-out generates an eight-bit,
CCIR656 digital video data stream that contains a com-
posited video and graphics overlay image. The video im-
age is taken from separate Y, U, and V planar data struc-
tures in SDRAM. The graphics overlay is taken from a
pixel-packed YUV data structure in SDRAM. Composit-
ing allows both alpha-blending and chroma keying.
The video-out unit can also up-scale the video image
horizontally by a factor of two to convert from CIF/SIF to
CCIR 601 resolution. The overlay image, if enabled, is al-
ways in full-pixel resolution.
Video Out is capable of pixel emission rates up to 40
Mpix/sec, and allows full programming of horizontal and
vertical frame/field structure. It is hence capable of re-
fresh of both interlaced as well as non-interlaced (“two
fh”) video displays, with 4:3 or 16:9 or other aspect ra-
tio’s.
The sample rate for video-out pixels (and audio samples)
is independently and dynamically programmable. The
high-quality on-chip sample clock generator circuit al-
lows the programmer subtle control over the sampling
frequency so that audio and video synchronization can
be achieved in any system configuration. When chang-
ing the sample frequency, the instantaneous phase does
not change, which allows sample frequency manipula-
tion without introducing audio or video distortion.
2.5.5
Image Coprocessor (ICP)
The image coprocessor (ICP) is used for several purpos-
es to off-load common image scaling or filtering tasks
from the DSPCPU. Although these tasks can be easily
performed by the DSPCPU, they are a poor use of the
relatively expensive CPU resource. When performed in
parallel by the ICP, these tasks are performed efficiently
by simple hardware, which allows the DSPCPU to con-
tinue with more complex tasks.
The ICP can operate as either a memory-to-memory or
a memory-to-PCI coprocessor device.
In memory-to-memory mode, the ICP can perform either
horizontal or vertical image filtering and resizing. A high
quality algorithm is used (5 tap polyphase filter in each
direction). Filtering or scaling is done in either the hori-
zontal or vertical direction in one pass. Two invocations
of the ICP are required to filter or resize in both direc-
tions.
In memory-to-PCI mode, the ICP can perform horizontal
resizing followed by color-space conversion. For exam-
ple, assume an n
× m pixel array is to be displayed in a
window on the PC video screen while the PC is running
a graphical user interface. The first step (if necessary)
would use the ICP in memory-to-memory mode to per-
form a vertical resizing. The second step would use the