
session, are often noticeable, and sometimes annoying.
Therefore, once the session starts, it is desirable to
maintain a continuous stream of sound. In fact, as shown
in the table below, several studies have indicated that
maximum tolerable latencies for speech are of the order of
600 ms. Our experiences with satellite communications
has demonstrated that even 250 ms (one way) delays are
annoying though coherence is not impaired. For music,
the latencies may be more noticeable and hence the delays
may be required to be even less in order to be
imperceptible. Several studies have been conducted in
order to determine effects of network delays on voice
transmissions [3], [4], [5], [21], [31], [32].
Table 3: Effects of latency on human ear perception
One-way delay
>600 ms
600 ms
250 ms
100 ms
50 ms
Effect of delay
Speech becomes incoherent and unintelligible.
Speech is barely coherent.
Annoying. Conversation style has to be changed.
Imperceptible if listener hears from network only and not off the air.
Imperceptible even if the listener in same room and can hear naturally off the air and from
the network.
Using interactive speech as a model, we decided that
the maximum end-to-end tolerable latency was 100 ms.
These latencies would be acceptable for a large spectrum
of multimedia applications.
The only effect of token jitter is on the need to buffer.
In a truly isochronous network (providing constant
latency), there would be zero buffering requirement in the
the maximum buffering required for a 64 kbps digital
voice stream would be 120 bytes, which is very small.
network. In an asynchronous network, there is a need to
buffer. So long as the buffering is not excessive, the jitter
has minimal impact on system design. For example, if the
maximum packet latency and hence the maximum jitter
between packets were to be restricted to less than 15 ms
(as we targeted for the simulation), then
5: Characteristics of video
Video is moving pictures. It is different from imaging
and graphics mainly in the motion component. Video
represents motion scenes as a rapid sequence of still
frames. An NTSC compatible video is 640 x 480 at 30
frames per second. A PAL compatible video is 768 x 516
at 25 frames per second. The smaller the window size
(fewer number of lines scanned), the lower the bandwidth
requirement.
Table 4: Effect of frame rate on human eye perception
Frames per second
(fps)
<10 fps
12-15 fps
30 fps
Effect on human eye
Eye cannot discern motion. Each frame appears disjointed
Eye can discern motion but is jerky.
Television quality. Cannot discern high motion component
(blurred); e.g. baseball
HDTV quality. Can discern motion in high-motion games; e.g. ice-
hockey
Limit of human eye perception
Scientific video quality; e.g. shuttle blast-off recording
60-75 fps
90 fps
1000 fps