|
|
Emerging Technology: Reducing Voice over IP Latency
In the past few years, Voice over IP (VoIP) has risen from obscurity
to one of the more popular topics at computer shows, communications
conferences, and in many networking publications. In addition, if you read
popular financial publications, you are probably aware of several
communications carriers that have created an IP infrastructure that
transports both voice and data. These communications carriers are issuing
stocks and bonds to fund this new infrastructure.
The increasing popularity of VoIP does not mean it's easy to
implement. In fact, it may be just the opposite, especially if
implementers don't do their homework ahead of time. Some recently
introduced VoIP products may work very well when paired with
equipment from the same vendor, yet these same products may work
neither well nor at all when used with a product from another vendor.
Even if you purchase products from the same vendor, there is no
guarantee that VoIP will work well or correctly. While compatibility
problems may be resolved as product manufacturers comply with new and
evolving standards, the fact that VoIP is latency driven means you must
carefully examine your network infrastructure to determine if VoIP can
work satisfactorily.
This article examines how data flows through a network, and looks at
each location where the data flow can be delayed. It also offers
different options that can reduce network delay so that you can
maximize a VoIP implementation.
THE CONVERSATION KILLER
You can view a VoIP application in terms of network delay because
real-time voice conversations are delay sensitive. Once the one-way
delay exceeds a quarter of a second-250 milliseconds (ms)-it becomes
relatively difficult for the parties in a conversation to tell when
one person is finished speaking. This increases the probability that
the parties will talk at the same time.One way to alleviate this
situation is to revert to a Citizen's Band (CB) mode of conversation,
using the term “over” to inform the other party that it is his or her turn
to speak. While the use of CB was quite popular during the 1970s, it's
doubtful that an enterprise would be willing to invest in a VoIP
application to obtain CB-style conversation today.
The ITU-T's G.114 recommendation specifies a round-trip delay time of
300ms for telephone traffic, which results in a one-way acceptable delay
of 150ms. While a maximum one-way delay of 150ms may be somewhat
restrictive, a delay of over 250ms will more than likely be unacceptable.
Thus, think of a one-way delay of 150ms as equivalent to a yellow caution
line; a delay of 250ms would represent the red alarm indicator for a VoIP
application.
Now that you have an appreciation for the one-way-delay range that a
VoIP application can tolerate, I'll turn to the data flow associated
with implementing the technology.
DATA FLOW
A common VoIP application runs over an IP network, whether a
corporate intranet or the Internet. The figure illustrates a simple VoIP implementation.
In the example, a voice call is routed from the PBX at location X-via the
gateway, LAN, and router at that location-through the IP network to a
telephone connected to the PBX at location Y. There are several areas
where datagrams transporting voice could be delayed.
As an analog voice conversation is routed through the PBX to the
voice gateway, the voice-coding algorithm used by the gateway adds a
degree of latency. The actual amount of delay is based on the type of
voice coder used. Once a small sample of voice is coded, it must be
encapsulated within a datagram for transmission to a distant gateway. The
encapsulation process includes adding applicable UDP and IP headers to
form the datagram as well as the flow of the datagram from the gateway to
the router via the LAN.
The total delay from those activities represents an interprocess time at
the origin. Note in the figure that when a datagram arrives at the
destination gateway, a reverse process to the one previously described
occurs. Therefore, a datagram will also encounter an interprocess delay at
the destination.
Once the datagram reaches the router at location X in the figure, the flow
of the datagram into the cloud (representing an IP network) will not occur
instantaneously. Instead, a delay occurs based on the length of the
datagram and the operating rate of the access line.
Once the datagram reaches the IP network, it will be routed through
one or more routers to a network egress point. This routing also adds
variable delay. The causes for the variable delay include the number of
routers in the path from the point of entry to the point of exit, the
processing power of each router, and the traffic load offered to each
router. These delays occur as the voice-transporting datagram flows
through the local network and contributes to the delay encountered by the
datagram as it flows through the wide-area IP network.
The table on page 140 summarizes the formerly described delays and
includes a general indication of the range of delays that can be
attributed to each of the factors described. Note that delays can
typically vary between a low of 80.5ms and a high of 314ms.
While a delay of 80.5ms is acceptable, any delay exceeding 150ms may
not be conducive to the legibility of a conversation, especially if
the one-way delay time expands toward the upper range of 314ms.
In examining the entries in the table, note that while the network
access at the origin and egress are shown with the same range of
values, this is not the case for the interprocess delays at the
origin and destination. Similarly, compression and decompression
delays are not symmetrical. Rather than discuss the reasons for this
here, I'll examine each delay category and discuss potential
adjustments that can reduce delay. By saving several milliseconds at
one juncture and then several more at another, it may become possible to
enable a VoIP application to operate acceptably.
VOICE CODING
In a VoIP environment, most gateways are configured to digitize voice
using a hybrid coding technique. A hybrid coder combines waveform coding
and voice coding.
Under a hybrid coding technique, the voice waveform is sampled and
speech parameters are extracted. However, instead of directly
encoding pitch, inflection, and other speech parameters, those
parameters are used to synthesize the segment of the voice sample
from which they were extracted. The synthesized period of voice,
typically 20ms of speech, is then compared to the original sample. If the
two are within a predefined interval, no adjustments are made to the
speech parameters. If the actual and synthesized samples differ by more
than a predefined amount, the speech parameters are adjusted to obtain a
“better fit.”
The end result of this feedback mechanism is an analysis by synthesis
technique: This attempts to adjust extracted speech parameters to provide
a synthesis capability that will closely resemble the original waveform.
Once the extracted speech parameters' values are finalized, the coder will
attempt to match parameters against previously “learned” parameters placed
in a codebook. If a match occurs, the position of the parameter in the
codebook is used instead of its value, further reducing the quantity of
data that requires transmission.
This hybrid coding technique is commonly used by a family of voice
coders referred to as Code Excited Linear Predictor coders (CELP). In
general, the data rate of different members of the CELP family is
inversely proportional to their algorithmic delay. That is, the higher the
voice coding rate, the lower the delay associated with coding a sample of
speech will be.
Three popular voice coders used with many VoIP gateways include the
G.728, G.729, and G.723.1 coders. G.728 is a low-delay version of
CELP. The algorithm delay of a G.728 coder is approximately 2.5ms;
however, the resulting digitized voice data rate is 16Kbits/sec. The
G.729 voice coder operates at 8Kbits/sec and has a 10ms delay. The
G.723.1 voice coder represents a multirate coder, as it operates at
either 5.3Kbits/sec or 6.3Kbits/sec. For either data rate, the
algorithm introduces a coding delay of approximately 30ms. Because
each voice coder operates on a 20ms segment of speech, the total
delays are approximately 22.5ms, 30ms, and 50ms, respectively.
A technique worth considering to reduce the one-way latency is to
change the voice coding method. For example, changing the voice coder from
G.723.1 to G.728 will reduce one-way delay by 27.5ms. Most gateway
products support between six and ten types of voice coders. Thus, by
carefully considering the voice coder to use, you can significantly reduce
one-way latency.
While it is relatively easy to obtain information about voice coding
latency for standardized coders, the same may not be true for
proprietary coders. My own attempts to determine the latency of some
proprietary “enhanced” CELP coders proved something of a scavenger
hunt, requiring a series of telephone calls and a dose of
perseverance, since such information is typically not included in
vendor specification sheets.
INTERPROCESS AT ORIGIN DELAY
As previously (and briefly) discussed, the interprocess delay at the
origin has several components. Those components include the creation
of a datagram containing a period of digitized voice, the placement
of the datagram onto the LAN, and its extraction from the network by
the router. Although the interprocess delay time at the origin is not
extremely variable, several techniques can shave off a few milliseconds of
delay.
First, if your LAN utilization level is high, your LAN is probably
experiencing a high level of collisions that delay the flow of frames
across that network. In this situation, you should consider either
upgrading or segmenting the LAN; also, you might consider bypassing that
network. Concerning the latter, instead of using a gateway that requires a
separate LAN connection, you could consider adding voice modules to the
router. This connects PBX lines directly to the router so that datagrams
don't have to traverse the LAN. While this change in the local
infrastructure may only save a few milliseconds, collectively shaving off
a few here and a few via other techniques may be necessary for VoIP to
work at your level of expectation.
NETWORK ACCESS AND EGRESS
The delay associated with transmitting a datagram into the IP network and
receiving it from that network is highly dependent on the operating rate
of the access line at each location. However, the delays are also
dependent on the voice coding method used. For example, assume the voice
coding method selected produces an 8Kbit/sec digital data stream. Then,
each 20ms sample results in 8000bits/sec ¥ 20ms, or 160 bits, that must be
encapsulated into an IP datagram.
As a refresher for those who may be a bit rusty on the relationship
of headers within a VoIP environment, the Real-Time Transport
Protocol (RTP) header is commonly used to prefix each digitized voice
sample. RTP contains timing information that makes it possible to place
received voice samples into a jitter buffer at the destination and extract
each sample to remove timing variations that occur as datagrams flow
through a network.
The RTP header adds 16 bytes, which is prefixed by UDP's additional 8
bytes of heading information. Finally, the IP header is prefixed to form
the datagram, placing an additional 20 bytes of header information. Thus,
in this example, the 160 bits, or 20 bytes, of digitized voice is
transported as a 64-byte datagram.
If the access line to the IP network operates at 64Kbits/sec, then
the delay associated with placing a datagram containing 20ms of
speech encoded at 8000bits/sec is (64 bytes ¥ 8 bits/byte) /
64Kbits/sec, or 8ms. At a T1 operating rate, the delay associated
with the access line is (64 bytes ¥ 8 bits/byte) / 1.536Mbits/sec, or
.334ms. (I listed a T1 rate of 1.536Mbits/sec instead of the line
operating rate of 1.544Mbits/sec because 8Kbits/sec is used for framing
and is not available for the actual transfer of data.)
Because of the concern about latency, the prior example indicates
that the delay can vary by approximately 7.67ms, depending on the use of
either a 64Kbit/sec or T1 access line. Since egress from the network also
occurs via an access line, you can eliminate approximately 15ms of delay
by using the higher-speed access line for each location in the figure.
INTERPROCESS AT DESTINATION DELAY
As datagrams flow from the IP network toward the private network
(location Y in the figure), the router at that location will more
than likely be configured with an access list. An access list
represents a series of permit-and-deny statements to which various
fields in each arriving datagram are compared. Access lists primarily
secure access to network facilities; however, they can also expedite the
flow of data based on the type of datagram being transported.
In a Cisco Systems router environment, there are two types of access
lists, referred to as “standard” and “extended.” A standard access
list checks the source address in a datagram. In comparison, an
extended access list can check the source and/or destination address,
protocol, upper-layer port number, and other information within each
datagram. Many security-conscious organizations program sophisticated
extended access lists, with anti-spoofing statements commonly placed at
the top.
Anti-spoofing statements check the source address of each datagram
against RFC 1918 addresses, as well as the loopback address and the
target network address. Because these addresses should not appear in
a datagram arriving at a network, datagrams with such addresses in
the source address field get tossed into the great bit bucket in the
sky.
While these anti-spoofing statements are a necessity in today's
operating environment, they are not without cost. That cost is in the time
delay required to buffer an arriving datagram and check its field values
against the statements in the access list until a match occurs. When that
happens, the access list either tosses the datagram or permits it to flow
through the router. Because datagrams are compared sequentially against
each statement in an access list-until either a matching condition occurs
or the end of the list is reached-a comprehensive list can introduce
another delay, especially if the router is a few years old.
While you could replace an old router with a newer model based on a
faster processor, there is a far more attractive solution for
minimizing latency. That solution is to move applicable permit
statements, which permit datagrams transporting digitized voice to
the gateway, to the top of the access list.
At the very worst, a datagram with a spoofed address and a viral
payload will only have its contents treated as a piece of digitized
voice, and the party to a conversation may hear an unexpected “burp,” or
some other awkward sound. Thus, moving permissions to the gateway to the
top of an access list can probably shave a few milliseconds off the
interprocess delay at the destination without adversely affecting security.
JITTER BUFFER
The jitter buffer is a temporary storage area built into the receiver of
each gateway. It provides a mechanism to remove the random delays between
datagrams, which occur as they are routed through a network. Most gateways
provide a configuration option, which permits the administrator to set the
size of the jitter buffer to store between 0ms (disabled) and 255ms of
voice-transporting datagrams.
In actuality, the IP and UDP headers are stripped from each datagram
prior to their storage in the jitter buffer. However, the RTP header
is removed from the remainder of the datagram only as the actual data is
extracted. This is because the RTP header contains timing information for
each voice sample. This enables the sample to be extracted from the jitter
buffer at the appropriate time to reconstruct the timing relationship
between voice samples.
Although the permissible jitter buffer range of settings is between
0ms and 255ms, it is typically set between 10ms and 20ms. While a
higher setting normally improves the quality of reconstructed voice,
a jitter buffer set too high may cause datagrams to exceed 150ms of
delay.
DECOMPRESSION
Although the delay associated with different voice compression
algorithms can differ considerably, the time required for
decompression is relatively uniform, regardless of the compression
algorithm used. Thus, changing a voice-coding method usually has a
minimal effect on the decompression delay.
NETWORK TRANSMISSION DELAY
I purposely deferred a discussion of network transmission delay until this
point. While most of the delay components listed in the table are directly
controllable by the user, network transmission delay may not be
controllable.
Network transmission delay represents the one-way delay through the
IP network, as shown in the figure. If the IP network is the
Internet, a large number of variables can affect the flow of
datagrams and may not be controllable. Those variables include
traffic arriving at each router in the path the voice-transporting
datagrams must traverse, the processing power of each router, the
bandwidth of the circuits connecting routers, and the number of
routers between the ingress and egress points on the network.
Depending on your ISP, it might be possible to obtain a Service Level
Agreement (SLA) that will guarantee end-to-end latency through the
network. Whether or not an SLA is offered, you should consider using both
the Ping and Traceroute utility programs prior to implementing a VoIP
application on the Internet.
Ping will provide the round-trip delay time that, when divided by
two, gives an approximation of the one-way delay if you ping the
router at the destination LAN. If the one-way delay appears excessive with
respect to the total permissible delay, consider using Traceroute.
In addition to tracing the route to the destination, this TCP/IP
application will indicate the delay at each hop on the path to the
destination. By careful examination of the route to the destination
LAN, you may be able to note one or more potentially overloaded
routers that are contributing more than their share of delay. With
one or more calls to the ISP, it might be possible to obtain an
alternate route through its network. At the very least, enough
complaints might get your ISP to replace an aged router or add
bandwidth to its network.
When using Ping and Traceroute, it is important to try each
periodically through the day, over a sufficient number of days, to
ensure that the readings reflect operational reality. It is not
advisable to use Ping during Christmas week, nor over other holiday
weeks, when network activity does not operate at a normal level. In
addition, the initial time the application is used may produce a
distorted delay value. This is because routing on the Internet occurs via
destination IP addressing. If you enter a host name that was not
previously resolved into an IP address, it must be resolved by the DNS,
adding some time to the first round-trip delay computation.
DO THE MATH
While the previously described delay components are the primary ones
that govern the ability to control datagram source to destination
latency, there are other tricks and techniques to be gained by
experience, which can shave additional ms off the total delay time.
For example, if the routers are edge devices connected to the IP
network, you may wish to consider employing static routing to avoid
unnecessary table updates. If you were previously using RIP, this
action would eliminate RIP table updates that normally occur every 30
seconds, and which suspend data transfer for the duration of the table
update.
By carefully examining the various contributors to latency, you can
determine ahead of time if VoIP will work at an acceptable level.
Similar to the Boy Scouts' motto, it's most important to be prepared.
Gilbert Held is an award-winning author and lecturer. He has written
over 300 technical articles and 40 books, including Cisco Access List
Field Guide and Cisco Router Performance Field Guide, both published by
McGraw Hill. He can be reached at gil_held@yahoo.com.

| cursos marketing.it |