Reducing Voice over IP Latency




Emerging Technology: Reducing Voice over IP Latency

In the past few years, Voice over IP (VoIP) has risen from obscurity to one of the more popular topics at computer shows, communications conferences, and in many networking publications. In addition, if you read popular financial publications, you are probably aware of several communications carriers that have created an IP infrastructure that transports both voice and data. These communications carriers are issuing stocks and bonds to fund this new infrastructure.

The increasing popularity of VoIP does not mean it's easy to implement. In fact, it may be just the opposite, especially if implementers don't do their homework ahead of time. Some recently introduced VoIP products may work very well when paired with equipment from the same vendor, yet these same products may work neither well nor at all when used with a product from another vendor.

Even if you purchase products from the same vendor, there is no guarantee that VoIP will work well or correctly. While compatibility problems may be resolved as product manufacturers comply with new and evolving standards, the fact that VoIP is latency driven means you must carefully examine your network infrastructure to determine if VoIP can work satisfactorily.

This article examines how data flows through a network, and looks at each location where the data flow can be delayed. It also offers different options that can reduce network delay so that you can maximize a VoIP implementation.

THE CONVERSATION KILLER

You can view a VoIP application in terms of network delay because real-time voice conversations are delay sensitive. Once the one-way delay exceeds a quarter of a second-250 milliseconds (ms)-it becomes relatively difficult for the parties in a conversation to tell when one person is finished speaking. This increases the probability that the parties will talk at the same time.One way to alleviate this situation is to revert to a Citizen's Band (CB) mode of conversation, using the term “over” to inform the other party that it is his or her turn to speak. While the use of CB was quite popular during the 1970s, it's doubtful that an enterprise would be willing to invest in a VoIP application to obtain CB-style conversation today.

The ITU-T's G.114 recommendation specifies a round-trip delay time of 300ms for telephone traffic, which results in a one-way acceptable delay of 150ms. While a maximum one-way delay of 150ms may be somewhat restrictive, a delay of over 250ms will more than likely be unacceptable. Thus, think of a one-way delay of 150ms as equivalent to a yellow caution line; a delay of 250ms would represent the red alarm indicator for a VoIP application.

Now that you have an appreciation for the one-way-delay range that a VoIP application can tolerate, I'll turn to the data flow associated with implementing the technology.

DATA FLOW

A common VoIP application runs over an IP network, whether a corporate intranet or the Internet. The figure illustrates a simple VoIP implementation.


In the example, a voice call is routed from the PBX at location X-via the gateway, LAN, and router at that location-through the IP network to a telephone connected to the PBX at location Y. There are several areas where datagrams transporting voice could be delayed.

As an analog voice conversation is routed through the PBX to the voice gateway, the voice-coding algorithm used by the gateway adds a degree of latency. The actual amount of delay is based on the type of voice coder used. Once a small sample of voice is coded, it must be encapsulated within a datagram for transmission to a distant gateway. The encapsulation process includes adding applicable UDP and IP headers to form the datagram as well as the flow of the datagram from the gateway to the router via the LAN.

The total delay from those activities represents an interprocess time at the origin. Note in the figure that when a datagram arrives at the destination gateway, a reverse process to the one previously described occurs. Therefore, a datagram will also encounter an interprocess delay at the destination.

Once the datagram reaches the router at location X in the figure, the flow of the datagram into the cloud (representing an IP network) will not occur instantaneously. Instead, a delay occurs based on the length of the datagram and the operating rate of the access line.

Once the datagram reaches the IP network, it will be routed through one or more routers to a network egress point. This routing also adds variable delay. The causes for the variable delay include the number of routers in the path from the point of entry to the point of exit, the processing power of each router, and the traffic load offered to each router. These delays occur as the voice-transporting datagram flows through the local network and contributes to the delay encountered by the datagram as it flows through the wide-area IP network.

The table on page 140 summarizes the formerly described delays and includes a general indication of the range of delays that can be attributed to each of the factors described. Note that delays can typically vary between a low of 80.5ms and a high of 314ms.

While a delay of 80.5ms is acceptable, any delay exceeding 150ms may not be conducive to the legibility of a conversation, especially if the one-way delay time expands toward the upper range of 314ms.

In examining the entries in the table, note that while the network access at the origin and egress are shown with the same range of values, this is not the case for the interprocess delays at the origin and destination. Similarly, compression and decompression delays are not symmetrical. Rather than discuss the reasons for this here, I'll examine each delay category and discuss potential adjustments that can reduce delay. By saving several milliseconds at one juncture and then several more at another, it may become possible to enable a VoIP application to operate acceptably.

VOICE CODING

In a VoIP environment, most gateways are configured to digitize voice using a hybrid coding technique. A hybrid coder combines waveform coding and voice coding.

Under a hybrid coding technique, the voice waveform is sampled and speech parameters are extracted. However, instead of directly encoding pitch, inflection, and other speech parameters, those parameters are used to synthesize the segment of the voice sample from which they were extracted. The synthesized period of voice, typically 20ms of speech, is then compared to the original sample. If the two are within a predefined interval, no adjustments are made to the speech parameters. If the actual and synthesized samples differ by more than a predefined amount, the speech parameters are adjusted to obtain a “better fit.”

The end result of this feedback mechanism is an analysis by synthesis technique: This attempts to adjust extracted speech parameters to provide a synthesis capability that will closely resemble the original waveform. Once the extracted speech parameters' values are finalized, the coder will attempt to match parameters against previously “learned” parameters placed in a codebook. If a match occurs, the position of the parameter in the codebook is used instead of its value, further reducing the quantity of data that requires transmission.

This hybrid coding technique is commonly used by a family of voice coders referred to as Code Excited Linear Predictor coders (CELP). In general, the data rate of different members of the CELP family is inversely proportional to their algorithmic delay. That is, the higher the voice coding rate, the lower the delay associated with coding a sample of speech will be.

Three popular voice coders used with many VoIP gateways include the G.728, G.729, and G.723.1 coders. G.728 is a low-delay version of CELP. The algorithm delay of a G.728 coder is approximately 2.5ms; however, the resulting digitized voice data rate is 16Kbits/sec. The G.729 voice coder operates at 8Kbits/sec and has a 10ms delay. The G.723.1 voice coder represents a multirate coder, as it operates at either 5.3Kbits/sec or 6.3Kbits/sec. For either data rate, the algorithm introduces a coding delay of approximately 30ms. Because each voice coder operates on a 20ms segment of speech, the total delays are approximately 22.5ms, 30ms, and 50ms, respectively.

A technique worth considering to reduce the one-way latency is to change the voice coding method. For example, changing the voice coder from G.723.1 to G.728 will reduce one-way delay by 27.5ms. Most gateway products support between six and ten types of voice coders. Thus, by carefully considering the voice coder to use, you can significantly reduce one-way latency.

While it is relatively easy to obtain information about voice coding latency for standardized coders, the same may not be true for proprietary coders. My own attempts to determine the latency of some proprietary “enhanced” CELP coders proved something of a scavenger hunt, requiring a series of telephone calls and a dose of perseverance, since such information is typically not included in vendor specification sheets.

INTERPROCESS AT ORIGIN DELAY

As previously (and briefly) discussed, the interprocess delay at the origin has several components. Those components include the creation of a datagram containing a period of digitized voice, the placement of the datagram onto the LAN, and its extraction from the network by the router. Although the interprocess delay time at the origin is not extremely variable, several techniques can shave off a few milliseconds of delay.

First, if your LAN utilization level is high, your LAN is probably experiencing a high level of collisions that delay the flow of frames across that network. In this situation, you should consider either upgrading or segmenting the LAN; also, you might consider bypassing that network. Concerning the latter, instead of using a gateway that requires a separate LAN connection, you could consider adding voice modules to the router. This connects PBX lines directly to the router so that datagrams don't have to traverse the LAN. While this change in the local infrastructure may only save a few milliseconds, collectively shaving off a few here and a few via other techniques may be necessary for VoIP to work at your level of expectation.

NETWORK ACCESS AND EGRESS

The delay associated with transmitting a datagram into the IP network and receiving it from that network is highly dependent on the operating rate of the access line at each location. However, the delays are also dependent on the voice coding method used. For example, assume the voice coding method selected produces an 8Kbit/sec digital data stream. Then, each 20ms sample results in 8000bits/sec ¥ 20ms, or 160 bits, that must be encapsulated into an IP datagram.

As a refresher for those who may be a bit rusty on the relationship of headers within a VoIP environment, the Real-Time Transport Protocol (RTP) header is commonly used to prefix each digitized voice sample. RTP contains timing information that makes it possible to place received voice samples into a jitter buffer at the destination and extract each sample to remove timing variations that occur as datagrams flow through a network.

The RTP header adds 16 bytes, which is prefixed by UDP's additional 8 bytes of heading information. Finally, the IP header is prefixed to form the datagram, placing an additional 20 bytes of header information. Thus, in this example, the 160 bits, or 20 bytes, of digitized voice is transported as a 64-byte datagram.

If the access line to the IP network operates at 64Kbits/sec, then the delay associated with placing a datagram containing 20ms of speech encoded at 8000bits/sec is (64 bytes ¥ 8 bits/byte) / 64Kbits/sec, or 8ms. At a T1 operating rate, the delay associated with the access line is (64 bytes ¥ 8 bits/byte) / 1.536Mbits/sec, or .334ms. (I listed a T1 rate of 1.536Mbits/sec instead of the line operating rate of 1.544Mbits/sec because 8Kbits/sec is used for framing and is not available for the actual transfer of data.)

Because of the concern about latency, the prior example indicates that the delay can vary by approximately 7.67ms, depending on the use of either a 64Kbit/sec or T1 access line. Since egress from the network also occurs via an access line, you can eliminate approximately 15ms of delay by using the higher-speed access line for each location in the figure.

INTERPROCESS AT DESTINATION DELAY

As datagrams flow from the IP network toward the private network (location Y in the figure), the router at that location will more than likely be configured with an access list. An access list represents a series of permit-and-deny statements to which various fields in each arriving datagram are compared. Access lists primarily secure access to network facilities; however, they can also expedite the flow of data based on the type of datagram being transported.

In a Cisco Systems router environment, there are two types of access lists, referred to as “standard” and “extended.” A standard access list checks the source address in a datagram. In comparison, an extended access list can check the source and/or destination address, protocol, upper-layer port number, and other information within each datagram. Many security-conscious organizations program sophisticated extended access lists, with anti-spoofing statements commonly placed at the top.

Anti-spoofing statements check the source address of each datagram against RFC 1918 addresses, as well as the loopback address and the target network address. Because these addresses should not appear in a datagram arriving at a network, datagrams with such addresses in the source address field get tossed into the great bit bucket in the sky.

While these anti-spoofing statements are a necessity in today's operating environment, they are not without cost. That cost is in the time delay required to buffer an arriving datagram and check its field values against the statements in the access list until a match occurs. When that happens, the access list either tosses the datagram or permits it to flow through the router. Because datagrams are compared sequentially against each statement in an access list-until either a matching condition occurs or the end of the list is reached-a comprehensive list can introduce another delay, especially if the router is a few years old.

While you could replace an old router with a newer model based on a faster processor, there is a far more attractive solution for minimizing latency. That solution is to move applicable permit statements, which permit datagrams transporting digitized voice to the gateway, to the top of the access list.

At the very worst, a datagram with a spoofed address and a viral payload will only have its contents treated as a piece of digitized voice, and the party to a conversation may hear an unexpected “burp,” or some other awkward sound. Thus, moving permissions to the gateway to the top of an access list can probably shave a few milliseconds off the interprocess delay at the destination without adversely affecting security.

JITTER BUFFER

The jitter buffer is a temporary storage area built into the receiver of each gateway. It provides a mechanism to remove the random delays between datagrams, which occur as they are routed through a network. Most gateways provide a configuration option, which permits the administrator to set the size of the jitter buffer to store between 0ms (disabled) and 255ms of voice-transporting datagrams.

In actuality, the IP and UDP headers are stripped from each datagram prior to their storage in the jitter buffer. However, the RTP header is removed from the remainder of the datagram only as the actual data is extracted. This is because the RTP header contains timing information for each voice sample. This enables the sample to be extracted from the jitter buffer at the appropriate time to reconstruct the timing relationship between voice samples.

Although the permissible jitter buffer range of settings is between 0ms and 255ms, it is typically set between 10ms and 20ms. While a higher setting normally improves the quality of reconstructed voice, a jitter buffer set too high may cause datagrams to exceed 150ms of delay.

DECOMPRESSION

Although the delay associated with different voice compression algorithms can differ considerably, the time required for decompression is relatively uniform, regardless of the compression algorithm used. Thus, changing a voice-coding method usually has a minimal effect on the decompression delay.

NETWORK TRANSMISSION DELAY

I purposely deferred a discussion of network transmission delay until this point. While most of the delay components listed in the table are directly controllable by the user, network transmission delay may not be controllable.

Network transmission delay represents the one-way delay through the IP network, as shown in the figure. If the IP network is the Internet, a large number of variables can affect the flow of datagrams and may not be controllable. Those variables include traffic arriving at each router in the path the voice-transporting datagrams must traverse, the processing power of each router, the bandwidth of the circuits connecting routers, and the number of routers between the ingress and egress points on the network. Depending on your ISP, it might be possible to obtain a Service Level Agreement (SLA) that will guarantee end-to-end latency through the network. Whether or not an SLA is offered, you should consider using both the Ping and Traceroute utility programs prior to implementing a VoIP application on the Internet.

Ping will provide the round-trip delay time that, when divided by two, gives an approximation of the one-way delay if you ping the router at the destination LAN. If the one-way delay appears excessive with respect to the total permissible delay, consider using Traceroute.

In addition to tracing the route to the destination, this TCP/IP application will indicate the delay at each hop on the path to the destination. By careful examination of the route to the destination LAN, you may be able to note one or more potentially overloaded routers that are contributing more than their share of delay. With one or more calls to the ISP, it might be possible to obtain an alternate route through its network. At the very least, enough complaints might get your ISP to replace an aged router or add bandwidth to its network.

When using Ping and Traceroute, it is important to try each periodically through the day, over a sufficient number of days, to ensure that the readings reflect operational reality. It is not advisable to use Ping during Christmas week, nor over other holiday weeks, when network activity does not operate at a normal level. In addition, the initial time the application is used may produce a distorted delay value. This is because routing on the Internet occurs via destination IP addressing. If you enter a host name that was not previously resolved into an IP address, it must be resolved by the DNS, adding some time to the first round-trip delay computation.

DO THE MATH

While the previously described delay components are the primary ones that govern the ability to control datagram source to destination latency, there are other tricks and techniques to be gained by experience, which can shave additional ms off the total delay time.

For example, if the routers are edge devices connected to the IP network, you may wish to consider employing static routing to avoid unnecessary table updates. If you were previously using RIP, this action would eliminate RIP table updates that normally occur every 30 seconds, and which suspend data transfer for the duration of the table update.

By carefully examining the various contributors to latency, you can determine ahead of time if VoIP will work at an acceptable level. Similar to the Boy Scouts' motto, it's most important to be prepared.

Gilbert Held is an award-winning author and lecturer. He has written over 300 technical articles and 40 books, including Cisco Access List Field Guide and Cisco Router Performance Field Guide, both published by McGraw Hill. He can be reached at gil_held@yahoo.com.

Buscador en la web

Inicio

cursos marketing.it