Jump to content United States-English
HP.com HomeProducts and ServicesSupport and DriversSolutionsHow to Buy
» Contact HP
 
HP ProCurve Networking

» My ProCurve Sign In

HP ProCurve Networking  >  Information library

Information library


Troubleshooting LAN performance and intermittent connectivity problems
  »

HP Networking



Introduction
Performance and intermittent connectivity problems are among the most difficult of network issues to troubleshoot. An exhaustive coverage of these topics would fill a large book. This technical note is a troubleshooting oriented approach to isolating such problems involving Ethernet (10Mbps, 100Mbps, and 1000Mbps) hubs (repeaters), switches, and routers.

This article does not tell you in detail how to isolate the cause of the problem from the entire network down to LAN devices. Rather, it assumes that you have some reason to suspect the LAN devices, or that you hope that the LAN devices can help you isolate the problem. See "Narrowing the Geographical Scope of the Problem" below for some general guidance in this area.

Focus on dropped packets
Network devices rarely forward packets so slowly as to cause severe performance problems. Rather, severe performance problems in LANs usually involve dropped packets resulting in end nodes timing out and re-transmitting those packets. Each retransmission usually results in a delay on the order of a second or more. Tens of hundreds of such delays result in a network slowdown that end users will notice and complain about.

Similarly, connections will be lost if too many keep-alive packets are dropped.

Note that while the network may be perceived as running slowly, the LAN devices are usually running at full speed. That is, the LAN devices are forwarding packets at a rapid rate. The reasons packets are being dropped, resulting in a network slowdown, are either due to excessive collisions or just a delay caused by high rates of collisions which naturally occur on a shared media network.

One key cause of dropped packets is the design of the network topology itself. For example, if twenty 10Mbps clients all try to send data to a 10Mbps server, all connected via a switch, packets can be dropped through no fault of the network device.

Narrowing the geographical scope of the problem
If the performance problem includes one or more WAN links or a firewall, you should first investigate those parts of the network. WANs and firewalls are much more likely to be the source of the problem than your LANs.

When isolating the performance problem within a particular LAN, you should first try to determine whether the problem is limited to a certain portion of the LAN or a certain path through the LAN. Probably you already did this if you suspect a network as the cause. If you have not narrowed the range of the problem, you may be able to do so by timing data operations (for example, file transfers) across different portions of the network.

Another important tool is ping. You should be able to execute several thousand successful (that is, no timeouts) ping commands on a healthy network.

Finding the drops
Once you suspect a particular network device or a small number of network devices, use network management or the device's Web or console interface to get its error counters (statistics). Then, look for drops.

Drops may be very clearly indicated by counters with names such as: Drops Tx Drop Rx Frames Dropped.

Drops may also be indicated more indirectly as a media-specific fault such as the following Ethernet errors: FCS error CRC error Alignment Rx Runt Rx Short Event Giant Rx Too Long Rx Late Collision Tx Excessive Collision Tx Late Events Excessive Deferrals Tx, Babble error Loss of Carrier

When one of these errors occurs, a hub, switch, or routing switch will drop the packet involved. It is the responsibility of the source end node's transport layer (for example, TCP) to re-send the packet.

How many errors are too many?
Data link errors such as CRC errors, alignment errors, and runts will occur on healthy networks. How do you distinguish between a reasonable number of these errors and too many? A rule of thumb is one error in 5,000. For example, on average, for every 5,000 packets received you should have no more than one receive error (CRC, alignment, runt, short, giant, or too long). And on average, for every 5,000 packets transmitted, you should have no more than one transmit error (late collision, excessive collision, late event, excessive deferral, or loss of carrier). At higher rates of errors, users will probably perceive the network's performance as being poor.

One data link error in 5,000 does not necessarily indicate a perfectly-performing network. Rather, it indicates a network where the errors are probably not causing serious performance problems that are apparent to the users.

Other link-level indications of bad performance
Ethernet also has some conditions that are normal unless they happen too often. Collisions, jabbers, and fragments are good examples. It is normal to have collisions, but they should not occur in large numbers relative to the total number of transmitted packets. Large numbers of collisions, jabbers, or fragments will result in network slowdowns. Unfortunately, it is difficult to define "too often" or "large numbers."

The device's LEDs or event log may indicate link-level problems such as auto-partition or lost link. Link loss is normal during device configuration changes. So, a few losses of link are acceptable. Many losses of link may indicate faulty wiring, bad NICs, bad transceivers, or an end node which has been powered off.

Non-Ethernet links will have their own types of errors. The Fault Finder capability in Hewlett-Packard ProCurve devices may already be reporting one of these errors through the devices' Web interface or event log.

Network device buffer problems
Buffer problems are typically the result of a network topology which is not suited to the traffic patterns on the network. For example, using a 10Mb backbone to interconnect switches will frequently cause congestion (and buffer problems) on all but the smallest networks. To resolve this problem, switch-to-switch and switch-to-server connections should be faster (e.g., 100Mb) than the connections to the clients.

A LAN device may indicate a drop through a report of a system-related problem, such as: Packet Buffer Misses Message Buffer Misses Buffer error Lack Of Resource error

These typically represent a dropped packet.

One or two occasional drops will not result in a noticeable performance problem or a failed connection. But, you find drops occurring more often than once per minute on a particular link or cable, you may have isolated the location of the problem.

Eliminating the dropped packets
Once you have found the location of the dropped packets, you have isolated the problem and are halfway to resolving it.

This article does not cover finding the root cause and solution. Generally speaking, your next step is to fix the cause of the dropped packets. This can involve the network design, faulty cables, faulty transceivers faulty NICs (network interface cards), or configuration problems such as full/half duplex mismatches. For example, the following configuration will cause severe network problems:

10/100 Hub, Switch, Router, etc., with port configured for auto-negotiation end node configured for 100Mbps/Full-Duplex or 10Mbps/Full-Duplex

The hub, switch, or router will correctly sense (not auto-negotiate) the 10Mbps or 100Mbps speed. Since the end node was configured for a specific speed and duplex state, and therefore does not negotiate, the hub, switch, or router will choose the communication mode specified by the 802.3u standard, namely half-duplex.

With one device running at half-duplex and the device on the other end of the connection at full-duplex, the connection will work reasonably well at low levels of traffic. At high levels of traffic the full-duplex device (end node, in this case) will experience an abnormally high level of CRC or alignment errors. The end users usually describe this situation as, "Performance seems to be approximately 1 Mbps!". Often, end nodes will drop connections to their servers.

For errors reported by the ProCurve Fault Finder, you should look at the online help that will suggest likely (though not exhaustive) root causes. Here are some examples.
Counter PossibleRoot Cause
Bad CRC or AlignmentHalf/full duplex mismatch or faulty driver, NIC or transceiver or faulty cable
GiantProblem driver or NIC
CollisionUsually, too much traffic for Ethernet to handle. In rare cases, can be caused by bad cables, NICs, or transceivers
Giant or RuntFaulty NIC, NIC driver, or transceiver
Auto-partitionLoop in network or jabber, faulty NIC, NIC driver, transceiver, or cable
Frame Dropped, Drop Tx, Drop Rx, Buffer OverflowHigh traffic or network design problems
JabberBad cable, NIC, or transceiver
Other information sources
Be sure to refer to the Troubleshooting section of your product manuals as a valuable source of information. Also, please look at the FAQs and white papers on ProCurve Networking by HP Web site.

Printable version
Privacy statementUsing this site means you accept its terms
© 2009 Hewlett-Packard Development Company, L.P.