Jump to content United States-English
HP.com HomeProducts and ServicesSupport and DriversSolutionsHow to Buy
» Contact HP
 
HP.com home
HP Server Connectivity  >  Information library

HP ATM 622 and 155 MBPS adapters for HP 9000 Enterprise Servers and Workstations


 

»

HP 9000 and HP Integrity server connectivity home

»

HP Servers home

»

Technical support

»

Buy online from hp

»

Section map

Performance Report: Product Numbers A5483A, A5513A, and A5515A

» Introduction
» Test highlights
» Single adapter configuration test results
» Multiple adapter configuration test results
» Tuning guidelines
» Appendix A: test methodology and test environment

Introduction

A series of tests were run to evaluate the performance of HP ATM 622 and 155 Mbps adapters on various HP 9000 server and workstation platforms. The tests also measured the effect of Classical IP and LAN Emulation interfaces, extended packet sizes, and TCP configuration parameters on the adapter's performance.

A second set of tests evaluated the scalability with respect to number of adapters and number of CPUs on various HP 9000 server platforms.

This document presents the performance results and provides tuning guidelines to achieve maximum possible performance from the ATM adapters.

Appendix A provides the Test Methodology and Environment used for the performance tests.

Test highlights

  • The 622 Mbps adapter delivers bidirectional TCP throughput in excess of 1 Gbit/sec on the N4000 platform. This is 94% of the theoretical maximum TCP throughput for a 622 Mbps link.
  • The unidirectional TCP throughput varies linearly with the number of 622 Mbps Adapters. The TCP throughput of 4.2 Gbits/sec is achieved with eight 622 adapters on N4000 platform.
  • The unidirectional TCP throughput for the 155 Mbps adapter is 134.5 Mbps, which is equal to the theoretical maximum throughput for a 155 Mbps link.
  • The TCP throughput varies linearly with the number of 155 Mbps adapters. Aggregate bidirectional throughput of 3 Gbits/sec is achieved with twelve 155 Mbps adapters on V2500 and N4000 platforms.

All the results above use the Classical IP as the interface with MTU size of 9180 bytes.

Single adapter configuration test results

The theoretical maximum TCP throughput is approximately 135 Mbps for the 155 adapter and 542 Mbps for the 622 adapter. See Appendix A for calculations of the theoretical maximum TCP throughput using protocol overhead at various layers.

Table 1: maximum TCP throughput on N4000*   

Table 1 below shows the TCP throughput numbers for the N4000 platform. See Tables 4 and 5 for results on other platforms.

 N4000 (155 Mbps Adapter)N4000 (622 Mbps Adapter)
Outbound throughput Mbps134.45538.14
CPU utilization %6.526.85
Inbound throughput Mbps132.03535.42
CPU utilization %5.6523.61
Bidirectional throughput Mbps260.121005
CPU utilization %11.1757.1

*One CPU, Classical IP Interface, MTU 9180 bytes

Table 2: TCP throughput on various interface types*   

The packet size (that is, the Maximum Transmission Unit [MTU] size) and the interface type influence the performance of the adapters. Table 2 below shows the maximum TCP throughput and CPU utilization figures for three different interface types for 155 Mbps adapters. Table 3 below shows the maximum throughput and CPU utilization figures for the same interface types for 622 Mbps adapters.

Test/interfaceClassical IP MTU 9180 bytesLANE MTU 9218 bytesLANE MTU 1516 bytes
Outbound Mbps134.45134.13128.4
CPU utilization %6.58.410.4
Inbound Mbps132.03132.38128.22
CPU utilization %5.656.428.49
Bidirectional Mbps260.12258.14240.25
CPU utilization %11.1712.2417.8

*N4000 system, One CPU, 155 Mbps Adapter

Table 3: TCP throughput on various interface types*   
Test/interfaceClassical IP MTU 9180 bytesLANE MTU 9218 bytesLANE MTU 1516 bytes
Outbound Mbps538.14533448
CPU utilization %26.8528.5888.79
Inbound Mbps535.42537454.48
CPU utilization %23.6152.5220.05
Bidirectional Mbps1005965453.15
CPU utilization %57.177.1471.27

*N4000 system, 1 CPU, 622 Mbps Adapter

The TCP window size is an important parameter that affects throughput performance. For 622 Mbps adapters, the TCP window size on the receiving end must be set to at least 144Kbytes (instead of the default window size of 64 Kbytes) to achieve the maximum throughput. For 155 and 622 Mbps adapters, the window size of 144 Kbytes also minimizes the sensitivity to application messages size. See Figure 1 below for an example of Outbound throughput.

Figure 1: TCP Window Size and Outbound TCP throughput on N4000 (Classical IP, 622 Adapter)

TCP Window Size and Outbound TCP Throughput on N4000 (Classical IP, 622 Adapter)

The second set of tests consists of exchange of a single request and response packet of 1 byte each. The packet is at the TCP level. The performance metric is the aggregate number of request/response packet pairs (transactions) per second. This performance metric is an indication of the number of user level packets the adapter can process in a second.

Test results show that the CPU and the platform type influence the single byte request/response performance.

A transaction consisting of a request size of 1024 bytes and a response size of 2048 bytes over a UDP protocol models an environment of NFS services running over ATM.

Table 41: results on different platforms with one 155 adapter*   

Tables 4 and 5 below show the throughput and request/response results on different platforms. All the results are for MTU size of 9180 bytes.

Test/platformV2250V2500N4000
Memory1 GB1 GB1 GB
Outbound TCP throughput Mbps134.73134.85134.45
CPU utilization %8.35.996.5
Inbound TCP throughput Mbps132.78132.36132.03
CPU utilization %13.414.815.65
Bidirectional TCP throughput Mbps261.6255.58260.12
CPU utilization %24.67510.2511.17
TCP Single byte R/R transactions/sec83641465420421
CPU utilization %10089.592.37
UDP (1K/2K) R/R transactions/sec726486948769
CPU utilization %100.0065.6351.11

*Classical IP, One CPU
1 The V2500 Platform tests for 155 and 622 adapters were run with 2 CPUs because the minimum configuration for V2500 is 2 CPUs. The CPU utilization reported for V2500 is per CPU.

Table 5: results on different platforms with one 622 adapter*   
Test/platformV2250V2500N4000
Memory1 GB1 GB1 GB
Outbound TCP throughput Mbps529.67537.75538.14
CPU utilization %65.7815.5226.85
Inbound TCP throughput Mbps512.41538.29535.42
CPU utilization %81.1518.6123.61
Bidirectional TCP throughput Mbps680.3962.951005
CPU utilization %91.8150.257.1
TCP Single byte R/R transactions/sec83672094428536
CPU utilization %100100100
UDP (1K/2K) R/R transactions/sec74131465818901
CPU utilization %100.00100.00100.00

*Classical IP, One CPU

The Bidirectional TCP throughput on the V2250 is limited to 680 Mbps due to the fact that V2250 has 1X PCI I/O Backplane (32 bit and 33 Mhz). The V2500 has a 2X PCI I/O Backplane (64 bit, 33Mhz).

Multiple adapter configuration test results

For a Classical IP interface with MTU size of 9180 bytes, the TCP throughput in outbound direction varies almost linearly with the number of adapters, for up to eight 622 adapters and 12 155 Adapters for V2500 and N4000 platforms. See Figures 2 and 3 below.

Figure 2: TCP Outbound throughput Scalability (N4000: 8 CPUs, 622 Adapters, Classical IP)

TCP Outbound Throughput Scalability (N4000: 8 CPUs, 622 Adapters, Classical IP)

Figure 3: TCP throughput Scalability (V2500: 8 CPUs, 155 Adapters, Classical IP)

TCP Throughput Scalability (V2500: 8 CPUs, 155 Adapters, Classical IP)

Each netperf process was bound to the same CPU that the adapter uses to send interrupts to. The Classical IP interface on each adapter was configured with a different IP subnet; that is, there were eight IP subnets configured in an eight-adapter test.

On V-series platforms, one 622 adapter was configured per PCI Controller, and a maximum of two 155 adapters were installed in a PCI Controller. On N-series platforms, the 622 Adapters were installed in the Twin Turbo slots.

Table 6: results on different platforms with 8 155 adapters   

Tables 6 and 7 below show scalability results on different server platforms for up to eight 155 and 622 adapters, respectively.

Test/platformV2250V2500N4000
Memory1 GB1 GB1 GB
Outbound TCP throughput Mbps103410411076
CPU utilization %23.9620.946.04
Inbound TCP throughput Mbps103810541059
CPU utilization %21.3610.947.05
Bidirectional TCP throughput Mbps169819022062
CPU utilization %43.9750.1226.86
TCP Single byte R/R transactions/sec380975691862466
CPU utilization %95.16100100
UDP (1K/2K) R/R transactions/sec344503675240943
CPU utilization %100.00100.00100.00

*8 CPUs, Classical IP

Table 72: results on different platforms with 8 622 adapters*   
Test/platformV2250V2500N4000
Memory1 GB1 GB1 GB
Outbound TCP throughput Mbps333837994209
CPU utilization %95.6777.6195.78
Inbound TCP throughput Mbps196536054000
CPU utilization %70.7654.4783.76
Bidirectional TCP throughput Mbps236032624020
CPU utilization %96.4285.2490.97
TCP Single byte R/R transactions/sec448196793575763
CPU utilization %100100100
UDP (1K/2K) R/R transactions/sec336695263249787
CPU utilization %100100100

*8 CPUs, Classical IP
2 The Inbound and Bidirectional throughput data for the N4000 platform is estimated.

Table 8: TCP throughput results on LAN emulation interface*   

Tests on the LAN Emulation Interface show that the maximum throughput using multiple adapters is less than the corresponding results for the Classical IP interface. Tests with MTU size of 9 Kbytes give better results than MTU size of 1500 bytes. For example, see Tables 8 and 9 below for results on 155 and 622 adapters, respectively.

Test/interfaceLANE MTU 9218 bytesLANE MTU 1516 bytes
Outbound TCP throughput Mbps538475
CPU utilization %2.756.3
Inbound TCP throughput Mbps529268
CPU utilization %4.043.28
Bidirectional TCP throughput Mbps772338
CPU utilization %6.84.90

*N4000: 8 CPUs, 4 155 Adapters

Table 9: TCP throughput results on LAN emulation interface*    
Test/interfaceLANE MTU 9218 bytesLANE MTU 1516 bytes
Outbound TCP throughput Mbps1567598
CPU utilization %14.3611.83
Inbound TCP throughput Mbps1221444
CPU utilization %11.916.91
Bidirectional TCP throughput Mbps1529680
CPU utilization %23.5420.88

*N4000: 8 CPUs, 4 622 Adapters

Tuning guidelines

• System-related Guidelines

When installing multiple adapters, make sure the adapters are evenly distributed in the card cages on V series platforms. In particular, avoid installing more than 3 155 adapters and more than 1 622 adapter per PCI Controller on the V series platforms. On N-class servers, install the 622 adapters in the Twin Turbo slots. Also, evenly distribute the adapters in slots of both I/O controllers on N-class systems.

For best performance, we recommend that all four memory carriers be installed on the N-class systems. Also, distribute the memory modules evenly on the memory carriers. When using multiple adapters, 4 Gbytes of memory is recommended. For single adapter installations, the memory size of 1 Gbyte is recommended.

• Application-related Guidelines

For standard applications (for example, ftp, NFS, and others), if the socket size is configurable, choose socket size (on the receiving end) and message length values of at least 144 Kbytes. This sets the TCP window size parameter. Alternatively, the TCP window size on the receiving end can be set using the ndd command. Set the tcp_recv_hiwater_def parameter to 144 Kbytes. If the receiving system is a non-HP platform, use a command equivalent to the ndd command on that platform.

For a customized application, use SETSOCKOPT IOCTL to define a socket size of at least 128 Kbytes. When possible, design the application to use the largest message size possible.

Throughput-intensive applications can see an increase of about 20% when the application is bound to a specific CPU. This is done as follows:

Find out which CPU the adapter is interrupting by using the sar command. With enough traffic on the adapter, the sar output should show interrupts on only one CPU. The application is bound to this CPU by using the mpctl system call.

• Network design-related Guidelines

When using LAN Emulation, HP recommends that all of the ATM systems be put on a separate subnet and use an MTU size of 9 Kbytes for the subnet. Subnets that include legacy systems must use the MTU size used by legacy systems; for example, Ethernet uses maximum MTU size of 1500 bytes.

Appendix A: test methodology and test environment

Test methodology

HP's Netperf version 2.1 was used as the test tool in all the measurements. Netperf measures the throughput at the TCP level. This measurement models many real life applications since these applications use the TCP protocol.

The target systems were configured with a minimum of 2 Gbytes of memory. The ATM software version used was I.11.00.10 with Build Revision 7.39; this corresponds to the DART45 release of ATM software. HP-UX 11.0 was used as the underlying operating system. All the configuration parameters were default with the exception of modifying the LANE MTU size. See Figures 4 and 5 below for the test topology.

Figure 4: single adapter test configuration

Single adapter test configuration

The FORE systems ASX-1000 ATM switch ran ForeThought version 4.0 switch software. The Cisco 1010 ATM switch ran version 11.1 of the CISCO IOS software. The N4000 platform had all the four memory carriers installed for the multiple adapter tests. Table 10 below provides the server and workstation platform configurations on which the performance tests were run.

Figure 5: multiple adapter test configuration

Multiple adapter test configuration
Table 10: server and workstation platform characteristics   
PlatformCPUICACHE/DCACHEMemorySPECint95OS
N4000PA8500
440 MHz
0.5 MB/1 MB1 GB33411.00
May 99
V2500PA8500
440 MHz
0.5 MB/1 MB1 GBN/A11.00
Dec 98
V2250PA8200
240 MHz
2 MB/2 MB1 GB16.411.00
Dec 98

3 8 GB for the Multiple adapter tests.

Protocol overheads and theoretical maximum TCP throughput

The throughput as seen by a TCP application is less than the OC-3 physical layer throughput of 155 Mbps or OC-12 (622 Mbps). This is because of the various protocol headers that are inserted by the protocol stack. For example, the protocol headers for Classical IP interface are as follows:

  • TCP/IP Overhead

    This Consists of 20 bytes of TCP header + 20 bytes of IP header + eight bytes of LLC header. This amounts to 48 bytes

  • ATM AAL5 Overhead

    The Convergence Sublayer appends an 8-byte trailer. It will also add a Pad. The Pad size varies from 0-47 bytes, depending on Application Message size.

  • ATM Layer Overhead

    The ATM Layer will append a five byte header to each Segmentation and Reassembly PDU (48 bytes) to form an ATM cell.

  • Physical Layer Overhead

    The Transmission Convergence sublayer creates SONET frames of 27 ATM cells by appending one overhead ATM cell to 26 data ATM cells.

    This leads to the following equation for Theoretical TCP throughput (T)

    T = (D / (D+ 56 + p)) * (48/53) * (26/27) * 155.52 Mbps (or 622 Mbps)

    Where, D = application message size in bytes.

    P = 0-47 bytes padding added at AAL5 layer.

    For Message sizes smaller than 4 Kbytes, The TCP/IP and AAL5 layer protocol overhead forms a major part and theoretical throughput reduces dramatically. For typical message sizes, the theoretical TCP throughput is as follows:

Messages   
Message sizeTheoretical TCP throughput
(155 adapter)
Theoretical TCP throughput
(622 adapter)
128 Bytes90 Mbps362 Mbps
256 Bytes103 Mbps413 Mbps
1024 Bytes126 Mbps503 Mbps
4096 Bytes133 Mbps535 Mbps
64 Kbytes135 Mbps542 Mbps

It should be noted that the above calculation assumes Classical IP interface. LAN Emulation adds additional 8 bytes as header. Therefore, the theoretical throughput will decrease accordingly.

Printable version
Privacy statementUsing this site means you accept its termsFeedback to webmaster
© 2005 Hewlett-Packard Development Company, L.P.