Скачать презентацию IBM Power Systems Network Performance SEA Components Steven Скачать презентацию IBM Power Systems Network Performance SEA Components Steven

b14b3f70fb3ab6cbb305fbf91aee87dc.ppt

  • Количество слайдов: 76

IBM Power Systems Network Performance, SEA Components Steven Knudson sjknuds@us. ibm. com IBM POWER IBM Power Systems Network Performance, SEA Components Steven Knudson [email protected] ibm. com IBM POWER Advanced Technical Skills © 2013 IBM Corporation

IBM Power Systems Agenda § § § § Physical Ethernet Adapters Link Aggregation Configuration IBM Power Systems Agenda § § § § Physical Ethernet Adapters Link Aggregation Configuration Shared Ethernet Adapter SEA Configuration SEA VLAN Tagging VLAN awareness in SMS 10 Gb SEA, active – active ha_mode=sharing, active – active Dynamic VLANs on SEA Throughput Virtual Switch – VEB versus VEPA mode AIX Virtual Ethernet adapter AIX IP interface AIX TCP settings AIX NFS settings largesend, large_receive with binary ftp for network performance iperf tool for network performance Most syntax in this presentation is VIO padmin, sometimes root smitty © 2013 IBM Corporation

IBM Power Systems Physical Ethernet Adapters § Lets use Flow Control § The 10 IBM Power Systems Physical Ethernet Adapters § Lets use Flow Control § The 10 Gb PCIe Ethernet-SR adapter uses 802. 3 x or “Link” Flow Control § The FCo. E adapter uses 802. 1 Qbb or Priority Flow Control. PFC requires VLAN tagging to be on (802. 1 q) § PCIe Adapter Flow Control attribute is on by default $ lsdev -dev ent 0 -attr | grep flow_ctrl yes Enable Transmit and Receive Flow Control § Attribute might still be disabled by switch – check status, in this case, SEA over a six link aggregation $ entstat -all ent 14 Transmit and Receive Transmit and Receive | grep "Transmit and Flow Control Status: Flow Control Status: Receive Flow Control Status: " Disabled Disabled © 2013 IBM Corporation

IBM Power Systems Physical Ethernet Adapters § IVE Physical port Flow Control (802. 3 IBM Power Systems Physical Ethernet Adapters § IVE Physical port Flow Control (802. 3 x, or Link) is off by default – set via HMC… © 2013 IBM Corporation

IBM Power Systems Physical Ethernet Adapters § IVE - Radio Button, then Configure… © IBM Power Systems Physical Ethernet Adapters § IVE - Radio Button, then Configure… © 2013 IBM Corporation

IBM Power Systems Physical Ethernet Adapters § IVE – HEA Flow control checkbox, Promiscuous IBM Power Systems Physical Ethernet Adapters § IVE – HEA Flow control checkbox, Promiscuous LPAR when VIO SEA will be built on this adapter © 2013 IBM Corporation

IBM Power Systems Physical Ethernet Adapters § What Ethernet adapters do we have? $ IBM Power Systems Physical Ethernet Adapters § What Ethernet adapters do we have? $ lsdev -type adapter | grep ent 0 Available ent 1 Available ent 2 Available ent 3 Available ent 4 Available ent Logical Host Ethernet Port ( lp-hea) Virtual I/O Ethernet Adapter ( l-lan) Shared Ethernet Adapter § What are their physical location codes? $ lsdev -type adapter -field name physloc | grep ent 0 U 78 C 0. 001. DBJ 4725 -P 2 -C 8 -T 1 ent 1 U 9179. MHB. 1026 D 1 P-V 1 -C 2 -T 1 ent 2 U 9179. MHB. 1026 D 1 P-V 1 -C 3 -T 1 ent 3 U 9179. MHB. 1026 D 1 P-V 1 -C 4 -T 1 ent 4 © 2013 IBM Corporation

IBM Power Systems Physical Ethernet Adapters § Physical adapters should have large_send (and those IBM Power Systems Physical Ethernet Adapters § Physical adapters should have large_send (and those that have large_receive) already set to yes $ lsdev -dev ent 0 -attr |grep large_receive yes Enable receive TCP segment aggregation True large_send yes Enable hardware Transmit TCP segmentation § There is no media_speed attribute on 10 Gb adapters. 1 Gb adapters are usually fine with Auto_Negotiation $ lsdev -dev ent 0 -attr | grep media_speed Auto_Negotiation Requested media speed © 2013 IBM Corporation

IBM Power Systems Physical Ethernet Adapters - dog threads ØIf you are configuring IP IBM Power Systems Physical Ethernet Adapters - dog threads ØIf you are configuring IP directly on a physical adapter, you may be steered into enabling dog threads for extremely high packet rates (no effect on virtual adapters, no recommendation for SEA) # chdev –l en 0 –a thread=on en 0 changed ØIt works in concert with the ndogthreads setting: # no -h ndogthreads Help for tunable ndogthreads: Purpose: Specifies the number of dog threads that are used during hashing. Values: Default: 0 Range: 0 - 1024 Type: Dynamic Unit: numeric Tuning: This option is valid only if dog threads are enabled for an interface. A value of 0 sets it to default ie dog threads equal to the number of CPUs. Max value is 1024. The minimum of tunable value and the number of cpus is taken as the number of dog threads during hashing. © 2013 IBM Corporation

IBM Power Systems Link Aggregation Configuration § smitty etherchannel Add An Ether. Channel / IBM Power Systems Link Aggregation Configuration § smitty etherchannel Add An Ether. Channel / Link Aggregation © 2013 IBM Corporation

IBM Power Systems Link Aggregation Configuration © 2013 IBM Corporation IBM Power Systems Link Aggregation Configuration © 2013 IBM Corporation

IBM Power Systems Link Aggregation Configuration § Mode – standard if network admin explicitly IBM Power Systems Link Aggregation Configuration § Mode – standard if network admin explicitly configures switch ports in a channel group for our server § Mode – 8023 ad if network admin configures LACP switch ports for our server. ad = Autodetect – if our server approaches switch with one adapter, switch sees one adapter. If our server approaches switch with a Link Aggregation, switch auto detects that. For 10 Gb, we should be LACP/8023 ad. § Hash Mode – default is by IP address, good fan out for one server to many clients. But will transmit to a given IP peer on only one adapter § Hash Mode – src_dst_port, uses source and destination port numbers in hash. Multiple connections between two peers likely hash over different adapters. Best opportunity for multiadapter bandwidth between two peers. Whichever mode used, we prefer hash_mode=src_dst_port § Backup adapter – optional, standby, single adapter to same network on a different switch. Would not use this for link aggregations underneath SEA Failover configuration. Also would likely not use on a large switch, where active adapters are connected to different, isolated “halves” of a large “logical” switch. § Address to ping – Not typically used. Aids detection for failover to backup adapter. Needs to be a reliable address, but perhaps not the default gateway. Do not use this on the Link Aggregation, if SEA will be built on top of it. Instead use netaddr attribute on SEA, and put VIO IP address on SEA interface. § Using mode and hash_mode, AIX readily transmits on all adapters. You may find switch delivers receives on only adapter – switches must enable hash_mode setting as well. © 2013 IBM Corporation

IBM Power Systems Link Aggregation Configuration § $ mkvdev –lnagg ent 0, ent 1 IBM Power Systems Link Aggregation Configuration § $ mkvdev –lnagg ent 0, ent 1 -attr mode=8023 ad hash_mode=src_dst_port ent 8 available en 8 et 8 § There is no largesend, large_send attribute on a link aggregation © 2013 IBM Corporation

IBM Power Systems Shared Ethernet Adapter SEA Configuration § § Create SEA If you IBM Power Systems Shared Ethernet Adapter SEA Configuration § § Create SEA If you are using netaddr “address to ping, ” you must have VIO IP on the SEA interface netaddr not typically needed With SEA, VIO local IP config is often on a “side” virtual adapter § $ mkvdev -sea ent 8 -vadapter ent. N -defaultid Y -attr ha_mode=auto ctl_chan=ent. K netaddr= largesend=1 large_receive=yes ent 10 available en 10 et 10 § You want largesend on the SEA, and mtu_bypass (largesend) on AIX LPAR ip interfaces. largesend on AIX ip interfaces boosts thruput LPAR to LPAR within the machine, with no additional cpu utilization. Along with that, largesend on the SEA will LOWER sending AIX LPAR cpu, and sending VIO cpu, when transferring to a peer outside the machine. © 2013 IBM Corporation

IBM Power Systems Shared Ethernet Adapter SEA Configuration § Some cautions with largesend § IBM Power Systems Shared Ethernet Adapter SEA Configuration § Some cautions with largesend § POWER Linux does not handle largesend on SEA. It has negative performance impact on sftp and nfs in Redhat RHEL. § A few customers have had trouble with what has been referred to as a DUP-ACK storm, and they are considering VIO ifix IV 12424 http: //www-01. ibm. com/support/docview. wss? uid=isg 1 IV 12424 § A potential “denial of service” attack can be waged against largesend, using a "specially-crafted sequence of packets. “ ifixes for various AIX levels are listed here http: //www 14. software. ibm. com/webapp/set 2/subscriptions/pqvcmjd? mode=18&ID=5706&myns=paix 53&m ync=E § largesend is NOT a universal problem, and these ifixes are not believed to be widely needed. © 2013 IBM Corporation

IBM Power Systems Shared Ethernet Adapter SEA Failover switch port settings § One vendor’s IBM Power Systems Shared Ethernet Adapter SEA Failover switch port settings § One vendor’s suggestions on portfast, and bpdu-guard http: //www. cisco. com/en/US/docs/switches/lan/catalyst 4000/7. 4/configuration/guide/stp_enha. html § Port. Fast causes a switch or trunk port to enter the spanning tree forwarding state immediately, bypassing the listening and learning states. (Faster SEA Failover) § Caution multiple times in the article - You can use Port. Fast to connect a single end station or a switch port to a switch port. If you enable Port. Fast on a port connected to another Layer 2 device, such as a switch, you might create network loops. § Because Port. Fast can be enabled on nontrunking ports connecting two switches, spanning tree loops can occur because BPDUs are still being transmitted and received on those ports. (Remember, SEA is a virtual switch. ) § Console> (enable) set spantree portfast bpdu-guard 6/1 enable § Bpdu-guard is not a panacea; it is disabled if you are VLAN tagging. When you are configuring SEA Failover, if you have any doubt about configuration, review it with Support Line to avoid BPDU storm. © 2013 IBM Corporation

IBM Power Systems Shared Ethernet Adapter SEA Configuration § VIO local IP config, on IBM Power Systems Shared Ethernet Adapter SEA Configuration § VIO local IP config, on SEA IP interface $ mktcpip (no flags, gives a helpful usage message) $ mktcipip -hostname -inetaddr ip_addr -interface en 10 -netmask 255. 0 -gateway_ip -nsrvaddr dns_ip -nsrvdomain your. domain. com –start $ netstat -state –num Name Mtu Network en 10 1500 link#2 en 10 1500 9. 19. 98 lo 0 16896 link#1 lo 0 16896 127 lo 0 16896 : : 1%1 § § Address 42. d 4. 90. 0. f 0. 4 9. 19. 98. 41 127. 0. 0. 1 Ipkts Ierrs 52052352 6724868 Opkts Oerrs 0 12046192 0 6724868 Coll 0 0 0 0 0 If you have mtu_bypass attribute on SEA interface, you will want set it on for bulky traffic to and from VIO local IP. Most bulky traffic thru SEA, is not destined for VIO local IP. What traffic is? Live Partition Mobility, transferring memory state of the moving LPAR is done VIO to VIO. $ lsdev -dev en 10 -attr | grep mtu_bypass off Enable/Disable largesend for virtual Ethernet § $ chdev -dev en 10 -attr mtu_bypass=on en 10 changed § mtu_bypass observed at ioslevel 2. 2. 1. 1, and oslevel –s 6100 -04 -05 -1015. Earlier than this, use root command line # ifconfig en 10 largesend ; echo ”ifconfig en 10 largesend” >>/etc/rc. net © 2013 IBM Corporation

IBM Power Systems Shared Ethernet Adapter Failover Client LPAR VIO Server 1 Client LPAR IBM Power Systems Shared Ethernet Adapter Failover Client LPAR VIO Server 1 Client LPAR The most widely done, most well understood config ent 1, a “side” virtual adapter for the VIO local IP config – isolation from SEA config ent 4 SEA ent 0 ent 3 99 IP Addr ent 2 ent 1 1 1 VIO Server 2 IP Address VLAN 1 ent 0 1 ent 4 SEA IP Addr ent 1 ent 2 1 1 ent 3 99 ent 0 Control Channel VLAN 99 mkvdev –sea ent 0 –vadapter ent 2 –defaultid 1 –attr ha_mode=auto ctl_chan=ent 3 Physical adapter ent 0 may be an aggregation of adapters Ethernet Switch VLAN 1 SEA Failover supports VLAN tagging – multiple IP subnets, thru single SEA, to different client LPARs © 2013 IBM Corporation

IBM Power Systems SEA Configuration, VLAN tagged configuration § 10 Gb is a large IBM Power Systems SEA Configuration, VLAN tagged configuration § 10 Gb is a large pipe, and many start to consider VLAN tagging, to consolidate networks onto one adapter. § Lets stay with the original config, as shown in Section 3. 6, Fig 3 -8 in redp 4194. http: //www. redbooks. ibm. com/abstracts/redp 4194. ht ml § Trunked virtual adapter, ent 1 in VIO, is on an unused PVID, 199 in example. § Communication VLANs are added as 802. 1 q “additional VLANs” 10, 20, 30 § SEA Failover, dual VIOs supported here, but not shown § Every VLAN device on top of SEA not required, unless VIO requires a local IP on each subnet – not typical. © 2013 IBM Corporation

IBM Power Systems Tagged configuration – VLAN awareness in SMS § Your network admin IBM Power Systems Tagged configuration – VLAN awareness in SMS § Your network admin might notify you that your switch port is configured as follows. They seem to be moving away from “access” ports, to “trunk” ports. interface Ethernet 1/18 switchport mode trunk switchport trunk allowed vlan 10, 20, 30 spanning-tree port type edge trunk § SEA will be configured with a physical adapter, and a bridged virtual adapter, with 802. 1 q VLANs 10, 20, 30, just as seen on previous slide § Since 2001, if you had AIX 5. 1 running, and you were putting IP directly on a physical adapter, we could add VLAN devices on top the physical for 10, 20, 30 (smitty vlan), and configure IPs on those subnets. We have handled VLANs in the operating system for a long time. § What do we lack? There has been no way to specify a VLAN tag on the physical adapter in SMS. I want to network boot a physical adapter, on VLAN 20, and install the first VIO server on the machine. § Some workarounds - Network boot VIO on a different physical adapter, plugged to an access port - Install VIO 1 from DVD media, configure tagged SEA, and network install VIO 2 on virtual adapter, thru VIO 1 SEA - You might have success adding a “native” VLAN specification on the switch port § § interface Ethernet 1/18 switchport mode trunk switchport trunk native vlan 20 switchport trunk allowed vlan 10, 20, 30 spanning-tree port type edge trunk This might affect the use of “unused” VLAN id on the bridged virtual adapter in SEA; you’ll have some experimentation here POWER Firmware stream 760 adds VLAN awareness; the ability to specify a VLAN tag on an Ethernet adapter in SMS, for network boot Observed on a 780 D model, firmware AM 760_051 © 2013 IBM Corporation

IBM Power Systems Tagged configuration – VLAN awareness in SMS § Version AM 760_051 IBM Power Systems Tagged configuration – VLAN awareness in SMS § Version AM 760_051 SMS 1. 7 (c) Copyright IBM Corp. 2000, 2008 All rights reserved. ---------------------------------------Network Parameters Port 1 - IBM 2 PORT PCIe 10/1000 Base-TX Adapter: U 2 C 4 E. 001. DBJ 8765 -P 2 -C 4 -T 1 1. IP Parameters 2. Adapter Configuration 3. Ping Test 4. Advanced Setup: BOOTP New option on menu at Firmware AM 760_051 ---------------------------------------Navigation keys: M = return to Main Menu ESC key = return to previous screen X = e. Xit System Management Services ---------------------------------------Type menu item number and press Enter or select Navigation key: © 2013 IBM Corporation

IBM Power Systems Tagged configuration – VLAN awareness in SMS § Version AM 760_051 IBM Power Systems Tagged configuration – VLAN awareness in SMS § Version AM 760_051 SMS 1. 7 (c) Copyright IBM Corp. 2000, 2008 All rights reserved. ---------------------------------------Advanced Setup: BOOTP Port 1 - IBM 2 PORT PCIe 10/1000 Base-TX Adapter: U 2 C 4 E. 001. DBJ 8765 -P 2 -C 4 -T 1 1. Bootp Retries 5 2. Bootp Blocksize 512 3. TFTP Retries 5 4. VLAN Priority 0 5. VLAN ID 0 (default - not configured) Specify your VLAN tag here, then escape to perform 3. ping test ---------------------------------------Navigation keys: M = return to Main Menu ESC key = return to previous screen X = e. Xit System Management Services ---------------------------------------Type menu item number and press Enter or select Navigation key: © 2013 IBM Corporation

IBM Power Systems Tagged configuration – VLAN awareness § Suppose you are running AIX, IBM Power Systems Tagged configuration – VLAN awareness § Suppose you are running AIX, and you want to kick off a network boot and reinstall from the command line. Yes, you can specify VLAN tag on the bootlist command (AIX 6100 -08 or 7100 -02): # bootlist -rm normal ent 0 client= bserver= gateway= vlan_tag= [vlan_pri= ] hdisk 0 hdisk 1 © 2013 IBM Corporation

IBM Power Systems 10 Gb SEA Configuration, both sides active § Field developed solution IBM Power Systems 10 Gb SEA Configuration, both sides active § Field developed solution for shops not satisfied with idle SEA standby 10 Gb adapter and switch port. § Independent SEAs configured in each VIO, on same PVIDs, tagged § How do they avoid BPDU Loop storm? Different Virtual Switches, and NIB in the client LPAR § http: //www 03. ibm. com/support/techdocs/atsmastr. nsf/fe 582 a 1 e 48331 b 5585256 de 50062 ae 1 c/81 c 729 a 840 b 213 b 98625779 e 000722 f 4/$FILE/Powe r. VM-Virtual. Switches-091010. pdf (google “vio sea 10 gb miller” look for article titled “Using Virtual Switches in Power. VM to Drive Maximum Value of 10 Gb”) © 2013 IBM Corporation

IBM Power Systems SEA Configuration, ha_mode=sharing VIOS (Primary) Partition 1 Partition 2 Partition 3 IBM Power Systems SEA Configuration, ha_mode=sharing VIOS (Primary) Partition 1 Partition 2 Partition 3 AIX Linux AIX SEA VIOS (Backup) SEA Adapter (Pri = 1) Control Channel Adapter (Pri = 2) Trunk Adapter (Pri = 1) VID = 10, 20 Trunk Adapter (Pri = 1) VID = 30, 40 Virtual Ethernet VID = 10 VID = 20 VID = 30 Trunk Adapter (Pri = 2) VID = 10, 20 Trunk Adapter (Pri = 2) VID = 30, 40 Physical Ethernet Adapter Control Channel Physical Ethernet Adapter VLAN 12 Etherne t Network POWER Hypervisor VLAN 99 (control channel) Etherne t Network Post Load Sharing Configuration VIO client 1 & 2 are bridged by primary VIOS, client 3 is bridged by backup VIOS Active Trunk Adapter Inactive Trunk Adapter © 2013 IBM Corporation

IBM Power Systems SEA Configuration ha_mode=sharing § § § § § VIO 2. 2. IBM Power Systems SEA Configuration ha_mode=sharing § § § § § VIO 2. 2. 1. 1 required Still a single SEA Failover configuration – single ctl_chan At least 2 (up to 16) trunked virtual adapters joined into each SEA Previous slide shows trunked virtual for VLAN 10, 20, and a trunked virtual for VLAN 30, 40, in each SEA Previous slide is tagged example. May be untagged as well. Both trunked adapters in SEA must have external access checkbox, and same trunk priority (e. g. both are 1 in vio 1, and both are 2 in vio 2) Set ha_mode=sharing on Primary SEA first, then Secondary $ chdev –dev ent. X –attr ha_mode=sharing Secondary offers sharing to Primary Client LPARs do not require NIB configuration POWER Admin balances placement of LPARs on VLANs © 2013 IBM Corporation

IBM Power Systems SEA Configuration ha_mode=sharing Sample config § tbvio 1 adapter 9 (ent IBM Power Systems SEA Configuration ha_mode=sharing Sample config § tbvio 1 adapter 9 (ent 10) PVID 160 802. 1 q 162 164 Pri 1 § tbvio 2 adapter 10 (ent 10) PVID 160 802. 1 q 162 164 Pri 2 adapter 10 (ent 11) PVID 170 802. 1 q 172 174 Pri 1 adapter 12 (ent 11) PVID 170 802. 1 q 172 174 Pri 2 adapter 11 (ent 12) PVID 199 adapter 13 (ent 12) PVID 199 § In both VIOs, physical ent 6 is one port on FCo. E adapter 5708 $ mkvdev –sea ent 6 –vadapter ent 10, ent 11 –default ent 10 –defaultid 160 –attr ha_mode=sharing largsend=1 large_receive=yes ctl_chan=ent 12 ent 9 available © 2013 IBM Corporation

IBM Power Systems SEA Configuration ha_mode=sharing Sample config § entstat command on SEA shows IBM Power Systems SEA Configuration ha_mode=sharing Sample config § entstat command on SEA shows a number of things. First, tbvio 1: $ entstat -all ent 9 | more. . . VLAN Ids : ent 11: 170 172 174 ent 10: 160 162 164. . . VID shared: 160 162 164 Number of Times Server became Backup: 0 Number of Times Server became Primary: 1 High Availability Mode: Sharing Priority: 1 § And now in tbvio 2. . . VLAN Ids : ent 11: 170 172 174 ent 10: 160 162 164. . . VID shared: 170 172 174 Number of Times Server became Backup: 1 Number of Times Server became Primary: 0 High Availability Mode: Sharing Priority: 2 © 2013 IBM Corporation

IBM Power Systems SEA Configuration ha_mode=sharing Sample config § Just a quick check, that IBM Power Systems SEA Configuration ha_mode=sharing Sample config § Just a quick check, that I put all virtual adapters on the correct virtual switch: $ entstat -all ent 9 | grep "^Switch ID: “ Switch ID: vswitch 1 § Above, how do you match adapter ID with ent name? § $ lsdev -type adapter -field name physloc | grep ent 0 U 78 C 0. 001. DBJ 4725 -P 2 -C 8 -T 1 ent 1 U 9179. MHB. 1026 D 1 P-V 1 -C 2 -T 1 ent 2 U 9179. MHB. 1026 D 1 P-V 1 -C 3 -T 1 ent 3 U 9179. MHB. 1026 D 1 P-V 1 -C 4 -T 1 ent 4 ent 5 U 9179. MHB. 1026 D 1 P-V 1 -C 7 -T 1 ent 6 U 78 C 0. 001. DBJ 4725 -P 2 -C 6 -T 1 ent 7 U 78 C 0. 001. DBJ 4725 -P 2 -C 6 -T 2 ent 8 U 9179. MHB. 1026 D 1 P-V 1 -C 8 -T 1 ent 9 ent 10 U 9179. MHB. 1026 D 1 P-V 1 -C 9 -T 1 ent 11 U 9179. MHB. 1026 D 1 P-V 1 -C 10 -T 1 ent 12 U 9179. MHB. 1026 D 1 P-V 1 -C 11 -T 1 © 2013 IBM Corporation

IBM Power Systems Dynamic VLANs § Perhaps you have a running configuration, and you IBM Power Systems Dynamic VLANs § Perhaps you have a running configuration, and you need to add an additional VLAN. § First, what is running in VIO? $ entstat -all ent 9 | more. . . VLAN Ids : ent 11: 170 172 174 ent 10: 160 162 164. . . VID shared: 160 162 164 § DLPAR, and “edit” the adapter © 2013 IBM Corporation

IBM Power Systems Dynamic VLANs § Checkbox the adapter, and actions -> edit Type IBM Power Systems Dynamic VLANs § Checkbox the adapter, and actions -> edit Type in new VLAN id, hit Add, hit OK © 2013 IBM Corporation

IBM Power Systems Dynamic VLANs § Note the warning to make the same change IBM Power Systems Dynamic VLANs § Note the warning to make the same change on SEA in the other VIO, hit OK Check entstat again for new VLAN id $ entstat -all ent 9 | more. . . VLAN Ids : ent 11: 170 172 174 ent 10: 160 162 164 182. . . VID shared: 160 162 164 182 © 2013 IBM Corporation

IBM Power Systems SEA Configuration ha_mode=sharing § If you have updated existing VIO to IBM Power Systems SEA Configuration ha_mode=sharing § If you have updated existing VIO to 2. 2. 1. 1, you might be missing in ODM, sharing as valid value for ha_mode. § Retrieve ODM stanza # odmget -q attribute=ha_mode Pd. At >thing # cat thing § Pd. At: uniquetype = "adapter/pseudo/sea“ attribute = "ha_mode“ deflt = "disabled“ values = "disabled, auto, standby“ width = "“ type = "R“ generic = "DU“ rep = "n“ nls_index = 88 # odmdelete -o Pd. At -q attribute=ha_mode 0518 -307 odmdelete: 1 objects deleted © 2013 IBM Corporation

IBM Power Systems SEA Configuration ha_mode=sharing § Edit thing, add sharing to values # IBM Power Systems SEA Configuration ha_mode=sharing § Edit thing, add sharing to values # cat thing § Pd. At: uniquetype = "adapter/pseudo/sea“ attribute = "ha_mode“ deflt = "disabled“ values = "disabled, auto, standby, sharing “ width = "“ type = "R“ generic = "DU“ rep = "n“ nls_index = 88 # odmadd thing # exit $ chdev –dev ent. X –attr ha_mode=sharing § Development is working on a fix for this © 2013 IBM Corporation

IBM Power Systems SEA Throughput § $ seastat –d ent 5 (In VIO, which IBM Power Systems SEA Throughput § $ seastat –d ent 5 (In VIO, which LPARs are getting how much traffic thru SEA? ) ======================================== Advanced Statistics for SEA Device Name: ent 5 ======================================== MAC: 32: 43: 23: 7 A: A 3: 02 -----------VLAN: None VLAN Priority: None Hostname: mob 76. dfw. ibm. com IP: 9. 19. 51. 76 Transmit Statistics: Receive Statistics: -------------------Packets: 9253924 Packets: 11275899 Bytes: 10899446310 Bytes: 6451956041 ======================================== MAC: 32: 43: 23: 7 A: A 3: 02 -----------VLAN: None VLAN Priority: None Transmit Statistics: Receive Statistics: -------------------Packets: 36787 Packets: 3492188 Bytes: 2175234 Bytes: 272207726 ======================================== MAC: 32: 43: 2 B: 33: 8 A: 02 -----------VLAN: None VLAN Priority: None Hostname: sharesvc 1. dfw. ibm. com IP: 9. 19. 51. 239 Transmit Statistics: Receive Statistics: -------------------Packets: 10 Packets: 644762 Bytes: 420 Bytes: 484764292 © 2013 IBM Corporation

IBM Power Systems SEA Throughput § #. /sk_sea (what is total aggregate packet count IBM Power Systems SEA Throughput § #. /sk_sea (what is total aggregate packet count on SEA? In VIO, as root, after $ oem_setup_env) sk_sea -i interval -a adapter -i interval (seconds) -a adapter -h or -? Usage § #. /sk_sea -i 10 -a ent 5 net to SEA--> 341656869 SEA to virt--> 341656842 250416752 <--to net from SEA 250416752 <--to SEA from virt net to SEA--> 1089 SEA to virt--> 1089 535 <--to net from SEA 535 <--to SEA from virt net to SEA--> 804 SEA to virt--> 804 523 <--to net from SEA 523 <--to SEA from virt net to SEA--> 902 SEA to virt--> 902 537 <--to net from SEA 537 <--to SEA from virt net to SEA--> 1125 SEA to virt--> 1125 620 <--to net from SEA 620 <--to SEA from virt © 2013 IBM Corporation

IBM Power Systems SEA Throughput § chdev –dev ent 7 –attr accounting=enabled § VIO IBM Power Systems SEA Throughput § chdev –dev ent 7 –attr accounting=enabled § VIO topas, then uppercase E Topas Monitor for host: mdvio 1 Interval: 2 Wed Apr 3 12: 15: 55 2013 ======================================== Network KBPS I-Pack O-Pack KB-In KB-Out ent 7 (SEA PRIM) 4825. 6 3100. 1 3099. 6 2412. 8 |--ent 5 (PHYS) 2412. 9 1794. 3 1306. 8 2293. 5 119. 4 |--ent 2 (VETH) 2412. 7 1305. 8 1792. 8 119. 3 2293. 4 --ent 4 (VETH CTRL) 1. 9 0. 0 5. 5 0. 0 1. 9 lo 0 0. 0 To see SEA traffic in VIO topas, you must have IP address on the SEA interface (en 7 here), and not on a “side” virtual adapter © 2013 IBM Corporation

IBM Power Systems Virtual Switch – VEB versus VEPA mode § Virtual Ethernet Bridging, IBM Power Systems Virtual Switch – VEB versus VEPA mode § Virtual Ethernet Bridging, VEB mode (what we’ve always done) § Virtual Ethernet Port Aggregator, VEPA mode, part of IEEE 802. 1 Qbg. (This is not Link Aggregation) § At HMC 777, and POWER firmware stream 760, we now can specify that a virtual switch is VEB or VEPA. § Attaching an LPAR to a VEPA mode switch requires Virtual Station Interface (VSI) configuration information for the LPAR, from the network administrator § You may also see the acronym VSN, Virtual Server Networking § VEPA gives us the ability to isolate LPARs that are on the same subnet. LPAR to LPAR traffic for these peers is forced out of the machine, to the customer enterprise network, subject to their firewall and filtering © 2013 IBM Corporation

IBM Power Systems Virtual Switch in Virtual Ethernet Bridging (VEB) mode Virtual to physical IBM Power Systems Virtual Switch in Virtual Ethernet Bridging (VEB) mode Virtual to physical bridging allowed We never bridge layer 2 physical to physical, nor do we IP route layer 3 Virtual to virtual within hypervisor virtual switch. Some shops want to restrict this © 2013 IBM Corporation

IBM Power Systems Virtual Switch in Virtual Ethernet Port Aggregation (VEPA) mode Virtual switch IBM Power Systems Virtual Switch in Virtual Ethernet Port Aggregation (VEPA) mode Virtual switch in VEPA mode © 2013 IBM Corporation

IBM Power Systems Virtual switch VEPA Mode LPAR to LPAR traffic forced out to IBM Power Systems Virtual switch VEPA Mode LPAR to LPAR traffic forced out to the Enterprise switch for firewall and filtering © 2013 IBM Corporation

IBM Power Systems Before VEPA, Isolation with VEB mode Up to 16 LPARs, each IBM Power Systems Before VEPA, Isolation with VEB mode Up to 16 LPARs, each on its own PVID VIO Server 1 ent 4 SEA Up to 16 virtuals join into one SEA Tagged or untagged, these will not reach other within the hypervisor. VIO Server 2 Client LPAR Client LPAR ent 4 SEA ent 0 ent 3 99 ent 3 ent 0 99 PVID 1 PVID 2 PVID 3 PVID 4 PVID 5 PVID 6 ctl_chan 99 Ethernet Switch ctl_chan, SEA failover, ha_mode=sharing might work here © 2013 IBM Corporation

IBM Power Systems VSI discovery and configuration Do not try to configure VEPA, VSI IBM Power Systems VSI discovery and configuration Do not try to configure VEPA, VSI before the network admin © 2013 IBM Corporation

IBM Power Systems VEPA – Server must be VSN Phase 2 Capable § hmca IBM Power Systems VEPA – Server must be VSN Phase 2 Capable § hmca 62: ~ # lssyscfg -r sys -m wiz -F name, state, ipaddr, type_model, serial_num, vsn_phase 2_capable, vsi_on_veth_capable wiz, Operating, 10. 33. 5. 110, 8231 -E 2 B, 108854 P, 1, 1 HMC command line or HMC browser GUI © 2013 IBM Corporation

IBM Power Systems VEPA - Virtual Switch: List Virtual Switch New property § Switches IBM Power Systems VEPA - Virtual Switch: List Virtual Switch New property § Switches are created in VEB mode. Modify switch mode after SEAs are configured © 2013 IBM Corporation

IBM Power Systems VEPA - Virtual Ethernet adapter VSI Profile data § Can be IBM Power Systems VEPA - Virtual Ethernet adapter VSI Profile data § Can be configured at LPAR creation, or DLPAR modified Virtual Station Interface configured on the Advanced tab © 2013 IBM Corporation

IBM Power Systems VEPA – No VSI Profile checkbox § If you have Virtual IBM Power Systems VEPA – No VSI Profile checkbox § If you have Virtual Station Interface config info on virtual Ethernet adapter in profile, but it cannot configure, Activate will fail § Go back to activate, and checkbox “No VSI Profile” to bypass your config info © 2013 IBM Corporation

IBM Power Systems VEPA – Other configuration effects § Network admin will also provide IBM Power Systems VEPA – Other configuration effects § Network admin will also provide vsi_manager_id, vsi_type_id, and vsi_type_version attribute values that we use as advanced attributes on the bridged virtual Ethernet adapter that we join into SEA. VSI- Virtual Station Interface § lldpd was already running on the VIO server at 2. 2 $ lssrc -s lldpd Subsystem Group PID Status lldpd tcpip 6750426 active § As root on VIO, you can check if any SEAs are already under lldpctl # lldpctl show portlist lldpctl: 0812 -001 lldpd is currently not managing any ports § There is an lldpsvc attribute on the SEA that you create. You will chdev it $ lsdev -dev ent 7 -attr | grep lldpsvc no Enable IEEE 802. 1 qbg services $ chdev –dev ent 7 –atttr lldpsvc=yes § If you ever need to remove this SEA, you must first set lldpsvc back to no. § The control channel between two VIOs, two SEAs, must NOT attach to the VEPA switch; it must attach to a VEB switch. § Physical adapter in a VEPA SEA may NOT be link aggregation or Ether. Channel. Single 10 Gb adapter, SEA Failover, ha_mode=sharing, potentially still 20 Gb bandwidth. § http: //pic. dhe. ibm. com/infocenter/powersys/v 3 r 1 m 5/advanced/content. jsp? topic=/p 7 hb 1/iphb 1_config_vsn. htm © 2013 IBM Corporation

IBM Power Systems AIX Virtual Ethernet adapter § Virtual adapters in AIX in high IBM Power Systems AIX Virtual Ethernet adapter § Virtual adapters in AIX in high end (large fabric bus, 770 -795) P 7 machines # chdev -l ent 0 -a dcbflush_local=yes –P (in nim script, before first boot) ent 0 changed § ifconfig largesend onto AIX interfaces # ifconfig en 0 largesend # echo “ifconfig en 0 largesend” >> /etc/rc. net (for reboot) § At 7100 -01 -01 -1141, (also 6100 -04 -05) we see the mtu_bypass ODM attribute – sets largesend # chdev –l en 0 –a mtu_bypass=on changes configured interface dynamically, and inserts ODM value; -P not required © 2013 IBM Corporation

IBM Power Systems AIX Virtual Ethernet adapter If you happen to observe hypervisor send IBM Power Systems AIX Virtual Ethernet adapter If you happen to observe hypervisor send or receive failures… # entstat -d ent 0 | grep -i hypervisor Hypervisor Send Failures: 0 Hypervisor Receive Failures: 4250 § You could review buffer allocation history on the virtual adapter # entstat –d ent 0 … … Receive Information Receive Buffers Buffer Type Tiny Small Medium Large Huge Min Buffers 512 128 24 Max Buffers 2048 256 64 Allocated 512 128 24 Registered 512 511 128 24 History Max Allocated 522 1349 133 29 47 Lowest Registered 502 123 19 § Consider increasing minimum tiny and minimum small to a level above Max Allocated # chdev –l ent 0 –a min_buf_tiny=1024 -P # chdev –l ent 0 –a min_buf_small=2048 -P © 2013 IBM Corporation

IBM Power Systems Default TCP settings are usually sufficient # no -o use_isno = IBM Power Systems Default TCP settings are usually sufficient # no -o use_isno = 1 Remember, Interface specific network options isno on by default. What you see with ifconfig is what is in force # ifconfig en 0: flags=1 e 080863, 4 c 0 inet 9. 19. 51. 148 netmask 0 xffffff 00 broadcast 9. 19. 51. 255 tcp_sendspace 262144 tcp_recvspace 262144 rfc 1323 1 For physical adapters in AIX, tcp_sendspace, tcp_recvspace, rfc 1323 may not be at the values shown on the above ifconfig # chdev –l en 0 –a tcp_sendspace=262144 # chdev –l en 0 –a tcp_recvspace=262144 # chdev –l en 0 –a rfc 1323=1 © 2013 IBM Corporation

IBM Power Systems TCP small packet, chatty conversations § There are two ways that IBM Power Systems TCP small packet, chatty conversations § There are two ways that TCP slows down conversations that send small packets § Nagle algorithm on sender prevents more than one small packet outstanding – you must wait for small segment to be acknowledged before you may transmit another § Delayed Acknowledgement on receiver says it may wait up to 200 ms before sending acknowledgement, just In case data arrives on the socket to be transmitted § TCP does a good job of aggregating small writes to the socket into full size segments, and then transmitting. But if you KNOW you have a small packet, time sensitive application, you can… § # Ifconfig en 0 tcp_nodelay 1 (a sender setting turn off nagle) # chdev –l en 0 –a tcp_nodelay=1 (a sender setting turn off nagle for reboot) # no –p –o tcp_nodelayack=1 (a receiver setting turn off delay acknowledge) Remember that both peers on a TCP connection act as sender and receiver § Optional – no –p –o tcp_nagle_limit=0 (or 1), no –p –o tcp_nagleoverride=1 © 2013 IBM Corporation

IBM Power Systems TCP small packet, chatty conversations § What if you make the IBM Power Systems TCP small packet, chatty conversations § What if you make the changes on the previous slide, and see no difference? Your sockets based application may ALREADY be setting these options on the socket. Unless you are editing and compiling the source code, you don’t control this § int on=1; setsockopt(s, IPPROTO_TCP, TCP_NODELAYACK, &on, sizeof(on)); http: //publib. boulder. ibm. com/infocenter/pseries/v 5 r 3/topic/com. ibm. aix. commtechref/doc/commtrf 2/setsockopt. htm © 2013 IBM Corporation

IBM Power Systems Default NFS Settings § Default NFS settings are usually sufficient # IBM Power Systems Default NFS Settings § Default NFS settings are usually sufficient # nfso -F -a | egrep "threads|socketsize“ nfs_max_threads = 3891 nfs_socketsize = 600000 nfs_tcp_socketsize = 600000 statd_max_threads = 50 § AIX NFS client mount options dio – direct io, bypass AIX caching of file pages written to NFS server (think Oracle rman backups to NAS). Reduces memory demand in AIX, reduces lrud running, reduces scans and frees. Be aware, this turns off readahead. If you ever had to restore from the same NAS, umount, and mount without dio biods=n AIX 53 defaulted to 4 biods per NFS mount, not sufficient. AIX 61, 71 default to 32 biods per NFS mount, usually sufficient. § Do not expect NFS throughput to be close to what you measure at the TCP layer. © 2013 IBM Corporation

IBM Power Systems largesend large_receive attributes for performance § ifconfig en 0 largesend, LPAR IBM Power Systems largesend large_receive attributes for performance § ifconfig en 0 largesend, LPAR to LPAR, virtual to virtual, in same machine single stream, binary FTP dd test 1 Gb per second without largesend 3. 8 Gb per second with largesend slightly higher CPU on sender, slightly lower CPU on receiver § largesend=1 on SEA, with largesend on client interfaces – much lower CPU in sender, and in sending VIO § All with MTU at 1500. No jumbo frames requirement © 2013 IBM Corporation

IBM Power Systems largesend on client IP interface, and largesend on SEA, LPARs on IBM Power Systems largesend on client IP interface, and largesend on SEA, LPARs on different servers § (sender fahr on P 5, receiver mob 29 on P 7) From fahr to mob 29 (P 5 to P 7) largesend off on LPAR interfaces, largesend 0 on SEAs 8589934592 bytes sent in 82. 17 seconds cpu -. 59 -. 64 on receiver, 8589934592 bytes sent in 82. 46 seconds. 95 -1. 02 on sender 8589934592 bytes sent in 82. 17 seconds 8589934592 bytes sent in 84. 43 seconds From fahr to mob 29 (P 5 to P 7) largesend ON on LPAR interfaces, largesend 0 on SEAs 8589934592 bytes sent in 83. 53 seconds cpu -. 95 -1. 05 on sender, 8589934592 bytes sent in 82. 69 seconds. 93 -1. 00 on receiving VIO, 8589934592 bytes sent in 83. 25 seconds. 90 -. 99 on sending VIO 8589934592 bytes sent in 82. 85 seconds From fahr to mob 29 (P 5 to P 7) largesend ON on LPAR interfaces, largesend 1 on SEAs (slightly higher thruput, much lower sending CPU - did not reboot) 8589934592 bytes sent in 75. 15 seconds cpu -. 67 -. 69 on receiver, 8589934592 bytes sent in 74. 87 seconds. 40 -. 45 on sender (big drop), 8589934592 bytes sent in 75. 12 seconds 1. 02 -1. 04 on receiving VIO, 8589934592 bytes sent in 74. 79 seconds. 21 -. 22 on sending VIO (big drop) © 2013 IBM Corporation

IBM Power Systems Binary ftp with dd input, for network bandwidth § The test IBM Power Systems Binary ftp with dd input, for network bandwidth § The test is from AIX 5 L Practical Performance Tools and Tuning Guide § To test ftp bandwidth between two peers, start with a. netrc file in one user's home directory like this: http: //www. redbooks. ibm. com/abstracts/sg 246478. html? Open # cat. /. netrc machine mob 26. dfw. ibm. com login root password roots_password macdef init bin put "|dd if=/dev/zero bs=8 k count=2097152" /dev/null quit (note blank line in the file, after quit. chmod 700. netrc) © 2013 IBM Corporation

IBM Power Systems Binary ftp with dd input for network bandwidth § Now, repeatedly IBM Power Systems Binary ftp with dd input for network bandwidth § Now, repeatedly send an 16 GB file to the peer machine # while true do ftp mob 26. dfw. ibm. com done § Connected to mob 26. dfw. ibm. com. 220 mob 26. dfw. ibm. com FTP server (Version 4. 2 Wed Dec 23 11: 06: 15 CST 2009) read y. 331 Password required for root. 230 -Last unsuccessful login: Tue May 3 08: 49: 32 2011 on /dev/pts/0 from sig-9 -6 5 -204 -36. mts. ibm. co 230 -Last login: Thu May 26 17: 15 2011 on ftp from ams 28. dfw. ibm. com 230 User root logged in. bin 200 Type set to I. put "|dd if=/dev/zero bs=8 k count=2097152" /dev/null 200 PORT command successful. 150 Opening data connection for /dev/null. 2097152+0 records in. 2097152+0 records out. 226 Transfer complete. 17179869184 bytes sent in 44. 35 seconds (3. 783 e+05 Kbytes/s) local: |dd if=/dev/zero bs=8 k count=2097152 remote: /dev/null quit 221 Goodbye. ctl-c to quit. © 2013 IBM Corporation

IBM Power Systems Binary ftp with dd input for network bandwidth § These results IBM Power Systems Binary ftp with dd input for network bandwidth § These results were virtual to virtual, inside the machine § The math on that, 16 GB or 128 Gb, transferred in 44. 35 sec or 2. 88 Gb / sec on a single TCP connection. I had THREE of these sessions running simultaneously between two LPARs. Sender at about 4. 75 CPU, receiver about 1. 25 CPU. Both LPARs were uncapped, POWER 7 -SMT-4, 3. 1 Ghz, six virtuals in each. § We are seeing nearly 9 Gb / sec between these two peers, virtual to virtual inside a POWER 7. Default isno settings on interfaces - tcp_sendspace, tcp_recvspace both at 262144, rfc 1323 on. MTU still 1500, but with ifconfig en 0 largesend on both peers. § Another 10 Gb Performance Reference https: //www. ibm. com/developerworks/wikis/download/attachments/153124943/7_Power. VM_10 Gbit_Ethernet. pdf? version=1 Gareth Coates, IBM UK Advanced Technical Support suggests higher thruput may be obtained by more trunked virtual adapters in the SEA. ha_mode=sharing requires at least 2. In a tagged environment, perhaps you would use 4, for four different 802. 1 q “additional VLANs, ” one per trunked virtual adapter. © 2013 IBM Corporation

IBM Power Systems iperf as alternative to ftp with dd § Google “iperf aix” IBM Power Systems iperf as alternative to ftp with dd § Google “iperf aix” § http: //www. perzl. org/aix/index. php? n=Main. Iperf § (http: //rpmfind. net/linux/rpm 2 html/search. php? query=iperf for linux) © 2013 IBM Corporation

IBM Power Systems iperf server side Actually, ifconfig shows what is truly in force IBM Power Systems iperf server side Actually, ifconfig shows what is truly in force § [email protected] 08. dfw. ibm. com / # iperf –s ------------------------------Server listening on TCP port 5001 TCP window size: 16. 0 KByte (default) ------------------------------[ 4] local 9. 19. 51. 90 port 5001 connected with 9. 19. 51. 115 port 46393 [ ID] Interval Transfer Bandwidth [ 4] 0. 0 -10. 0 sec 8. 36 GBytes 7. 17 Gbits/sec [ 4] local 9. 19. 51. 90 port 5001 connected with 9. 19. 51. 115 port 46396 [ 5] local 9. 19. 51. 90 port 5001 connected with 9. 19. 51. 115 port 46397 [ 4] 0. 0 -10. 0 sec 6. 01 GBytes 5. 16 Gbits/sec [ 5] 0. 0 -10. 0 sec 6. 02 GBytes 5. 17 Gbits/sec [SUM] 0. 0 -10. 0 sec 12. 0 GBytes 10. 3 Gbits/sec [ 4] local 9. 19. 51. 90 port 5001 connected with 9. 19. 51. 115 port 46399 [ 5] local 9. 19. 51. 90 port 5001 connected with 9. 19. 51. 115 port 46400 [ 6] local 9. 19. 51. 90 port 5001 connected with 9. 19. 51. 115 port 46401 [ 4] 0. 0 -10. 1 sec 4. 78 GBytes 4. 05 Gbits/sec [ 5] 0. 0 -10. 1 sec 4. 66 GBytes 3. 95 Gbits/sec [ 6] 0. 0 -10. 1 sec 4. 88 GBytes 4. 14 Gbits/sec [SUM] 0. 0 -10. 1 sec 14. 3 GBytes 12. 1 Gbits/sec Single thread, 2 threads, 3 threads. LPAR to LPAR, within machine ^[email protected] 08. dfw. ibm. com / # ifconfig en 0: flags=1 e 080863, 4 c 0 inet 9. 19. 51. 90 netmask 0 xffffff 00 broadcast 9. 19. 51. 255 tcp_sendspace 262144 tcp_recvspace 262144 rfc 1323 1 © 2013 IBM Corporation

IBM Power Systems iperf client side § root@fahr / # iperf -c sq 08 IBM Power Systems iperf client side § [email protected] / # iperf -c sq 08 ------------------------------Client connecting to sq 08, TCP port 5001 TCP window size: 256 KByte (default) ------------------------------[ 3] local 9. 19. 51. 115 port 46393 connected with 9. 19. 51. 90 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0. 0 -10. 0 sec 8. 36 GBytes 7. 18 Gbits/sec [email protected] / # iperf -c sq 08 -P 2 ------------------------------Client connecting to sq 08, TCP port 5001 TCP window size: 256 KByte (default) ------------------------------[ 4] local 9. 19. 51. 115 port 46397 connected with 9. 19. 51. 90 port 5001 [ 3] local 9. 19. 51. 115 port 46396 connected with 9. 19. 51. 90 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0. 0 -10. 0 sec 6. 02 GBytes 5. 17 Gbits/sec [ 3] 0. 0 -10. 0 sec 6. 01 GBytes 5. 16 Gbits/sec [SUM] 0. 0 -10. 0 sec 12. 0 GBytes 10. 3 Gbits/sec [email protected] / # iperf -c sq 08 -P 3 ------------------------------Client connecting to sq 08, TCP port 5001 TCP window size: 256 KByte (default) ------------------------------[ 3] local 9. 19. 51. 115 port 46401 connected with 9. 19. 51. 90 port 5001 [ 4] local 9. 19. 51. 115 port 46399 connected with 9. 19. 51. 90 port 5001 [ 5] local 9. 19. 51. 115 port 46400 connected with 9. 19. 51. 90 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0. 0 -10. 0 sec 4. 88 GBytes 4. 19 Gbits/sec [ 4] 0. 0 -10. 0 sec 4. 78 GBytes 4. 10 Gbits/sec [ 5] 0. 0 -10. 0 sec 4. 66 GBytes 4. 01 Gbits/sec [SUM] 0. 0 -10. 0 sec 14. 3 GBytes 12. 3 Gbits/sec Hmm. Correct tcp_recvspace in this case Single thread 2 threads 3 threads. LPAR to LPAR, within machine © 2013 IBM Corporation

IBM Power Systems iperf client side continued § root@fahr /export/res # chdev -l en IBM Power Systems iperf client side continued § [email protected] /export/res # chdev -l en 0 -a mtu_bypass=off en 0 changed [email protected] /export/res # iperf -c sq 08 -P 3 ------------------------------Turning off largesend Client connecting to sq 08, TCP port 5001 TCP window size: 256 KByte (default) ------------------------------[ 5] local 9. 19. 51. 115 port 46634 connected with 9. 19. 51. 90 port 5001 [ 3] local 9. 19. 51. 115 port 46632 connected with 9. 19. 51. 90 port 5001 [ 4] local 9. 19. 51. 115 port 46633 connected with 9. 19. 51. 90 port 5001 [ ID] Interval Transfer Bandwidth [ 5] 0. 0 -10. 0 sec 455 MBytes 381 Mbits/sec 3 threads. LPAR [ 3] 0. 0 -10. 0 sec 452 MBytes 379 Mbits/sec to LPAR, within [ 4] 0. 0 -10. 0 sec 482 MBytes 404 Mbits/sec machine, MUCH [SUM] 0. 0 -10. 0 sec 1. 36 GBytes 1. 16 Gbits/sec LOWER THRUPUT © 2013 IBM Corporation

IBM Power Systems VIO 1 2. 2. 1. 4 6100 -06 iperf thruput – IBM Power Systems VIO 1 2. 2. 1. 4 6100 -06 iperf thruput – FCo. E adapter Client LPAR 1 7100 -01 -04 Client LPAR 2 7100 -01 -04 IP Addr VIO 2 2. 2. 1. 4 6100 -06 IP Addr ent 0 iperf 4 parallel 120 sec – 4. 60 Gb/sec VIO-VIO, IP on physical FCo. E 10 Gb physical adapters feature 5708 Server 9179 -MHB, 780 B model 4144 Mhz 5802 drawers PCIe Gen 1 0. 85 cpu on sender, 1. 20 cpu on receiver CSCO Nexus 5010 © 2013 IBM Corporation

IBM Power Systems VIO 1 2. 2. 1. 4 6100 -06 iperf thruput – IBM Power Systems VIO 1 2. 2. 1. 4 6100 -06 iperf thruput – FCo. E adapter Client LPAR 1 7100 -01 -04 Client LPAR 2 7100 -01 -04 IP Addr VIO 2 2. 2. 1. 4 6100 -06 IP Addr SEA ent 0 iperf 4 parallel 120 sec – 4. 31 Gb/sec VIO-VIO, IP on SEA FCo. E 10 Gb physical adapters feature 5708 Server 9179 -MHB, 780 B model 4144 Mhz 5802 drawers PCIe Gen 1 1. 0 CPU consumed on sender, 1. 10 consumed on receiver CSCO Nexus 5010 © 2013 IBM Corporation

IBM Power Systems VIO 1 2. 2. 1. 4 6100 -06 iperf thruput – IBM Power Systems VIO 1 2. 2. 1. 4 6100 -06 iperf thruput – FCo. E adapter, and SEA Client LPAR 1 7100 -01 -04 Client LPAR 2 7100 -01 -04 VIO 2 2. 2. 1. 4 6100 -06 IP Addr ent 4 SEA IP Address VLAN 201 ent 0 ent 2 201 ent 1 201 IP Address VLAN 202 ent 0 1 iperf 4 parallel 120 sec – 4. 16 Gb/sec Client-Client Independent SEAs – different PVIDs 201, 202 FCo. E 10 Gb physical adapters feature 5708 Server 9179 -MHB, 780 B model 4144 Mhz 5802 drawers PCIe Gen 1 CPU – sending AIX 1. 0, receiving AIX 1. 1 CPU – sending VIO 1. 0, receiving VIO 1. 3 ent 0 1 ent 1 202 ent 2 202 ent 0 LPAR 2, Receiving AIX netstat –I en 1 10 45 K packets/sec receive 23 K packets/sec transmit CSCO Nexus 5010 © 2013 IBM Corporation

IBM Power Systems iperf 10 Gb, SEA § If you are getting less than IBM Power Systems iperf 10 Gb, SEA § If you are getting less than the values on the two previous slides… § It appears that LARGESEND is on physical 10 Gb adapter interfaces automatically, but you can set it explicitly $ chdev –dev en 4 –attr mtu_bypass=on § Check that largesend, large_receive are on SEA at both ends $ chdev –dev ent 4 –attr largesend=1 large_receive=yes § Check that mtu_bypass (largesend) is on AIX client LPAR interfaces # chdev –l en 0 –a mtu_bypass=on § Watch CPU usage in both VIOs, both Client LPARs during iperf interval, and make sure no LPAR is pegged or starving © 2013 IBM Corporation

IBM Power Systems Introduction – Power is Performance Redefined § For the past 10 IBM Power Systems Introduction – Power is Performance Redefined § For the past 10 years, through sustained investment in the Power Systems platform, IBM has gone head-tohead with our competitors in the UNIX market segment, and we won. Today, according to IDC, IBM commands a 47 percent share of the worldwide UNIX market segment. 1 The next 10 years, however, will be about helping our clients implement smarter computing. And for the Power Systems platform, that battle will center on our alternative value proposition to Linux and Microsoft Windows technology on x 86 servers. To do that, we need to move beyond talking about pure system performance and industry benchmarks to placing a sharper focus on the performance of our clients’ businesses and the business benefits of the IBM Power Systems platform. § Industry benchmarks and our IBM POWER® processor technology are, and always will be, important. In the past they have enabled us to clearly and succinctly demonstrate our leadership position in terms of POWER processor performance versus our competitors. And we will continue to set those leadership benchmarks for the industry. But today the conversation must go beyond the performance of our systems and be framed in the broader context of smarter computing. Power is performance redefined sets out how we intend to shift the conversation with our clients. It defines how the Power Systems platform, and our associated software and services, can enable our clients to embrace smarter computing and derive business benefits from implementing big data, workload optimized infrastructure and cloud projects. In this messaging guide, you will learn that smarter computing isn’t a product we sell; it isn’t something clients can buy. Smarter computing is something our clients can implement through projects on the Power Systems platform to achieve better business outcomes. And it is smarter computing, enabled by IBM Power Systems servers, that will help our clients deliver services faster, with higher quality and with superior economics. 1 – IDC, “UNIX Server Rolling Four Quarter Average Revenue Share, ” Worldwide Quarterly Server Tracker, 2 Q 2011. © 2013 IBM Corporation

IBM Power Systems Smarter Computing – The Next Era of IT § But this IBM Power Systems Smarter Computing – The Next Era of IT § But this radical change is placing enormous pressure on businesses of practically every size, in just about every industry. The barriers of entry for competitors are lower. Companies can be blindsided by competitors that appear seemingly out of nowhere and seize market share by the handful. The need to be proactive, which requires an agile, flexible human and IT infrastructure, is critical. New ways of working, such as social media and mobile technologies, must be embraced ahead of the curve. Even customers are changing. Newly empowered by information, their expectations and the number of influencers that must be marketed to are rising. External forces such as compliance, regulations, privacy and security threats have to be addressed to survive. Ubiquitous mobile devices and instrumented, intelligent objects are creating unimaginable amounts of data volumes every day—data that must be analyzed to reveal systemic patterns, trends and insights that in turn inform the decisions businesses must make to stay competitive. And to deal with these changes, IT architectures must move from heterogeneous silos to flexible, workload optimized infrastructures. All of these forces must be dealt with in an era of tighter budgets and the directive to do more with less. § But smarter companies are thinking differently about computing and how to deal with data that is growing exponentially and can become stagnant and unexploited simply because of its sheer volume. These smarter companies are breaking the vicious cycle of untrustworthy data, inflexibility and sprawl. They are reversing the always-guessing, reactive, costly IT conundrum by embracing what we call smarter computing. What smarter computing entails is the creation of an IT infrastructure that is designed for data and that harnesses enterprise information to unlock insights and make better, more informed choices. Organizations embracing smarter computing are creating IT infrastructures that are tuned to the task of the business, helping reduce costs by driving greater efficiency and performance for virtually every workload. And smarter computing is managed with cloud technologies, speeding delivery of services and creating an IT environment that has practically no boundaries, enabling the reinvention of processes and driving innovation. § But to be clear, smarter computing isn’t just a catch phrase or a lofty idea. It’s not a metaphor, intro paragraph or headline. It’s what the IBM Power Systems platform enables our clients to do. And this is the basis for our new brand identity Power is performance redefined. It’s about how we believe clients measure IT performance – focusing less on processor performance and more on business performance. It’s about our clients’ ability to react more quickly to change, to innovate faster, and to seize new opportunities as they arise. It’s about their ability to handle rapid growth and combat emerging competitors while responding to demands to meet increasingly higher service levels. And it’s about doing more with less and delivering services within constrained IT budgets. We believe that with a new focus on business performance, we will enable our clients to deliver services faster, with higher quality and superior economics. Our message to clients is that, with Power Systems solutions, we can help them achieve these goals as they deploy smarter computing projects. © 2013 IBM Corporation

IBM Power Systems Power is Performance Redefined § In this new smarter computing era IBM Power Systems Power is Performance Redefined § In this new smarter computing era for business and IT, forward-thinking companies consider more than server performance, existing skills and ease of management when choosing a platform for new application workloads. They also evaluate how well the platform will help them achieve three core business objectives: delivering services faster, with higher quality and superior economics. § By implementing smarter computing projects on an IBM Power Systems platform, businesses can outpace their competitors by delivering services faster. They can differentiate their offerings from the competition by delivering higher quality services. And they can turn operational cost into investment opportunity by delivering services with superior economics. © 2013 IBM Corporation

IBM Power Systems Power is Performance Redefined § Deliver services faster – – § IBM Power Systems Power is Performance Redefined § Deliver services faster – – § A key measure of performance for IT today is around agility and the ability of IT to help the business gain a competitive edge and capitalize on emerging opportunities. Businesses need to simplify and integrate their IT infrastructure to deliver services faster. The IBM Power Systems platform features deep integration and optimization across operating systems, databases and middleware for simpler, and more flexible, service delivery. Optimized with Power. VM virtualization for rapid cloud provisioning, clients can speed the delivery and deployment of new applications and processes to support their strategic business initiatives. Deliver services with higher quality – Deliver services with superior economics – IT performance today is also measured on its ability to maintain existing services and deliver services within tight budget constraints. In order to do more with less, businesses need to deliver services with superior economics. – The Power Systems platform with Power. VM virtualization is central to our differentiation when compared to x 86 servers. Power. VM technology is designed to offer more secure and scalable virtualization than VMware on x 86, enabling costeffective control of server and virtual image sprawl. Power. VM technology also is designed to help Power Systems servers deliver higher server utilization rates than VMware on x 86. We believe that the superior economic model for workload consolidation on POWER 7 servers with Power. VM software has been the key driver behind migrations from Oracle Sun and HP to Power Systems technology. Today’s IT departments are also measured on their ability to provide an infrastructure that can address demands for increased application service levels while at the same time balancing rapid change with managing business risk. Businesses need an integrated approach to managing security, resiliency and business risk to deliver higher quality services. – § The IBM Power Systems platform, storage and software provide a highly secure and resilient infrastructure foundation for smarter computing. In addition to the built-in reliability, availability and serviceability (RAS) characteristics of Power Systems servers and blades, our IBM System Storage®DS 8000® and IBM Storwize®V 7000 Unified storage systems, and IBM Power. HA System. Mirror clustering software is tightly integrated with our operating systems to provide a system-wide solution for business resilience. © 2013 IBM Corporation

IBM Power Systems Special notices This document was developed for IBM offerings in the IBM Power Systems Special notices This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area. Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504 -1785 USA. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied. All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions. IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice. IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies. All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the same on generallyavailable systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment. Revised September 26, 2006 © 2013 IBM Corporation

IBM Power Systems Special notices (cont. ) IBM, the IBM logo, ibm. com AIX, IBM Power Systems Special notices (cont. ) IBM, the IBM logo, ibm. com AIX, AIX (logo), AIX 5 L, AIX 6 (logo), AS/400, Blade. Center, Blue Gene, Cluster. Proven, DB 2, ESCON, i 5/OS (logo), IBM Business Partner (logo), Intelli. Station, Load. Leveler, Lotus Notes, Operating System/400, OS/400, Partner. Link, Partner. World, Power. PC, p. Series, Rational, RISC System/6000, RS/6000, THINK, Tivoli (logo), Tivoli Management Environment, Web. Sphere, x. Series, z/OS, z. Series, Active Memory, Balanced Warehouse, Cache. Flow, Cool Blue, IBM Systems Director VMControl, pure. Scale, Turbo. Core, Chiphopper, Cloudscape, DB 2 Universal Database, DS 4000, DS 6000, DS 8000, Energy. Scale, Enterprise Workload Manager, General Parallel File System, , GPFS, HACMP/6000, HASM, IBM Systems Director Active Energy Manager, i. Series, Micro-Partitioning, POWER, Power. Executive, Power. VM (logo), Power. HA, Power Architecture, Power Everywhere, Power Family, POWER Hypervisor, Power Systems (logo), Power Systems Software (logo), POWER 2, POWER 3, POWER 4+, POWER 5+, POWER 6+, POWER 7, System i, System p 5, System Storage, System z, TME 10, Workload Partitions Manager and X-Architecture are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U. S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A full list of U. S. trademarks owned by IBM may be found at: http: //www. ibm. com/legal/copytrade. shtml. Adobe, the Adobe logo, Post. Script, and the Post. Script logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Alti. Vec is a trademark of Freescale Semiconductor, Inc. AMD Opteron is a trademark of Advanced Micro Devices, Inc. Infini. Band, Infini. Band Trade Association and the Infini. Band design marks are trademarks and/or service marks of the Infini. Band Trade Association. Intel, Intel logo, Intel Inside logo, Intel Centrino logo, Celeron, Intel Xeon, Intel Speed. Step, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U. S. and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries or both. Microsoft, Windows and the Windows logo are registered trademarks of Microsoft Corporation in the United States, other countries or both. Net. Bench is a registered trademark of Ziff Davis Media in the United States, other countries or both. SPECint, SPECfp, SPECjbb, SPECweb, SPECj. App. Server, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC). The Power Architecture and Power. org wordmarks and the Power and Power. org logos and related marks are trademarks and service marks licensed by Power. org. TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC). UNIX is a registered trademark of The Open Group in the United States, other countries or both. Other company, product and service names may be trademarks or service marks of others. Revised December 2, 2010 © 2013 IBM Corporation

IBM Power Systems Notes on benchmarks and values The IBM benchmarks results shown herein IBM Power Systems Notes on benchmarks and values The IBM benchmarks results shown herein were derived using particular, well configured, development-level and generally-available computer systems. Buyers should consult other sources of information to evaluate the performance of systems they are considering buying and should consider conducting application oriented testing. For additional information about the benchmarks, values and systems tested, contact your local IBM office or IBM authorized reseller or access the Web site of the benchmark consortium or benchmark vendor. IBM benchmark results can be found in the IBM Power Systems Performance Report at http: //www. ibm. com/systems/p/hardware/system_perf. html. All performance measurements were made with AIX or AIX 5 L operating systems unless otherwise indicated to have used Linux. For new and upgraded systems, the latest versions of AIX were used. All other systems used previous versions of AIX. The SPEC CPU 2006, LINPACK, and Technical Computing benchmarks were compiled using IBM's high performance C, C++, and FORTRAN compilers for AIX 5 L and Linux. For new and upgraded systems, the latest versions of these compilers were used: XL C for AIX v 11. 1, XL C/C++ for AIX v 11. 1, XL FORTRAN for AIX v 13. 1, XL C/C++ for Linux v 11. 1, and XL FORTRAN for Linux v 13. 1. For a definition/explanation of each benchmark and the full list of detailed results, visit the Web site of the benchmark consortium or benchmark vendor. TPC SPEC LINPACK Pro/E GPC Volano. Mark STREAM SAP Oracle, Siebel, People. Soft Baan Fluent TOP 500 Supercomputers Ideas International Storage Performance Council http: //www. tpc. org http: //www. spec. org http: //www. netlib. org/benchmark/performance. pdf http: //www. proe. com http: //www. spec. org/gpc http: //www. volano. com http: //www. cs. virginia. edu/stream/ http: //www. sap. com/benchmark/ http: //www. oracle. com/apps_benchmark/ http: //www. ssaglobal. com http: //www. fluent. com/software/fluent/index. htm http: //www. top 500. org/ http: //www. ideasinternational. com/benchmark/bench. html http: //www. storageperformance. org/results Revised December 2, 2010 © 2013 IBM Corporation

IBM Power Systems Notes on HPC benchmarks and values The IBM benchmarks results shown IBM Power Systems Notes on HPC benchmarks and values The IBM benchmarks results shown herein were derived using particular, well configured, development-level and generally-available computer systems. Buyers should consult other sources of information to evaluate the performance of systems they are considering buying and should consider conducting application oriented testing. For additional information about the benchmarks, values and systems tested, contact your local IBM office or IBM authorized reseller or access the Web site of the benchmark consortium or benchmark vendor. IBM benchmark results can be found in the IBM Power Systems Performance Report at http: //www. ibm. com/systems/p/hardware/system_perf. html. All performance measurements were made with AIX or AIX 5 L operating systems unless otherwise indicated to have used Linux. For new and upgraded systems, the latest versions of AIX were used. All other systems used previous versions of AIX. The SPEC CPU 2006, LINPACK, and Technical Computing benchmarks were compiled using IBM's high performance C, C++, and FORTRAN compilers for AIX 5 L and Linux. For new and upgraded systems, the latest versions of these compilers were used: XL C for AIX v 11. 1, XL C/C++ for AIX v 11. 1, XL FORTRAN for AIX v 13. 1, XL C/C++ for Linux v 11. 1, and XL FORTRAN for Linux v 13. 1. Linpack HPC (Highly Parallel Computing) used the current versions of the IBM Engineering and Scientific Subroutine Library (ESSL). For Power 7 systems, IBM Engineering and Scientific Subroutine Library (ESSL) for AIX Version 5. 1 and IBM Engineering and Scientific Subroutine Library (ESSL) for Linux Version 5. 1 were used. For a definition/explanation of each benchmark and the full list of detailed results, visit the Web site of the benchmark consortium or benchmark vendor. SPEC http: //www. spec. org LINPACK http: //www. netlib. org/benchmark/performance. pdf Pro/E http: //www. proe. com GPC http: //www. spec. org/gpc STREAM http: //www. cs. virginia. edu/stream/ Fluent http: //www. fluent. com/software/fluent/index. htm TOP 500 Supercomputers http: //www. top 500. org/ AMBER http: //amber. scripps. edu/ FLUENT http: //www. fluent. com/software/fluent/fl 5 bench/index. htm GAMESS http: //www. msg. chem. iastate. edu/gamess GAUSSIAN http: //www. gaussian. com ANSYS http: //www. ansys. com/services/hardware-support-db. htm Click on the "Benchmarks" icon on the left hand side frame to expand. Click on "Benchmark Results in a Table" icon for benchmark results. ABAQUS http: //www. simulia. com/support/v 68_performance. php ECLIPSE http: //www. sis. slb. com/content/software/simulation/index. asp? seg=geoquest& MM 5 http: //www. mmm. ucar. edu/mm 5/ MSC. NASTRAN http: //www. mscsoftware. com/support/prod%5 Fsupport/nastran/performance/v 04_sngl. cfm STAR-CD www. cd-adapco. com/products/STAR-CD/performance/320/index/html NAMD http: //www. ks. uiuc. edu/Research/namd Revised December 2, 2010 HMMER http: //hmmer. janelia. org/ http: //powerdev. osuosl. org/project/hmmer. Altivec. Gen 2 mod © 2013 IBM Corporation

IBM Power Systems Notes on performance estimates r. Perf for AIX r. Perf (Relative IBM Power Systems Notes on performance estimates r. Perf for AIX r. Perf (Relative Performance) is an estimate of commercial processing performance relative to other IBM UNIX systems. It is derived from an IBM analytical model which uses characteristics from IBM internal workloads, TPC and SPEC benchmarks. The r. Perf model is not intended to represent any specific public benchmark results and should not be reasonably used in that way. The model simulates some of the system operations such as CPU, cache and memory. However, the model does not simulate disk or network I/O operations. § r. Perf estimates are calculated based on systems with the latest levels of AIX and other pertinent software at the time of system announcement. Actual performance will vary based on application and configuration specifics. The IBM e. Server p. Series 640 is the baseline reference system and has a value of 1. 0. Although r. Perf may be used to approximate relative IBM UNIX commercial processing performance, actual system performance may vary and is dependent upon many factors including system hardware configuration and software design and configuration. Note that the r. Perf methodology used for the POWER 6 systems is identical to that used for the POWER 5 systems. Variations in incremental system performance may be observed in commercial workloads due to changes in the underlying system architecture. All performance estimates are provided "AS IS" and no warranties or guarantees are expressed or implied by IBM. Buyers should consult other sources of information, including system benchmarks, and application sizing guides to evaluate the performance of a system they are considering buying. For additional information about r. Perf, contact your local IBM office or IBM authorized reseller. ==================================== CPW for IBM i Commercial Processing Workload (CPW) is a relative measure of performance of processors running the IBM i operating system. Performance in customer environments may vary. The value is based on maximum configurations. More performance information is available in the Performance Capabilities Reference at: www. ibm. com/systems/i/solutions/perfmgmt/resource. html Revised April 2, 2007 © 2013 IBM Corporation