Description of problem: bnx2's bnx2_start_xmit() calls pci_map_page() twice on each skb->frag[]. This is causing problems on IBM systems with IOMMU. Version-Release number of selected component (if applicable):2.0.8 How reproducible: Only on systems with IOMMU enabled. Not a problem on systems without IOMMU. Additional info: Reported by Breno Leitao at IBM <leitao.ibm.com> This may be the cause of bug #641495 on the AMD Dinar system.
This is what I have found: 1. RHEL5.4 had the following code snippet added to bnx2_start_xmit() } else mss = 0; mapping = pci_map_single(bp->pdev, skb->data, len, PCI_DMA_TODEVICE); + if (pci_dma_mapping_error(mapping)) + goto error; + + dma_maps[0] = mapping; + + last_frag = skb_shinfo(skb)->nr_frags; + + for (i = 0; i < last_frag; i++) { + skb_frag_t *fp = &skb_shinfo(skb)->frags[i]; + mapping = pci_map_page(bp->pdev, fp->page, fp->page_offset, + fp->size, PCI_DMA_TODEVICE); + if (pci_dma_mapping_error(mapping)) + goto map_unwind; + dma_maps[i + 1] = mapping; + } + + mapping = dma_maps[0]; tx_buf = &txr->tx_buf_ring[ring_prod]; tx_buf->skb = skb; pci_unmap_addr_set(tx_buf, mapping, mapping); Note: This adds an additional last_frag for loop to bnx2_start_xmit(). The RHEL5.6 update had added: txbd = &txr->tx_desc_ring[ring_prod]; len = frag->size; - mapping = dma_maps[i + 1]; + mapping = pci_map_page(bp->pdev, frag->page, frag->page_offset, + len, PCI_DMA_TODEVICE); + if (pci_dma_mapping_error(mapping)) + goto dma_error; pci_unmap_addr_set(&txr->tx_buf_ring[ring_prod], mapping, mapping); Per Duyck "bnx2: remove skb_dma_map/unmap calls from driver", added to the second last_frag for loop (the one not added by the 5.4 patch). Upstream has only one last_frag for loop and it looks like this: mapping = pci_map_single(bp->pdev, skb->data, len, PCI_DMA_TODEVICE); 6391 if (pci_dma_mapping_error(bp->pdev, mapping)) { 6392 dev_kfree_skb(skb); 6393 return NETDEV_TX_OK; 6394 } 6395 6396 tx_buf = &txr->tx_buf_ring[ring_prod]; 6397 tx_buf->skb = skb; 6398 pci_unmap_addr_set(tx_buf, mapping, mapping); 6399 6400 txbd = &txr->tx_desc_ring[ring_prod]; 6401 6402 txbd->tx_bd_haddr_hi = (u64) mapping >> 32; 6403 txbd->tx_bd_haddr_lo = (u64) mapping & 0xffffffff; 6404 txbd->tx_bd_mss_nbytes = len | (mss << 16); 6405 txbd->tx_bd_vlan_tag_flags = vlan_tag_flags | TX_BD_FLAGS_START; 6406 6407 last_frag = skb_shinfo(skb)->nr_frags; 6408 tx_buf->nr_frags = last_frag; 6409 tx_buf->is_gso = skb_is_gso(skb); 6410 6411 for (i = 0; i < last_frag; i++) { 6412 skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; 6413 6414 prod = NEXT_TX_BD(prod); 6415 ring_prod = TX_RING_IDX(prod); 6416 txbd = &txr->tx_desc_ring[ring_prod]; 6417 6418 len = frag->size; 6419 mapping = pci_map_page(bp->pdev, frag->page, frag->page_offset, 6420 len, PCI_DMA_TODEVICE); 6421 if (pci_dma_mapping_error(bp->pdev, mapping)) 6422 goto dma_error; 6423 pci_unmap_addr_set(&txr->tx_buf_ring[ring_prod], mapping, 6424 mapping); 6425 6426 txbd->tx_bd_haddr_hi = (u64) mapping >> 32; 6427 txbd->tx_bd_haddr_lo = (u64) mapping & 0xffffffff; 6428 txbd->tx_bd_mss_nbytes = len | (mss << 16); 6429 txbd->tx_bd_vlan_tag_flags = vlan_tag_flags; 6430 6431 } 6432 txbd->tx_bd_vlan_tag_flags |= TX_BD_FLAGS_END; 6433 6434 prod = NEXT_TX_BD(prod); 6435 txr->tx_prod_bseq += skb->len; My first reaction to this is to remove the additional for loop (added by 5.4) since the 5.6 addition mimics upstream.
(In reply to comment #2) > This is what I have found: > 1. RHEL5.4 had the following code snippet added to bnx2_start_xmit() ... Yes, I believe this code was added to mimic skb_dma_map() used by the upstream driver at that time. > Note: This adds an additional last_frag for loop to bnx2_start_xmit(). Probably because RHEL5 does not have skb_dma_map()/skb_dma_unmap(). > > The RHEL5.6 update had added: .... > Per Duyck "bnx2: remove skb_dma_map/unmap calls from driver", added to the > second last_frag for loop (the one not added by the 5.4 patch). Yes, since this upstream patch replaces the skb_dma_map() call, we need to remove the earlier code that mimics skb_dma_map(). > > Upstream has only one last_frag for loop and it looks like this: ... > > My first reaction to this is to remove the additional for loop (added by 5.4) > since the 5.6 addition mimics upstream. Yes, agreed. Thanks.
Thanks for the confirmation, Michael. The patch to remove the first (5.4) loop is: --- linux-2.6.18.noarch/drivers/net/bnx2.c.663509 +++ linux-2.6.18.noarch/drivers/net/bnx2.c @@ -6469,18 +6469,6 @@ bnx2_start_xmit(struct sk_buff *skb, str dma_maps[0] = mapping; last_frag = skb_shinfo(skb)->nr_frags; - - for (i = 0; i < last_frag; i++) { - skb_frag_t *fp = &skb_shinfo(skb)->frags[i]; - - mapping = pci_map_page(bp->pdev, fp->page, fp->page_offset, - fp->size, PCI_DMA_TODEVICE); - if (pci_dma_mapping_error(mapping)) - goto map_unwind; - dma_maps[i + 1] = mapping; - } - - mapping = dma_maps[0]; tx_buf = &txr->tx_buf_ring[ring_prod]; tx_buf->skb = skb; tx_buf->nr_frags = last_frag; @@ -6543,7 +6531,6 @@ bnx2_start_xmit(struct sk_buff *skb, str return NETDEV_TX_OK; -map_unwind: while (--i >= 0) { skb_frag_t *fp = &skb_shinfo(skb)->frags[i]; [root@dhcp-100-19-202 kernel-2.6.18]# How's that look? Very much appreciated. John
I think we should remove even more code. The dma_maps[] local array should be removed. the while loop at map_unwind should also be removed. I think the dmap_map[], first for loop, and map_unwind while loop were all added to do what the upstream skb_dma_map() used to do. Since skb_dma_map() has now been replaced, we should remove all remnants of it. Thanks.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Yep you are right, Michael. --- linux-2.6.18.noarch/drivers/net/bnx2.c.663509 +++ linux-2.6.18.noarch/drivers/net/bnx2.c @@ -6395,7 +6395,6 @@ static int bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct bnx2 *bp = netdev_priv(dev); - dma_addr_t dma_maps[MAX_SKB_FRAGS + 1]; dma_addr_t mapping; struct tx_bd *txbd; struct sw_bd *tx_buf; @@ -6466,21 +6465,7 @@ bnx2_start_xmit(struct sk_buff *skb, str if (pci_dma_mapping_error(mapping)) goto error; - dma_maps[0] = mapping; - last_frag = skb_shinfo(skb)->nr_frags; - - for (i = 0; i < last_frag; i++) { - skb_frag_t *fp = &skb_shinfo(skb)->frags[i]; - - mapping = pci_map_page(bp->pdev, fp->page, fp->page_offset, - fp->size, PCI_DMA_TODEVICE); - if (pci_dma_mapping_error(mapping)) - goto map_unwind; - dma_maps[i + 1] = mapping; - } - - mapping = dma_maps[0]; tx_buf = &txr->tx_buf_ring[ring_prod]; tx_buf->skb = skb; tx_buf->nr_frags = last_frag; @@ -6543,15 +6528,6 @@ bnx2_start_xmit(struct sk_buff *skb, str return NETDEV_TX_OK; -map_unwind: - while (--i >= 0) { - skb_frag_t *fp = &skb_shinfo(skb)->frags[i]; - - pci_unmap_page(bp->pdev, dma_maps[i + 1], fp->size, - PCI_DMA_TODEVICE); - } - pci_unmap_single(bp->pdev, dma_maps[0], skb_headlen(skb), - PCI_DMA_TODEVICE); error: dev_kfree_skb(skb); return NETDEV_TX_OK; [jfeeney@dhcp-100-19-202 RHEL-5]$
This looks good. Thanks John.
------- Comment From tpnoonan.com 2010-12-16 15:45 EDT------- ibm, please test on system x and pwr as soon as test patch is available and provide feedback via an extenal comment, thnx
John, could you point me to the brew build once you have one and I can provide it to IBM, so we can test it. -Steve
Steve, I was planning on copying the rpms to my people page and adding a pointer to it, in this bz when they are done. But the build is still chugging along.
------- Comment From tpnoonan.com 2010-12-16 16:24 EDT------- hi red hat, can you clarify why this should be tested on system x? thanks
(In reply to comment #11) > Steve, > I was planning on copying the rpms to my people page and adding a pointer to > it, in this bz when they are done. But the build is still chugging along. John, that will work .. thanks -Steve
The x86_64 rpms are built and ready. See http://people.redhat.com/jfeeney/.bz663509 I am still waiting on ppc64 to finish. I'll provide a comment when that is complete. Please let me know if you require anything else. (Steve: The build number is 2982677, just so you know if needed.)
Tim, It's to verify that when we fix an issue on the POWER platform, we don't break something on the x86-64 platform. John
------- Comment From sbest.com 2010-12-17 06:45 EDT------- ppc64 kernel is on my gsa dir http://pokgsa.ibm.com/~sbest/public/68805/ kernel-2.6.18-233.el5.bz663509.2.ppc64.rpm
I copied the ppc rpms to my people page (took a while to build). Thanks Steve for procuring them, too.
------- Comment From lxie.com 2010-12-17 09:31 EDT------- (In reply to comment #19) > ppc64 kernel is on my gsa dir > http://pokgsa.ibm.com/~sbest/public/68805/ > kernel-2.6.18-233.el5.bz663509.2.ppc64.rpm Donna/Nadia Please test the patch and post the rest results via external comments. Thank you very much for your support. Linda ------- Comment From lxie.com 2010-12-17 09:34 EDT------- (In reply to comment #19) > ppc64 kernel is on my gsa dir > http://pokgsa.ibm.com/~sbest/public/68805/ > kernel-2.6.18-233.el5.bz663509.2.ppc64.rpm Can you post a test kernel for x team? Thanks, Linda
------- Comment From lxie.com 2010-12-17 10:28 EDT------- (In reply to comment #16) > The x86_64 rpms are built and ready. See > http://people.redhat.com/jfeeney/.bz663509 > I am still waiting on ppc64 to finish. I'll provide a comment when that is > complete. Please let me know if you require anything else. > (Steve: The build number is 2982677, just so you know if needed.) Hi Iranna, Please have your team test this and post the test results via external comments asap. Thank you very much for your support. Linda
------- Comment From dbabka.com 2010-12-17 11:18 EDT------- I have put this ppc64 patch kernel on uli08 and have started BASE and TCP focus tests. [root@uli08 ~]# uname -a Linux uli08.upt.austin.ibm.com 2.6.18-233.el5.bz663509.2 #1 SMP Thu Dec 16 15:16:39 EST 2010 ppc64 ppc64 ppc64 GNU/Linux [root@uli08 ~]# The tcp tests have been running well for 30 minutes, a time period in which the tests would have typically started failing pre-patch. I'm going to let these tests continue to run for 72-hrs. I am also going to put this kernel on uli04 for extra testing. Here is the test results are the test results at the 30 minute mark: [root@uli08 ~]# show.report.py HOSTNAME KERNEL VERSION DISTRO INFO ------------------------ ------------------------- ----------- uli08.upt.austin.ibm.com 2.6.18-233.el5.bz663509.2 Red Hat Enterprise Linux Server release 5.6 Beta (Tikanga) ######## Current Time: Fri Dec 17 10:13:13 2010 ######## Job-ID FOCUS Start-Time Duration Function ------ ----- ---------- -------- -------- 1 TCP 20101217-09:42:29 0.0 hr(s) 30.0 min(s) networkTest2poplp2 2 BASE 20101217-09:43:28 0.0 hr(s) 29.0 min(s) Test FOCUS TCP BASE SUM TOTAL 195 470 665 FAIL 0 2 2 PASS 195 468 663 (%) (100%) (99%) (99%) DLPAR is not tested! Here are the tcp tests that are currently being run: [root@uli08 ~]# gss 1 Hostname : uli08 Kernel : 2.6.18-233.el5.bz663509.2 Kernel Build Date : Thu Dec 16 14:16:39 CST 2010 Distribution : Red Hat -------- Job ID : 1 Focus Group : TCP XML File Name : /usr/local/staf/xml/uli082poplp2.tcp.xml Function : networkTest2poplp2 Arguments : null Start Date : 20101217 Start Time : 09:42:29 Clear Logs : Disabled Log TC Elapsed Time: Disabled Log TC Num Starts : Disabled Log TC Start/Stop : Disabled TCP Start Time: Fri Dec 17 09:42:27 CST 2010 Snapshot Time: Fri Dec 17 10:13:01 CST 2010 -------- multicast/mc_cmds/mc_cmds01;35;0;1 multicast/mc_member/mc_member01;1;0;1 tcp_cmds/echo/echo01;5;0;1 tcp_cmds/finger/finger01;4;0;1 tcp_cmds/ftp/ftp01;19;0;1 tcp_cmds/netstat/netstat01;28;0;1 tcp_cmds/ntwk_files/ntwk_files01;29;0;1 tcp_cmds/perf_lan/perf_lan01;1;0;1 tcp_cmds/ping/ping01;1;0;1 tcp_cmds/rdist/rdist01;2;0;1 tcp_cmds/rlogin/rlogin01;10;0;1 tcp_cmds/rsh/rsh01;2;0;1 tcp_cmds/rsync/rsync01;11;0;1 tcp_cmds/sendfile/sendfile01;4;0;1 tcp_cmds/ssh/ssh01;24;0;1 tcp_cmds/telnet/telnet01;17;0;1 It looks like this patch was built again snap3, on our other testing, we are currently testing snap 5. Will it be possible to get this patch re-built on later snaps?
It's good things are going well on ppc64. Thanks for the update. Please keep us posted. Is x86_64 being tested too? As for snap3 vs. snap5, I used what I had ready (snap3) due to the critical time crunch. I checked and did not find anything new added to bnx2 between the two releases but still, I am building a new kernel with this patch now that we know this works. Given the ppc64 build took several hours yesterday, I don't expect it ready until later tonight. I will provide a comment when available.
------- Comment From dbabka.com 2010-12-17 17:06 EDT------- uli08 has been running base and tcp tests for 5.66 hrs (99% success rate), and uli04 is running base (99% success) and tcp (96% success) for 5 hrs. -=================================================== uli08: [root@uli08 ~]# show.report.py HOSTNAME KERNEL VERSION DISTRO INFO ------------------------ ------------------------- ----------- uli08.upt.austin.ibm.com 2.6.18-233.el5.bz663509.2 Red Hat Enterprise Linux Server release 5.6 Beta (Tikanga) ######## Current Time: Fri Dec 17 15:24:07 2010 ######## Job-ID FOCUS Start-Time Duration Function ------ ----- ---------- -------- -------- 1 TCP 20101217-09:42:29 5.0 hr(s) 41.0 min(s) networkTest2poplp2 2 BASE 20101217-09:43:28 5.0 hr(s) 40.0 min(s) Test FOCUS TCP BASE SUM TOTAL 2165 8554 10719 FAIL 13 42 55 PASS 2152 8512 10664 (%) (99%) (99%) (99%) =================================================== [root@uli04 ~]# show.report.py HOSTNAME KERNEL VERSION DISTRO INFO ------------------------ ------------------------- ----------- uli04.upt.austin.ibm.com 2.6.18-233.el5.bz663509.2 Red Hat Enterprise Linux Server release 5.6 Beta (Tikanga) ######## Current Time: Fri Dec 17 15:43:35 2010 ######## Job-ID FOCUS Start-Time Duration Function ------ ----- ---------- -------- -------- 1 BASE 20101217-10:45:20 4.0 hr(s) 58.0 min(s) Test 2 TCP 20101217-10:45:26 4.0 hr(s) 58.0 min(s) networkTest2poplp2 FOCUS BASE TCP SUM TOTAL 7538 1930 9468 FAIL 48 68 116 PASS 7490 1862 9352 (%) (99%) (96%) (98%) DLPAR is not tested! [root@uli04 ~]# gss 2 Hostname : uli04 Kernel : 2.6.18-233.el5.bz663509.2 Kernel Build Date : Thu Dec 16 14:16:39 CST 2010 Distribution : Red Hat -------- Job ID : 2 Focus Group : TCP XML File Name : /usr/local/staf/xml/uli042poplp2.tcp.xml Function : networkTest2poplp2 Arguments : null Start Date : 20101217 Start Time : 10:45:26 Clear Logs : Disabled Log TC Elapsed Time: Disabled Log TC Num Starts : Disabled Log TC Start/Stop : Disabled TCP Start Time: Fri Dec 17 10:45:24 CST 2010 Snapshot Time: Fri Dec 17 15:44:12 CST 2010 -------- multicast/mc_cmds/mc_cmds01;425;0;1 multicast/mc_member/mc_member01;14;0;1 tcp_cmds/echo/echo01;46;0;1 tcp_cmds/finger/finger01;53;0;1 tcp_cmds/ftp/ftp01;137;20;1 tcp_cmds/host/host01;145;4;1 tcp_cmds/netstat/netstat01;195;0;1 tcp_cmds/ntwk_files/ntwk_files01;207;20;1 tcp_cmds/perf_lan/perf_lan01;16;5;1 tcp_cmds/ping/ping01;14;4;1 tcp_cmds/rcp/rcp01;2;0;1 tcp_cmds/rdist/rdist01;27;0;1 tcp_cmds/rlogin/rlogin01;83;1;1 tcp_cmds/rsh/rsh01;18;6;1 tcp_cmds/rsync/rsync01;112;1;1 tcp_cmds/sendfile/sendfile01;41;2;1 tcp_cmds/ssh/ssh01;211;2;1 tcp_cmds/telnet/telnet01;119;3;1
Broadcom - this is slated to be included on or before Release Candidate 2. Thanks for your help on this.
As promised, I built kernel rpms with the patch in more recent kernels, ie. 2.6.18.237. See http://people.redhat.com/jfeeney/.bz663509.
------- Comment From dbabka.com 2010-12-18 11:30 EDT------- (In reply to comment #32) > As promised, I built kernel rpms with the patch in more recent kernels, ie. > 2.6.18.237. See http://people.redhat.com/jfeeney/.bz663509. Thank you for your support and building this new kernel, we will put it on a system as soon we complete the 72-hr runs currently in test. uli08 has continued to run for 24.5 hrs with 99% success rate for BASE and TCP tests. uli04 has continued to run for 23.5 hrs with 99% success rate for BASE and 97% rate for TCP. We will continue to monitor these machines.
------- Comment From dbabka.com 2010-12-19 15:19 EDT------- uli08 has continued to run for 52.3 hrs with 99% success rate for BASE and TCP tests. uli04 has continued to run for 51.33 hrs with 99% success rate for BASE and 97% rate for TCP.
------- Comment From huachenl.com 2010-12-20 04:08 EDT------- uli08 has continued to run for 65.3 hrs with 99% success rate for BASE and TCP tests. uli04 has continued to run for 64.33 hrs with 99% success rate for BASE and 97% rate for TCP.
------- Comment From iranna.ankad.com 2010-12-20 07:50 EDT------- (In reply to comment #27) > Hi Iranna, > Please have your team test this and post the test results via external comments > asap. > > > Thank you very much for your support. > > > Linda We verified this bug on x86_64 on the patched kernel (2.6.18-233.el5.bz663509.2) & everything looks fine. Here is our test environment & test coverage: Hardware used: x3755 (with iommu+bnx2 enabled) as client & ls21 (with bnx2) as Server. Kernel: 2.6.18-233.el5.bz663509.2 (x86_64) Test coverage: 1. Netperf tests (tests for TCP/UDP stream & request/response ) successfully ran for 7-8 hours. 2. We also ran LTP's memory mapped I/O tests over NFS b/w client/serverto see if it detects any memory leaks. All 10 tests of mmapstress are PASSED. 3. Kdump over network (passing through bnx2 drivers), vmcore successfully generated on the server. I think above tests are sufficient to confirm that given bnx2 fix is fine & has not regressed anything in x. Thanks!
in kernel-2.6.18-238.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
------- Comment From dbabka.com 2010-12-20 12:42 EDT------- uli08 has continued to run for 74 hrs with 99% success rate for BASE and TCP tests. uli04 has continued to run for 73 hrs with 99% success rate for BASE and 97% rate for TCP. [root@uli08 ~]# gss 1 Hostname : uli08 Kernel : 2.6.18-233.el5.bz663509.2 Kernel Build Date : Thu Dec 16 14:16:39 CST 2010 Distribution : Red Hat -------- Job ID : 1 Focus Group : TCP XML File Name : /usr/local/staf/xml/uli082poplp2.tcp.xml Function : networkTest2poplp2 Arguments : null Start Date : 20101217 Start Time : 09:42:29 Clear Logs : Disabled Log TC Elapsed Time: Disabled Log TC Num Starts : Disabled Log TC Start/Stop : Disabled TCP Start Time: Fri Dec 17 09:42:27 CST 2010 Snapshot Time: Mon Dec 20 11:41:54 CST 2010 -------- multicast/mc_cmds/mc_cmds01;5952;3;1 multicast/mc_member/mc_member01;221;0;1 tcp_cmds/echo/echo01;696;1;1 tcp_cmds/finger/finger01;769;0;1 tcp_cmds/ftp/ftp01;2057;63;1 tcp_cmds/host/host01;10;19;1 tcp_cmds/netstat/netstat01;4980;0;1 tcp_cmds/ntwk_files/ntwk_files01;2990;15;1 tcp_cmds/perf_lan/perf_lan01;261;6;1 tcp_cmds/ping/ping01;249;8;1 tcp_cmds/rcp/rcp01;33;0;1 tcp_cmds/rdist/rdist01;418;0;1 tcp_cmds/rlogin/rlogin01;1199;1;1 tcp_cmds/rsh/rsh01;282;0;1 tcp_cmds/rsync/rsync01;1559;7;1 tcp_cmds/sendfile/sendfile01;601;35;1 tcp_cmds/ssh/ssh01;3049;27;1 tcp_cmds/telnet/telnet01;1759;0;1 [root@uli08 ~]# We will conclude these tests since they were successful for over 72 hrs. and will put a newer test kernel on these two systems.
IBM, Thank you.
------- Comment From huachenl.com 2010-12-21 03:04 EDT------- Per comment#38, both uli04 and uli08 have run over 72 hours with 2.6.18-233.el5.bz663509.2 kerne, so now I've installed newest 2.6.18-238.el5 ppc64 kernel and tried to kick off BASE and TCP tests again on uli04 and uli08. Thanks! [root@uli04 ~]# uname -a Linux uli04.upt.austin.ibm.com 2.6.18-238.el5 #1 SMP Sun Dec 19 14:29:51 EST 2010 ppc64 ppc64 ppc64 GNU/Linux [root@uli04 ~]# show.report.py HOSTNAME KERNEL VERSION DISTRO INFO ------------------------ -------------- ----------- uli04.upt.austin.ibm.com 2.6.18-238.el5 Red Hat Enterprise Linux Server release 5.6 Beta (Tikanga) ######## Current Time: Tue Dec 21 01:53:38 2010 ######## Job-ID FOCUS Start-Time Duration Function ------ ----- ---------- -------- -------- 1 BASE 20101221-01:49:02 0.0 hr(s) 4.0 min(s) Test 2 TCP 20101221-01:49:06 0.0 hr(s) 4.0 min(s) networkTest2poplp2 FOCUS BASE TCP SUM TOTAL 7 21 28 FAIL 0 0 0 PASS 7 21 28 (%) (100%) (100%) 1.0 DLPAR is not tested! [root@uli04 ~]# [root@uli08 ~]# uname -a Linux uli08.upt.austin.ibm.com 2.6.18-238.el5 #1 SMP Sun Dec 19 14:29:51 EST 2010 ppc64 ppc64 ppc64 GNU/Linux [root@uli08 ~]# show.report.py HOSTNAME KERNEL VERSION DISTRO INFO ------------------------ -------------- ----------- uli08.upt.austin.ibm.com 2.6.18-238.el5 Red Hat Enterprise Linux Server release 5.6 Beta (Tikanga) ######## Current Time: Tue Dec 21 01:53:25 2010 ######## Job-ID FOCUS Start-Time Duration Function ------ ----- ---------- -------- -------- 1 BASE 20101221-01:48:58 0.0 hr(s) 4.0 min(s) Test 2 TCP 20101221-01:49:02 0.0 hr(s) 4.0 min(s) networkTest2poplp2 FOCUS BASE TCP SUM TOTAL 7 19 26 FAIL 0 0 0 PASS 7 19 26 (%) (100%) (100%) 1.0 DLPAR is not tested! [root@uli08 ~]#
------- Comment From iranna.ankad.com 2010-12-21 10:08 EDT------- (In reply to comment #37) > in kernel-2.6.18-238.el5 > You can download this test kernel (or newer) from > http://people.redhat.com/jwilson/el5 > > Detailed testing feedback is always welcomed. Well..I have triggered netperf on 2.6.18-238.el5 on SystemX. Tests are running fine from last 5 hours ..I will let them continue overnight & will come back with more updates tomorrow. ------- Comment From dbabka.com 2010-12-21 10:10 EDT------- uli04 and uli08, with 2.6.18-238.el5 ppc64 kernel have been running BASE and TCP tests for 7+ hrs. uli08 has a 99% success rate for both BASE and TCP. uli04 has a 99% success rate for BASE and a 95% success rate for TCP. Here is uli04 interim tcp results: TCP Start Time: Tue Dec 21 01:49:02 CST 2010 Snapshot Time: Tue Dec 21 09:06:57 CST 2010 -------- multicast/mc_cmds/mc_cmds01;626;0;1 multicast/mc_member/mc_member01;21;0;1 tcp_cmds/echo/echo01;61;4;1 tcp_cmds/finger/finger01;78;0;1 tcp_cmds/ftp/ftp01;204;32;1 tcp_cmds/host/host01;203;6;1 tcp_cmds/netstat/netstat01;271;0;1 tcp_cmds/ntwk_files/ntwk_files01;287;24;1 tcp_cmds/perf_lan/perf_lan01;22;35;1 tcp_cmds/ping/ping01;21;31;1 tcp_cmds/rcp/rcp01;3;0;1 tcp_cmds/rdist/rdist01;41;0;1 tcp_cmds/rlogin/rlogin01;118;2;1 tcp_cmds/rsh/rsh01;24;17;1 tcp_cmds/rsync/rsync01;171;1;1 tcp_cmds/sendfile/sendfile01;58;7;1 tcp_cmds/ssh/ssh01;315;6;1 tcp_cmds/telnet/telnet01;168;5;1 [root@uli04 ~]#
------- Comment From huachenl.com 2010-12-22 03:37 EDT------- uli04 and uli08, with 2.6.18-238.el5 ppc64 kernel have been running BASE and TCP tests for 24+ hrs. uli08 has a 99% success rate for both BASE and TCP. uli04 has a 99% success rate for BASE and a 96% success rate for TCP. [root@uli04 ~]# show.report.py HOSTNAME KERNEL VERSION DISTRO INFO ------------------------ -------------- ----------- uli04.upt.austin.ibm.com 2.6.18-238.el5 Red Hat Enterprise Linux Server release 5.6 Beta (Tikanga) ######## Current Time: Wed Dec 22 02:35:26 2010 ######## Job-ID FOCUS Start-Time Duration Function ------ ----- ---------- -------- -------- 1 BASE 20101221-01:49:02 24.0 hr(s) 46.0 min(s) Test 2 TCP 20101221-01:49:06 24.0 hr(s) 46.0 min(s) networkTest2poplp2 FOCUS BASE TCP SUM TOTAL 38581 10704 49285 FAIL 255 399 654 PASS 38326 10305 48631 (%) (99%) (96%) (98%) DLPAR is not tested! [root@uli04 ~]# gss 2 Hostname : uli04 Kernel : 2.6.18-238.el5 Kernel Build Date : Sun Dec 19 13:29:51 CST 2010 Distribution : Red Hat -------- Job ID : 2 Focus Group : TCP XML File Name : /usr/local/staf/xml/uli042poplp2.tcp.xml Function : networkTest2poplp2 Arguments : null Start Date : 20101221 Start Time : 01:49:06 Clear Logs : Disabled Log TC Elapsed Time: Disabled Log TC Num Starts : Disabled Log TC Start/Stop : Disabled TCP Start Time: Tue Dec 21 01:49:02 CST 2010 Snapshot Time: Wed Dec 22 02:36:24 CST 2010 -------- multicast/mc_cmds/mc_cmds01;2138;0;1 multicast/mc_member/mc_member01;74;0;1 tcp_cmds/echo/echo01;233;22;1 tcp_cmds/finger/finger01;272;0;1 tcp_cmds/ftp/ftp01;864;81;1 tcp_cmds/host/host01;844;12;1 tcp_cmds/netstat/netstat01;925;0;1 tcp_cmds/ntwk_files/ntwk_files01;1285;64;1 tcp_cmds/perf_lan/perf_lan01;83;44;1 tcp_cmds/ping/ping01;74;66;1 tcp_cmds/rcp/rcp01;17;0;1 tcp_cmds/rdist/rdist01;140;1;1 tcp_cmds/rlogin/rlogin01;501;16;1 tcp_cmds/rsh/rsh01;101;39;1 tcp_cmds/rsync/rsync01;610;3;1 tcp_cmds/sendfile/sendfile01;223;15;1 tcp_cmds/ssh/ssh01;1233;17;1 tcp_cmds/telnet/telnet01;694;19;1 [root@uli04 ~]# [root@uli08 ~]# show.report.py HOSTNAME KERNEL VERSION DISTRO INFO ------------------------ -------------- ----------- uli08.upt.austin.ibm.com 2.6.18-238.el5 Red Hat Enterprise Linux Server release 5.6 Beta (Tikanga) ######## Current Time: Wed Dec 22 02:35:19 2010 ######## Job-ID FOCUS Start-Time Duration Function ------ ----- ---------- -------- -------- 1 BASE 20101221-01:48:58 24.0 hr(s) 46.0 min(s) Test 2 TCP 20101221-01:49:02 24.0 hr(s) 46.0 min(s) networkTest2poplp2 FOCUS BASE TCP SUM TOTAL 37695 9950 47645 FAIL 195 77 272 PASS 37500 9873 47373 (%) (99%) (99%) (99%) DLPAR is not tested! [root@uli08 ~]# gss 2 Hostname : uli08 Kernel : 2.6.18-238.el5 Kernel Build Date : Sun Dec 19 13:29:51 CST 2010 Distribution : Red Hat -------- Job ID : 2 Focus Group : TCP XML File Name : /usr/local/staf/xml/uli082poplp2.tcp.xml Function : networkTest2poplp2 Arguments : null Start Date : 20101221 Start Time : 01:49:02 Clear Logs : Disabled Log TC Elapsed Time: Disabled Log TC Num Starts : Disabled Log TC Start/Stop : Disabled TCP Start Time: Tue Dec 21 01:48:58 CST 2010 Snapshot Time: Wed Dec 22 02:36:38 CST 2010 -------- multicast/mc_cmds/mc_cmds01;1986;2;1 multicast/mc_member/mc_member01;74;0;1 tcp_cmds/echo/echo01;236;2;1 tcp_cmds/finger/finger01;258;0;1 tcp_cmds/ftp/ftp01;847;15;1 tcp_cmds/host/host01;3;6;1 tcp_cmds/netstat/netstat01;1630;0;1 tcp_cmds/ntwk_files/ntwk_files01;1236;6;1 tcp_cmds/perf_lan/perf_lan01;85;10;1 tcp_cmds/ping/ping01;82;9;1 tcp_cmds/rcp/rcp01;15;0;1 tcp_cmds/rdist/rdist01;140;1;1 tcp_cmds/rlogin/rlogin01;503;5;1 tcp_cmds/rsh/rsh01;106;1;1 tcp_cmds/rsync/rsync01;583;3;1 tcp_cmds/sendfile/sendfile01;223;8;1 tcp_cmds/ssh/ssh01;1196;6;1 tcp_cmds/telnet/telnet01;679;3;1 [root@uli08 ~]#
------- Comment From iranna.ankad.com 2010-12-22 06:23 EDT------- (In reply to comment #41) > (In reply to comment #37) > > in kernel-2.6.18-238.el5 > > You can download this test kernel (or newer) from > > http://people.redhat.com/jwilson/el5 > > > > Detailed testing feedback is always welcomed. > > Well..I have triggered netperf on 2.6.18-238.el5 on SystemX. Tests are running > fine from last 5 hours ..I will let them continue overnight & will come back > with more updates tomorrow. My netperf tests have been running fine for more than 24 hours. I also confirmed memory mapped I/O & kdump over netdump are fine. With this I would conclude this bug is fixed in 2.6.18-238.el5. Thanks! More details on test environment: Hardware used: x3755 (with iommu+bnx2 enabled) as client & ls21 (with bnx2) as Server. Kernel: 2.6.18-238.el5 (x86_64) Test coverage: 1. Netperf tests (tests for TCP/UDP stream & request/response ) successfully ran for 24+ hours. I verified "netstat -i" & dmesg, I could not find any errors. 2. All 10 tests of mmapstress are PASSED (They are LTP's memory mapped I/O tests ran over NFS between client & server) 3. Kdump over network (passing through bnx2 drivers from x3755 to ls21), vmcore successfully generated on the server.
------- Comment From huachenl.com 2010-12-23 03:38 EDT------- uli04 and uli08, with 2.6.18-238.el5 ppc64 kernel have been running BASE and TCP tests for 48+ hrs. uli08 has a 99% success rate for both BASE and TCP. uli04 has a 99% success rate for BASE and a 96% success rate for TCP. [root@uli04 ~]# show.report.py HOSTNAME KERNEL VERSION DISTRO INFO ------------------------ -------------- ----------- uli04.upt.austin.ibm.com 2.6.18-238.el5 Red Hat Enterprise Linux Server release 5.6 Beta (Tikanga) ######## Current Time: Thu Dec 23 02:37:24 2010 ######## Job-ID FOCUS Start-Time Duration Function ------ ----- ---------- -------- -------- 1 BASE 20101221-01:49:02 48.0 hr(s) 48.0 min(s) Test 2 TCP 20101221-01:49:06 48.0 hr(s) 48.0 min(s) networkTest2poplp2 FOCUS BASE TCP SUM TOTAL 75684 21423 97107 FAIL 489 711 1200 PASS 75195 20712 95907 (%) (99%) (96%) (98%) DLPAR is not tested! [root@uli04 ~]# [root@uli04 ~]# gss 2 Hostname : uli04 Kernel : 2.6.18-238.el5 Kernel Build Date : Sun Dec 19 13:29:51 CST 2010 Distribution : Red Hat -------- Job ID : 2 Focus Group : TCP XML File Name : /usr/local/staf/xml/uli042poplp2.tcp.xml Function : networkTest2poplp2 Arguments : null Start Date : 20101221 Start Time : 01:49:06 Clear Logs : Disabled Log TC Elapsed Time: Disabled Log TC Num Starts : Disabled Log TC Start/Stop : Disabled TCP Start Time: Tue Dec 21 01:49:02 CST 2010 Snapshot Time: Thu Dec 23 02:37:54 CST 2010 -------- multicast/mc_cmds/mc_cmds01;4211;0;1 multicast/mc_member/mc_member01;146;0;1 tcp_cmds/echo/echo01;480;33;1 tcp_cmds/finger/finger01;536;0;1 tcp_cmds/ftp/ftp01;1791;152;1 tcp_cmds/host/host01;1566;29;1 tcp_cmds/netstat/netstat01;1819;0;1 tcp_cmds/ntwk_files/ntwk_files01;2687;127;1 tcp_cmds/perf_lan/perf_lan01;163;78;1 tcp_cmds/ping/ping01;148;105;1 tcp_cmds/rcp/rcp01;38;0;1 tcp_cmds/rdist/rdist01;277;3;1 tcp_cmds/rlogin/rlogin01;1058;28;1 tcp_cmds/rsh/rsh01;214;68;1 tcp_cmds/rsync/rsync01;1207;3;1 tcp_cmds/sendfile/sendfile01;453;19;1 tcp_cmds/ssh/ssh01;2486;35;1 tcp_cmds/telnet/telnet01;1434;31;1 [root@uli04 ~]# [root@uli08 ~]# show.report.py HOSTNAME KERNEL VERSION DISTRO INFO ------------------------ -------------- ----------- uli08.upt.austin.ibm.com 2.6.18-238.el5 Red Hat Enterprise Linux Server release 5.6 Beta (Tikanga) ######## Current Time: Thu Dec 23 02:37:32 2010 ######## Job-ID FOCUS Start-Time Duration Function ------ ----- ---------- -------- -------- 1 BASE 20101221-01:48:58 48.0 hr(s) 48.0 min(s) Test 2 TCP 20101221-01:49:02 48.0 hr(s) 48.0 min(s) networkTest2poplp2 FOCUS BASE TCP SUM TOTAL 75072 20114 95186 FAIL 377 99 476 PASS 74695 20015 94710 (%) (99%) (99%) (99%) DLPAR is not tested! [root@uli08 ~]# [root@uli08 ~]# gss 2 Hostname : uli08 Kernel : 2.6.18-238.el5 Kernel Build Date : Sun Dec 19 13:29:51 CST 2010 Distribution : Red Hat -------- Job ID : 2 Focus Group : TCP XML File Name : /usr/local/staf/xml/uli082poplp2.tcp.xml Function : networkTest2poplp2 Arguments : null Start Date : 20101221 Start Time : 01:49:02 Clear Logs : Disabled Log TC Elapsed Time: Disabled Log TC Num Starts : Disabled Log TC Start/Stop : Disabled TCP Start Time: Tue Dec 21 01:48:58 CST 2010 Snapshot Time: Thu Dec 23 02:38:07 CST 2010 -------- multicast/mc_cmds/mc_cmds01;3916;5;1 multicast/mc_member/mc_member01;145;0;1 tcp_cmds/echo/echo01;485;2;1 tcp_cmds/finger/finger01;509;0;1 tcp_cmds/ftp/ftp01;1757;17;1 tcp_cmds/host/host01;4;20;1 tcp_cmds/netstat/netstat01;3211;0;1 tcp_cmds/ntwk_files/ntwk_files01;2614;6;1 tcp_cmds/perf_lan/perf_lan01;170;11;1 tcp_cmds/ping/ping01;163;11;1 tcp_cmds/rcp/rcp01;33;0;1 tcp_cmds/rdist/rdist01;278;1;1 tcp_cmds/rlogin/rlogin01;1050;5;1 tcp_cmds/rsh/rsh01;223;1;1 tcp_cmds/rsync/rsync01;1171;3;1 tcp_cmds/sendfile/sendfile01;449;8;1 tcp_cmds/ssh/ssh01;2424;6;1 tcp_cmds/telnet/telnet01;1413;3;1 [root@uli08 ~]#
------- Comment From dbabka.com 2010-12-23 12:51 EDT------- uli04 and uli08, with 2.6.18-238.el5 ppc64 kernel have been running BASE and TCP tests for ~58+ hrs. uli08 has a 99% success rate for both BASE and TCP. uli04 has a 99% success rate for BASE and a 96% success rate for TCP.
------- Comment From huachenl.com 2010-12-26 20:39 EDT------- uli04 and uli08, with 2.6.18-238.el5 ppc64 kernel have been running BASE and TCP tests for 137+ hrs. uli04 has a 99% success rate for BASE and a 97% success rate for TCP. uli08 has a 99% success rate for BASE and a 88% success rate for TCP. [root@uli04 ~]# show.report.py HOSTNAME KERNEL VERSION DISTRO INFO ------------------------ -------------- ----------- uli04.upt.austin.ibm.com 2.6.18-238.el5 Red Hat Enterprise Linux Server release 5.6 Beta (Tikanga) ######## Current Time: Sun Dec 26 19:35:14 2010 ######## Job-ID FOCUS Start-Time Duration Function ------ ----- ---------- -------- -------- 1 BASE 20101221-01:49:02 137.0 hr(s) 46.0 min(s) Test 2 TCP 20101221-01:49:06 137.0 hr(s) 46.0 min(s) networkTest2poplp2 FOCUS BASE TCP SUM TOTAL 214881 62966 277847 FAIL 1444 1842 3286 PASS 213437 61124 274561 (%) (99%) (97%) (98%) DLPAR is not tested! [root@uli04 ~]# [root@uli04 ~]# gss 2 Hostname : uli04 Kernel : 2.6.18-238.el5 Kernel Build Date : Sun Dec 19 13:29:51 CST 2010 Distribution : Red Hat -------- Job ID : 2 Focus Group : TCP XML File Name : /usr/local/staf/xml/uli042poplp2.tcp.xml Function : networkTest2poplp2 Arguments : null Start Date : 20101221 Start Time : 01:49:06 Clear Logs : Disabled Log TC Elapsed Time: Disabled Log TC Num Starts : Disabled Log TC Start/Stop : Disabled TCP Start Time: Tue Dec 21 01:49:02 CST 2010 Snapshot Time: Sun Dec 26 19:36:16 CST 2010 -------- multicast/mc_cmds/mc_cmds01;11847;1;1 multicast/mc_member/mc_member01;412;0;1 tcp_cmds/echo/echo01;1460;70;1 tcp_cmds/finger/finger01;1525;0;1 tcp_cmds/ftp/ftp01;5337;393;1 tcp_cmds/host/host01;5105;55;1 tcp_cmds/netstat/netstat01;5123;0;1 tcp_cmds/ntwk_files/ntwk_files01;8569;372;1 tcp_cmds/perf_lan/perf_lan01;468;195;1 tcp_cmds/ping/ping01;426;278;1 tcp_cmds/rcp/rcp01;118;0;1 tcp_cmds/rdist/rdist01;786;6;1 tcp_cmds/rlogin/rlogin01;3520;88;1 tcp_cmds/rsh/rsh01;694;174;1 tcp_cmds/rsync/rsync01;3232;10;1 tcp_cmds/sendfile/sendfile01;1282;23;1 tcp_cmds/ssh/ssh01;7023;83;1 tcp_cmds/telnet/telnet01;4203;94;1 [root@uli04 ~]# [root@uli08 ~]# show.report.py HOSTNAME KERNEL VERSION DISTRO INFO ------------------------ -------------- ----------- uli08.upt.austin.ibm.com 2.6.18-238.el5 Red Hat Enterprise Linux Server release 5.6 Beta (Tikanga) ######## Current Time: Sun Dec 26 19:35:52 2010 ######## Job-ID FOCUS Start-Time Duration Function ------ ----- ---------- -------- -------- 1 BASE 20101221-01:48:58 137.0 hr(s) 46.0 min(s) Test 2 TCP 20101221-01:49:02 137.0 hr(s) 46.0 min(s) networkTest2poplp2 FOCUS BASE TCP SUM TOTAL 212461 64057 276518 FAIL 1062 7198 8260 PASS 211399 56859 268258 (%) (99%) (88%) (97%) DLPAR is not tested! [root@uli08 ~]# gss 2 Hostname : uli08 Kernel : 2.6.18-238.el5 Kernel Build Date : Sun Dec 19 13:29:51 CST 2010 Distribution : Red Hat -------- Job ID : 2 Focus Group : TCP XML File Name : /usr/local/staf/xml/uli082poplp2.tcp.xml Function : networkTest2poplp2 Arguments : null Start Date : 20101221 Start Time : 01:49:02 Clear Logs : Disabled Log TC Elapsed Time: Disabled Log TC Num Starts : Disabled Log TC Start/Stop : Disabled TCP Start Time: Tue Dec 21 01:48:58 CST 2010 Snapshot Time: Sun Dec 26 19:36:29 CST 2010 -------- multicast/mc_cmds/mc_cmds01;10697;505;1 multicast/mc_member/mc_member01;412;0;1 tcp_cmds/echo/echo01;1406;617;1 tcp_cmds/finger/finger01;1400;0;1 tcp_cmds/ftp/ftp01;4964;487;1 tcp_cmds/host/host01;11;336;1 tcp_cmds/netstat/netstat01;9083;0;1 tcp_cmds/ntwk_files/ntwk_files01;7909;539;1 tcp_cmds/perf_lan/perf_lan01;469;1924;1 tcp_cmds/ping/ping01;450;721;1 tcp_cmds/rcp/rcp01;109;0;1 tcp_cmds/rdist/rdist01;759;497;1 tcp_cmds/rlogin/rlogin01;3302;357;1 tcp_cmds/rsh/rsh01;683;584;1 tcp_cmds/rsync/rsync01;3491;6;1 tcp_cmds/sendfile/sendfile01;1212;134;1 tcp_cmds/ssh/ssh01;6543;136;1 tcp_cmds/telnet/telnet01;3964;355;1 [root@uli08 ~]#
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html
*** Bug 641495 has been marked as a duplicate of this bug. ***