Bug 517329
Summary: | [RHEL4.8] igb driver doesn't allocate enough buffer for ethtool_get_strings() | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Flavio Leitner <fleitner> | ||||
Component: | kernel | Assignee: | Stefan Assmann <sassmann> | ||||
Status: | CLOSED ERRATA | QA Contact: | Evan McNabb <emcnabb> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 4.8 | CC: | bobby.suber, daniel_frazier, dannf, david.graham, dhoward, emcnabb, peterm, sandy.garza, tao | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
The data buffer ethtool_get_strings() allocated, for the igb driver, was smaller than the amount of data that was copied in igb_get_strings(), because of a miscalculation in IGB_QUEUE_STATS_LEN, resulting in memory corruption. This bug could have led to a kernel panic.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-02-16 15:46:23 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 522738 | ||||||
Attachments: |
|
Description
Flavio Leitner
2009-08-13 14:36:51 UTC
I've been looking into this with the Intel igb author, who points out a clear bug in the define of IGB_QUEUE_STATS_LEN, and that becuase TX-stats are not recorded in this driver, the TX queue -relevant stuff can be removed. The changes are required in drivers/net/igb/igb_ethtool.c The bug is in the check against the number of queues, which increments by 0! if there is exactly one RX or TX quueue. Because TXStats are not recoded, we should change this:. 098 #define IGB_QUEUE_STATS_LEN \ 099 ((((((struct igb_adapter *)netdev->priv)->num_rx_queues > 1) ? \ 100 ((struct igb_adapter *)netdev->priv)->num_rx_queues : 0) + \ 101 (((((struct igb_adapter *)netdev->priv)->num_tx_queues > 1) ? \ 102 ((struct igb_adapter *)netdev->priv)->num_tx_queues : 0))) * \ 103 (sizeof(struct igb_queue_stats) / sizeof(u64))) to this: 098 #define IGB_QUEUE_STATS_LEN 099 ((((struct igb_adapter *)netdev->priv)->num_rx_queues) * (sizeof(struct igb_queue_stats) / sizeof(u64))) And , again, because there are no TXQueue stats recorded, this should be removed: 1864 for (i = 0; i < adapter->num_tx_queues; i++) { 1865 sprintf(p, "tx_queue_%u_packets", i); 1866 p += ETH_GSTRING_LEN; 1867 sprintf(p, "tx_queue_%u_bytes", i); 1868 p += ETH_GSTRING_LEN; 1869 } I have not yet has time to verify that this resolves the issue, but hope it helps. Could we also add an assertion, e.g.: BUG_ON(p - data >= igb_get_stats_count(dev) * ETH_GSTRING_LEN) The changes I proposed above are based on the original driver in the RHEL 2.6.9-78 source, and the issue addressed is actually __fixed__ in the igb-1.3.8.6 module that is installed as part of the PSP that exposed the issue. It is unfortunate that the installation of the new fixed driver exposes an issue in the older one. I am not sure how (or if) it should be addressed. As long as the newer (igb-1.3.8.6) driver is installed 'enough' to be loadable on the next system boot, at least this one panic will be the last one cause by the old driver. Possibly there are actions that the PSP install could take to ensure that the old driver does not receive an etthool -S query before the reboot has completed ? The assertion is a good idea, but should actually use ">" rather than ">=" BUG_ON(p - data > igb_get_stats_count(dev) * ETH_GSTRING_LEN) I tried to reproduce the issue with an Intel 82576 NIC on RHEL4. I followed the description in the initial comment as close as possible, however in Step 4 I had to execute install825.sh instead of install826.sh and didn't get asked for Step 5. The other difference is that I tested on x86_64. I ran "for i in `seq 100`; do /etc/init.d/network restart; done" but the machine didn't crash it only displayed Shutting down interface eth1: [ OK ] Shutting down loopback interface: [ OK ] Setting network parameters: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth1: [ OK ] SIOCGIFXQLEN: No such device Shutting down interface eth1: [ OK ] Shutting down loopback interface: [ OK ] Setting network parameters: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth1: [ OK ] SIOCGIFXQLEN: No such device Shutting down interface eth1: [ OK ] Shutting down loopback interface: [ OK ] Setting network parameters: [ OK ] Will I need any special hardware? Additional info: dmesg |grep eth divert: not allocating divert_blk for non-ethernet device lo divert: allocating divert_blk for eth0 igb 0000:03:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:1b:21:38:2f:50 igb 0000:03:00.0: eth0: PBA No: e19418-006 divert: allocating divert_blk for eth1 igb 0000:03:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:1b:21:38:2f:51 igb 0000:03:00.1: eth1: PBA No: e19418-006 divert: allocating divert_blk for eth2 eth2: Tigon3 [partno(BCM95754) rev 5784100 PHY(5784)] (PCI Express) 10/100/1000Base-T Ethernet 00:1a:4b:0c:d8:7a eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] TSOcap[1] eth2: dma_rwctrl[76180000] dma_mask[64-bit] divert: not allocating divert_blk for non-ethernet device sit0 eth1: no IPv6 routers present restarting the network is not sufficient. the corruption comes as a result of an ioctl that only gets called when hp-snmp-agents starts. To reproduce, you need to also restart that service. while :; do /etc/init.d/hp-snmp-agents restart # causes corruption /etc/init.d/network start # trips over corruption and crashes done Created attachment 358683 [details] bug517329-igb-buffer-fix.patch I was able to reproduce the problem and can confirm that with this patch no oops occurs. If you want to have anything else added to this patch let me know. when will this be fixed upstream? I believe it is already fixed upstream in: commit e21ed3538f1946ea623caf28f1c44ede50224275 Author: Alexander Duyck <alexander.h.duyck> igb: update ethtool stats to support multiqueue Which was in 2.6.27-rc1. Thanks Dann! All my questions have been addressed. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Committed in 89.12.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The data buffer ethtool_get_strings() allocated, for the igb driver, was smaller than the amount of data that was copied in igb_get_strings(), because of a miscalculation in IGB_QUEUE_STATS_LEN, resulting in memory corruption. This bug could have led to a kernel panic. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html |