Description of problem: There is no synchronization to protect NIU's get_stats method from seeing a NULL pointer in either np->rx_rings or np->tx_rings. In fact, as far as ->ndo_get_stats is concerned, these values are set completely asynchronously. The lack of synchronization causes this patch below: Unable to handle kernel NULL pointer dereference at 0000000000000058 RIP: [<ffffffff8827f10c>] :niu:niu_get_stats+0x2f/0xa4 PGD 3fbc60067 PUD 3fbc57067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /class/net/eth0/address CPU 5 Modules linked in: 8021q iptable_mangle ip_tables x_tables bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport tpm_infineon joydev sr_mod cdrom i5000_edac tpm_tis i2c_i801 tpm edac_mc tpm_bios i2c_core niu e1000e pcspkr sg serio_raw dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 6532, comm: ifconfig Not tainted 2.6.18-238.el5.0.DEBUG00413622 #1 RIP: 0010:[<ffffffff8827f10c>] [<ffffffff8827f10c>] :niu:niu_get_stats+0x2f/0xa4 RSP: 0018:ffff8103fbcf3e90 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff810418a84000 RCX: 0000000000000001 RDX: ffff810418a84500 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff810418851740 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000004 R11: 0000000000000000 R12: 000000000000068c R13: 0000000000001000 R14: 0000000000000000 R15: 00002b7b99cc0000 FS: 00002b7b99cce5e0(0000) GS:ffff81041fc7b4c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000058 CR3: 00000003fbc66000 CR4: 00000000000006e0 Process ifconfig (pid: 6532, threadinfo ffff8103fbcf2000, task ffff8103fc873820) Stack: ffffffff802335ed ffff810418851740 ffff810418851740 ffff810418a84000 ffffffff8003fa30 ffff8103fbcf3f50 ffff8103fce17a80 ffff810418851770 000000000000000d 000000000000000c ffff8103fce17a80 0000000000001000 Call Trace: [<ffffffff802335ed>] dev_seq_show+0x35/0xd0 [<ffffffff8003fa30>] seq_read+0x1b8/0x28c [<ffffffff8000b787>] vfs_read+0xcb/0x171 [<ffffffff80011c9a>] sys_read+0x45/0x6e [<ffffffff8005e28d>] tracesys+0xd5/0xe0 Code: 48 03 78 58 48 03 70 60 4c 03 48 68 4c 03 40 70 44 39 d1 7c RIP [<ffffffff8827f10c>] :niu:niu_get_stats+0x2f/0xa4 RSP <ffff8103fbcf3e90> Version-Release number of selected component (if applicable): kernel-2.6.18-238.el5 How reproducible: Frequently Steps to Reproduce: 1. run 'cat /proc/net/dev' in a busy loop to see the stats 2. then run ifup <niu_iface> Actual results: kernel panic Additional info: This has been worked upstream in the following threads: http://www.spinics.net/lists/netdev/msg154375.html http://www.spinics.net/lists/netdev/msg154405.html The result patch is merged upstream and works out for the customer. commit 9690c636ac118b6662f28308bee817343d9932d8 Author: David S. Miller <davem> Date: Thu Feb 3 16:12:50 2011 -0800 niu: Fix races between up/down and get_stats. As reported by Flavio Leitner, there is no synchronization to protect NIU's get_stats method from seeing a NULL pointer in either np->rx_rings or np->tx_rings. In fact, as far as ->ndo_get_stats is concerned, these values are set completely asynchronously. Flavio attempted to fix this using a RW semaphore, which in fact works most of the time. However, dev_get_stats() can be invoked from non-sleepable contexts in some cases, so this fix doesn't work in all cases. So instead, control the visibility of the np->{rx,tx}_ring pointers when the device is being brough up, and use properties of the device down sequence to our advantage. In niu_get_stats(), return immediately if netif_running() is false. The device shutdown sequence first marks the device as not running (by clearing the __LINK_STATE_START bit), then it performans a synchronize_rcu() (in dev_deactive_many()), and then finally it invokes the driver ->ndo_stop() method. This guarentees that all invocations of niu_get_stats() either see netif_running() as false, or they see the channel pointers before ->ndo_stop() clears them out. If netif_running() is true, protect against startup races by loading the np->{rx,tx}_rings pointer into a local variable, and punting if it is NULL. Use ACCESS_ONCE to prevent the compiler from reloading the pointer on us. Also, during open, control the order in which the pointers and the ring counts become visible globally using SMP write memory barriers. We make sure the np->num_{rx,tx}_rings value is stable and visible before np->{rx,tx}_rings is. Such visibility control is not necessary on the niu_free_channels() side because of the RCU sequencing that happens during device down as described above. We are always guarenteed that all niu_get_stats calls are finished, or will see netif_running() false, by the time ->ndo_stop is invoked. Reported-by: Flavio Leitner <fleitner> Signed-off-by: David S. Miller <davem>
Processes racing: PID: 8348 TASK: ffff8103eaffe7e0 CPU: 3 COMMAND: "ifenslave" #0 [ffff81041fcbbf20] crash_nmi_callback at ffffffff8007bce5 #1 [ffff81041fcbbf40] do_nmi at ffffffff800658c5 #2 [ffff81041fcbbf50] nmi at ffffffff80064eaf [exception RIP: niu_init_hw+2199] RIP: ffffffff8825c700 RSP: ffff8103ea48bc88 RFLAGS: 00000086 RAX: ffffc20013600160 RBX: 0000000000000000 RCX: 000000000000000c RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff81041df74500 R8: 0000000000000001 R9: 0000000000000002 R10: 0000e00800000000 R11: 0000000000000006 R12: ffff81041df7565c R13: ffff81041df755d4 R14: 0000000000000160 R15: 000000000000002c ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #3 [ffff8103ea48bc88] niu_init_hw at ffffffff8825c700 #4 [ffff8103ea48bcc0] niu_open at ffffffff88263704 #5 [ffff8103ea48bcf0] dev_open at ffffffff80234b54 #6 [ffff8103ea48bd00] bond_enslave at ffffffff885df6a5 #7 [ffff8103ea48bd80] bond_do_ioctl at ffffffff885e165c #8 [ffff8103ea48bde0] dev_ioctl at ffffffff802347a5 #9 [ffff8103ea48be90] sock_ioctl at ffffffff8022a58c #10 [ffff8103ea48beb0] do_ioctl at ffffffff8004226a #11 [ffff8103ea48bed0] vfs_ioctl at ffffffff8003026e #12 [ffff8103ea48bf40] sys_ioctl at ffffffff8004c73b #13 [ffff8103ea48bf80] tracesys at ffffffff8005d28d (via system_call) RIP: 0000003c2e6cc647 RSP: 00007fffac465908 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: ffffffff8005d28d RCX: ffffffffffffffff RDX: 00007fffac465b00 RSI: 0000000000008990 RDI: 000000000000000b RBP: 00007fffac4670c3 R8: 00007fffac465b15 R9: 0000000000000003 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007fffac4670c9 R15: 00007fffac465b00 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b PID: 8444 TASK: ffff8103eafa9100 CPU: 4 COMMAND: "ifconfig" #0 [ffff8103ea34dbf0] crash_kexec at ffffffff800af85a #1 [ffff8103ea34dcb0] __die at ffffffff80065117 #2 [ffff8103ea34dcf0] do_page_fault at ffffffff8006748d #3 [ffff8103ea34dde0] error_exit at ffffffff8005dde9 [exception RIP: niu_get_stats+121] RIP: ffffffff88262156 RSP: ffff8103ea34de90 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff81041df74000 RCX: 0000000000000001 RDX: ffff81041df74500 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff81041d702e40 R8: 0000000000000000 R9: 0000000000000006 R10: 0000000000000004 R11: 0000000000000000 R12: 00000000000002b4 R13: 0000000000001000 R14: 0000000000000000 R15: 00002b3ff19f4000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #4 [ffff8103ea34de90] dev_seq_show at ffffffff8023256d #5 [ffff8103ea34deb0] seq_read at ffffffff8003f858 #6 [ffff8103ea34df10] vfs_read at ffffffff8000b787 #7 [ffff8103ea34df40] sys_read at ffffffff80011c5c #8 [ffff8103ea34df80] tracesys at ffffffff8005d28d (via system_call) RIP: 0000003c2e6c5ff0 RSP: 00007fff078ab918 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: ffffffff8005d28d RCX: ffffffffffffffff RDX: 0000000000001000 RSI: 00002b3ff19f4000 RDI: 0000000000000005 RBP: 000000001e6005a0 R8: 00000000ffffffff R9: 0000000000000000 R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 000000001e6005a0 R15: 00000000078ab9a8 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Adding Stefan to the CC list as he is the niu maintainer and should have been assigned this bug.
in kernel-2.6.18-246.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Reproduced on kernel-2.6.18-238.el5, execute "while true; do cat /proc/net/dev; done", kernel panic was hit after several ifup and ifdown on niu NIC. On kernel-2.6.18-246.el5, running "while true; do cat /proc/net/dev; done" and "while true; do ifdown eth5; sleep 3; ifup eth5; sleep 3 ; done" in two ssh sessions, no panic is triggered in test test for about an hour. Set Verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html