Bug 493517
Summary: | get_partstats() returns NULL and causes panic | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Lachlan McIlroy <lmcilroy> | |
Component: | kernel | Assignee: | Jerome Marchand <jmarchan> | |
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 5.3 | CC: | anton, cward, dejohnso, dzickus, jmarchan, jtluka, kzhang, tao, vgaikwad | |
Target Milestone: | rc | |||
Target Release: | 5.4 | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 517523 (view as bug list) | Environment: | ||
Last Closed: | 2010-03-30 07:18:33 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 517523, 533192 |
Description
Lachlan McIlroy
2009-04-02 03:56:27 UTC
probably, caused by recently adopted statistics patchset: linux-2.6-block-enhanced-partition-statistics-*.patch, in tree. was committed to 2.6.18-104.el5. Jerome, please, take a look. I have not really investigate yet, but it doesn't seems right that get_partstats() may return NULL at these places. I suspect some race condition. The problem is that we use get_partstats() without RCU read lock and we may end up trying to access an already freed structure. It happens on several places. Wait... That does not explain the bug in part_stat_reset() where RCU read lock is held. Hi Lachlan, I have two questions for you: What kind of block devices there are on this machine? Do you have any idea of what kind of workload ran on it when the crash occurred? Thanks, Jérôme Hi Jerome, I don't know the answer to those questions - the issue is waiting on support so hopefully we get a response. Maybe a disk was partitioned at the time iostat was run - is it possible we raced and found a new partition that wasn't completely setup? Lachlan This script was passed to me from mgoodwin. I tried it in a KVM and it didn't crash but might have more success on a bare-metal system. #! /bin/sh [ $# -ne 1 ] && echo Usage: $0 scratchdevice && exit 1 scratch=$1 while true; do parted $scratch mkpart p ext2 1 10 parted $scratch rm 1 done >/dev/null 2>&1 & while true; do grep crap /proc/partitions /proc/diskstats; done I've never been able to reproduce the delete_partition() bug and I don't understand how it could happen. Anyway, I will post a patch that corrects the faulty RCU unprotected accesses. It will at least fixes the other kind of panic. This problem has been seen by another customer... sdd:<1>Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: [<ffffffff80141291>] part_round_stats+0x13/0x8a PGD c5091c067 PUD c65aef067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:03.0/0000:02:00.1/irq CPU 10 Modules linked in: mptctl mptbase vfat fat usb_storage ipmi_devintf ipmi_si ipmi_msghandler dell_rbu autofs4 hidp rfcomm l2cap bluetooth lock_dlm gfs2 dlm configfs lockd sunrpc bonding cpufreq_ondemand acpi_cpufreq freq_table emcpdm(PU) emcpgpx(PU) emcpmpx(PU) emcp(PU) emcplib(U) dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev sr_mod cdrom sg bnx2 serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata shpchp megaraid_sas ext3 jbd uhci_hcd ohci_hcd ehci_hcd qla2xxx(U) sd_mod scsi_mod qla2xxx_conf(U) intermodule(U) Pid: 11401, comm: vol_id Tainted: P 2.6.18-164.el5 #1 RIP: 0010:[<ffffffff80141291>] [<ffffffff80141291>] part_round_stats+0x13/0x8a RSP: 0018:ffff810c680d99d8 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff810c7fc31680 RCX: b400000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff810c7fc31680 RBP: 00000001003c6b5a R08: 000000000000000f R09: ffff810116c78000 R10: ffff8106947689c8 R11: ffffffff801465ba R12: 0000000000000000 R13: 0000000000000002 R14: 0000000012bfef3f R15: 0000000000800011 FS: 000000001889a880(0063) GS:ffff8106951a7940(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000c63323000 CR4: 00000000000006e0 Process vol_id (pid: 11401, threadinfo ffff810c680d8000, task ffff810c6717f040) Stack: ffff810c7fc31680 ffff810c6b283c58 0000000000000000 ffffffff80141459 ffff810c6b283c58 ffff810c7e0900c8 0000000012bfef3f ffffffff8000c047 0000000000000001 0000000000000400 ffff810c647054c0 0000000000000282 Call Trace: [<ffffffff80141459>] drive_stat_acct+0x151/0x175 [<ffffffff8000c047>] __make_request+0x385/0x401 [<ffffffff8001c049>] generic_make_request+0x211/0x228 [<ffffffff80033472>] submit_bio+0xe4/0xeb [<ffffffff8001a793>] submit_bh+0xf1/0x111 [<ffffffff800e0a81>] block_read_full_page+0x259/0x276 [<ffffffff800e3d27>] blkdev_get_block+0x0/0x46 [<ffffffff8000c6dd>] add_to_page_cache+0xaa/0xc1 [<ffffffff80012ead>] __do_page_cache_readahead+0x140/0x179 [<ffffffff8003245d>] blockable_page_cache_readahead+0x53/0xb2 [<ffffffff80013f68>] page_cache_readahead+0x13d/0x1af [<ffffffff8000c189>] do_generic_mapping_read+0xc6/0x354 [<ffffffff8000d0b6>] file_read_actor+0x0/0x159 [<ffffffff8000c563>] __generic_file_aio_read+0x14c/0x198 [<ffffffff800c5145>] generic_file_read+0xac/0xc5 [<ffffffff8009f6c1>] autoremove_wake_function+0x0/0x2e [<ffffffff800b61b0>] audit_syscall_entry+0x180/0x1b3 [<ffffffff8000b695>] vfs_read+0xcb/0x171 [<ffffffff80011b72>] sys_read+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: 48 3b 68 28 49 89 c4 74 69 83 78 38 00 74 5e 48 89 df e8 22 RIP [<ffffffff80141291>] part_round_stats+0x13/0x8a RSP <ffff810c680d99d8> crash> dis part_round_stats 0xffffffff8014127e <part_round_stats>: push %r12 0xffffffff80141280 <part_round_stats+2>: push %rbp 0xffffffff80141281 <part_round_stats+3>: mov 2809848(%rip),%rbp # 0xffffffff803ef280 0xffffffff80141288 <part_round_stats+10>: push %rbx 0xffffffff80141289 <part_round_stats+11>: mov %rdi,%rbx 0xffffffff8014128c <part_round_stats+14>: callq 0xffffffff801445ca <get_partstats> 0xffffffff80141291 <part_round_stats+19>: cmp 0x28(%rax),%rbp <== here why panic %rax = 0 0xffffffff80141295 <part_round_stats+23>: mov %rax,%r12 0xffffffff80141298 <part_round_stats+26>: je 0xffffffff80141303 <part_round_stats+133> 0xffffffff8014129a <part_round_stats+28>: cmpl $0x0,0x38(%rax) 0xffffffff8014129e <part_round_stats+32>: je 0xffffffff801412fe <part_round_stats+128> crash> dis 0xffffffff801445ca 0xffffffff801445ca <get_partstats>: mov $0x3ffffffffc0001,%rax Shows get_partstats() is called right before this so the code snip analysis in description is same issue here. This customer is also using EMC storage. Will attach IT. NOTE: Latest customer is at RHEL 5.4 Elevating priority and severity to match ITs. This is now being seen on RHEL 5.4 and by more than one customer. Both customers are using EMC storage. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. @GSS We need to confirm that there is commitment to test for the resolution of this request during the RHEL 5.5 test phase, if it is accepted into the release. Please post a confirmation before Oct 16th, 2009, including the contact information for testing engineers. As this bug targets two different issues the delete_partition() panic is now tracked in bug 530066. This bug will track the part_round_stats() panic. I don't see how the posted patch will fix this issue. What I think is happening here is a race between the various users of part_round_stats() and rescan_partitions(). For example in diskstats_show() we have this sequence: struct hd_struct *hd = gp->part[n]; if (!hd || !hd->nr_sects) continue; preempt_disable(); part_round_stats(hd); preempt_enable(); Between assigning 'hd' and calling part_round_stats() another thread may have called rescan_partitions() and reloaded the partition table so our 'hd' is now old and doesn't match the new one and get_partstats() returns NULL. This bug has been fixed in BZ 495866. So what needs to be done to fix the same bug in drive_stat_acct(), blk_account_io_merge(), blk_account_io_done() and part_stats_read() is to call rcu_read_lock() before retrieving the partition which is mostly only a slight change to the posted patch. I guess you're right. I'll nack my patch and fix it. *** Bug 530066 has been marked as a duplicate of this bug. *** ~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~ RHEL 5.5 Beta has been released! There should be a fix present in this release that addresses your request. Please test and report back results here, by March 3rd 2010 (2010-03-03) or sooner. Upon successful verification of this request, post your results and update the Verified field in Bugzilla with the appropriate value. If you encounter any issues while testing, please describe them and set this bug into NEED_INFO. If you encounter new defects or have additional patch(es) to request for inclusion, please clone this bug per each request and escalate through your support representative. (In reply to comment #8) > This script was passed to me from mgoodwin. I tried it in a KVM and it didn't > crash but might have more success on a bare-metal system. > > #! /bin/sh > > [ $# -ne 1 ] && echo Usage: $0 scratchdevice && exit 1 > scratch=$1 > > while true; do > parted $scratch mkpart p ext2 1 10 > parted $scratch rm 1 > done >/dev/null 2>&1 & > > while true; do grep crap /proc/partitions /proc/diskstats; done Hi, Jerome, on what machine did you reproduce it? and how long the program run before the panic? I tried on beaker machine ibm-himalaya.rhts.eng.bos.redhat.com, it has 3 disks, and I used /dev/sdb as the parameter, the program has run for more than 20 minutes, and still does not trigger the bug. thanks in advance. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html |