Bug 1208557
Summary: | crash-7.1.0-1.el6 spins at 'please wait... (gathering task table data)' when loading rhel6.4.z vmcore | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dave Wysochanski <dwysocha> |
Component: | crash | Assignee: | Dave Anderson <anderson> |
Status: | CLOSED ERRATA | QA Contact: | Qiao Zhao <qzhao> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 6.7 | CC: | anderson, atomlin, dwysocha, jherrman, kwalker, lilu, stalexan |
Target Milestone: | rc | Keywords: | Regression, TestCaseProvided |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | crash-7.1.0-3.el6 | Doc Type: | Bug Fix |
Doc Text: |
Attempting to run the crash utility with the vmcore and vmlinux files previously caused crash to enter an infinite loop and became unresponsive. With this update, the handling of errors when gathering tasks from pid_hash[] chains during session initialization has been enhanced. Now, if a pid_hash[] chain has been corrupted, the patch prevents the initialization sequence from entering an infinite loop. This prevents the described failure of the crash utility from occurring. In addition, the error messages associated with corrupt/invalid pid_hash[] chains have been updated to report the pid_hash[] index number.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2015-07-22 06:27:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dave Wysochanski
2015-04-02 14:09:08 UTC
Thanks Dave, I'll take a look. Note that "crash --log vmcore" works, and it shows a serious memory corruption: $ crash --log vmcore ... [ cut ] ... 1>BUG: unable to handle kernel paging request at 00000000ad6fdfe0 <1>IP: [<ffffffff81056b14>] update_curr+0x144/0x1f0 <4>PGD 413df1067 PUD 0 <0>Thread overran stack, or stack corrupted <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/module/ipv6/initstate <4>CPU 5 <4>Modules linked in: mvfs(U) cdr(P)(U) nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ppdev parport_pc parport sg vmware_balloon microcode vmxnet3 i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmw_pvscsi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 30737, comm: export_mvfs Tainted: P --------------- 2.6.32-358.23.2.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform <4>RIP: 0010:[<ffffffff81056b14>] [<ffffffff81056b14>] update_curr+0x144/0x1f0 <4>RSP: 0018:ffff880247403db8 EFLAGS: 00010082 <4>RAX: ffff880425fc2aa0 RBX: 0000000025760028 RCX: ffff88063d0de440 <4>RDX: 00000000000192d8 RSI: 0000000000000000 RDI: ffff880425fc2ad8 <4>RBP: ffff880247403de8 R08: ffffffff8160bb65 R09: 0000000000000000 <4>R10: 0000000000000010 R11: 0000000000000000 R12: ffff880247416768 <4>R13: 00000000000f4435 R14: 000000d543a6c0cf R15: ffff880425fc2aa0 <4>FS: 0000000000000000(0000) GS:ffff880247400000(0063) knlGS:00000000f77a06c0 <4>CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 <4>CR2: 00000000ad6fdfe0 CR3: 0000000425fb2000 CR4: 00000000000407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process export_mvfs (pid: 30737, threadinfo ffff880425760000, task ffff880425fc2aa0) <4>Stack: <4> ffff880247403dc8 ffffffff81013643 ffff880425fc2ad8 ffff880247416768 <4><d> 0000000000000000 0000000000000000 ffff880247403e18 ffffffff810570cb <4><d> ffff880247416700 0000000000000005 0000000000016700 0000000000000005 <4>Call Trace: <4> <IRQ> <4> [<ffffffff81013643>] ? native_sched_clock+0x13/0x80 <4> [<ffffffff810570cb>] task_tick_fair+0xdb/0x160 <4> [<ffffffff8105af11>] scheduler_tick+0xc1/0x260 <4> [<ffffffff810a8060>] ? tick_sched_timer+0x0/0xc0 <4> [<ffffffff810812fe>] update_process_times+0x6e/0x90 <4> [<ffffffff810a80c6>] tick_sched_timer+0x66/0xc0 <4> [<ffffffff8109b4ae>] __run_hrtimer+0x8e/0x1a0 <4> [<ffffffff810a219f>] ? ktime_get_update_offsets+0x4f/0xd0 <4> [<ffffffff8107710f>] ? __do_softirq+0x11f/0x1e0 <4> [<ffffffff8109b816>] hrtimer_interrupt+0xe6/0x260 <4> [<ffffffff8151785b>] smp_apic_timer_interrupt+0x6b/0x9b <4> [<ffffffff8100bb93>] apic_timer_interrupt+0x13/0x20 <4> <EOI> <4>Code: 00 8b 15 04 2b a4 00 85 d2 74 34 48 8b 50 08 8b 5a 18 48 8b 90 10 09 00 00 48 8b 4a 50 48 85 c9 74 1d 48 63 db 66 90 48 8b 51 20 <48> 03 14 dd a0 de bf 81 4c 01 2a 48 8b 49 78 48 85 c9 75 e8 48 <1>RIP [<ffffffff81056b14>] update_curr+0x144/0x1f0 <4> RSP <ffff880247403db8> <4>CR2: 00000000ad6fdfe0 $ Anyway, crash ends up re-reading the same location endlessly, so I'm guessing that there's some kind of corruption in the pid_hash area. > Note that "crash --log vmcore" works, and it shows a serious memory
> corruption:
and crash --minimal also works.
This behavior was introduced by a crash-7.0.9 patch, which fixed a problem where tasks in a chain could get skipped: - Fix for the one-time (dumpfile), or as-required (live system), gathering of tasks from the kernel pid_hash[] in 2.6.24 and later kernels. Without the patch, if an entry in a pid_hash[] chain is not related to the "init_pid_ns" pid_namespace structure, any remaining entries in the hlist chain are skipped. (vvs) Unfortunately, it has the side effect seen in this case when the pid_hash[] chains has been corrupted, presumably due to prior corruption of a kernel stack. Running with an older rhel6 version (crash-6.1.0-5.el6) does initialize like so: $ /tmp/crash vm* crash 6.1.0-5.el6 Copyright (C) 2002-2012 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.3.1 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... please wait... (gathering task table data) crash: duplicate task in pid_hash: ffff88043cda23b0 crash: invalid task address: ffff88043cda23b0 please wait... (determining panic task) WARNING: active task ffff880425fc2aa0 on cpu 5 not found in PID hash WARNING: active task ffff880425fc2aa0 on cpu 5: corrupt cpu value: 628490280 KERNEL: vmlinux-2.6.32-358.23.2.el6 DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 20 DATE: Sun Mar 29 07:03:01 2015 UPTIME: 00:15:15 LOAD AVERAGE: 2.62, 1.95, 1.11 TASKS: 227 NODENAME: pv0il0244.cc0.mercadona.es RELEASE: 2.6.32-358.23.2.el6.x86_64 VERSION: #1 SMP Sat Sep 14 05:32:37 EDT 2013 MACHINE: x86_64 (2900 Mhz) MEMORY: 32 GB PANIC: "Oops: 0000 [#1] SMP " (check log for details) PID: 30737 COMMAND: "export_mvfs" TASK: ffff880425fc2aa0 [THREAD_INFO: ffff880425760000] CPU: 5 STATE: TASK_RUNNING (PANIC) crash> I'll look into seeing how/if this can be recognized and handled, although I worry that a "fix" may only address this specific instance of corruption. Thanks Dave A! If it's a damaged task_struct / vmcore which causes crash to go bonkers here it may be 'low' priority but leaving 'medium' for now and setting 'regression' though that may make it sound too important. We've only seen it one time but we've not been been running crash-7.0.9 too long - installed Jan 20, 2015 so only a little over 2 months. If stack overflows trigger this bug then those do happen more on rhel6 from what I've seen but it's probably only a couple percent of vmcores. There's a corruption associated with the pid_hash[62] chain that causes the crash utility to go into an infinite loop. The pid_hash[62] hlist_head structure points to an embedded hlist_node in the first upid structure in the chain: crash> p pid_hash[62] $2 = { first = 0xffff8804255f5c80 } crash> struct upid -l upid.pid_chain 0xffff8804255f5c80 struct upid { nr = -1607242614, ns = 0xffffffff81aa31a0 <init_pid_ns>, pid_chain = { next = 0xffff880425761e48, pprev = 0xffff88043cda28c0 } } The PID "nr" of -1607242614 is obviously not correct: crash> eval -1607242614 hexadecimal: ffffffffa0336c8a decimal: 18446744072102309002 (-1607242614) octal: 1777777777764014666212 binary: 1111111111111111111111111111111110100000001100110110110010001010 crash> sym ffffffffa0336c8a ffffffffa0336c8a (t) fsf_ops_lookup+58 [cdr] crash> mod -t NAME TAINTS cdr P(U) mvfs (U) crash> So it appears that the proprietary "cdr" module, along with the unsigned "mvfs" module, are involved in the stack overrun of the "export_mvfs" task: crash> set PID: 30737 COMMAND: "export_mvfs" TASK: ffff880425fc2aa0 [THREAD_INFO: ffff880425760000] CPU: 5 STATE: TASK_RUNNING (PANIC) crash> Dump the overrun stack contents from the bottom: crash> bt -T PID: 30737 TASK: ffff880425fc2aa0 CPU: 5 COMMAND: "export_mvfs" [ffff880425760070] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257600a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257600b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257600e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257600f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760120] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760130] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760160] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760170] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257601a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257601b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257601e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257601f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760220] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760230] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760260] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760270] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257602a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257602b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257602e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257602f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760320] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760330] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760360] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760370] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257603a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257603b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257603e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257603f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760420] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760430] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760460] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760470] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257604a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257604b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257604e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257604f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760520] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760530] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760560] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760570] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257605a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257605b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257605e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257605f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760620] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760630] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760660] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760670] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257606a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257606b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257606e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257606f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760720] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760730] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760760] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760770] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257607a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257607b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257607e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257607f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760820] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760830] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760860] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760870] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257608a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257608b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257608e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257608f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760920] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760930] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760960] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760970] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257609a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257609b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257609e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257609f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760a20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760a30] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760a60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760a70] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760aa0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760ab0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760ae0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760af0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760b20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760b30] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760b60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760b70] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760ba0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760bb0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760be0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760bf0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760c20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760c30] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760c60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760c70] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760ca0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760cb0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760ce0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760cf0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760d20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760d30] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760d60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760d70] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760da0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760db0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760de0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760df0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760e20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760e30] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760e60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760e70] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760ea0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760eb0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760ee0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760ef0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760f20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760f30] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760f60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760f70] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760fa0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760fb0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425760fe0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425760ff0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761020] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761030] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761060] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761070] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257610a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257610b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257610e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257610f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761120] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761130] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761160] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761170] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257611a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257611b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257611e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257611f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761220] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761230] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761260] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761270] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257612a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257612b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257612e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257612f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761320] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761330] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761360] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761370] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257613a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257613b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257613e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257613f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761420] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761430] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761460] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761470] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257614a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257614b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257614e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257614f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761520] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761530] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761560] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761570] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257615a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257615b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257615e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257615f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761620] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761630] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761660] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761670] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257616a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257616b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257616e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257616f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761720] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761730] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761760] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761770] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257617a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257617b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257617e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257617f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761820] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761830] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761860] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761870] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257618a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257618b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257618e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257618f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761920] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761930] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761938] zone_statistics at ffffffff8113b579 [ffff880425761960] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761970] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257619a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257619b0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff8804257619e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff8804257619f0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761a20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761a30] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761a60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761a70] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761aa0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761ab0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761ae0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761af0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761b20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761b30] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761b60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761b70] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761ba0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761bb0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761be0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761bf0] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761c20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761c30] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761c60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761c70] fsf_ops_lookup at ffffffffa0336c8a [cdr] [ffff880425761ca0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs] [ffff880425761cb0] do_lookup at ffffffff81190865 [ffff880425761d10] __link_path_walk at ffffffff81191024 [ffff880425761d20] page_remove_rmap at ffffffff8114cf34 [ffff880425761d90] handle_mm_fault at ffffffff8114452a [ffff880425761dd0] path_walk at ffffffff81191baa [ffff880425761e10] do_path_lookup at ffffffff81191d7b [ffff880425761e40] user_path_at at ffffffff81192a07 [ffff880425761ee0] security_prepare_creds at ffffffff8121c0e6 [ffff880425761f10] sys_faccessat at ffffffff8117f130 [ffff880425761f70] sys_access at ffffffff8117f248 [ffff880425761f80] sysenter_dispatch at ffffffff8104d830 RIP: 0000000000734430 RSP: 00000000fffaee0c RFLAGS: 00000296 RAX: 0000000000000021 RBX: ffffffff8104d830 RCX: 0000000000000000 RDX: 00000000008d3490 RSI: 00000000fffafe4c RDI: 0000000000000001 RBP: 00000000fffb03c8 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8117f248 R13: ffff880425761f78 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 0000000000000021 CS: 0023 SS: 002b crash> Anyway, following the corrupt upid chain leads to the crash utility spin. I have a couple of things I can add to prevent that from happening, although in all probability, it's unlikely it will ever be seen again. I guess you could call it a regression, but again, if a vmcore is corrupted to the point where some of the basic requirements for the crash session to come up are compromised, well, then shit like this can happen. By "if a vmcore is corrupted", I mean "if the crashed system's memory was corrupted". The vmcore is fine. By the search command, you can see that the stack overrun continued downwards for 529 pages, or over 2MB worth of memory corruption. That's pretty impressive corruption right there... ;-) And the corruption overwrote the memory containing that first upid structure in the pid_hash[62] chain: crash> struct upid -l upid.pid_chain 0xffff8804255f5c80 struct upid { nr = -1607242614, ns = 0xffffffff81aa31a0 <init_pid_ns>, pid_chain = { next = 0xffff880425761e48, pprev = 0xffff88043cda28c0 } } crash> rd -S 0xffff8804255f5c80 100 ffff8804255f5c80: ffff880425761e48 [ext4_inode_cache] ffff8804255f5c90: [dentry] [pid] ffff8804255f5ca0: vnlayer_hijacked_lookup+44 [pid] ffff8804255f5cb0: fsf_ops_lookup+58 0000000000000000 ffff8804255f5cc0: ffff880425761e48 [ext4_inode_cache] ffff8804255f5cd0: [dentry] [pid] ffff8804255f5ce0: vnlayer_hijacked_lookup+44 [pid] ffff8804255f5cf0: fsf_ops_lookup+58 init_pid_ns ffff8804255f5d00: ffff880425761e48 [ext4_inode_cache] ffff8804255f5d10: [dentry] [pid] ffff8804255f5d20: vnlayer_hijacked_lookup+44 [pid] ffff8804255f5d30: fsf_ops_lookup+58 0000000000000000 ... [ repeat ] ... A fix for handling this type of kernel memory corruption has been applied upstream: https://github.com/crash-utility/crash/commit/39fffdc78c13c8a3464b373beac99a89c25456bc Fortified the error handling of task gathering from the pid_hash[] chains during session initialization. If a chain has been corrupted, the patch prevents the sequence from entering an infinite loop, and the error messages associated with corrupt/invalid chains have been updated to report the pid_hash[] index number. (anderson) Information for build crash-7.1.0-3.el6: https://brewweb.devel.redhat.com/buildinfo?buildID=430557 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1309.html |