Bug 1208557
| Summary: | crash-7.1.0-1.el6 spins at 'please wait... (gathering task table data)' when loading rhel6.4.z vmcore | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Dave Wysochanski <dwysocha> |
| Component: | crash | Assignee: | Dave Anderson <anderson> |
| Status: | CLOSED ERRATA | QA Contact: | Qiao Zhao <qzhao> |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 6.7 | CC: | anderson, atomlin, dwysocha, jherrman, kwalker, lilu, stalexan |
| Target Milestone: | rc | Keywords: | Regression, TestCaseProvided |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | crash-7.1.0-3.el6 | Doc Type: | Bug Fix |
| Doc Text: |
Attempting to run the crash utility with the vmcore and vmlinux files previously caused crash to enter an infinite loop and became unresponsive. With this update, the handling of errors when gathering tasks from pid_hash[] chains during session initialization has been enhanced. Now, if a pid_hash[] chain has been corrupted, the patch prevents the initialization sequence from entering an infinite loop. This prevents the described failure of the crash utility from occurring. In addition, the error messages associated with corrupt/invalid pid_hash[] chains have been updated to report the pid_hash[] index number.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-07-22 06:27:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Dave Wysochanski
2015-04-02 14:09:08 UTC
Thanks Dave, I'll take a look. Note that "crash --log vmcore" works, and it shows a serious memory corruption: $ crash --log vmcore ... [ cut ] ... 1>BUG: unable to handle kernel paging request at 00000000ad6fdfe0 <1>IP: [<ffffffff81056b14>] update_curr+0x144/0x1f0 <4>PGD 413df1067 PUD 0 <0>Thread overran stack, or stack corrupted <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/module/ipv6/initstate <4>CPU 5 <4>Modules linked in: mvfs(U) cdr(P)(U) nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 ppdev parport_pc parport sg vmware_balloon microcode vmxnet3 i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmw_pvscsi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 30737, comm: export_mvfs Tainted: P --------------- 2.6.32-358.23.2.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform <4>RIP: 0010:[<ffffffff81056b14>] [<ffffffff81056b14>] update_curr+0x144/0x1f0 <4>RSP: 0018:ffff880247403db8 EFLAGS: 00010082 <4>RAX: ffff880425fc2aa0 RBX: 0000000025760028 RCX: ffff88063d0de440 <4>RDX: 00000000000192d8 RSI: 0000000000000000 RDI: ffff880425fc2ad8 <4>RBP: ffff880247403de8 R08: ffffffff8160bb65 R09: 0000000000000000 <4>R10: 0000000000000010 R11: 0000000000000000 R12: ffff880247416768 <4>R13: 00000000000f4435 R14: 000000d543a6c0cf R15: ffff880425fc2aa0 <4>FS: 0000000000000000(0000) GS:ffff880247400000(0063) knlGS:00000000f77a06c0 <4>CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 <4>CR2: 00000000ad6fdfe0 CR3: 0000000425fb2000 CR4: 00000000000407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process export_mvfs (pid: 30737, threadinfo ffff880425760000, task ffff880425fc2aa0) <4>Stack: <4> ffff880247403dc8 ffffffff81013643 ffff880425fc2ad8 ffff880247416768 <4><d> 0000000000000000 0000000000000000 ffff880247403e18 ffffffff810570cb <4><d> ffff880247416700 0000000000000005 0000000000016700 0000000000000005 <4>Call Trace: <4> <IRQ> <4> [<ffffffff81013643>] ? native_sched_clock+0x13/0x80 <4> [<ffffffff810570cb>] task_tick_fair+0xdb/0x160 <4> [<ffffffff8105af11>] scheduler_tick+0xc1/0x260 <4> [<ffffffff810a8060>] ? tick_sched_timer+0x0/0xc0 <4> [<ffffffff810812fe>] update_process_times+0x6e/0x90 <4> [<ffffffff810a80c6>] tick_sched_timer+0x66/0xc0 <4> [<ffffffff8109b4ae>] __run_hrtimer+0x8e/0x1a0 <4> [<ffffffff810a219f>] ? ktime_get_update_offsets+0x4f/0xd0 <4> [<ffffffff8107710f>] ? __do_softirq+0x11f/0x1e0 <4> [<ffffffff8109b816>] hrtimer_interrupt+0xe6/0x260 <4> [<ffffffff8151785b>] smp_apic_timer_interrupt+0x6b/0x9b <4> [<ffffffff8100bb93>] apic_timer_interrupt+0x13/0x20 <4> <EOI> <4>Code: 00 8b 15 04 2b a4 00 85 d2 74 34 48 8b 50 08 8b 5a 18 48 8b 90 10 09 00 00 48 8b 4a 50 48 85 c9 74 1d 48 63 db 66 90 48 8b 51 20 <48> 03 14 dd a0 de bf 81 4c 01 2a 48 8b 49 78 48 85 c9 75 e8 48 <1>RIP [<ffffffff81056b14>] update_curr+0x144/0x1f0 <4> RSP <ffff880247403db8> <4>CR2: 00000000ad6fdfe0 $ Anyway, crash ends up re-reading the same location endlessly, so I'm guessing that there's some kind of corruption in the pid_hash area. > Note that "crash --log vmcore" works, and it shows a serious memory
> corruption:
and crash --minimal also works.
This behavior was introduced by a crash-7.0.9 patch, which fixed a
problem where tasks in a chain could get skipped:
- Fix for the one-time (dumpfile), or as-required (live system),
gathering of tasks from the kernel pid_hash[] in 2.6.24 and later
kernels. Without the patch, if an entry in a pid_hash[] chain is
not related to the "init_pid_ns" pid_namespace structure, any
remaining entries in the hlist chain are skipped.
(vvs)
Unfortunately, it has the side effect seen in this case when the
pid_hash[] chains has been corrupted, presumably due to prior
corruption of a kernel stack.
Running with an older rhel6 version (crash-6.1.0-5.el6) does initialize
like so:
$ /tmp/crash vm*
crash 6.1.0-5.el6
Copyright (C) 2002-2012 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
please wait... (gathering task table data)
crash: duplicate task in pid_hash: ffff88043cda23b0
crash: invalid task address: ffff88043cda23b0
please wait... (determining panic task)
WARNING: active task ffff880425fc2aa0 on cpu 5 not found in PID hash
WARNING: active task ffff880425fc2aa0 on cpu 5: corrupt cpu value: 628490280
KERNEL: vmlinux-2.6.32-358.23.2.el6
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 20
DATE: Sun Mar 29 07:03:01 2015
UPTIME: 00:15:15
LOAD AVERAGE: 2.62, 1.95, 1.11
TASKS: 227
NODENAME: pv0il0244.cc0.mercadona.es
RELEASE: 2.6.32-358.23.2.el6.x86_64
VERSION: #1 SMP Sat Sep 14 05:32:37 EDT 2013
MACHINE: x86_64 (2900 Mhz)
MEMORY: 32 GB
PANIC: "Oops: 0000 [#1] SMP " (check log for details)
PID: 30737
COMMAND: "export_mvfs"
TASK: ffff880425fc2aa0 [THREAD_INFO: ffff880425760000]
CPU: 5
STATE: TASK_RUNNING (PANIC)
crash>
I'll look into seeing how/if this can be recognized and handled, although
I worry that a "fix" may only address this specific instance of corruption.
Thanks Dave A! If it's a damaged task_struct / vmcore which causes crash to go bonkers here it may be 'low' priority but leaving 'medium' for now and setting 'regression' though that may make it sound too important. We've only seen it one time but we've not been been running crash-7.0.9 too long - installed Jan 20, 2015 so only a little over 2 months. If stack overflows trigger this bug then those do happen more on rhel6 from what I've seen but it's probably only a couple percent of vmcores.
There's a corruption associated with the pid_hash[62] chain that causes
the crash utility to go into an infinite loop.
The pid_hash[62] hlist_head structure points to an embedded hlist_node
in the first upid structure in the chain:
crash> p pid_hash[62]
$2 = {
first = 0xffff8804255f5c80
}
crash> struct upid -l upid.pid_chain 0xffff8804255f5c80
struct upid {
nr = -1607242614,
ns = 0xffffffff81aa31a0 <init_pid_ns>,
pid_chain = {
next = 0xffff880425761e48,
pprev = 0xffff88043cda28c0
}
}
The PID "nr" of -1607242614 is obviously not correct:
crash> eval -1607242614
hexadecimal: ffffffffa0336c8a
decimal: 18446744072102309002 (-1607242614)
octal: 1777777777764014666212
binary: 1111111111111111111111111111111110100000001100110110110010001010
crash> sym ffffffffa0336c8a
ffffffffa0336c8a (t) fsf_ops_lookup+58 [cdr]
crash> mod -t
NAME TAINTS
cdr P(U)
mvfs (U)
crash>
So it appears that the proprietary "cdr" module, along with the
unsigned "mvfs" module, are involved in the stack overrun of the
"export_mvfs" task:
crash> set
PID: 30737
COMMAND: "export_mvfs"
TASK: ffff880425fc2aa0 [THREAD_INFO: ffff880425760000]
CPU: 5
STATE: TASK_RUNNING (PANIC)
crash>
Dump the overrun stack contents from the bottom:
crash> bt -T
PID: 30737 TASK: ffff880425fc2aa0 CPU: 5 COMMAND: "export_mvfs"
[ffff880425760070] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257600a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257600b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257600e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257600f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760120] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760130] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760160] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760170] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257601a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257601b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257601e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257601f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760220] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760230] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760260] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760270] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257602a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257602b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257602e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257602f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760320] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760330] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760360] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760370] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257603a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257603b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257603e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257603f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760420] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760430] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760460] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760470] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257604a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257604b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257604e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257604f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760520] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760530] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760560] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760570] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257605a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257605b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257605e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257605f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760620] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760630] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760660] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760670] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257606a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257606b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257606e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257606f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760720] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760730] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760760] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760770] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257607a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257607b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257607e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257607f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760820] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760830] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760860] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760870] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257608a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257608b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257608e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257608f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760920] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760930] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760960] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760970] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257609a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257609b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257609e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257609f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760a20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760a30] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760a60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760a70] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760aa0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760ab0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760ae0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760af0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760b20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760b30] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760b60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760b70] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760ba0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760bb0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760be0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760bf0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760c20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760c30] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760c60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760c70] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760ca0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760cb0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760ce0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760cf0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760d20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760d30] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760d60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760d70] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760da0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760db0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760de0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760df0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760e20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760e30] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760e60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760e70] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760ea0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760eb0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760ee0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760ef0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760f20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760f30] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760f60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760f70] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760fa0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760fb0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425760fe0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425760ff0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761020] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761030] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761060] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761070] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257610a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257610b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257610e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257610f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761120] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761130] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761160] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761170] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257611a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257611b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257611e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257611f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761220] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761230] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761260] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761270] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257612a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257612b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257612e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257612f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761320] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761330] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761360] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761370] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257613a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257613b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257613e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257613f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761420] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761430] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761460] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761470] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257614a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257614b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257614e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257614f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761520] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761530] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761560] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761570] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257615a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257615b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257615e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257615f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761620] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761630] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761660] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761670] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257616a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257616b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257616e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257616f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761720] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761730] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761760] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761770] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257617a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257617b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257617e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257617f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761820] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761830] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761860] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761870] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257618a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257618b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257618e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257618f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761920] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761930] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761938] zone_statistics at ffffffff8113b579
[ffff880425761960] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761970] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257619a0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257619b0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff8804257619e0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff8804257619f0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761a20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761a30] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761a60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761a70] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761aa0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761ab0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761ae0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761af0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761b20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761b30] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761b60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761b70] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761ba0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761bb0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761be0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761bf0] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761c20] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761c30] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761c60] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761c70] fsf_ops_lookup at ffffffffa0336c8a [cdr]
[ffff880425761ca0] vnlayer_hijacked_lookup at ffffffffa039749d [mvfs]
[ffff880425761cb0] do_lookup at ffffffff81190865
[ffff880425761d10] __link_path_walk at ffffffff81191024
[ffff880425761d20] page_remove_rmap at ffffffff8114cf34
[ffff880425761d90] handle_mm_fault at ffffffff8114452a
[ffff880425761dd0] path_walk at ffffffff81191baa
[ffff880425761e10] do_path_lookup at ffffffff81191d7b
[ffff880425761e40] user_path_at at ffffffff81192a07
[ffff880425761ee0] security_prepare_creds at ffffffff8121c0e6
[ffff880425761f10] sys_faccessat at ffffffff8117f130
[ffff880425761f70] sys_access at ffffffff8117f248
[ffff880425761f80] sysenter_dispatch at ffffffff8104d830
RIP: 0000000000734430 RSP: 00000000fffaee0c RFLAGS: 00000296
RAX: 0000000000000021 RBX: ffffffff8104d830 RCX: 0000000000000000
RDX: 00000000008d3490 RSI: 00000000fffafe4c RDI: 0000000000000001
RBP: 00000000fffb03c8 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8117f248
R13: ffff880425761f78 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000021 CS: 0023 SS: 002b
crash>
Anyway, following the corrupt upid chain leads to the crash utility spin.
I have a couple of things I can add to prevent that from happening,
although in all probability, it's unlikely it will ever be seen again.
I guess you could call it a regression, but again, if a vmcore is corrupted
to the point where some of the basic requirements for the crash session
to come up are compromised, well, then shit like this can happen.
By "if a vmcore is corrupted", I mean "if the crashed system's memory was corrupted". The vmcore is fine. By the search command, you can see that the stack overrun continued downwards for 529 pages, or over 2MB worth of memory corruption. That's pretty impressive corruption right there... ;-)
And the corruption overwrote the memory containing that first upid
structure in the pid_hash[62] chain:
crash> struct upid -l upid.pid_chain 0xffff8804255f5c80
struct upid {
nr = -1607242614,
ns = 0xffffffff81aa31a0 <init_pid_ns>,
pid_chain = {
next = 0xffff880425761e48,
pprev = 0xffff88043cda28c0
}
}
crash> rd -S 0xffff8804255f5c80 100
ffff8804255f5c80: ffff880425761e48 [ext4_inode_cache]
ffff8804255f5c90: [dentry] [pid]
ffff8804255f5ca0: vnlayer_hijacked_lookup+44 [pid]
ffff8804255f5cb0: fsf_ops_lookup+58 0000000000000000
ffff8804255f5cc0: ffff880425761e48 [ext4_inode_cache]
ffff8804255f5cd0: [dentry] [pid]
ffff8804255f5ce0: vnlayer_hijacked_lookup+44 [pid]
ffff8804255f5cf0: fsf_ops_lookup+58 init_pid_ns
ffff8804255f5d00: ffff880425761e48 [ext4_inode_cache]
ffff8804255f5d10: [dentry] [pid]
ffff8804255f5d20: vnlayer_hijacked_lookup+44 [pid]
ffff8804255f5d30: fsf_ops_lookup+58 0000000000000000
... [ repeat ] ...
A fix for handling this type of kernel memory corruption has been applied upstream: https://github.com/crash-utility/crash/commit/39fffdc78c13c8a3464b373beac99a89c25456bc Fortified the error handling of task gathering from the pid_hash[] chains during session initialization. If a chain has been corrupted, the patch prevents the sequence from entering an infinite loop, and the error messages associated with corrupt/invalid chains have been updated to report the pid_hash[] index number. (anderson) Information for build crash-7.1.0-3.el6: https://brewweb.devel.redhat.com/buildinfo?buildID=430557 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1309.html |