Description of problem: I was trying to debug kernel 3.10.9-20.fc19.x86_64 crash by using command: sudo crash /var/crash/127.0.0.1-2013.08.23-15\:17\:07/vmcore /usr/lib/debug/lib/modules/3.10.9-200.fc19.x86_64/vmlinux I used command "bt" to get the stack trace which made the debugger crash. The output of the session: crash: cannot determine thread return address KERNEL: /usr/lib/debug/lib/modules/3.10.9-200.fc19.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2013.08.23-15:17:07/vmcore [PARTIAL DUMP] CPUS: 2 DATE: Fri Aug 23 15:16:52 2013 UPTIME: 00:02:30 LOAD AVERAGE: 4.78, 1.41, 0.51 TASKS: 418 NODENAME: shadow RELEASE: 3.10.9-200.fc19.x86_64 VERSION: #1 SMP Wed Aug 21 19:27:58 UTC 2013 MACHINE: x86_64 (1596 Mhz) MEMORY: 3.9 GB PANIC: "Oops: 0000 [#1] SMP " (check log for details) PID: 0 COMMAND: "swapper/1" TASK: ffff880135a9cc40 (1 of 2) [THREAD_INFO: ffff880135b42000] CPU: 1 STATE: TASK_RUNNING (PANIC) crash> bt PID: 0 TASK: ffff880135a9cc40 CPU: 1 COMMAND: "swapper/1" #0 [ffff8801223e02d0] __schedule at ffffffff8163d631 The program crashed after printing the line above. Version-Release number of selected component: crash-6.1.4-1.fc19 Additional info: reporter: libreport-2.1.6 backtrace_rating: 4 cmdline: crash /var/crash/127.0.0.1-2013.08.23-15:17:07/vmcore /usr/lib/debug/lib/modules/3.10.9-200.fc19.x86_64/vmlinux crash_function: __schedule_frame_adjust executable: /usr/bin/crash kernel: 3.10.9-200.fc19.x86_64 runlevel: N 5 uid: 0 Truncated backtrace: Thread no. 1 (10 frames) #0 __schedule_frame_adjust at x86_64.c:7446 #1 x86_64_low_budget_back_trace_cmd at x86_64.c:3237 #2 back_trace at kernel.c:2509 #3 cmd_bt at kernel.c:2114 #4 exec_command at main.c:771 #5 main_loop at main.c:719 #6 captured_command_loop at ./main.c:228 #7 catch_errors at exceptions.c:531 #8 captured_main at ./main.c:958 #9 catch_errors at exceptions.c:531
Created attachment 789586 [details] File: backtrace
Created attachment 789587 [details] File: cgroup
Created attachment 789588 [details] File: core_backtrace
Created attachment 789589 [details] File: dso_list
Created attachment 789590 [details] File: environ
Created attachment 789591 [details] File: exploitable
Created attachment 789592 [details] File: limits
Created attachment 789593 [details] File: maps
Created attachment 789594 [details] File: open_fds
Created attachment 789595 [details] File: proc_pid_status
Created attachment 789596 [details] File: var_log_messages
What I really would like is the vmcore. Do you still have it?
Sure, the vmcore file can be found here: http://ganda.lf.fi/~kaitanie/tmp/vmcore.gz (too big to upload as an attachment to bugzilla)
> Sure, the vmcore file can be found here: OK, thanks I've got it. I haven't figured why exactly, but it has something to do with the memory corruption caused by the stack overflow that you can see in the "log" command. Or you can do a "set 9821", and then a "bt". The crash utility is not finding the real panic task because the kexec/kdump work in the kernel is being done from the page just underneath the overrun stack of pic 9821. (i.e., instead of on a legitimate stack page) And when the panic task is not found, crash just defaults to setting the initial task to pid 0 on cpu 0, which at least is guaranteed to exist. But for some reason, a stack address from the pid 9821 is mistakenly being used by pid 0, and when "bt" tries to unwind pid 0's stack, it creates a bogus offset value when mathematically using the "real" pid 0 base stack address in conjunction with the unrelated stack address from pid 9821. (and I'm currently trying to figure out why that's happening...) Anyway, it goes without saying that the crash utility shouldn't core dump, regardless of the contents of the vmcore. You can ditch the vmcore -- thanks again.
> And when the panic task is not found, crash just defaults to setting > the initial task to pid 0 on cpu 0, which at least is guaranteed > to exist. Except that this one is defaulting to the idle/swapper task 0 on cpu 1 instead of cpu 0, which presumably is related to the fact that the real panic task 9821 was also running on cpu 1. This is a bizarre (probably a one-time-only) dumpfile. The investigation continues...
The dump is related to bug 994824 which turned out to likely be caused by bug 917081. I haven't tried to produce the dump again (at least not yet) because I found a workaround that solves my original (apparently mei kernel module and suspend/resume related) problem.
I see that the system did a resume, but I'm wondering whether the crash happened immediately upon the resume?
The system didn't crash immediately. It sort of came back up at least partially. I could see the desktop but the system wouldn't react to keyboard and mouse input. After a couple of seconds or something the system started saving the crash dump.
OK thanks. The problem is that the runqueue for cpu 1 shows the swapper task as currently active, in conflict with the per-cpu "current_task" variable, which shows pid 9821 as the active task -- all complicated by the fact that there was no evidence of the crash occurring on pid 9821's stack because it overflowed and used the page below it.
Information for build crash-7.0.2-1.fc21: http://koji.fedoraproject.org/koji/buildinfo?buildID=461865