Bug 1000440 - [abrt] crash-6.1.4-1.fc19: __schedule_frame_adjust: Process /usr/bin/crash was killed by signal 11 (SIGSEGV)
[abrt] crash-6.1.4-1.fc19: __schedule_frame_adjust: Process /usr/bin/crash wa...
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: crash (Show other bugs)
19
x86_64 Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Dave Anderson
Fedora Extras Quality Assurance
abrt_hash:fd6fee784cda609509289218d24...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-23 08:54 EDT by Pekka Kaitaniemi
Modified: 2013-12-16 14:07 EST (History)
1 user (show)

See Also:
Fixed In Version: crash-7.0.2-1.fc21
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-16 14:07:54 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
File: backtrace (82.43 KB, text/plain)
2013-08-23 08:54 EDT, Pekka Kaitaniemi
no flags Details
File: cgroup (140 bytes, text/plain)
2013-08-23 08:54 EDT, Pekka Kaitaniemi
no flags Details
File: core_backtrace (4.00 KB, text/plain)
2013-08-23 08:54 EDT, Pekka Kaitaniemi
no flags Details
File: dso_list (962 bytes, text/plain)
2013-08-23 08:54 EDT, Pekka Kaitaniemi
no flags Details
File: environ (2.10 KB, text/plain)
2013-08-23 08:54 EDT, Pekka Kaitaniemi
no flags Details
File: exploitable (82 bytes, text/plain)
2013-08-23 08:54 EDT, Pekka Kaitaniemi
no flags Details
File: limits (1.29 KB, text/plain)
2013-08-23 08:54 EDT, Pekka Kaitaniemi
no flags Details
File: maps (4.60 KB, text/plain)
2013-08-23 08:54 EDT, Pekka Kaitaniemi
no flags Details
File: open_fds (335 bytes, text/plain)
2013-08-23 08:54 EDT, Pekka Kaitaniemi
no flags Details
File: proc_pid_status (898 bytes, text/plain)
2013-08-23 08:54 EDT, Pekka Kaitaniemi
no flags Details
File: var_log_messages (1.88 KB, text/plain)
2013-08-23 08:54 EDT, Pekka Kaitaniemi
no flags Details

  None (edit)
Description Pekka Kaitaniemi 2013-08-23 08:54:18 EDT
Description of problem:
I was trying to debug kernel 3.10.9-20.fc19.x86_64 crash by using command:
sudo crash /var/crash/127.0.0.1-2013.08.23-15\:17\:07/vmcore /usr/lib/debug/lib/modules/3.10.9-200.fc19.x86_64/vmlinux

I used command "bt" to get the stack trace which made the debugger crash.

The output of the session:
crash: cannot determine thread return address
      KERNEL: /usr/lib/debug/lib/modules/3.10.9-200.fc19.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2013.08.23-15:17:07/vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Fri Aug 23 15:16:52 2013
      UPTIME: 00:02:30
LOAD AVERAGE: 4.78, 1.41, 0.51
       TASKS: 418
    NODENAME: shadow
     RELEASE: 3.10.9-200.fc19.x86_64
     VERSION: #1 SMP Wed Aug 21 19:27:58 UTC 2013
     MACHINE: x86_64  (1596 Mhz)
      MEMORY: 3.9 GB
       PANIC: "Oops: 0000 [#1] SMP " (check log for details)
         PID: 0
     COMMAND: "swapper/1"
        TASK: ffff880135a9cc40  (1 of 2)  [THREAD_INFO: ffff880135b42000]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 0      TASK: ffff880135a9cc40  CPU: 1   COMMAND: "swapper/1"
 #0 [ffff8801223e02d0] __schedule at ffffffff8163d631

The program crashed after printing the line above.

Version-Release number of selected component:
crash-6.1.4-1.fc19

Additional info:
reporter:       libreport-2.1.6
backtrace_rating: 4
cmdline:        crash /var/crash/127.0.0.1-2013.08.23-15:17:07/vmcore /usr/lib/debug/lib/modules/3.10.9-200.fc19.x86_64/vmlinux
crash_function: __schedule_frame_adjust
executable:     /usr/bin/crash
kernel:         3.10.9-200.fc19.x86_64
runlevel:       N 5
uid:            0

Truncated backtrace:
Thread no. 1 (10 frames)
 #0 __schedule_frame_adjust at x86_64.c:7446
 #1 x86_64_low_budget_back_trace_cmd at x86_64.c:3237
 #2 back_trace at kernel.c:2509
 #3 cmd_bt at kernel.c:2114
 #4 exec_command at main.c:771
 #5 main_loop at main.c:719
 #6 captured_command_loop at ./main.c:228
 #7 catch_errors at exceptions.c:531
 #8 captured_main at ./main.c:958
 #9 catch_errors at exceptions.c:531
Comment 1 Pekka Kaitaniemi 2013-08-23 08:54:22 EDT
Created attachment 789586 [details]
File: backtrace
Comment 2 Pekka Kaitaniemi 2013-08-23 08:54:26 EDT
Created attachment 789587 [details]
File: cgroup
Comment 3 Pekka Kaitaniemi 2013-08-23 08:54:29 EDT
Created attachment 789588 [details]
File: core_backtrace
Comment 4 Pekka Kaitaniemi 2013-08-23 08:54:32 EDT
Created attachment 789589 [details]
File: dso_list
Comment 5 Pekka Kaitaniemi 2013-08-23 08:54:36 EDT
Created attachment 789590 [details]
File: environ
Comment 6 Pekka Kaitaniemi 2013-08-23 08:54:40 EDT
Created attachment 789591 [details]
File: exploitable
Comment 7 Pekka Kaitaniemi 2013-08-23 08:54:44 EDT
Created attachment 789592 [details]
File: limits
Comment 8 Pekka Kaitaniemi 2013-08-23 08:54:47 EDT
Created attachment 789593 [details]
File: maps
Comment 9 Pekka Kaitaniemi 2013-08-23 08:54:53 EDT
Created attachment 789594 [details]
File: open_fds
Comment 10 Pekka Kaitaniemi 2013-08-23 08:54:56 EDT
Created attachment 789595 [details]
File: proc_pid_status
Comment 11 Pekka Kaitaniemi 2013-08-23 08:54:59 EDT
Created attachment 789596 [details]
File: var_log_messages
Comment 12 Dave Anderson 2013-08-23 10:16:57 EDT
What I really would like is the vmcore.  Do you still have it?
Comment 13 Pekka Kaitaniemi 2013-08-23 10:42:44 EDT
Sure, the vmcore file can be found here:
http://ganda.lf.fi/~kaitanie/tmp/vmcore.gz
(too big to upload as an attachment to bugzilla)
Comment 14 Dave Anderson 2013-08-23 11:57:51 EDT
> Sure, the vmcore file can be found here:

OK, thanks I've got it.

I haven't figured why exactly, but it has something to do with the
memory corruption caused by the stack overflow that you can see
in the "log" command.  Or you can do a "set 9821", and then a "bt".

The crash utility is not finding the real panic task because
the kexec/kdump work in the kernel is being done from the page
just underneath the overrun stack of pic 9821.  (i.e., instead
of on a legitimate stack page)

And when the panic task is not found, crash just defaults to setting
the initial task to pid 0 on cpu 0, which at least is guaranteed
to exist.

But for some reason, a stack address from the pid 9821 is mistakenly
being used by pid 0, and when "bt" tries to unwind pid 0's stack,
it creates a bogus offset value when mathematically using the
"real" pid 0 base stack address in conjunction with the unrelated
stack address from pid 9821.  (and I'm currently trying to 
figure out why that's happening...)

Anyway, it goes without saying that the crash utility shouldn't core
dump, regardless of the contents of the vmcore.  You can ditch
the vmcore -- thanks again.
Comment 15 Dave Anderson 2013-08-23 12:17:34 EDT
> And when the panic task is not found, crash just defaults to setting
> the initial task to pid 0 on cpu 0, which at least is guaranteed
> to exist.

Except that this one is defaulting to the idle/swapper task 0 on cpu 1
instead of cpu 0, which presumably is related to the fact that the
real panic task 9821 was also running on cpu 1.
  
This is a bizarre (probably a one-time-only) dumpfile.  The investigation
continues...
Comment 16 Pekka Kaitaniemi 2013-08-23 14:15:21 EDT
The dump is related to bug 994824 which turned out to likely be caused by bug 917081.

I haven't tried to produce the dump again (at least not yet) because I found a workaround that solves my original (apparently mei kernel module and suspend/resume related) problem.
Comment 17 Dave Anderson 2013-08-23 14:29:13 EDT
I see that the system did a resume, but I'm wondering whether the crash
happened immediately upon the resume?
Comment 18 Pekka Kaitaniemi 2013-08-23 18:08:58 EDT
The system didn't crash immediately. It sort of came back up at least partially. I could see the desktop but the system wouldn't react to keyboard and mouse input. After a couple of seconds or something the system started saving the crash dump.
Comment 19 Dave Anderson 2013-08-27 10:10:20 EDT
OK thanks.

The problem is that the runqueue for cpu 1 shows the swapper task as currently active, in conflict with the per-cpu "current_task" variable, which shows pid 9821 as the active task -- all complicated by the fact that there was no evidence of the crash occurring on pid 9821's stack because it overflowed and used the page below it.
Comment 20 Dave Anderson 2013-09-04 15:32:28 EDT
 Information for build crash-7.0.2-1.fc21:
 http://koji.fedoraproject.org/koji/buildinfo?buildID=461865

Note You need to log in before you can comment on or make changes to this bug.