Bug 591285

Summary: System is crashing at [<ffffffff80046a1a>] try_to_wake_up+0x414/0x481
Product: Red Hat Enterprise Linux 5 Reporter: Flavio Leitner <fleitner>
Component: kernelAssignee: Dave Anderson <anderson>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.4CC: esammons, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-16 14:25:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Flavio Leitner 2010-05-11 19:43:46 UTC
Description of problem:
The system crash twice in the same point with apparently the same symptoms.

---- oops ----
Unable to handle kernel paging request at ffff8108bfe98128 RIP:
[<ffffffff80046a1a>] try_to_wake_up+0x414/0x481
PGD 8063 PUD 0
Oops: 0000 [1] SMP
last sysfs file: /devices/pci0000:29/0000:29:00.0/0000:2a:00.1/irq
CPU 22
Modules linked in: lock_dlm gfs2 dlm configfs bonding mptctl dm_round_robin dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport st shpchp qla2xxx ide_cd sg e1000e cdrom bnx2 pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci aacraid ata_piix libata mptsas scsi_transport_sas mptspi scsi_transport_spi mptfc mptscsih scsi_transport_fc mptbase cciss megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 7308, comm: sshg3 Not tainted 2.6.18-128.1.6.el5 #1
RIP: 0010:[<ffffffff80046a1a>]  [<ffffffff80046a1a>] try_to_wake_up+0x414/0x481
RSP: 0018:ffff8103b9dd1b78  EFLAGS: 00010046
RAX: 0000000000000080 RBX: 00016784bc1bbe3d RCX: 00000000000003e8
RDX: ffff810009082728 RSI: ffff810009082728 RDI: ffff8108bfe98100
RBP: ffff8103b9dd1c08 R08: 0000000000000080 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000016
R13: ffff810563cc80c0 R14: ffff810009082b00 R15: ffff810009081680
FS:  0000000040486940(0063) GS:ffff81083f891e40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8108bfe98128 CR3: 000000034350d000 CR4: 00000000000006e0
Process sshg3 (pid: 7308, threadinfo ffff8103b9dd0000, task ffff8102a058d080)
Stack:  0000000000000000 0000000000000001 0000000f00000004 0000000000000001
000000163f822240 00016784bc16ccec 000004d000000000 ffff81011ce5c440
0000000000000246 00000000000004d0 0000000000001000 0000000000000096
Call Trace:
[<ffffffff800888ec>] __wake_up_common+0x3e/0x68
[<ffffffff8002df7c>] __wake_up+0x38/0x4f
[<ffffffff80012bda>] sock_def_readable+0x34/0x5f
[<ffffffff8004a17e>] unix_stream_sendmsg+0x281/0x346
[<ffffffff800376a3>] do_sock_write+0xc4/0xce
[<ffffffff8004719e>] sock_aio_write+0x4f/0x5e
[<ffffffff80017d2d>] do_sync_write+0xc7/0x104
[<ffffffff8009dba8>] autoremove_wake_function+0x0/0x2e
[<ffffffff800165b1>] vfs_write+0xe1/0x174
[<ffffffff80016e6b>] sys_write+0x45/0x6e
[<ffffffff8005d116>] system_call+0x7e/0x83


Code: 8b 47 28 41 39 45 28 7d 0c e8 ab 34 04 00 eb 05 45 31 e4 eb
RIP  [<ffffffff80046a1a>] try_to_wake_up+0x414/0x481
RSP <ffff8103b9dd1b78>

Version-Release number of selected component (if applicable):
2.6.18-164.6.1.el5

How reproducible:
a couple times so far, unpredictable.

Steps to Reproduce:
Unknown.

Comment 1 Flavio Leitner 2010-05-11 19:51:48 UTC
First core is at megatron.gsslab.rdu.redhat.com:/cores/20100414134034

These are my findings in the first vmcore:
Crash point:
RIP  [<ffffffff80046a1a>] try_to_wake_up+0x414/0x481

Disass:
/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/kernel/sched.c: 1579
 0xffffffff80046a16 <try_to_wake_up+0x410>:      mov    0x50(%r15),%rdi
>0xffffffff80046a1a <try_to_wake_up+0x414>:      mov    0x28(%rdi),%eax
 0xffffffff80046a1d <try_to_wake_up+0x417>:      cmp    %eax,0x28(%r13)
 0xffffffff80046a21 <try_to_wake_up+0x41b>:      jge    0xffffffff80046a2f 
    <try_to_wake_up+0x429>
    /usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/kernel/sched.c: 1580

C code:
 1578         if (!sync || cpu != this_cpu) {
>1579                 if (TASK_PREEMPTS_CURR(p, rq))
 1580                         resched_task(rq->curr);
 1581         }

162 #define TASK_PREEMPTS_CURR(p, rq) \
163         ((p)->prio < (rq)->curr->prio)
                         ^^^^^^^^^^^^^^^^^
struct rq {
   [0x50] struct task_struct *curr;
...
struct task_struct {
  [0x28] int prio;
...

working back on line 'mov    0x28(%rdi),%eax'
RDI: ffff8108bfe98100
crash> task_struct ffff8108bfe98100
struct task_struct struct: seek error: kernel virtual address: ffff8108bfe98100  type: "gdb_readmem_callback"
Cannot access memory at address 0xffff8108bfe98100

it's invalid, so working back on line: 'mov    0x50(%r15),%rdi'
R15: ffff810009081680
crash> struct rq.curr ffff810009081680
 curr = 0xffff81083fe98100,

Well, curr = 0xffff81083fe98100 is different from RDI: ffff8108bfe98100
and it has just one bit wrong:
crash> eval 0xffff81083fe98100 ^ 0xffff8108bfe98100
hexadecimal: 80000000  (2GB)
   decimal: 2147483648  
     octal: 20000000000
    binary: 0000000000000000000000000000000010000000000000000000000000000000

Ok, so what would happen if we use the correct value based on ->curr:
crash> task_struct.prio 0xffff81083fe98100
 prio = 0x8c,

The other task (p) is on %r13: 0xffff810563cc80c0 and it is correct:
crash> task_struct.prio ffff810563cc80c0
 prio = 0x73,
crash> task_struct.comm ffff810563cc80c0
 comm = "ssh-broker-cli\000",
crash> task_struct.pid ffff810563cc80c0
 pid = 0x6641, <--- 26177d

What happened? The CPU was executing these two lines:
 0xffffffff80046a16 <try_to_wake_up+0x410>:      mov    0x50(%r15),%rdi
>0xffffffff80046a1a <try_to_wake_up+0x414>:      mov    0x28(%rdi),%eax
In the first line it was trying to access (struct rq *)(%r15)->curr which
is a (task_struct *) pointer. This pointer is stored in %rdi register.
Later, in the second line, the CPU tries to get the (task_struct *)->prio
value, but the task_struct pointer was invalid.

Interesting that the pointer at %r15 (struct rq) is correct and its has
->curr pointing to 0xffff81083fe98100 which didn't match with the one
used on the second line. They differ by one bit only.

The other tasks running seems unrelated to my eyes.

crash> ps | grep ffff8108bfe98100 <--- bad pointer
crash> ps | grep ffff81083fe98100
>     0      1  15  ffff81083fe98100  RU   0.0       0      0  [swapper]
crash>

so if the bad pointer was a real task_struct before, it has exited.

The struct rq lock is correctly holded.

Searching the entire image (partial dump) didn't find any other
occurrence of the bad pointer besides in the crashed task.

This is the slab cache for the task_struct
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
ffff81011ce87280 task_struct             1888       1731      1840    920     4k
and it didn't include references to that pointer either.

So, either the partial dump is complicating things and I can't find any
record of that task at all or this could be a hardware problem as that
address in the memory is okay and in the register it has one bit flipped.

fbl

Comment 2 Flavio Leitner 2010-05-11 20:00:24 UTC
Second vmcore available at megatron.gsslab.rdu.redhat.com:/cores/20100510101329

Same crash point, same findings. However, with another CPU this time.
The same bit flipped.
crash> struct rq.curr ffff8100090a3fa0
 curr = 0xffff81083ffd8100,
crash> eval 0xffff81083ffd8100 ^ 0xffff8108bffd8100
hexadecimal: 80000000  (2GB)
   decimal: 2147483648  
     octal: 20000000000
    binary: 0000000000000000000000000000000010000000000000000000000000000000


crash> ps |grep ffff8108bffd8100
crash> ps |grep ffff81083ffd8100
>     0      1  19  ffff81083ffd8100  RU   0.0       0      0  [swapper]
crash> 

The system was fully verified and no errors were found.

Another thing called my attention is the fact that the bad pointer is on
the stack trace of the crashed task:
#3 [ffff81015061bbb0] error_exit at ffffffff8005dde9
   [exception RIP: try_to_wake_up+0x414]
   RIP: ffffffff80047005  RSP: ffff81015061bc68  RFLAGS: 00010046
   RAX: 0000000000000080  RBX: 0000bec5fcd7dc6c  RCX: 00000000000003e8
   RDX: ffff8100090a5058  RSI: ffff8100090a5058  RDI: ffff8108bffd8100
   RBP: ffff81015061bcf8   R8: 0000000000000080   R9: 0000000000000001
   R10: 0000000000000000  R11: 0000000000000080  R12: 000000000000000e
   R13: ffff810194214080  R14: ffff8100090a5420  R15: ffff8100090a3fa0
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   ffff81015061bbb8: ffff8100090a3fa0 ffff8100090a5420
   ffff81015061bbc8: ffff810194214080 000000000000000e
   ffff81015061bbd8: ffff81015061bcf8 0000bec5fcd7dc6c
   ffff81015061bbe8: 0000000000000080 0000000000000000
   ffff81015061bbf8: 0000000000000001 0000000000000080
   ffff81015061bc08: 0000000000000080 00000000000003e8
   ffff81015061bc18: ffff8100090a5058 ffff8100090a5058
   ffff81015061bc28: ffff8108bffd8100 ffffffffffffffff
                     ^^^^^^^^^^^^^^^^

That indicates the pointer was real another time and due to the
slabcache alignment, it could be possible to have address with a
difference of just a single bit, so I said that it didn't look
like a HW problem anymore. However, after that pointer in the stack
there is pointer to the crashed instruction and then all registers,
so I guess those were the crash context saved for kdump or something.

I need some help from engineering to find the root cause.

fbl