Bug 455179

Summary: SIGKILL may crash in flush_old_exec/release_task
Product: Red Hat Enterprise Linux 4 Reporter: Jan Kratochvil <jan.kratochvil>
Component: kernelAssignee: Jerome Marchand <jmarchan>
Status: CLOSED DUPLICATE QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 4.7CC: duck, dvlasenk, jmarchan, riek, roland
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-26 15:27:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 311931    
Bug Blocks: 461297    
Attachments:
Description Flags
Testcase. none

Description Jan Kratochvil 2008-07-13 14:37:45 UTC
Description of problem:
Attached testcase causes Kernel BUG crash.
It SIGKILLs a process doing execve() in a loop.

Version-Release number of selected component (if applicable):
RHEL-4.7 kernel-smp-2.6.9-78.EL.x86_64
Heuristically tested as non-crashing:
RHEL-5.2 kernel-2.6.18-92.el5.x86_64
F-9 kernel-2.6.25.9-76.fc9.x86_64
F-9 kernel-vanilla-2.6.25.6-55.vanilla.fc9.x86_64
(but no-one knows if the race isn't just less reproducible there)

How reproducible:
At most several seconds.

Steps to Reproduce:
1. gcc -o exitcrash exitcrash.c -Wall -ggdb2 -pthread -D_GNU_SOURCE 
2. ./exitcrash

Actual results:
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at signal:377
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket
pcmcia_core cpufreq_powersave loop button battery ac uhci_hcd ehci_hcd hw_random
snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore
snd_page_alloc tg3 floppy sr_mod dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
ahci libata sd_mod scsi_mod
Pid: 31269, comm: exe Not tainted 2.6.9-78.ELsmp
RIP: 0010:[<ffffffff80141f0a>] <ffffffff80141f0a>{__exit_signal+29}
RSP: 0018:0000010023895c58  EFLAGS: 00010046
RAX: 000001003d2d20d0 RBX: 0000000000000000 RCX: 0000000000000054
RDX: 000001000000c000 RSI: ffffffff8050e600 RDI: 000001003d2d2030
RBP: 000001003d2d2030 R08: 0000000000000000 R09: 00000001801ae824
R10: 0000000000000000 R11: ffffffff801ae824 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 000001002383f700
FS:  0000000000000000(0000) GS:ffffffff8050d280(005b) knlGS:00000000f7fdeba0
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000f7fdd388 CR3: 0000000000101000 CR4: 00000000000006e0
Process exe (pid: 31269, threadinfo 0000010023894000, task 00000100246457f0)
Stack: 000001003d2d2030 000001003d2d2030 000001003d2d2030 0000000000000000
       0000000000000000 ffffffff80139c21 000001000000c000 0000000000000010
       000001003d2d2030 000001003eb4dac0
Call Trace:<ffffffff80139c21>{release_task+126}
<ffffffff80185c9f>{flush_old_exec+1696}
       <ffffffff8017bbf1>{vfs_read+248} <ffffffff80130807>{load_elf32_binary+1673}
       <ffffffff801a6c26>{load_elf_binary+5452}
<ffffffff8015e3aa>{generic_file_aio_read+48}
       <ffffffff8017bacd>{do_sync_read+178} <ffffffff8013017e>{load_elf32_binary+0}
       <ffffffff80186789>{search_binary_handler+209}
<ffffffff801a3487>{compat_do_execve+398}
       <ffffffff80128757>{sys32_execve+53} <ffffffff801269cd>{ia32_ptregs_common+37}


Code: 0f 0b 8a 25 33 80 ff ff ff ff 79 01 8b 03 85 c0 75 0c 0f 0b
RIP <ffffffff80141f0a>{__exit_signal+29} RSP <0000010023895c58>
 <0>Kernel panic - not syncing: Oops

Expected results:
No crash.

Additional info:
The extra thread there may be redundant, it is derived from a ptrace-testsuite
testcase late-ptrace-may-attach-check.c.

Comment 1 Jan Kratochvil 2008-07-13 14:37:45 UTC
Created attachment 311664 [details]
Testcase.

Comment 2 Jan Kratochvil 2008-07-13 17:18:13 UTC
Threading appears to be required to crash it, Bug 311931 may need more fixes.

Kernel 2.6.9-78.ELsmp on an x86_64

RHTS Job 25225 - intel-s5000phb-01.rhts.bos.redhat.com
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at signal:377
invalid operand: 0000 [1] SMP 
CPU 5 
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket
pcmcia_core cpufreq_powersave loop button battery ac uhci_hcd ehci_hcd
i5000_edac edac_mc hw_random e1000 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 1, comm: init Not tainted 2.6.9-78.ELsmp
RIP: 0010:[<ffffffff80141f0a>] <ffffffff80141f0a>{__exit_signal+29}
RSP: 0018:000001003fb61e68  EFLAGS: 00010046
RAX: 000001003ba47890 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000007fbfffd501 RSI: 0000000000000000 RDI: 000001003ba477f0
RBP: 000001003ba477f0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 000001003ba47918 R15: 0000007fbfffd584
FS:  0000002a95562360(0000) GS:ffffffff8050d500(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000409fe028 CR3: 0000000037e12000 CR4: 00000000000006e0
Process init (pid: 1, threadinfo 000001003fb60000, task 000001000153f7f0)
Stack: 000001003ba477f0 000001003ba477f0 00000000000064fa 0000000000000000 
       0000000000000000 ffffffff80139c21 0000007fbfffd501 000001003ba477f0 
       00000000000064fa 0000000000000000 
Call Trace:<ffffffff80139c21>{release_task+126} <ffffffff8013c3f2>{do_wait+2758} 
       <ffffffff80134709>{default_wake_function+0}
<ffffffff80134709>{default_wake_function+0} 
       <ffffffff8011037f>{sysret_signal+28} <ffffffff801102f6>{system_call+126} 
       

Code: 0f 0b 8a 25 33 80 ff ff ff ff 79 01 8b 03 85 c0 75 0c 0f 0b 
RIP <ffffffff80141f0a>{__exit_signal+29} RSP <000001003fb61e68>
 <0>Kernel panic - not syncing: Oops


Comment 4 RHEL Program Management 2008-09-03 13:02:59 UTC
Updating PM score.

Comment 5 RHEL Program Management 2008-09-19 13:52:51 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Jerome Marchand 2008-10-24 13:43:02 UTC
I didn't reproduce the bug as easily as stated above. I had to adjust the timeout to a few minutes to reproduce it on x86_64, but it's still systematic. I haven't reproduce it so far on an other arch, but I keep trying. I don't think it's x86_64 specific.

Comment 7 Jerome Marchand 2008-11-12 12:14:00 UTC
I still don't know too much about why the crash happens, but a least I reproduced it on i686. The reproducibility of that bug depends a lot on the machine it runs on.

Comment 8 Jerome Marchand 2008-11-26 15:27:19 UTC
This a duplicate of 452706. It's already fixed in recent kernels.

*** This bug has been marked as a duplicate of bug 452706 ***

Comment 9 Jan Kratochvil 2008-11-26 15:37:43 UTC
Denys,
found out this testcase+Bug is forgotten to be included in the ptrace testsuite and also in the tests/kernel/syscalls/ptrace/BUGS RHEL Bugs list.