Description of problem: Invocation of a user daemon in a particular manner causes crash of system -- believed to be a part of the distribution kernel (see details below). Version-Release number of selected component (if applicable): Fedora release 13 (Goddard) How reproducible: Invoke the compiled fwknop deamon, then restart using specific options. I have only attempted to execute the following within a customized initrd (I did not recompile the kernel to create the customized initrd -- just copied additional configuration scripts into the tarred structure and tweaked the init script to invoke). Steps to Reproduce: 1. Download, install, and configure the fwknop-2.0.0rc1-1 source RPM (http://www.cipherdyne.org/fwknop/download/) 2. Issue following invocation: fwknopd -a /etc/fwknop/access.conf -c /etc/fwknop/fwknopd.conf -i eth0 --gpg-home-dir=/root/.gnupg -f > /dev/null 2> /dev/kmsg & 3. Wait for fwknop to handle signals: sleep 5; 4. Restart daemon in the following manner: fwknopd -a /etc/fwknop/access.conf -c /etc/fwknop/fwknopd.conf -i eth0 --gpg-home-dir=/root/.gnupg -f --restart; Actual results: The following is the error message (abridged to exclude register and stack trace information -- had to manually type this): BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1074 in_atomic(): 0, irqs_disabled(): 1, pid: 224, name: fwknopd BUG: unable to handle kernel NULL pointer dereference at 0000000000000258 IP: [<ffffffff8105c6f2>] complete_signal+0x103/0x151 PGD 22fe16067 PUD 22fc93067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/module/ip6_tables/initstate CPU 3 Modules linked in: xt_multiport ip6table_filter ip6_tables ipv6 e1000e noveau ttm drm_kms_helper drm i2c_algo_bit video ouput i2c_core Pid: 224, comm: fwknopd Not tainted 2.6.34.7-56.fc13.x86_64 #1 H57M01/DX4831 < ... register dump information ... > < ... stack trace ... > Call Trace: [<ffffffff8105c9a4>] __send_signal+0x264/0x288 [<ffffffff8105ca3e>] send_signal+0x76/0x81 [<ffffffff8105ca94>] do_send_sig_info+0x4b/0x75 [<ffffffff811cdf7f>] ? security_task_kill+0x16/0x18 [<ffffffff8105cd1f>] group_send_sig_info+0x39/0x42 [<ffffffff8105ce79>] __kill_pgrp_info+0x44/0x67 [<ffffffff8105cfcf>] sys_kill+0xea/0x16e [<ffffffff8100e0d7>] ? vfs_write+0xd3/0x10b [<ffffffff8100e1e6>] ? sys_write+0x61/0x6e [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b Expected results: <no crash> Additional info: Available upon request. I have also contacted the fwknop developer regarding the issue, but the error message above indicates an ungraceful handling of a userspace error (which the kernel should be able to handle in some manner).
Almost forgot the kernel detail: Linux localhost.localnet 2.6.34.7-56.fc13.x86_64 #1 SMP Wed Sep 15 03:36:55 UTC 2010 x86_64 GNU/Linux
_perhaps_ fa2755e20ab0c7215d99c2dc7c262e98a09b01df "INIT_TASK() should initialize ->thread_group list" can help. IFF /sbin/init doesn't change its pgid and fwknopd runs in init's pgrp. > only attempted to execute the following within a > customized initrd this is not clear to me. So, is it possible to reproduce the problem by just doing 1-4 on f13 machine or no?
(In reply to comment #2) > _perhaps_ fa2755e20ab0c7215d99c2dc7c262e98a09b01df > "INIT_TASK() should initialize ->thread_group list" > can help. I'll prepare kernel for test. Thanks Oleg for looking at this!
Here is kernel build with Oleg's patch (currently still compiling): http://koji.fedoraproject.org/koji/taskinfo?taskID=2506347
Please test if above kernel build fix problem on your system. Note these scratch builds are removed automatically after about one week.
With 2.6.34.7-56.fc13.x86_64 I get a similar, reproducible crash while mounting certain NFS exports. With the referenced test kernel I no longer get a crash on the NFS mount but later it eventually died. Here is the trace in /var/log/messages: BUG: unable to handle kernel paging request at 0000006e75725f64 IP: [<0000006e75725f64>] 0x6e75725f64 PGD 337562067 PUD 0 Oops: 0010 [#1] SMP last sysfs file: /sys/module/lockd/initstate CPU 4 Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss autofs4 fuse sunrpc 8021q garp i2c_core ioatdma serio_raw joydev dca iTCO_wdt iTCO_vendor_support raid1 [last unloaded: scsi_wait_sca Pid: 1298, comm: mdmon Not tainted 2.6.34.7-59.bz637242.fc13.x86_64 #1 X8DTT-H/X8DTT-H RIP: 0010:[<0000006e75725f64>] [<0000006e75725f64>] 0x6e75725f64 RSP: 0018:ffff880337e49e40 EFLAGS: 00010246 RAX: ffffffff8165b73c RBX: ffff8801b7b69280 RCX: ffff880337e49f58 RDX: ffff880337e49e80 RSI: ffff880337e49e80 RDI: ffff8801b7b69280 RBP: ffff880337e49eb8 R08: ffffffff811267d9 R09: 0000000000000007 R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000000400 R13: ffff880337e49f58 R14: ffff8801b7dbec00 R15: 0000000000000000 FS: 00007ffa86352700(0000) GS:ffff8801c5800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000006e75725f64 CR3: 0000000337563000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mdmon (pid: 1298, threadinfo ffff880337e48000, task ffff880337582ee0) Stack: ffffffff81126943 0000000000102073 ffff880337e49e80 ffff8801b7b692b8 <0> 0000000000000000 00007ffa8635f000 0000000000000000 ffff880336e33800 <0> 0000000000000000 ffff880337e49eb8 ffff8801b8ff2600 ffff8801b7dbec00 Call Trace: [<ffffffff81126943>] ? seq_read+0x16a/0x36b [<ffffffff81155ce6>] proc_reg_read+0x75/0x8e [<ffffffff8110e29e>] vfs_read+0xab/0x108 [<ffffffff8110e3bb>] sys_read+0x4a/0x6e [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [<0000006e75725f64>] 0x6e75725f64 RSP <ffff880337e49e40> CR2: 0000006e75725f64 ---[ end trace b127d97b815b890b ]---
Confused ;) (In reply to comment #6) > > With 2.6.34.7-56.fc13.x86_64 I get a similar, reproducible crash while > mounting certain NFS exports. No, it is not similar at all. This bug has nothing to do with the original bug report. > With the referenced test kernel I no longer get a crash > on the NFS mount What about signals? Does this patch fix the original problem with sys_kill() or not? And we still do not know how the original problem can be reproduced. Let me repeat the question from #2. > only attempted to execute the following within a > customized initrd Is it possible to reproduce the problem by just doing 1-4 on f13 machine? Or do you need a special environment (initrd/etc) to reproduce? > BUG: unable to handle kernel paging request at 0000006e75725f64 > IP: [<0000006e75725f64>] 0x6e75725f64 > ... > [<ffffffff81126943>] ? seq_read+0x16a/0x36b > [<ffffffff81155ce6>] proc_reg_read+0x75/0x8e So, this is another problem. Again, how to reproduce?
Oleg & Stanislaw - Sincerest apologies - I was on a backpacking trip for the last several weeks (hence the late reply). If the patch is still available (looks like it is), I'll try the patch as soon as I get a chance and will report back my findings (shooting for this weekend or over Thanksgiving). I'll also try the lastest pre-built kernel to see if it has been resolved there. Thanks a ton & will let you know - Will
This message is a reminder that Fedora 13 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '13'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 13's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 13 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
All, The following isssue was seen during testing: Checking dmesg for specific failures! BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1097 End of log. See testing here: https://beaker.engineering.redhat.com/recipes/208451 http://tinyurl.com/3ejwysx <-SNIP-> Checking dmesg for specific failures! BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1097 End of log. <-SNIP-> http://tinyurl.com/3qkop3s <-SNIP-> BUG: sleeping function called from invalid context at arch/x86 /mm/fault.c:1097 in_atomic(): 0, irqs_disabled(): 1, pid: 25664, name: rhts-db-submit- INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [<(null)>] (null) hardirqs last disabled at (0): [<ffffffff81069ddd>] copy_process+0x5ed/0x14d0 softirqs last enabled at (0): [<ffffffff81069ddd>] copy_process+0x5ed/0x14d0 softirqs last disabled at (0): [<(null)>] (null) Pid: 25664, comm: rhts-db-submit- Tainted: G ---------------- T 2.6.32-162.el6.x86_64.debug #1 Call Trace: [<ffffffff810a8300>] ? print_irqtrace_events+0xd0/0xe0 [<ffffffff81055a27>] ? __might_sleep+0xf7/0x130 [<ffffffff810428a4>] ? __do_page_fault+0x114/0x4e0 [<ffffffff8128455e>] ? cfq_set_request+0x8e/0x520 [<ffffffff8128455e>] ? cfq_set_request+0x8e/0x520 [<ffffffff810ac6cd>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8151441e>] ? do_page_fault+0x3e/0xa0 [<ffffffff81511525>] ? page_fault+0x25/0x30 [<ffffffff812987ec>] ? debug_object_activate+0x5c/0x160 [<ffffffff81298812>] ? debug_object_activate+0x82/0x160 [<ffffffff812987ec>] ? debug_object_activate+0x5c/0x160 [<ffffffff8107ffdf>] ? mod_timer+0xcf/0x240 [<ffffffff8126a6f2>] ? blk_plug_device+0x72/0x100 [<ffffffff8126d9a4>] ? __make_request+0x194/0x5e0 [<ffffffffa0003b2b>] ? dm_request+0x3b/0x1e0 [dm_mod] [<ffffffff8126bbb9>] ? generic_make_request+0x329/0x640 [<ffffffff811c6586>] ? bio_add_page+0x36/0x40 [<ffffffff811cb5d0>] ? do_mpage_readpage+0x310/0x5f0 [<ffffffff8126bf5d>] ? submit_bio+0x8d/0x120 [<ffffffff811cb137>] ? mpage_bio_submit+0x27/0x30 [<ffffffff811cba35>] ? mpage_readpages+0x115/0x130 [<ffffffffa0244e30>] ? ext4_get_block+0x0/0x120 [ext4] [<ffffffffa0244e30>] ? ext4_get_block+0x0/0x120 [ext4] [<ffffffff8116c30a>] ? alloc_pages_current+0xaa/0x110 [<ffffffffa02409ad>] ? ext4_readpages+0x1d/0x20 [ext4] [<ffffffff81139a90>] ? __do_page_cache_readahead+0x1d0/0x260 [<ffffffff8113996e>] ? __do_page_cache_readahead+0xae/0x260 [<ffffffff81123be0>] ? find_get_page+0x0/0x120 [<ffffffff81139b41>] ? ra_submit+0x21/0x30 [<ffffffff811245f8>] ? filemap_fault+0x4e8/0x530 [<ffffffff8114e844>] ? __do_fault+0x54/0x4f0 [<ffffffff812317a4>] ? task_has_capability+0xb4/0x110 [<ffffffff8114ed70>] ? handle_pte_fault+0x90/0xa90 [<ffffffff81152d98>] ? vma_link+0x58/0xf0 [<ffffffff815109eb>] ? _spin_unlock+0x2b/0x40 [<ffffffff8114f954>] ? handle_mm_fault+0x1e4/0x2b0 [<ffffffff810428f3>] ? __do_page_fault+0x163/0x4e0 [<ffffffff81155a2a>] ? do_mmap_pgoff+0x33a/0x380 [<ffffffff8151441e>] ? do_page_fault+0x3e/0xa0 [<ffffffff81511525>] ? page_fault+0x25/0x30 <-SNIP-> ========================================================================= Note: I was not able to reproduce this issue testing with same host. See here: [] J:101962 2.6.32-162.el6 Cthon X5 https://beaker.engineering.redhat.com/jobs/101962 ========================================================================= Best, -pbunyan =========================================================================
(In reply to comment #10) > > BUG: sleeping function called from invalid context at arch/x86 > /mm/fault.c:1097 > in_atomic(): 0, irqs_disabled(): 1, pid: 25664, name: rhts-db-submit- note this in_atomic() == 0 > [<ffffffff812987ec>] ? debug_object_activate+0x5c/0x160 > [<ffffffff81298812>] ? debug_object_activate+0x82/0x160 > [<ffffffff812987ec>] ? debug_object_activate+0x5c/0x160 > [<ffffffff8107ffdf>] ? mod_timer+0xcf/0x240 > [<ffffffff8126a6f2>] ? blk_plug_device+0x72/0x100 so according to this trace debug_object_activate() faults for some unknown reason, strange. And since in_atomic() == F do_page_fault() doesn't do bad_area() but takes mmap_sem and proceeds. And it seems fault_in_kernel_space() == F, this is strange too. And. why in_atomic() == F ??? We are holding tvec_base->lock at least. Confused.
(In reply to comment #11) > > And. why in_atomic() == F ??? We are holding tvec_base->lock > at least. Ah, probably !CONFIG_PREEMPT. > Confused. Yes.
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.