Bug 327461
Summary: | NFS crash when service nfs restart | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Raph BIOLLUZ <cougar__74> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | rawhide | CC: | chris.brown, steved, triage |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-2.6.23.12-52.fc7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-06-30 14:12:06 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Raph BIOLLUZ
2007-10-11 10:04:42 UTC
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged. Hello, I worked on this a little time ago : I compiled my kernel from sources of 2.6.23-1 It seems chrashes append less ferquently (one time between two or three weeks) I'm now trying with the last Fedora 7 kernel 2.6.23.12-52.fc7 to see if it works... Okay, setting to NEEDINFO so we know we're waiting on information from you. You should be running a fedora kernel for us to be able to troubleshoot the problem but good to know things have improved. Hmm...the messages just before the "cut here" might have been helpful. It looks like a BUG() call was hit: Actual results: in /var/log/messages Serveur kernel: ------------[ cut here ]------------ Message from syslogd@ at Fri Oct 5 11:28:40 2007 ... Serveur kernel: invalid opcode: 0000 [#1] I suspect that the problem may be one of these calls in svc_destroy: BUG_ON(!list_empty(&serv->sv_permsocks)); BUG_ON(!list_empty(&serv->sv_tempsocks)); ...is this still a problem on 2.6.23-ish kernels? Hi ! I had to force a nfs restart because of a global blocking on all my accounts (80 users). It was a week ago. This time "service nfs restart" didn't crashed my server, the server went back to his work, I didn't reboot him and it's still working. I use the last Fedora 7 kernel, the bug may have been solved. Ok. If you think this is solved, then let's go ahead and close this. If it occurs again, please collect as much of the log as you can from the crash and reopen this bug. I have some news ! After some big lag, i tried to restart some services that could hold some processor time. Just after restarting nfs --> crash !! My server crashed friday with this message : (I used the last F7 kernel 2.6.23.14-64.fc7) [root@ia74-tournette ~]# service ldap restart Arrêt de slapd : [ OK ] Vérification des fichiers de configuration pour slapd : [AVERTISSEMENT] WARNING: No dynamic config support for database ldbm. Démarrage de slapd : [ OK ] [root@ia74-tournette ~]# service cups restart Arrêt de cups : [ OK ] Démarrage de cups : [ OK ] [root@ia74-tournette ~]# service nfs restart Arrêt de NFS mountd : [ OK ] Arrêt du démon NFS : [ OK ] Arrêt des quotas NFS : [ OK ] Arrêt des services NFS : [ OK ] Démarrage des services NFS : [ OK ] Démarrage du quota NFS : [ OK ] Démarrage du démon NFS : [ OK ] Démarrage de NFS mountd : [ OK ] [root@ia74-tournette ~]# Message from syslogd@ at Fri Feb 8 16:41:26 2008 ... ia74-tournette kernel: ------------[ cut here ]------------ Message from syslogd@ at Fri Feb 8 16:41:26 2008 ... ia74-tournette kernel: invalid opcode: 0000 [#1] Message from syslogd@ at Fri Feb 8 16:41:26 2008 ... ia74-tournette kernel: SMP Message from syslogd@ at Fri Feb 8 16:41:26 2008 ... ia74-tournette kernel: CPU: 2 Message from syslogd@ at Fri Feb 8 16:41:26 2008 ... ia74-tournette kernel: EIP: 0060:[<f8a7ee07>] Not tainted VLI Message from syslogd@ at Fri Feb 8 16:41:26 2008 ... ia74-tournette kernel: EFLAGS: 00010202 (2.6.23.12-52.fc7 #1) Message from syslogd@ at Fri Feb 8 16:41:26 2008 ... ia74-tournette kernel: EIP is at svc_destroy+0xc1/0x13d [sunrpc] Message from syslogd@ at Fri Feb 8 16:41:26 2008 ... ia74-tournette kernel: eax: ed5abb9c ebx: ed5abb80 ecx: ed5abb9c edx: ed5a bb94 Message from syslogd@ at Fri Feb 8 16:41:26 2008 ... ia74-tournette kernel: esi: ed5abb94 edi: ed5abba4 ebp: 00000000 esp: d161 6fa4 Message from syslogd@ at Fri Feb 8 16:41:26 2008 ... ia74-tournette kernel: ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Message from syslogd@ at Fri Feb 8 16:41:26 2008 ... ia74-tournette kernel: Process nfsd (pid: 1542, ti=d1616000 task=f6400c20 task.t i=d1616000) Message from syslogd@ at Fri Feb 8 16:41:27 2008 ... ia74-tournette kernel: Stack: ed5abc00 f8a7eef4 00000009 00000009 2f976f78 f5160 000 f8ace88a f8ae8df5 Message from syslogd@ at Fri Feb 8 16:41:27 2008 ... ia74-tournette kernel: 00000000 fffffeff ffffffff fffffef8 ffffffff f8ace 61a 00000000 00000000 Message from syslogd@ at Fri Feb 8 16:41:27 2008 ... ia74-tournette kernel: c0405dbb f5160000 00000000 00000000 00000000 746e6 563 a9c34220 Message from syslogd@ at Fri Feb 8 16:41:27 2008 ... ia74-tournette kernel: Call Trace: Message from syslogd@ at Fri Feb 8 16:41:27 2008 ... ia74-tournette kernel: [<f8a7eef4>] svc_exit_thread+0x71/0x85 [sunrpc] Message from syslogd@ at Fri Feb 8 16:41:27 2008 ... ia74-tournette kernel: [<f8ace88a>] nfsd+0x270/0x282 [nfsd] Message from syslogd@ at Fri Feb 8 16:41:27 2008 ... ia74-tournette kernel: [<f8ace61a>] nfsd+0x0/0x282 [nfsd] Message from syslogd@ at Fri Feb 8 16:41:27 2008 ... ia74-tournette kernel: [<c0405dbb>] kernel_thread_helper+0x7/0x10 Message from syslogd@ at Fri Feb 8 16:41:27 2008 ... ia74-tournette kernel: ======================= Message from syslogd@ at Fri Feb 8 16:41:27 2008 ... ia74-tournette kernel: Code: 8b 72 08 eb 0c 89 d0 e8 38 1f 00 00 89 f2 8b 76 08 8d 4a 08 83 ee 08 8d 43 1c 39 c1 75 e7 39 4b 1c 74 04 0f 0b eb fe 39 3f 74 04 <0 f> 0b eb fe 89 d8 e8 12 75 00 00 83 7b 74 00 74 4b b8 20 51 a9 Message from syslogd@ at Fri Feb 8 16:41:27 2008 ... ia74-tournette kernel: EIP: [<f8a7ee07>] svc_destroy+0xc1/0x13d [sunrpc] SS:ESP [root@ia74-tournette ~]# Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: Oops: 0000 [#2] Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: SMP Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: CPU: 3 Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: EIP: 0060:[<f8ad1ba9>] Tainted: G D VLI Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: EFLAGS: 00010202 (2.6.23.12-52.fc7 #1) Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: EIP is at nfsd_vfs_read+0xee/0x2ed [nfsd] Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: eax: 015d8186 ebx: f8af5e00 ecx: 00000000 edx: e6703838 Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: esi: 4d560640 edi: 00000000 ebp: 00000001 esp: e561bea8 Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: Process nfsd (pid: 2197, ti=e561b000 task=d953b230 task.ti=e561b000) Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: Stack: 00000000 c047f7f4 f54f8240 f3643000 00800003 015d8186 c41bb440 f8ad1eea Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: 00000044 e561bf00 d77dd804 00008000 00000000 d77dd804 f3643000 d77dd8ec Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: f8ad2223 00000000 00000000 f364351c 00000001 d77dd8ec f54f8240 d77dd804 Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: Call Trace: Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: [<c047f7f4>] dentry_open+0x50/0x56 Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: [<f8ad1eea>] nfsd_open+0x133/0x161 [nfsd] Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: [<f8ad2223>] nfsd_read+0xbe/0xd3 [nfsd] Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: [<f8ad89eb>] nfsd3_proc_read+0x12c/0x175 [nfsd] Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: [<f8ace20d>] nfsd_dispatch+0xd3/0x1c5 [nfsd] Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: [<f8a7f8ee>] svc_process+0x3b1/0x67f [sunrpc] Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: [<f8a824ff>] svc_recv+0x326/0x393 [sunrpc] Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: [<f8ace798>] nfsd+0x17e/0x282 [nfsd] Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: [<f8ace61a>] nfsd+0x0/0x282 [nfsd] Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: [<c0405dbb>] kernel_thread_helper+0x7/0x10 Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: ======================= Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: Code: 89 c7 29 c3 c1 ef 0f 31 ed 31 df 83 e7 0f 89 f8 c1 e0 07 8d 98 00 5e af f8 8d 43 04 e8 ef ad b4 c7 89 da 31 c9 eb 1d 8b 44 24 14 <39> 46 08 75 09 8b 44 24 10 39 46 0c 74 4d 83 7e 04 00 75 02 89 Message from syslogd@ at Fri Feb 8 16:41:45 2008 ... ia74-tournette kernel: EIP: [<f8ad1ba9>] nfsd_vfs_read+0xee/0x2ed [nfsd] SS:ESP 0068:e561bea8 I suspect this problem is the same as the one described here by Neil Brown: http://lkml.org/lkml/2007/8/2/473 I've been working on converting lockd to use kthreads on upstream kernels. My plan is to eventually convert nfsd and the nfsv4 callback thread to that as well. We may be able to fix this within the context of that conversion. Do you happen to have the messages file from these oopses? Was anything logged there? I don't have any message in the logs about these crashes, I only have a message that appears in kwrited when it happens. I don't understand why kwrited is used to tell the system is crashing. When it happens, I am forced to make a hard reboot. This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists. Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs: http://docs.fedoraproject.org/release-notes/ The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping I'm pretty sure this is an upstream bug, so moving to rawhide. Pushed a RFC patchset for this upstream over the weekend. The main concern is that it would remove signaling from the shutdown codepath. Most distros do something like "killall nfsd" to take it down, so that would mean a user visible change. Neil Brown suggested that we don't do that if we can help it. If we keep the shutdown asynchronous, we'll need to fix the locking. I think this is doable, and we need to remove the BKL from that codepath anyway. Comments from Neil about this problem (so I don't lose them). His suggestion sounds reasonable and not too difficult to implement. ------------[snip]--------------- I never followed up on this did I... The core problem seems to be the principle of "The last one out turns off the lights" but once you've turned off the lights, you can't see if someone else snuck back in so you aren't the last one. You really have to have only one door and stand in the doorway while switching off the lights.... If we replace the BKL usage with a simple global semaphore, that problem might just go away. We should only need to protect svc_destroy, svc_exit_thread, and svc_set_num_threads from each other. It's long past time to discard the BLK here anyway. New patchset pushed upstream. It incorporates Neil's patchset to change the BKL usage in this codepath to a global semaphore, and also converts nfsd to the kthread API. Awaiting comments... Looks like Bruce Fields has pulled this patchset into his tree, along with a few follow-up patches to clean up small bugs that it introduced. There's also some patches flying around to clean up signal handling as well. I'm pretty confident that we'll have this problem fixed for 2.6.27... I'm going to go ahead and close this with a resolution of UPSTREAM. The patchset to fix this should make 2.6.27. |