Bug 175629
Summary: | rpc.nfsd kernel panic after starting HA filesystem services | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Corey Marthaler <cmarthal> |
Component: | kernel | Assignee: | Ric Wheeler <rwheeler> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 4.0 | CC: | jbaron, lhh, steved |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-20 13:23:21 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 176344 |
Description
Corey Marthaler
2005-12-13 15:53:54 UTC
Bumping the priority as QA is hitting this bug regularly during rgmanager/NFS testing. Just hit and filed a similar bz (if not a dup to this one) 190401. Any update from devel on this issue? *** Bug 190401 has been marked as a duplicate of this bug. *** It appears the nfsd_serv pointer is becoming null while its still in use... What exactly do the "derringer" test do? Basically all that derringer was doing was relocating the HA GFS services from one machine to another (clusvcadm -r $servicename -m $newserviceowner). This was happening all the while there was I/O going from the NFS clients to those filesystems. Lately however, we have seen these panics when just starting the HA GFS services, that is we get a valid cluster up, mount some GFS filesystems, then do a 'service rgmamager start' which starts clurgmgrd (which I believe then takes care of all the NFS and exportfs stuff). I'm currently attempting to reproduce this with just ext filesystems. GFS is not required for this issue, I was able to recreate this using just ext3 filesystems. link-01: May 9 08:29:36 taft-01 clurgmgrd[25695]: <notice> Resource Group Manager Starting May 9 08:29:36 taft-01 clurgmgrd[25695]: <info> Loading Service Data May 9 08:29:36 taft-01 rgmanager: clurgmgrd startup succeeded May 9 08:29:36 taft-01 clurgmgrd[25695]: <info> Initializing Services May 9 08:29:36 taft-01 clurgmgrd: [25695]: <info> Removing export: *:/mnt/taft0 May 9 08:29:36 taft-01 clurgmgrd: [25695]: <info> Removing export: *:/mnt/taft1 May 9 08:29:36 taft-01 kernel: Installing knfsd (copyright (C) 1996 okir.de). May 9 08:29:36 taft-01 clurgmgrd: [25695]: <err> NFS daemon nfsd is not running. May 9 08:29:36 taft-01 clurgmgrd: [25695]: <err> NFS daemon nfsd is not running. May 9 08:29:36 taft-01 clurgmgrd: [25695]: <err> Verify that the NFS service run level script is enable May 9 08:29:36 taft-01 clurgmgrd: [25695]: <err> Verify that the NFS service run level script is enable May 9 08:29:36 taft-01 clurgmgrd: [25695]: <err> Restarting NFS daemons May 9 08:29:36 taft-01 clurgmgrd: [25695]: <err> Restarting NFS daemons May 9 08:29:36 taft-01 rpc.statd[2646]: Caught signal 15, un-registering and exiting. May 9 08:29:36 taft-01 nfslock: rpc.statd shutdown succeeded May 9 08:29:36 taft-01 nfslock: rpc.statd shutdown succeeded May 9 08:29:36 taft-01 rpc.statd[25972]: Version 1.0.6 Starting May 9 08:29:36 taft-01 rpc.statd[25973]: Version 1.0.6 Starting May 9 08:29:36 taft-01 rpc.statd[25972]: unable to register (statd, 1, udp). May 9 08:29:36 taft-01 nfslock: rpc.statd startup succeeded May 9 08:29:36 taft-01 nfslock: rpc.statd startup succeeded Unable to handle kernel NULL pointer dereference at 0000000000000038 RIP: <ffffffffa02d11d8>{:nfsd:nfsd_svc+454} [...] Steve, I bet the sock->sk socket has been shutdowned (inet_shutdown) where the socket got released (release_sock). The address taking over must happen sometime between sock->ops->listen() and svc_setup_socket() within svc_create_socket(). How to fix this ? No idea at this moment. This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release. Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue. |