Bug 166701
Summary: | Kernel panic with NFS traffic being moved between nodes using service IP address | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Henry Harris <henry.harris> | ||||||
Component: | kernel | Assignee: | Steve Dickson <steved> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4.0 | CC: | axel.thimm, bmarzins, jbrassow, kanderso, lhh | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHBA-2005-740 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2005-10-07 16:57:22 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 132823, 167257 | ||||||||
Attachments: |
|
Description
Henry Harris
2005-08-24 18:59:13 UTC
adding ben marzinski to look at the problem Created attachment 118092 [details]
script to move service ip address
Use your service name and node name
Forgot to mention that this problem is occuring on a dual processor Opteron server. Also, the more NFS traffic and the more often the service ip address for the mount is moved, the quicker the problem seems to occur. The attached script caused the failure to occur within one hour last night. In that particuloar instance, the GFS file system that was being exported was also being accessed locally by other software running on the cluster nodes. We have just discovered that the PCI slot where our quad GigE adapter is located is hardwired to CPU 1. This means this could very well be a multiprocessor issue if CPU 0 is moving the IP addresses while network traffic is being handled by CPU 1. Given the fact that this happens in seemingly random points in the kernel, it looks like something subtle. It appears that I jumped to gun when I told people that I had an explanation for the spinlock bugs. It appears that GFS is also compile with spinlock debugging enabled. Here is the output from lsmod: Module Size Used by nfsd 267104 9 exportfs 8192 1 nfsd lockd 78896 2 nfsd lock_dlm 45684 2 gfs 320652 2 lock_harness 6960 2 lock_dlm,gfs autofs4 24072 0 i2c_dev 14208 0 i2c_core 29184 1 i2c_dev dlm 129796 9 lock_dlm cman 136224 19 lock_dlm,dlm md5 6272 1 ipv6 283104 31 sunrpc 171128 19 nfsd,lockd button 9504 0 battery 11656 0 ac 7176 0 ohci_hcd 24976 0 tg3 89476 0 e1000 96228 0 bonding 64436 0 floppy 66512 0 sg 43320 0 ext3 137488 4 jbd 68784 1 ext3 dm_mod 65984 3 qla2300 124032 0 qla2xxx 122080 3 qla2300 scsi_transport_fc 11136 1 qla2xxx mptscsih 37808 0 mptbase 50848 1 mptscsih sd_mod 19328 8 scsi_mod 140240 5 sg,qla2xxx,scsi_transport_fc,mptscsih,sd_mod What are the exact iozone cmdlines that you are using? I've been just guessing and using the defaults in a loop. Created attachment 118265 [details]
Script to run iozone
Here is the script that runs iozone.
Looking back through the /var/log/messages files, I see that in every case I looked at (4 or 5) the node that fails is the one that the IP service has been moved to. This is also consistant with what we were seeing when the failure occured in the field every 2 to 4 days. I'm almost positive I know what is causing the panic in gfs_create. It is a problem in GFS/NFS interaction. Once we changed our tests to stress this interaction, we were able to reproduce that kernel panic. I am currently working a test GFS rpm, to see if it fixes these kernel panics. Unfortunately, this doesn't look like it has anything to do with the other panics. But, there's always hope. Here's an explanation of the problem, and the workaround in the modified gfs module. When the VFS layer calls a filesystem specific create function, it passes down intent data. This tells the filesystem information like whether or not this is an exclusive create request (the file was opened with O_CREAT | O_EXCL). Before the kernel nfs daemon calls a filesystem specific create function, it checks if the file exists. If it does, nfsd never passes the request to the underlying filesystem. Because the file doesn't exists, it doesn't matter whether or not the create is exclusive, so nfsd doesn't pass the intent datat to the underlying filesystem. This works fine for local filesystems. But for cluster filesystems, the file could be getting created on another node after nfs checks for it's existance. GFS is cannot reliably check whether a file exists until it locks the directory. The panic in gfs_create was happening because nfs passed down a create request with no intent information. When GFS checked, the file already existed, so GFS checked the intent information, but found a NULL pointer. Since there is no way to get NFS to pass the intent information for this release, I made GFS assume that the create was not exclusive if it gets into this situation. Posted attachments to bug #163168 showing strace for clustat hang and clusvcadm hang that occurred while running tests described in this bug. Just to log this, I compiled the RHEL4 U2 gfs code (Including the nfs creation fix from above) against the current crosswalk kernel, and as far as I know, the systems have been running this code for days without problems. Is this correct? An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-740.html *** Bug 169301 has been marked as a duplicate of this bug. *** |