Description of problem: When connecting to a Solaris 8 sparc server from an FC5 desktop, and automounting the user's home directory from the FC5 desktop, then NFS settings which worked on FC4 cause the FC5 kernel to panic. Version-Release number of selected component (if applicable): nfs-utils-1.0.8.rc2-4.FC5.2 How reproducible: Every time Steps to Reproduce: 1. Configure a Solaris 8 sparc machine to automount a user's home directory from an FC5 machine, using the automount NFS parameters: -vers=2,proto=udp,rsize=4096,wsize=4096 2. From the FC5 machine, get that user to logon to the Solaris 8 machine Actual results: FC5 machine kernel panics. Expected results: Successful mount, or error. Additional info: Removing the automount NFS parameters on the Solaris machine fixes this problem. But we shouldn't get a kernel panic. The parameters were probably added to the Solaris machine to get around NFS interworking problems with previous RedHat clients.
It's not just FC5. It's just been duplicated on an FC4 box where the kernel has recently been updated to 2.6.16. So suspect the kernel.
Has this been fixed in recent kernels? if not, could you please post the oops backtrace
Not fixed in recent kernels. I can't readily provide a backtrace as the fault totally hangs the kernel - guess it would have to involve a video capture of the console output.
Ok... how about either a binary tethereal or snoop trace of when this happens... something simlar to: tethereal -w /tmp/data.pcap host <client> ; bzip2 /tmp/data.pcap
Created attachment 134306 [details] bzip2'd XWD X Window Dump data Trying to duplicate using an FC5 VMWare guest OS. Attached xwd data is the result of exporting a directory from the FC5 VMWare guest (2.6.17 kernel) to a Solaris 9 x86 VMWare guest NFS mounting that directory with "-vers=2,proto=udp" NFS options. This isn't hanging the VMWare guest, but is giving it an oops. My desktop FC5 is running the SMP kernel, the VMWare guest isn't - this may be important.
Created attachment 134308 [details] bzip2'd XWD X Window Dump data Aha. The hang happens when using an SMP kernel. This is my FC5 VMWare guest again, but running on 2 processors under VMWare. The desktops I reported the bug on were dual-processor machines too. Kernel 2.6.17-1.2174_FC5smp (was 2.6.17-1.2174_FC5 in previous test)
what should I used to read those dumps? Newer and older versions of ethereal don't seem to understand that format...
Use xwud - the files are screendumps of the console showing the backtrace.
Or the gimp. You'll find the first screen dump (from the single processor test) is useless as my shell window overlapped the console window. Pah!
cool... got them... thanks!
Note: the dump in Comment #5 is blocked by an terminal window...
Created attachment 134373 [details] bzip2'd XWD X Window Dump data New dump of the oops from running the non-SMP kernel. No terminal window obscuring the backtrace this time!
Created attachment 134504 [details] Snoop capture file - version 2 (Ethernet) Added a snoop capture of the traffic between the machines. 194.217.90.103 is the FC5 machine holding the home directory for user benhaman. 194.217.90.121 is a Solaris 9 x86 machine attempting to automount that directory using NFS v2, proto=UDP. The FC5 machine is running the SMP kernel, and as a result of this traffic the kernel panics and the machine hangs. snoop captures can be read by wireshark 0.99.2, because I'm doing that here.
From then snoop trace, it appears the remote quota query that Solaris box sends is never responds to... so I'm thinking that could be the problem.. FC5 is not handling with those messages very well... to see if this is the case, kill the rpc.rquotad or edit /etc/init.d/nfs to not start rpc.rquotad and then have the Solaris machine try the mount.
(In reply to comment #14) > From then snoop trace, it appears the remote quota query that Solaris box > sends is never responds to... so I'm thinking that could be the problem.. > FC5 is not handling with those messages very well... to see if this is > the case, kill the rpc.rquotad or edit /etc/init.d/nfs to not start rpc.rquotad > and then have the Solaris machine try the mount. OK. I can try that. It's work mentioning here that if I remove the "-vers=2,proto=udp" NFS mount options from the Solaris machine, then there's no kernel oops, panics, etc. So if we use TCP for NFS it's OK, UDP for NFS is bad.
Created attachment 134528 [details] Snoop capture file - version 2 (Ethernet) No rpc.quotad running on the FC5 box. Portmapper replies OK to this effect.
Comment 16 should have stated that the kernel still panics. In case you were about to ask: ~/.bash_profile and ~/.bashrc files are the bog standard copies from /etc/skel, and /etc/bashrc is the unmodified file from the setup rpm.
Created attachment 134529 [details] Snoop capture file - version 2 (Ethernet) Just for comparison purposes, this trace is from where I commented out the "-vers=2, proto=udp" NFS mount parameters on the Solaris machine. NFS uses TCP and everything (including quotad) works OK - no kernel panic.
Created attachment 134700 [details] A patch turning off ACL support for v2 Ok... I'm pretty sure I know what the problem is... the Capture file from Comment #6 shows Solaris sending inquiries about ACL support for version 2 of the NFS protocol. Unfortunately, the server is saying yes but the answer should be no. So will just the patch work or would me to supply some test kernels? If so, which machine architectures will be needed.
I'll rebuild a kernel when I'm back from holiday
Okay.... have a nice holiday and thanks for all your help!
Created attachment 135484 [details] Snoop capture file - version 2 (Ethernet) That patch didn't help. Wireshark no longer shows the NFSACL traffic, so we know that I built the kernel OK (!), but the kernel still panics. I've also tried with rpc.quotad not running, kernel still panics.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
Created attachment 143014 [details] This patch address the issue that is causing this oops.
(this is a mass-close to kernel bugs in NEEDINFO state) As indicated previously there has been no update on the progress of this bug therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue still occurs for you and I will try to assist in its resolution. Thank you for taking the time to report the initial bug. If you believe that this bug was closed in error, please feel free to reopen this bug.