Description of problem: mount fails intermittently for NFSv4 Version-Release number of selected component (if applicable): util-linux-2.12a-16.EL4.29 How reproducible: Perform multiple mounts of a volume specifying "-t nfs4". Steps to Reproduce: 1. Run the script below (with appropriate substitutions) Actual results: NFSv4 mounting sometimes fails with the (spurious) error "mount: wrong fs type, bad option, bad superblock on <remote-filesystem>, or too many mounted file systems". This error comes and goes on successive mounts, but it will persist in failing or succeeding for a short period of time, e.g. over 30-60 seconds (though it may actually be more tied to the number of mounts, since I've seen it continue to fail for many hours when I was just doing individual mounts by hand). Expected results: NFSv4 mounts should function (i.e. fail or succeed) consistently, as NFSv3 mounts do. Additional info: When mounting a volume from a Netapp filer, RHEL 4.6 sometimes produces the error "mount: wrong fs type, bad option, bad superblock on <remote-filesystem>, or too many mounted file systems". This error comes and goes on successive mounts. I've tried destroying and recreating the Netapp volume itself while the error was occurring, but it continued to happen, so it's not a problem with the volume. Also, NFSv3 mounts of the same volume always work fine even when NFSv4 mounts are producing this error...so it's clear that it's a problem with the Redhat NFSv4 mount code. The following script demonstrates the issue: ---- 8< ----------------------------------------------------- #!/bin/sh MOUNTOPTIONS=${@:-"-t nfs4"} MOUNTS=0 FAILURES=0 OUTFILE=/tmp/mountout.$$ REMOTEVOLUME=netappfiler:/vol/test MOUNTPOINT=/var/tmp/test trap 'rm -f $OUTFILE; umount $MOUNTPOINT 2>/dev/null; echo "Exiting after $MOUNTS mounts"; exit 0' INT while true; do MOUNTS=`expr $MOUNTS + 1` if mount $MOUNTOPTIONS $REMOTEVOLUME $MOUNTPOINT/ 2>$OUTFILE; then if [ $FAILURES -gt 0 -o $MOUNTS -eq 1 ]; then echo "" echo "mount is SUCCEEDING as of `date` ($MOUNTS mounts)" fi FAILURES=0 else if [ $FAILURES -eq 0 ]; then echo "" echo "mount is failing as of `date` ($MOUNTS mounts): `cat $OUTFILE`" fi FAILURES=`expr $FAILURES + 1` fi umount $MOUNTPOINT 2>/dev/null sleep .5 done ---- 8< ----------------------------------------------------- Here's the output of the script when I ran it over a period of a few minutes with "-t nfs4": ---- 8< ----------------------------------------------------- mount is now SUCCEEDING as of Thu Feb 14 11:25:37 PST 2008 (1 mounts) mount is now failing as of Thu Feb 14 11:26:02 PST 2008 (31 mounts): mount: wrong fs type, bad option, bad superblock on netappfiler:/vol/test, or too many mounted file systems mount is now SUCCEEDING as of Thu Feb 14 11:26:34 PST 2008 (81 mounts) mount is now failing as of Thu Feb 14 11:27:40 PST 2008 (159 mounts): mount: wrong fs type, bad option, bad superblock on netappfiler:/vol/test, or too many mounted file systems mount is now SUCCEEDING as of Thu Feb 14 11:28:12 PST 2008 (209 mounts) mount is now failing as of Thu Feb 14 11:29:18 PST 2008 (287 mounts): mount: wrong fs type, bad option, bad superblock on netappfiler:/vol/test, or too many mounted file systems mount is now SUCCEEDING as of Thu Feb 14 11:29:50 PST 2008 (337 mounts) mount is now failing as of Thu Feb 14 11:30:58 PST 2008 (415 mounts): mount: wrong fs type, bad option, bad superblock on netappfiler:/vol/test, or too many mounted file systems mount is now SUCCEEDING as of Thu Feb 14 11:31:30 PST 2008 (465 mounts) mount is now failing as of Thu Feb 14 11:32:36 PST 2008 (543 mounts): mount: wrong fs type, bad option, bad superblock on netappfiler:/vol/test, or too many mounted file systems mount is now SUCCEEDING as of Thu Feb 14 11:33:09 PST 2008 (593 mounts) mount is now failing as of Thu Feb 14 11:34:13 PST 2008 (671 mounts): mount: wrong fs type, bad option, bad superblock on netappfiler:/vol/test, or too many mounted file systems ---- 8< ----------------------------------------------------- By contrast, if you run the script with "-o vers=3" the mount never fails (even while mounts with "-t nfs4" are failing). I've opened Redhat service request 180277 for this NFSv4 bug (and one other), and that case has syslog and tcpdump output for the mount issue along with a sysreport for the system in question.
One possibly important piece of info: setting aside the initial failure count, the mount switches from failing to succeeding after 50 mounts and from succeeding back to failing after 78 mounts. This is 100% consistent on two different RHEL 4.6 servers where I've run the script. You can see it in the output above, but here's more output from another system: mount is failing as of Thu Feb 14 12:22:32 PST 2008 (77 mounts) mount is SUCCEEDING as of Thu Feb 14 12:23:04 PST 2008 (127 mounts) mount is failing as of Thu Feb 14 12:24:09 PST 2008 (205 mounts) mount is SUCCEEDING as of Thu Feb 14 12:24:42 PST 2008 (255 mounts) mount is failing as of Thu Feb 14 12:25:47 PST 2008 (333 mounts) mount is SUCCEEDING as of Thu Feb 14 12:26:19 PST 2008 (383 mounts) mount is failing as of Thu Feb 14 12:27:24 PST 2008 (461 mounts) mount is SUCCEEDING as of Thu Feb 14 12:27:56 PST 2008 (511 mounts) mount is failing as of Thu Feb 14 12:29:02 PST 2008 (589 mounts) Note the deltas: 50, 78, 50, 78, 50, 78, 50, 78. I'm not sure what this means, though 50+78=128, which seems suggestive.
I've determined what's triggering the failed mounts. In the NFS SETCLIENTID call, there's an r_addr field (aka the "universal address") that contains the IP address plus the port (I assume) of the client, in ASCII. The difference between mounts that fail and mounts that work is that for the working mounts, the port number portion of r_addr has three digits in the last octet--e.g., an r_addr of "10.0.0.1.191.98" (where the last octet is 98, which is two digits) causes the filer to respond with NFS4ERR_INVAL, but an r_addr of "10.0.0.1.191.100" (where the last octet is 100, which is three digits) causes the filer to respond with NFS4_OK. This explains why the mounts fail for 50 attempts and then work for 78 attempts. The port number is always incremented by 2, so while the second octet of the port number increments from 0 to 98 the mounts fail. But once the second octet of the port number hits 100 and then as it continues to increment up to 254, the mounts work. So at this point I'm no longer sure if this is a bug in RHEL4; it may actually be a bug in the Netapp filer's handling of the r_addr field. Based on the filer's response it would appear that it's expecting each octet of the port number portion of r_addr to be zero-filled, but I can't find a reference that gives the correct format (RFC 1833 says the format is "defined by the addressing authority for the transport", with no pointer to another document). The IP address fields in r_addr aren't zero-filled, though, and that doesn't cause any problem on the successful mounts. I've opened a case with Netapp to see what they say. If you want to see tcpdumps of failed and successful mounts, that's available as part of RHN support case 1802770.
John, thanks for the nice investigative work. Quoting from RFC 3530: -------[snip]------- For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the US-ASCII string: h1.h2.h3.h4.p1.p2 The prefix, "h1.h2.h3.h4", is the standard textual form for representing an IPv4 address, which is always four octets long. Assuming big-endian ordering, h1, h2, h3, and h4, are respectively, the first through fourth octets each converted to ASCII-decimal. Assuming big-endian ordering, p1 and p2 are, respectively, the first and second octets each converted to ASCII-decimal. For example, if a host, in big-endian order, has an address of 0x0A010307 and there is a service listening on, in big endian order, port 0x020F (decimal 527), then the complete universal address is "10.1.3.7.2.15". -------[snip]------- ...given that even the example has port octets that are less than 3 digits I think this is almost surely a server-side problem.
Actually I was mistaken. The correlation of a mount failure to a final octet less than 3 digits long does hold, except in one case: immediately after a reboot. Right after a reboot, mount requests with a 3-digit final octet may fail and mount requests with a final octet < 3 digits may succeed. This continues for one cycle, at which point the pattern reverts to the familiar sequence I've already described. I did find that after this initial anomalous cycle I could consistently make the mounts succeed or fail by manipulating the length of the port number's final octet using /proc/sys/net/ipv4/ip_local_port_range. So at that point the pattern holds. This would seem to imply that the correlation I found is only a secondary correlation--not the core issue. Which makes it less clear (again) whether this is a bug in RHEL4 or a bug on the Netapp filer. Also, the RHN support tech and I have verified that 1) Mac OS X has no similar problems mounting the same volume from the same Netapp filer with NFS4, and 2) RHEL4 has no similar problems mounting an NFS4-based share from a RHEL5 NFS server. So this is an interoperability issue between RHEL4 and Netapp's ONTAP 7.2.3, and it's hard to say where the bug is. You can see more details of this in RHN support case 1802770, including a tcpdump of the after-reboot behavior.
I've been able to reproduce this here as well on one of our lab netapps. I've inspected some traces with the setclientid packet and don't see anything wrong with it... > 1) Mac OS X has no similar problems mounting the same volume from the same > Netapp filer with NFS4 Does OSX do a setclientid call at mount time? > 2) RHEL4 has no similar problems mounting an NFS4-based share from a RHEL5 NFS > server. Which would seem to imply that RHEL4 is generally sending valid SETCLIENTID calls and that the filer is occasionally rejecting them.
Bikash - any comments for your side?
I *think* the customer has opened a case with Netapp already and the case number is 2855584. Please correct this if that's not right.
Created attachment 297489 [details] binary capture of failed RHEL4 mount attempt against NetApp filer Binary capture of mount attempt against NetApp filer.
From all the errors that we have, it looks like we may have fixed this problem on DATA ONTAP 7.3. But I have asked the case owner to escalate this to Engg. to get the confirmation to this behavior.
Yes, Mac OS X does send a SETCLIENTID call at mount time, but it doesn't fill in r_netid or r_addr (which is apparently how Solaris operates as well, according to the Redhat support engineer). And yes, 2855584 is the Netapp case number.
Ok. Since it sounds like this is really a Netapp bug, I'm going to set this to NEEDINFO. Please update the case after you've upgraded the filer and retested.
I think it's still Bikash who needs to provide the info, since there's nothing for me to upgrade to or test at this point (and there may not be for a while if this really is a Netapp bug and it's really fixed in ONTAP 7.3, which isn't available yet).
John: would you be opposed to us closing this bug, but you can re-open it if isn't fixed in ONTAP 7.3 GA and is indeed a confirmed RHEL issue?
Bikash had speculated that this may be a Netapp bug that may be fixed in ONTAP 7.3, but it was only speculation--there's been no confirmation yet (either here or on my Netapp support case). So until Bikash or someone else from Netapp confirms that Netapp considers this a bug in ONTAP, I'd prefer to keep the case open.
John - That's fine, but we are running out of runway for kernel fixes for 4.7 Beta. Bikash - Any testing/verification of this soon would be great to know for sure.
John - I would push or escalate your NetApp case if you can to find out for sure, due to the tight time left in RHEL 4.7. I don't know how the Beta ONTAP process is, but ideally it would be great if you could give some sort of ONTAP 7.3 Beta a shot. We really need to know for sure if this is fixed in ONTAP before we proceed. Hope you understand our situation.
DATA ONTAP 7.3RC1 is currently available on now.netapp.com for dpwnload. I would suggest if you could test with that release and check the behavior changes.
Andrius: I've been pushing Netapp very hard, with no luck. I'd assume Redhat has better channels than I do. If/when Netapp takes ownership of the bug, you can feel free to close this bug (as far as I'm concerned). Bikash: I don't have anywhere I can test this--the filers I've been testing on thus far are all production filers. Maybe Redhat does? Regardless, it would seem to me that Netapp should be able to say definitively at this point whether or not this is a bug in ONTAP.
John - Yikes, I feel your pain. I know we *just* received some spanking new filers courtesy of NetApp (thanks!) but they were loaded with some version pre-release DATA ONTAP 7.3 already. I'm not sure if we run the reproducer it will accurately portray a success/failure due to differences in the environment that you have. I'm no expert here. That and unfortunately our filers require service calls due to a bad drive as well "monitor.shelf.fault:CRITICAL" nastiness that requires replacing the modules that go behind the shelf. I'm not sure when we'll be back up and running. Bikash - do you have any ideas on how we can say for sure if this may be fixed without disturbing their production environment?
I have this sneaking suspicion that Netapp has plenty of Netapp filers. So they could do the testing themselves. Bikash? I'll ping the owner of my Netapp support case as well and ask him to test it.
John - OK, keep us posted if you find out anything about this issue - our boxes are used specifically for engineering activities (testing RHEL 5.2 and 4.7 currently), rather than support activities. Leaving assigned to Bikash to see if any headway can be made on the NetApp support case on their end.
John, I would appreciate if you could call the support and ask for escalating the case with Engg. I have also done so at my end to expedite the progress on this case. Some should be in touch with you later today. Either way, please call the support line and talk to some for an update or an escalation. Bikash
John, I am Sam Li Escalation engineer with netapp support I have taken ownership of the case and get things moving. Sincere apologies on behalf of netapp support.
No problem, Sam--the urgency is (apparently) on Redhat's side, not mine. We're not even using NSFv4 at the moment; I just happened to discover this issue while testing Kerberos with NFSv4 a while back, and thought I should report it.
Hi - the only urgency on our part is to either identify a root cause on the Red Hat side (since this was reported in our bug reporting system) or close if it isn't. Please keep us posted, thanks!
FYI: Sam Li of Netapp informed me that based on testing he's performed, the bug still exists between RHEL4 and ONTAP 7.3RC1. So the speculation that it's fixed in ONTAP 7.3 is apparently incorrect, and the root cause still isn't clear.
A root cause still has not been determined - deferring to RHEL 4.8.
Netapp has opened a bug on their side for this (bug number 290200), and they say that the fix is targeted for ONTAP 7.3.1. Which would appear to mean it's their issue, not Redhat's, in which case this bug could be closed.
John, this is great to hear. Last I heard we thought it wasn't something with ONTAP. Please do not hesitate to reopen this issue if needed.
Andrius, As John mentioned the fix will be targeted for ontap 7.3.1. The problem only exhibits on linux clients and does not affect Solaris clients.