Bug 624254
Summary: | NFS mount/umount fails if run in rapid successions | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Igor Lvovsky <ilvovsky> | ||||
Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||
Status: | CLOSED WONTFIX | QA Contact: | yanfu,wang <yanwang> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 6.0 | CC: | abaron, bazulay, cpelland, cplisko, iheim, ikent, ilvovsky, jkurik, lpeer, rwheeler | ||||
Target Milestone: | rc | Keywords: | RHELNAK, TestBlocker | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2010-11-10 20:34:56 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 624265 | ||||||
Attachments: |
|
Description
Igor Lvovsky
2010-08-15 08:49:02 UTC
This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** Thank you for your bug report. This issue was evaluated for inclusion in the current release of Red Hat Enterprise Linux. Unfortunately, we are unable to address this request in the current release. Because we are in the final stage of Red Hat Enterprise Linux 6 development, only significant, release-blocking issues involving serious regressions and data corruption can be considered. If you believe this issue meets the release blocking criteria as defined and communicated to you by your Red Hat Support representative, please ask your representative to file this issue as a blocker for the current release. Otherwise, ask that it be evaluated for inclusion in the next minor release of Red Hat Enterprise Linux. Please remove the 'soft' mount option and rerun the test. I removed 'soft' mount option and got the same failure [root@white-vdsd x86_64]# ~/stressmount.sh mount.nfs: mount system call failed ERROR 32 Iteration 196 Thank you for trying that.... The problem is the client is exhausting all of the reserve port used during the mounts. Reserve ports are network ports, used with TCP connections, that are less 1024 and only root is allow to bind to them. They are used as a somewhat outdated security "feature"... The problem lies when the TCP connection is close. The connection goes into a state call TIME_WAIT for a minute or so which in turn ties up that port for a minute or so... Being there is only finite number of ports, it does not take long to run out of them when mounts are done so quickly... This is a very common problem across all OSes... With RHEL6, there are a couple work arounds: 1) Use -o noresvport mount option.This assumes the server will allow mounts from non-reserve ports, which is generally not the default. With Linux server the 'insecure' export option needs to be specified to allow non-secure mounts 2) Use UDP as the network transport since it does not tie up the port after the connection is closed... BUT... BUYER BEWARE.... UDP is a far inferior network protocol than TCP especially on a busy network that contain routers. The reason being is TCP know hows to smartly retransmit lost packages and do flow control on busy networks. UDP does not... With UDP, packets are continently blasted out on the network until an acknowledgement is received. Basically making a busy network even worse. So be very very very careful if you decide to go the this route... Actually I would advise against it.. The only reason I mention it is just wanted you to know (and hopefully understand) all the options... So at the end of the day this is not a bug, its just a known limitations with NFS mounts. Sounds like a release note candidate and something to close as "NOTABUG"? How can we verify that this is indeed the problem we hit? the attached script is just something we wrote thinking it reproduces the problem, but in our tests, we do not perform that many connects and disconnects before we hit the fail state. Also, on RHEL 5.5, our tests worked fine. Also, this means that working in the "Standard way" limits us to less than 200 concurrent mounts, right? so if we need more, we will have to enable the "unsecure" option on the NFS server (assuming linux) and update our mount options appropriately. Would this default to try and use a port under 1024 if available? Also, assuming we did not cross this limit, we should probably check to see if we have NFS ports in this state and if so sleep and retry. Steve, what do you think? any suggestions? > How can we verify that this is indeed the problem we hit? I ran the tests like this # sh /tmp/stressmount.sh ; netstat -na | grep TIME_WAIT | wc -l mount.nfs: mount system call failed ERROR 32 Iteration 358 1435 ^^^^ is the number of connections that are in TIME_WAIT I also wrote a systemtap probe that monitored one of the kernel routines (xs_bind4()) that does the socket binding and was failing with EADDRINUSE which means we ran out of sockets... > Also, on RHEL 5.5, our tests worked fine hmm... I'm a bit surprised at this since the your test scrip failed in the same way when I ran a quick test. > Also, this means that working in the "Standard way" limits us to less than 200 > concurrent mounts, right? No. The limitation comes in to play with simultaneous mounts, a bunch of mounts all at the same time. I'm not sure what the maximum number of concurrent, but pretty sure its in the thousands... I would assume it has something to do with the amount of memory that's available. > Would this default to try and use a port under 1024 if available? If I'm understanding the question, No. If you uses the -o noresvport option the default behaviour will be to use an non-reserve port. > Also, assuming we did not cross this limit, we should probably check to see if > we have NFS ports in this state and if so sleep and retry. Steve, what do you > think? any suggestions? I need more context as to what you are trying to doing... (In reply to comment #13) > > How can we verify that this is indeed the problem we hit? > I ran the tests like this > # sh /tmp/stressmount.sh ; netstat -na | grep TIME_WAIT | wc -l > mount.nfs: mount system call failed > ERROR 32 > Iteration 358 > 1435 > ^^^^ is the number of connections that are in TIME_WAIT > > I also wrote a systemtap probe that monitored one of the > kernel routines (xs_bind4()) that does the socket binding and > was failing with EADDRINUSE which means we ran out > of sockets... Please attach the probe so we can test whether this is really what is hitting us. By your comment about RHEL5.5 I'm not sure it is. > > > Also, on RHEL 5.5, our tests worked fine > hmm... I'm a bit surprised at this since the your > test scrip failed in the same way when I ran a > quick test. As I said, the attached script is not what we run, we have a python test suite with many NFS scripts which at some point simply starts to fail on the above error. We wanted to test whether this is due to the mount/umount actions running and wrote above script. It may be that we are looking at two different problems here. > > > Also, this means that working in the "Standard way" limits us to less than 200 > > concurrent mounts, right? > No. The limitation comes in to play with simultaneous mounts, a bunch > of mounts all at the same time. I'm not sure what the maximum number > of concurrent, but pretty sure its in the thousands... I would assume > it has something to do with the amount of memory that's available. Ok, good to know. > > > Would this default to try and use a port under 1024 if available? > If I'm understanding the question, No. If you uses the -o noresvport > option the default behaviour will be to use an non-reserve port. Ok, so we would have to set this param per connection, thanks. > > > Also, assuming we did not cross this limit, we should probably check to see if > > we have NFS ports in this state and if so sleep and retry. Steve, what do you think? any suggestions? > I need more context as to what you are trying to doing... We are running in a dynamic environment where according to which VMs are running on the system we need to mount different mounts. The number of mounts required depends on the number of VMs running, their placement on the storage (obviously we can have many VMs on same mount). Also, we use NFS mounts to store ISO images (which are exposed to VM instead of CDs). In a cloud environment, the numbers can grow rapidly and VMs can migrate between hosts. If one host running 100 VMs is moved into "maintenance" mode, all its VMs would be migrated to different hosts (1 or more). Which means we might face a "mount storm". IIUC in this case we would need to take into account the possibility that there are no available sockets (but there will be shortly) in which case we would want to wait and then retry. Before waiting though, we should make sure mount failed due to this reason and not something else (where retrying is pointless). > Please attach the probe so we can test whether this is really what is hitting > us Its pretty simple... probe module("sunrpc").function("xs_bind4").return { if ($return) printf("xs_bind4: %d %s\n", $return, errno_str($return)); } probe begin { log("starting xprt probe") } probe end { log("ending xprt probe") } > As I said, the attached script is not what we run, we have a python test suite > with many NFS scripts which at some point simply starts to fail on the above > error. We wanted to test whether this is due to the mount/umount actions > running and wrote above script. It may be that we are looking at two different > problems here. We might be... run the above probe and see if it fails with a 'xs_bind4: -98 EADDRINUSE' I'm CC-ing Ian Kent our autofs maintainer... Autofs has a similar need of doing a large number of mounts during system start.. We call them 'Mount storms" Maybe Ian can shed some light on how autofs deals with running out of ports... (In reply to comment #15) > > > As I said, the attached script is not what we run, we have a python test suite > > with many NFS scripts which at some point simply starts to fail on the above > > error. We wanted to test whether this is due to the mount/umount actions > > running and wrote above script. It may be that we are looking at two different > > problems here. > We might be... run the above probe and see if it fails with a > 'xs_bind4: -98 EADDRINUSE' > > I'm CC-ing Ian Kent our autofs maintainer... Autofs has > a similar need of doing a large number of mounts during > system start.. We call them 'Mount storms" Maybe Ian > can shed some light on how autofs deals with running out > of ports... Yeah, we have the same problem with rapid mounting with autofs. I did a lot of work to minimize the number of ports I use by re-use in the RPC code that I wrote for autofs. At one time I also used a code pattern that eliminates the TIME_WAIT state when closing down a connection, but only for connections that had a high likely hood of not having so called lost duplicates. Quite illegal for a TCP protocol standpoint and potentially dangerous if there are lost duplicates (avoiding these interfering with a subsequent connection is what the TIME_WAIT state is for) still in transit. I also seem to remember that the number of reserved ports available had been reduced at some point so checking that configuration and maximising the number available for use might help a little. Even so, the bottom line, unfortunately, is that the only viable solution at the moment is to use ports above the reserved port threshold. Sorry I couldn't bring better news. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. |