Bug 750926
Summary: | Netfs fails to unmount unreachable NFS filesystems | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Orion Poplawski <orion> |
Component: | systemd | Assignee: | systemd-maint |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 18 | CC: | Colin.Simpson, david.halliwell, d.bz-redhat, fedora, harald, iarlyy, jcapik, johannbg, jonathan, lnykryn, marmalodak, mschmidt, msekleta, notting, plautrba, systemd-maint, vpavlin, zbyszek |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 735458 | Environment: | |
Last Closed: | 2013-09-13 01:38:55 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 735458 | ||
Bug Blocks: |
Description
Orion Poplawski
2011-11-02 20:38:18 UTC
The problem is umount will return busy (32) first if someone is using filesystem, so I don't think we can tell the difference between a reachable and unreachable server. While it probably would be nice to kill the processes using the filesystem, I think for nfs it's probably safer to just unmount. psmisc 22.14 has a patch to prevent it from hanging - https://sourceforge.net/tracker/index.php?func=detail&aid=1963033&group_id=15273&atid=315273 has some more info. I installed on my F16 machine and it does appear to work, though it spawns a *huge* amount of processes and a couple stale ones get left. Apparently this may be an option in 22.15. May be better ways to tackle this, but fixing fuser it probably the most correct option. This message is a notice that Fedora 15 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 15. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At this time, all open bugs with a Fedora 'version' of '15' have been closed as WONTFIX. (Please note: Our normal process is to give advanced warning of this occurring, but we forgot to do that. A thousand apologies.) Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, feel free to reopen this bug and simply change the 'version' to a later Fedora version. Bug Reporter: Thank you for reporting this issue and we are sorry that we were unable to fix it before Fedora 15 reached end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" (top right of this page) and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Problem is still present with psmisc-22.16-1.fc17.x86_64 : # umount -a -f -l -t nfs - hangs, - eventually (after many minutes) fails with "umount.nfs: /auto/XXX: Stale NFS file handle", - and the NFS share is not unmounted. # df - hangs # dmesg ... kernel: [1296121.709474] nfs: server XXX not responding, timed out ... (repeated) # ps aux | grep umount root 13070 0.0 0.0 119896 1052 pts/15 S+ 10:26 0:00 umount -a -f -l -t nfs root 13071 0.0 0.0 26032 1168 pts/15 D+ 10:26 0:00 /sbin/umount.nfs /auto/XXX -l -f This is extremely annoying when e.g. moving laptops with mounted NFS shares between location, without first unmounting. Even with recently introduced non-blocking mode, there need to be changes made in the initscripts. Once the non-blocking mode patch is accepted upstream, I'll change the component to initscripts. The patch is in upstream, but it needs to be enabled with --enable-timeout-stat. I don't quite understand what the --enable-timeout-stat=static version does though. I also think we could really use a run-time option instead of compile time. Hi Orion. Believe or not. I started fighting this issue quite a long time ago and it's a tricky one. The timeout doesn't solve anything. It's pretty unreliable and has a negative performance impact on the fuser tool. If the timeout interval it is too short, then you can see unwanted timeouts when the system is busy and when it's too long, then it takes ages for all hanged stats to timeout. The root cause lies in the whole concept of rebooting / umounting / killing processes / NFS. I don't understand why this bug was created as a clone of RHEL5 bug. Fedora has systemd and the solution differs from RHEL5. I don't even know if fuser still plays any role here. I'm changing the component to systemd, because there's nothing I can do with psmisc in order to prevent this from happening. Regards, Jaromir. The netfs service belonged to initscripts. Also it's gone in F18. Whether it's a bug at a lower level (umount.nfs, kernel?) I don't know. NFS cannot surprise me by anything. Hi Michal. AFAIK more causes of the NFS related reboot hangs exist. And as I experienced some of them in F18, it seems to me, that systemd needs to be finetuned too. But of course, it's up to you. Bill must be happy the issue finally got back to him again. This hot potato game lasts too long and users are suffering. Regards, Jaromir. (In reply to Jaromír Cápík from comment #9) > AFAIK more causes of the NFS related reboot hangs exist. And as I > experienced some of them in F18, it seems to me, that systemd needs to be > finetuned too. That's quite possible. However, this BZ does not contain any information implicating systemd. > Bill must be happy the issue finally got back to him again. > This hot potato game lasts too long and users are suffering. Sorry, I did not notice this BZ was already assigned to initscripts before. The problem is that it's not clear what this BZ is meant to be about. It surely does not seem to be about "netfs" anymore. It's a long time present conceptual deadlock. I'll tell you more about the scenario I was fighting with in the past. It's possible it evolved/changed since that. So ... here's the story ... When a hardmounted NFS share becomes unreachable, all blocking calls like stat(2) called by the fuser tool wait forever till the server becomes reachable again (and that often doesn't happen). So ... the fuser tool waits for the NFS server forever. But as the fuser tool output is used for generating a list of processes which need to be killed because they use the mountpoint, these processes then cannot be killed even if you let the fuser tool timeout and consequently you cannot umount the mountpoint. The processes which block the NFS mountpoint simply cannot be reliably detected and killed. Funny, isn't it? In the past Bill Nottingham tried to lazy umount such shares, but that apparently resulted in hangs too. Maybe he'll tell you more, because I don't know much about his findings. Does it mean it's caused by the umount or kernel? I don't know ... I only know it often happens on servers with higher uptime. Unfortunately we were unable to reproduce the issue in the lab. Somebody would have to sacrifice himself and do a very deep analysis + document what exactly happens. We're lacking a reliable reproduction scenario. And I have no idea if this story is still valid for Fedora 17 and later. Maybe not. See also bug 851665 This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. Netfs is no longer in initscripts. So, netfs doesn't exist in initscripts anymore, and systemd doesn't use fuser or anything. I don't see how this would apply to systemd. Closing. Wouldn't it be better to assign this bug to a more appropriate component than close it? (In reply to John Schmitt from comment #17) > Wouldn't it be better to assign this bug to a more appropriate component > than close it? The component is systemd, but it's should not be a bug anymore. Test: - mount nfs - pull the network cable - shutdown systemd will maybe stall for 3 minutes, but should continue (In reply to Harald Hoyer from comment #18) > (In reply to John Schmitt from comment #17) > > Wouldn't it be better to assign this bug to a more appropriate component > > than close it? > > The component is systemd, but it's should not be a bug anymore. > > Test: > > - mount nfs > - pull the network cable > - shutdown > > systemd will maybe stall for 3 minutes, but should continue ok, wrong.. it hangs in the kernel: https://bugzilla.redhat.com/show_bug.cgi?id=1007607 https://bugzilla.redhat.com/show_bug.cgi?id=1007745 (In reply to John Schmitt from comment #17) > Wouldn't it be better to assign this bug to a more appropriate component > than close it? Well, I don't know any more appropiate one. But certainly not systemd nor initscripts... Knock yourself out and reopen it and assign it to some component... |