Bug 208103
Summary: | Anaconda doesn't umount NFS shares before rebooting | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Bastien Nocera <bnocera> | ||||||||||||||||
Component: | anaconda | Assignee: | Joel Andres Granados <jgranado> | ||||||||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Alexander Todorov <atodorov> | ||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||
Priority: | high | ||||||||||||||||||
Version: | 4.4 | CC: | atodorov, duck, herrold, jgranado, jim, jplans, marcobillpeter, rlerch, tao | ||||||||||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||||||||||
Target Release: | --- | ||||||||||||||||||
Hardware: | All | ||||||||||||||||||
OS: | Linux | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||
Doc Text: |
When installing Red Hat Enterprise Linux 4 through an Network File System (NFS) server, the installer is unable to correctly close the NFS mount points. This might cause the NFS server to misbehave. In these cases Red Hat suggests the use of an HTTP server for installations.
|
Story Points: | --- | ||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||
Last Closed: | 2009-01-12 19:22:42 UTC | Type: | --- | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Bug Depends On: | |||||||||||||||||||
Bug Blocks: | 391511, 458752 | ||||||||||||||||||
Attachments: |
|
Description
Bastien Nocera
2006-09-26 14:07:26 UTC
Created attachment 137140 [details]
anaconda-init-more-umount-debug-3.patch
Created attachment 137141 [details]
Screenshot-VMWare.png
Screenshot before reboot
As you can see, /dev/loop0 and /dev/loop1 are still mounted. I believe that's, respectively the ISO itself, and the stage2.img. To fix this, we'd have to copy the stage2 image from the NFS share to ramfs locally (like we do for HTTP and FTP installs). That will let us umount the NFS shares before reboot. Setting devel-nak now. Originally gave devel-ack because I thought comment #1 was providing a patch to fix the issue. We can revisit this issue in a later update release, but there's no time to fix it now. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. Teste and have some questions. I could reproduce, but I don't see the exact same behavior as the bug report. Background info: 1. Used PXE boot. 2. Used Iso+nfs and normal nfs install 3. nfs server nfs-utils-1.0.9-24.el5 and nfs-utils-lib-1.0.8-7.2.z2. I started with the installation of the Iso+nfs, I got up to 30 installs and the nfs server did not disallow mounting more times. ichihi, can you confirm that the nfs server stops mounting. The nfs server did not stop mounting but the mount point used by anaconda was not correctly unmounted. I could see this in the /var/lib/nfs/rmtab of the nfs server. The count (third column from colon separated fields) kept on going as I installed rhel4. So I tested with "normal" nfs installs. Just for fun:). With "normal" nfs installs, the behavior at the nfs server side was the same. That is, the mount count kept going up as I installed. My conclusion to all this is that the fact that the iso image is used is not the cause of this bug. I think we are just doing a poor job at unmounting the nfs mountpoint. Do you see the same behavior with normal nfs installs? SideNote: Showmount and /var/lib/nfs/rmtab have the same info (AFAIK). The behavior of this information is very chaotic. In other words this file can contain references to mounts that are not relevant anymore. If, for example, the host (identified by an IP) that is being installed, failed to correctly unmount a nfs dir in the past (This basically means that it will still be in the nfs server), it will show up when calling showmount before and after installation (provided the list is not manually flushed). This may or may not be the case. My point is to make sure that the nfs server has accurate info (by this I mean that the list in /var/lib/nfs/rmtab be manually checked for consistency) In that way the showmount command will show what is really happenning. Additionally the /var/lib/nfs/rmtab has an additional piece of info that might be very relevant to this bug: the number of times that the host has mounted a share in the nfs server. So the use of that file might be better. The iso images not unmouning and the nfs not working are two different problems. I'm suspecting that the loop device is maintained open because of an image mounted from a mounted image. have to keep looking into this. The nfs directories are unmounted on the client side using umount2() but the server is never told about it. The server must be told to erase the shared dir from the rmtab and terminate handle for remote mounted service (this is not done yet) The code posted on comment #1 is very missleading :(. I ran it on my test environment and discovered that it freed memory twice and it did not reinitialize the numFilesystems counter. This made the output of said patch to be undefined and invalid. In reality all the mount point are unmounted on the client side. Created attachment 298724 [details] Proper patch to check mount points post undomount comment 1 and 2 have no relevance to what is happening. Created attachment 299736 [details]
yuminstall patch
Think I found the reason anaconda does not report the problem properly. The
error information is in the value.
please ignore comment 21 Created attachment 300063 [details]
patch to undomounts.c
Think I have an answer to the "unmounting but not telling the server"
situation. This is a partial patch as some changes in the Makefiles are
needed.
Created attachment 300064 [details]
one last minute change
Basically the same, except for the fact that we don't have to check if it
starts with "/proc" or not.
Created attachment 301816 [details]
Patch with makefile additions.
Full patch.
Should be available in 10.1.1.84. relative commit acab28e75d11b5ce7d9ece0cdf5a54391dea954b While the patch works, it brings too much library code into the init binary (20x size increase is too much for this particular bug). A good fix should be developed in Fedora and possibly backported to 4.8. (In reply to comment #29) > While the patch works, it brings too much library code into the init binary (20x > size increase is too much for this particular bug). > > A good fix should be developed in Fedora and possibly backported to 4.8. Dave, do you have BZ # for Fedora? Is this reported upstream so we can backport ? The effort and the amount of changes that need to happen in order to make this work for rhel4.8 is not worth it. Reasons: 1. The possibility of introducing additional bugs is very hi, As new code that handles the comunication with the nfs server has to be put in. 2. There are workarounds to avoid this behavior. Install with an http server instead of an nfs server. If the behavior occurs restart the nfs service in the server.... 3. I used the same nfs server for all my tests. and it did not stop working after 50 (or more installs) installs. for these three reasons I am devel_nacking it. We can have this as a release note stating the situation and the possible workaround. It would go something like: " When installing RHEL4.X through an NFS server, the installer is unable to correctly close the NFS mountpoints. This might cause your NFS server to misbehave. In these cases Red Hat suggests the use of an HTTP server for installations. " Thats a note off the top of my head. Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When installing RHEL4.X through an NFS server, the installer is unable to correctly close the NFS mountpoints. This might cause your NFS server to misbehave. In these cases Red Hat suggests the use of an HTTP server for installations. This bug requires release notes only (see comment #37), closed. (In reply to comment #34) > The effort and the amount of changes that need to happen in order to make this > work for rhel4.8 is not worth it. Reasons: > > 1. The possibility of introducing additional bugs is very hi, As new code that > handles the comunication with the nfs server has to be put in. > > 2. There are workarounds to avoid this behavior. Install with an http server > instead of an nfs server. If the behavior occurs restart the nfs service in > the server.... > > 3. I used the same nfs server for all my tests. and it did not stop working > after 50 (or more installs) installs. > > for these three reasons I am devel_nacking it. While I do see the advantages of using a http based install, in some enterprises, this is not an acceptable solution. IT risk policies can state that web servers are not allowed on a production subnet. Also, at my enterprise, we do all deployments currently via NFS. While there are active plans to change this, it is simply not an option for us at this point to tell out lines of business, ``Nope, sorry, Red Hat's answer to your work-stopping bug is to completely re-engineer our provisioning solution RIGHT NOW before you can get back into business.'' This attitude smacks of being lazy, arrogant, and stupid. There is more than one enterprise level Linux providers out there, and I am sure that at least one other one in play at my enterprise would love to see this response. I am re-opening this issue. The decision was taken on the base that we will do more damage to the product than good. Additionally, if this issue is not yet solved in RHEL5 it is for sure solved in fedora so lazy is the last thing I would consider from the rhel product line. We acknowledge the issue, we dedicated a great deal of time trying to fix it, It was not possible for size issues. At this point of RHEL4's life it is not a good idea to add a huge chunck of network code for something that has a simple workaround. I understand that your policies might very strict and for that reason I will check with the owner of nfs-utils to see what can be done to workaround the nfs server locking without having to implement an http server. Guess this issue got lost in the sands of time? Reported: 2006-09-26 10:07 EDT by Bastien Nocera (bnocera) Comment #40 From Joel Andres Granados (jgranado) 2008-09-29 09:32:41 EDT "At this point of RHEL4's life it is not a good idea to add a huge chunck of network code for something that has a simple workaround." It is not a simple workaround if you have dozens of NAS appliances that are configured for nfs, (but not for http serving), your users expect to reuse the devices for future installs, and significant parts of the install process involve running scripts from an nfs volume. Is the issue fixed in RHEL5 or Fedora? (In reply to comment #41) > > It is not a simple workaround if you have dozens of NAS appliances that are > configured for nfs, (but not for http serving), your users expect to reuse the > devices for future installs, and significant parts of the install process > involve running scripts from an nfs volume. Again, I understand your possition and am currently working on getting more information from the NFS server side. > > Is the issue fixed in RHEL5 or Fedora? Im sure that its in Fedora, I have to test with the new RHEL5.3 when it is out. Yes, this is definitely fixed in Fedora. The release note is waiting for review and there are no changes to be done in the code. going to put it on modified. Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,2 +1 @@ -When installing RHEL4.X through an NFS server, the installer is unable to correctly close the NFS mountpoints. This might cause your NFS server to +When installing Red Hat Enterprise Linux 4 through an Network File System (NFS) server, the installer is unable to correctly close the NFS mount points. This might cause the NFS server to misbehave. In these cases Red Hat suggests the use of an HTTP server for installations.-misbehave. In these cases Red Hat suggests the use of an HTTP server for installations. |