Bug 208103

Summary: Anaconda doesn't umount NFS shares before rebooting
Product: Red Hat Enterprise Linux 4 Reporter: Bastien Nocera <bnocera>
Component: anacondaAssignee: Joel Andres Granados <jgranado>
Status: CLOSED WONTFIX QA Contact: Alexander Todorov <atodorov>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: atodorov, duck, herrold, jgranado, jim, jplans, marcobillpeter, rlerch, tao
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When installing Red Hat Enterprise Linux 4 through an Network File System (NFS) server, the installer is unable to correctly close the NFS mount points. This might cause the NFS server to misbehave. In these cases Red Hat suggests the use of an HTTP server for installations.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-12 19:22:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 391511, 458752    
Attachments:
Description Flags
anaconda-init-more-umount-debug-3.patch
none
Screenshot-VMWare.png
none
Proper patch to check mount points post undomount
none
yuminstall patch
none
patch to undomounts.c
none
one last minute change
none
Patch with makefile additions. none

Description Bastien Nocera 2006-09-26 14:07:26 UTC
Using RHEL4 U4's anaconda, anaconda doesn't umount the NFS shares the ISOs are
on before rebooting, leaving dead clients on the server.

1. Boot using a stage1 on CD (boot.iso), or HD (using isolinux or grub, and the
stage1 initrd.img)

2. Select NFS as an installation method, with the ISOs being available (not
mounted loopback) on the NFS server

3. Go through with the installation, and reboot the machine using anaconda

4. See that, on the server, the client still appears, for example:
# /usr/sbin/showmount -a
All mount points on nfsserver.test.redhat.com:
clientmachine.test.redhat.com:/distribution/RHEL4/U4/ISOs/

The same problem happens with RHEL5 Beta1.

Attached is the patch used to gather more debugging (it doesn't umount /proc,
and re-reads /proc/mounts after having umounted everything it usually does), as
well as the output of anaconda with this patch, before a reboot.

Comment 1 Bastien Nocera 2006-09-26 14:07:28 UTC
Created attachment 137140 [details]
anaconda-init-more-umount-debug-3.patch

Comment 2 Bastien Nocera 2006-09-26 14:08:47 UTC
Created attachment 137141 [details]
Screenshot-VMWare.png

Screenshot before reboot

Comment 3 Bastien Nocera 2006-09-26 14:10:09 UTC
As you can see, /dev/loop0 and /dev/loop1 are still mounted. I believe that's,
respectively the ISO itself, and the stage2.img.

Comment 4 David Cantrell 2007-03-15 20:59:04 UTC
To fix this, we'd have to copy the stage2 image from the NFS share to ramfs
locally (like we do for HTTP and FTP installs).  That will let us umount the NFS
shares before reboot.

Comment 9 David Cantrell 2007-08-22 18:46:36 UTC
Setting devel-nak now.  Originally gave devel-ack because I thought comment #1
was providing a patch to fix the issue.  We can revisit this issue in a later
update release, but there's no time to fix it now.

Comment 10 RHEL Program Management 2007-08-22 18:51:29 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request. 

Comment 13 Joel Andres Granados 2008-03-13 17:30:50 UTC
Teste and have some questions.
I could reproduce, but I don't see the exact same behavior as the bug report.

Background info:
1. Used PXE boot.
2. Used Iso+nfs and normal nfs install
3. nfs server nfs-utils-1.0.9-24.el5 and nfs-utils-lib-1.0.8-7.2.z2.

I started with the installation of the Iso+nfs,  I got up to 30 installs and the
nfs server did not disallow mounting more times.  ichihi, can you
confirm that the nfs server stops mounting.
The nfs server did not stop mounting but the mount point used by anaconda was
not correctly unmounted.  I could see this in the /var/lib/nfs/rmtab of the nfs
server.  The count (third column from colon separated fields) kept on going as I
installed rhel4.
So I tested with "normal" nfs installs.  Just for fun:).  With "normal" nfs
installs, the behavior at the nfs server side was the same.  That is, the mount
count kept going up as I installed.
My conclusion to all this is that the fact that the iso image is used is not the
cause of this bug.  I think we are just doing a poor job at unmounting the nfs
mountpoint.


Do you see the same behavior with normal nfs installs?

Comment 14 Joel Andres Granados 2008-03-13 17:31:52 UTC
SideNote:
Showmount and /var/lib/nfs/rmtab have the same info (AFAIK).  The behavior of
this information is very chaotic.  In other words this file can contain
references to mounts that are not relevant anymore.  If, for example, the host
(identified by an IP) that is being installed, failed to correctly unmount a nfs
dir in the past (This basically means that it will still be in the nfs server),
it will show up when calling showmount before and after installation (provided
the list is not manually flushed).
This may or may not be the case.  My point is to make sure that the nfs server
has accurate info (by this I mean that the list in /var/lib/nfs/rmtab be
manually checked for consistency)  In that way the showmount command will show
what is really happenning.  
Additionally the /var/lib/nfs/rmtab has an additional piece of info that might
be very relevant to this bug: the number of times that the host has mounted a
share in the nfs server.  So the use of that file might be better.

Comment 18 Joel Andres Granados 2008-03-19 18:04:40 UTC
The iso images not unmouning and the nfs not working are two different problems.
 I'm suspecting that the loop device is maintained open because of an image
mounted from a mounted image.  have to keep looking into this.
The nfs directories are unmounted on the client side using umount2() but the
server is never told about it.  The server must be told to erase the shared dir
from the rmtab and terminate handle for remote mounted service (this is not done
yet)

Comment 19 Joel Andres Granados 2008-03-20 18:18:57 UTC
The code posted on comment #1 is very missleading :(.  I ran it on my test
environment and discovered that it freed memory twice and it did not
reinitialize the numFilesystems counter.  This made the output of said patch to
be undefined and invalid.  In reality all the mount point are unmounted on the
client side.

Comment 20 Joel Andres Granados 2008-03-20 18:21:06 UTC
Created attachment 298724 [details]
Proper patch to check mount points post undomount

comment 1 and 2 have no relevance to what is happening.

Comment 21 Joel Andres Granados 2008-03-31 16:32:55 UTC
Created attachment 299736 [details]
yuminstall patch

Think I found the reason anaconda does not report the problem properly.  The
error information is in the value.

Comment 22 Joel Andres Granados 2008-04-01 08:29:21 UTC
please ignore comment 21

Comment 23 Joel Andres Granados 2008-04-02 13:39:55 UTC
Created attachment 300063 [details]
patch to undomounts.c

Think I have an answer to the "unmounting but not telling the server"
situation.  This is a partial patch as some changes in the Makefiles are
needed.

Comment 24 Joel Andres Granados 2008-04-02 13:45:23 UTC
Created attachment 300064 [details]
one last minute change

Basically the same, except for the fact that we don't have to check if it
starts with "/proc" or not.

Comment 25 Joel Andres Granados 2008-04-09 14:47:59 UTC
Created attachment 301816 [details]
Patch with makefile additions.

Full patch.

Comment 26 Joel Andres Granados 2008-04-09 15:43:37 UTC
Should be available in 10.1.1.84.
relative commit acab28e75d11b5ce7d9ece0cdf5a54391dea954b

Comment 29 David Lehman 2008-05-05 23:25:24 UTC
While the patch works, it brings too much library code into the init binary (20x
size increase is too much for this particular bug).

A good fix should be developed in Fedora and possibly backported to 4.8.

Comment 33 Alexander Todorov 2008-09-02 13:52:13 UTC
(In reply to comment #29)
> While the patch works, it brings too much library code into the init binary (20x
> size increase is too much for this particular bug).
> 
> A good fix should be developed in Fedora and possibly backported to 4.8.

Dave,
do you have BZ # for Fedora? Is this reported upstream so we can backport ?

Comment 34 Joel Andres Granados 2008-09-23 15:13:32 UTC
The effort and the amount of changes that need to happen in order to make this work for rhel4.8 is not worth it.  Reasons:

1. The possibility of introducing additional bugs is very hi, As new code that handles the comunication with the nfs server has to be put in.

2. There are workarounds to avoid this behavior.  Install with an http server instead of an nfs server.  If the behavior occurs restart the nfs service in the server....

3. I used the same nfs server for all my tests. and it did not stop working after 50 (or more installs) installs.

for these three reasons I am devel_nacking it.

Comment 35 Joel Andres Granados 2008-09-24 14:02:48 UTC
We can have this as a release note stating the situation and the possible workaround.  It would go something like:
"
When installing RHEL4.X through an NFS server, the installer is unable to correctly close the NFS mountpoints.  This might cause your NFS server to misbehave.  In these cases Red Hat suggests the use of an HTTP server for installations.
"

Thats a note off the top of my head.

Comment 37 Ludek Smid 2008-09-29 11:27:15 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
When installing RHEL4.X through an NFS server, the installer is unable to correctly close the NFS mountpoints.  This might cause your NFS server to
misbehave.  In these cases Red Hat suggests the use of an HTTP server for installations.

Comment 38 Ludek Smid 2008-09-29 11:41:35 UTC
This bug requires release notes only (see comment #37), closed.

Comment 39 Donald Harper 2008-09-29 12:11:51 UTC
(In reply to comment #34)
> The effort and the amount of changes that need to happen in order to make this
> work for rhel4.8 is not worth it.  Reasons:
> 
> 1. The possibility of introducing additional bugs is very hi, As new code that
> handles the comunication with the nfs server has to be put in.
> 
> 2. There are workarounds to avoid this behavior.  Install with an http server
> instead of an nfs server.  If the behavior occurs restart the nfs service in
> the server....
> 
> 3. I used the same nfs server for all my tests. and it did not stop working
> after 50 (or more installs) installs.
> 
> for these three reasons I am devel_nacking it.

While I do see the advantages of using a http based install, in some enterprises, this is not an acceptable solution. IT risk policies can state that web servers are not allowed on a production subnet.  Also, at my enterprise, we do all deployments currently via NFS.  While there are active plans to change this, it is simply not an option for us at this point to tell out lines of business, ``Nope, sorry, Red Hat's answer to your work-stopping bug is to completely re-engineer our provisioning solution RIGHT NOW before you can get back into business.''

This attitude smacks of being lazy, arrogant, and stupid.  There is more than one enterprise level Linux providers out there, and I am sure that at least one other one in play at my enterprise would love to see this response.

I am re-opening this issue.

Comment 40 Joel Andres Granados 2008-09-29 13:32:41 UTC
The decision was taken on the base that we will do more damage to the product than good.  Additionally, if this issue is not yet solved in RHEL5 it is for sure solved in fedora so lazy is the last thing I would consider from the rhel product line.

We acknowledge the issue,  we dedicated a great deal of time trying to fix it,  It was not possible for size issues.  At this point of RHEL4's life it is not a good idea to add a huge chunck of network code for something that has a simple workaround.

I understand that your policies might very strict and for that reason I will check with the owner of nfs-utils to see what can be done to workaround the nfs server locking without having to implement an http server.

Comment 41 Jim Wildman 2008-09-30 01:58:06 UTC
Guess this issue got lost in the sands of time?

Reported:  	2006-09-26 10:07 EDT by Bastien Nocera (bnocera)

Comment #40 From  Joel Andres Granados (jgranado)  2008-09-29 09:32:41 EDT
"At this point of RHEL4's life it is not a good idea to add a huge chunck of network code for something that has a simple workaround."

It is not a simple workaround if you have dozens of NAS appliances that are configured for nfs, (but not for http serving), your users expect to reuse the devices for future installs, and significant parts of the install process involve running scripts from an nfs volume.

Is the issue fixed in RHEL5 or Fedora?

Comment 42 Joel Andres Granados 2008-09-30 14:43:58 UTC
(In reply to comment #41)
> 
> It is not a simple workaround if you have dozens of NAS appliances that are
> configured for nfs, (but not for http serving), your users expect to reuse the
> devices for future installs, and significant parts of the install process
> involve running scripts from an nfs volume.

Again, I understand your possition and am currently working on getting more information from the NFS server side.

> 
> Is the issue fixed in RHEL5 or Fedora?
Im sure that its in Fedora,  I have to test with the new RHEL5.3 when it is out.

Comment 43 Chris Lumens 2008-10-13 18:01:50 UTC
Yes, this is definitely fixed in Fedora.

Comment 45 Joel Andres Granados 2009-01-09 14:57:25 UTC
The release note is waiting for review and there are no changes to be done in the code.

going to put it on modified.

Comment 49 Ryan Lerch 2009-01-22 01:11:01 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,2 +1 @@
-When installing RHEL4.X through an NFS server, the installer is unable to correctly close the NFS mountpoints.  This might cause your NFS server to
+When installing Red Hat Enterprise Linux 4 through an Network File System (NFS) server, the installer is unable to correctly close the NFS mount points.  This might cause the NFS server to misbehave.  In these cases Red Hat suggests the use of an HTTP server for installations.-misbehave.  In these cases Red Hat suggests the use of an HTTP server for installations.