Description of problem: I have a setup with the following configuration: A configuration where the directories are mounted via NFS as background mounts through /etc/fstab. And trying to mount the file system by issuing service netfs start command. Here are the entries in /etc/fstab file <ip>:/work1 /work1 nfs rw,bg,intr,hard,wsize=32768,rsize=32768 0 0 <ip>:/work2 /work2 nfs rw,bg,intr,hard,wsize=32768,rsize=32768 0 0 Version-Release number of selected component (if applicable): RHEL4U5 How reproducible: Always Steps to Reproduce: 1. make the entries in /etc/fstab file to mount two NFS directories as; <ip>:/work1 /work1 nfs rw,bg,intr,hard,wsize=32768,rsize=32768 0 0 <ip>:/work2 /work2 nfs rw,bg,intr,hard,wsize=32768,rsize=32768 0 0 2. make the NFS server not reachable, either plug out the network cable or shutdown the NFS server. 3. start the /etc/rc.d/init.d/netfs service. 4. ps -ef Actual results: The background mount for "/work2" ( second line in /etc/fstab) disappears shortly after it is backgrounded. We see from the "/var/log/messages" file that the mount for "/work1" issues the following message shortly after it is backgrounded. mount: mount to NFS server 'XXX.XXX.XXX.XXX' failed However, we never see that message coming from the background mount for "/work2". But, if I manually run "mount -a" from the client, both mounts issue the message and the background mounts remain running until the server that is exporting the "/work1" and "/work2" file systems is up and running. Expected results: When the "service netfs start" command is issued. The mounts for "/work1" and "/work2" should get backgrounded and wait for the NFS server to export the two file systems. Additional info: mount command submitted through initlog command has this behavior. strace on the mount command shows. Process 8119 attached - interrupt to quit connect(3, {sa_family=AF_INET, sin_port=htons(111), sin_addr=inet_addr("XXX.XXX.XXX.XXX")}, 16) = -1 EHOSTUNREACH (No route to host) close(3) = 0 close(3) = -1 EBADF (Bad file descriptor) gettimeofday({1201283232, 281438}, NULL) = 0 write(2, "mount: mount to NFS server \'172."..., 49) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ a000000000010641 (1fb7) --- Process 8119 detached
If you background yourself, you must be prepared for the fact that you are no longer attached to the terminal. (This could also fail due to SELinux, for example.)
> <ip>:/work1 /work1 nfs rw,bg,intr,hard,wsize=32768,rsize=32768 0 0 > <ip>:/work2 /work2 nfs rw,bg,intr,hard,wsize=32768,rsize=32768 0 0 > We see from the "/var/log/messages" file that the mount for "/work1" > issues the following message shortly after it is backgrounded. > > mount: mount to NFS server 'XXX.XXX.XXX.XXX' failed > > However, we never see that message coming from the background mount > for "/work2". It seems correct, you shouldn't see any initial message from /work2. See nfs(5) man page: After a mount operation is backgrounded, all subsequent mounts on the same NFS server will be backgrounded immediately, without first attempting the mount. Are you really sure that the second mount process (for /work2) really died?
(In reply to comment #2) > > <ip>:/work1 /work1 nfs rw,bg,intr,hard,wsize=32768,rsize=32768 0 0 > > <ip>:/work2 /work2 nfs rw,bg,intr,hard,wsize=32768,rsize=32768 0 0 > > We see from the "/var/log/messages" file that the mount for "/work1" > > issues the following message shortly after it is backgrounded. > > > > mount: mount to NFS server 'XXX.XXX.XXX.XXX' failed > > > > However, we never see that message coming from the background mount > > for "/work2". > > It seems correct, you shouldn't see any initial message from /work2. See nfs(5) > man page: > > After a mount operation is backgrounded, all subsequent mounts on the same > NFS server will be backgrounded immediately, without first attempting the > mount. > > Are you really sure that the second mount process (for /work2) really died? > Yes i am very sure if I submit the mount command through initlog command the second mount process dies.restarting/starting the netfs also has this behavior.
Going through the different cases here. Set up 2 mounts in /etc/fstab: server:/vol/vol0/home/jlayton/rhel4/1 /mnt/netapp1 nfs bg 0 0 server:/vol/vol0/home/jlayton/rhel4/2 /mnt/netapp2 nfs bg 0 0 Running mount -a seems to result in both daemons running after the initial mount backgrounds them: root 3816 1 0 08:26 pts/0 00:00:00 mount -- server:/vol/vol0/home/jlayton/rhel4/1 root 3817 1 0 08:26 pts/0 00:00:00 mount -- server:/vol/vol0/home/jlayton/rhel4/2 ...both are blocked trying to open a socket to the portmapper on the server: connect(3, {sa_family=AF_INET, sin_port=htons(111), sin_addr=inet_addr("10.100.56.139")}, 16 <unfinished ...> ...this might be a factor. std* are still connected to the controlling terminal: mount 3816 root 0u CHR 136,0 2 /dev/pts/0 mount 3816 root 1u CHR 136,0 2 /dev/pts/0 mount 3816 root 2u CHR 136,0 2 /dev/pts/0 I'll do an attempt next by running "service netfs start"...
Doing service nfs start shows a couple of processes just after it starts: root 4074 4061 0 08:35 pts/1 00:00:00 initlog -q -c mount -a -t nfs,nfs4 root 4075 4074 0 08:35 pts/1 00:00:00 mount -a -t nfs,nfs4 ...after running for a bit, I see the log messages come across saying that the mount has been backgrounded, but both mount processes then disappear. My suspicion is that initlog is exiting and that's causing the background mount processes to die...
yep...using gdb in follow-fork-mode: (gdb) set follow-fork-mode child (gdb) c Continuing. Attaching after fork to child process 2517. Program received signal SIGPIPE, Broken pipe. [Switching to process 2517] 0x000000337a2b9f12 in __write_nocancel () from /lib64/tls/libc.so.6 (gdb) bt #0 0x000000337a2b9f12 in __write_nocancel () from /lib64/tls/libc.so.6 #1 0x000000337a265678 in _IO_new_file_write () from /lib64/tls/libc.so.6 #2 0x000000337a265867 in _IO_new_file_xsputn () from /lib64/tls/libc.so.6 #3 0x000000337a23f6ff in buffered_vfprintf () from /lib64/tls/libc.so.6 #4 0x000000337a23f889 in vfprintf () from /lib64/tls/libc.so.6 #5 0x000000337a248576 in fprintf () from /lib64/tls/libc.so.6 #6 0x000000000040b2c8 in mount_errors (server=0x46 <Address 0x46 out of bounds>, will_retry=1, bg=1) at nfsmount.c:139 #7 0x000000000040cf0c in nfsmount (spec=Variable "spec" is not available. ) at nfsmount.c:1263 #8 0x0000000000403fec in try_mount_one (spec0=0x51f7e0 "netapp-f87.gsslab.rdu.redhat.com:/vol/vol0/home/jlayton/rhel4/1", node0=0x51ce20 "/mnt/netapp1", types0=0x51e6a0 "nfs", opts0=0x51f830 "bg", freq=0, pass=0, bg=1, ro=0) at mount.c:859 #9 0x000000000040533a in mount_one (spec=0x51f7e0 "netapp-f87.gsslab.rdu.redhat.com:/vol/vol0/home/jlayton/rhel4/1", node=0x51ce20 "/mnt/netapp1", types=0x51e6a0 "nfs", opts=0x51f830 "bg", cmdlineopts=0x0, freq=0, pass=0) at mount.c:1301 #10 0x0000000000405b5f in main (argc=Variable "argc" is not available. ) at mount.c:1445 #11 0x000000337a21c3fb in __libc_start_main () from /lib64/tls/libc.so.6 #12 0x00000000004030ea in _start () #13 0x0000007fbffffde8 in ?? () #14 0x000000000000001c in ?? () #15 0x0000000000000004 in ?? () #16 0x0000007fbfffff6c in ?? () #17 0x0000000000000000 in ?? () ...mount_errors does a fprintf(stderr, ...) so it does look like the stderr pipe closed here. I suppose the question is -- is initlog expected to stick around for children of the process that it spawns?
No, it isn't. It repeatedly calls waitpid() with WNOHANG to see if it has exited. Once the main process has exited, the parent exits and the children's stderr fd's are closed. They then get a SIGPIPE when trying to write to stderr and exit, but the parent process is now init and they get auto-reaped. There are a couple of options I suppose: 1) change initlog so that it can wait for all children to exit (allow it to follow forks in a sense). This could be problematic though since the shell script probably won't exit if initlog doesn't exit... 2) change mount to run bg children in a silent mode (redirecting all std* fd's to /dev/null) or to log to syslog directly. The former would probably be fairly easy to implement. The latter may be tricky.
This seems to be fixed in rawhide -- the backgrounded children dissociate from the controlling terminal. I'm going to presume that that's ok here...
Created attachment 298722 [details] proposed patch Proposed patch -- make backgrounded mount processes detach from controlling terminal. This patch seems to fix it. It just has the forked mount processes detach from the controlling terminal. This prevents the SIGPIPE that occurs when the process tries to write to stderr. This does mean that there are no log messages produced from the backgrounded processes, but this seems to be consistent with upstream behavior in this regard.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 299336 [details] proposed patch 2 Updated patch. Make the daemonize() call a void function. There's not much we can do differently if it fails, so just make it a best-effort thing.
Committed in util-linux-2.12a-19.el4
Created attachment 299358 [details] patch 3 Patch committed to CVS. Same code, but rename function to be more clear. Also add comment.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0751.html
*** Bug 445037 has been marked as a duplicate of this bug. ***