Bug 773433

Summary: NFS mount fails: network.service starts after remote-fs
Product: [Fedora] Fedora Reporter: Hans Ecke <hansecke>
Component: systemdAssignee: systemd-maint
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16CC: fedora, johannbg, metherid, mschmidt, msekleta, notting, plautrba, systemd-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-30 17:58:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
output of "dmesg -s10000000"
none
output from "systemctl dump"
none
output from "/bin/systemd --test --system --log-level=debug 2>&1"
none
a copy of /var/log/messages
none
dmesg after setting the fs_passno field of NFS mounts to 0 none

Description Hans Ecke 2012-01-11 19:56:38 UTC
Created attachment 552213 [details]
output of "dmesg -s10000000"

Description of problem:

Systemd attempts to mount remote filesystems before it executes network.service. Accordingly, the NFS mount times out.

This is on Fedora 16, without NetworkManager but instead with the static network.service.

Version-Release number of selected component (if applicable):

37-3.fc16

How reproducible:

Always

Steps to Reproduce:
1. restart system
2. it hangs till the NFS mounts timeout
3. login on console and do a manual "mount -a"
  
Actual results:

NFS does not get mounted because the network is not ready yet

Expected results:

network.service (and rpcbind etc) should execute pretty early in bootup, maybe as soon as udev renames eth0 to em1. Then the NFS mount would succeed.

Additional info:

I worked through a lot of the systemd/network related bugs here and I understand it _should_ work. Something seems to be different on my system. Could you help me find what?

I attach the info requested on http://fedoraproject.org/wiki/How_to_debug_Systemd_problems

/proc/cmdline:

BOOT_IMAGE=/vmlinuz-3.1.7-1.fc16.x86_64 root=UUID=cc66242e-9e3a-401b-8f0b-fcade9972710 ro rd.md=0 rd.lvm=0 rd.dm=0 KEYTABLE=us selinux=0 SYSFONT=latarcyrheb-sun16 rd.luks=0 LANG=en_US.UTF-8 rdblacklist=nouveau nouveau.modeset=0 log_buf_len=1048576 systemd.log_level=debug systemd.log_target=kmsg

dmesg output .. attached

/var/log/messages and output from systemctl dump and systemd --test: I can not see how I can attach more than one file to the initial bug report, so I'll do that momentarily...

Comment 1 Hans Ecke 2012-01-11 19:57:20 UTC
Created attachment 552214 [details]
output from "systemctl dump"

Comment 2 Hans Ecke 2012-01-11 19:58:23 UTC
Created attachment 552215 [details]
output from "/bin/systemd --test --system --log-level=debug 2>&1"

This was done as user "hans". systemd refuses to run this as root.

Comment 3 Hans Ecke 2012-01-11 19:59:19 UTC
Created attachment 552216 [details]
a copy of /var/log/messages

... I deleted everything before the bootup

Comment 4 Hans Ecke 2012-01-11 20:08:06 UTC
A few comments:

1.) I changed the /etc/init.d/network script with these lines:

echo "Running /etc/init.d/network (stderr)" >&2
logger "Running /etc/init.d/network (syslog)"
echo "Running /etc/init.d/network (kmsg)" > /dev/kmsg

at the very top. You can see the output in dmesg.txt at 95.657956. Obviously the network should have been brought up much earlier....

2.) systemd unit files are unchanged. /etc/init.d/network is unchanged except for the above 3 lines.

3.) root@bose:/mpl/bose 518> rpm -qa|fgrep NetworkM

NetworkManager-gtk-0.9.2-1.fc16.x86_64
NetworkManager-glib-0.9.2-1.fc16.x86_64

4.) root@bose:/mpl/bose 519> systemctl is-enabled network.service

network.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig network --level=5
enabled

5.) root@bose:/mpl/bose 521> systemctl is-enabled remote-fs.target
enabled

Comment 5 Michal Schmidt 2012-01-11 23:18:32 UTC
There's a timeout waiting for a non-sensical device unit:
[   93.780946] systemd[1]: Job pavlov:-data.device/start timed out.

Do you have a non-zero value in the sixth fstab field (fs_passno) in /etc/fstab for any of the NFS mounts? I think that's what confuses systemd here.

Comment 6 Hans Ecke 2012-01-12 00:05:38 UTC
That was indeed the case, and now it mounts correctly. Thank you Michal!

However, I still get a huge wait/timeout. I'll attach a new dmesg file so you can see it. Any thoughts?

Comment 7 Hans Ecke 2012-01-12 00:06:55 UTC
Created attachment 552261 [details]
dmesg after setting the fs_passno field of NFS mounts to 0

Comment 8 Michal Schmidt 2012-01-12 01:10:59 UTC
From the log:

[   24.887032] systemd[1]: About to execute: /etc/rc.d/init.d/network start
[   24.911899] systemd[1]: Forked /etc/rc.d/init.d/network as 971
[   24.911941] systemd[1]: network.service changed dead -> start
[   24.922464] logger[983]: Running /etc/init.d/network (syslog)
[   24.922569] Running /etc/init.d/network (kmsg)

OK, the network script has been spawned.

[   27.470124] systemd[1]: Child 971 belongs to network.service
[   27.470137] systemd[1]: network.service: control process exited, code=exited status=0

The network script finished.

[   27.470142] systemd[1]: network.service got final SIGCHLD for state start
[   27.470193] systemd[1]: network.service changed start -> running
[   27.470206] systemd[1]: Job network.service/start finished, result=done
[   27.491866] systemd[1]: network.target changed dead -> active
[   27.491873] systemd[1]: Job network.target/start finished, result=done

Mounting starts at this point.

[   29.360845] systemd[1]: Received SIGCHLD from PID 1439 (ifup-eth).
[   29.360878] systemd[1]: Got SIGCHLD for process 1439 (ifup-eth)
[   29.385851] systemd[1]: Received SIGCHLD from PID 1444 (ifup-eth).
[   29.385882] systemd[1]: Got SIGCHLD for process 1444 (ifup-eth)
[   29.387120] systemd[1]: network.service: cgroup is empty

It's odd that the two ifup-eth processes outlived the network script and finished only now. I don't know where this asynchronicity comes from.

Comment 9 Michal Sekletar 2012-08-03 12:44:35 UTC
Are you experiencing the same issue, with latest versions of systemd, initscripts and bash? I consulted this issue with maintainer of initscripts and we didn't figure out how is it possible for /etc/rc.d/init.d/network to exit earlier than ifup-eth script.

Comment 10 Hans Ecke 2012-08-15 08:56:43 UTC
Hi Michal-

I do not see this bug anymore with current package versions. Feel free to close issue and thank you for your work.

Hans