Bug 754789

Summary: Cannot create SSH tunnel via rc.local, no route to host
Product: [Fedora] Fedora Reporter: vincent.santamaria
Component: systemdAssignee: systemd-maint
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16CC: fedoraproject, gregscott, johannbg, lpoetter, metherid, mschmidt, notting, plautrba, pnewell0705, scott, systemd-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: systemd-37-11.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-30 20:59:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
rc.local file with debugging info
none
file created by the debugging commands none

Description vincent.santamaria 2011-11-17 18:20:53 UTC
Created attachment 534283 [details]
rc.local file with debugging info

Description of problem:
I am unable to create SSH tunnels via /etc/rc.d/rc.local as I have been able to in previous releases of Fedora (Fedora 15 included). I have tested this across three different systems:

1. Physical machine, same subnet as tunnel target, Fedora 16 installed as upgrade
2. Physical machine, different subnet as tunnel target, Fedora 16 installed as upgrade
3. Virtual machine, same subnet as tunnel target, Fedora 16 installed from scratch

/etc/rc.d/rc.local has its permissions set to 755, and if the script is run manually all commands within execute without issue.

/var/log/boot.log shows:

"ssh: Could not resolve hostname server1: Name or service not known
...
Starting /etc/rc.local Compatibility failed, see 'systemctl status rc-local.service' for details"

If I make an entry in /etc/hosts:

'192.168.50.1 server1'

then /var/log/boot.log shows a different error for the ssh command:

'ssh: connect to host server1: No route to host'

Putting some debugging information in /etc/rc.d/rc.local shows that the network interface is up and has an appropriate IP address, and 'netstat -rn' shows the correct routing information, but ping commands to the target host (server1) fail completely.

SELinux is in permissive mode.

Version-Release number of selected component (if applicable):
Fedora 16, running kernel 3.1.1-1.fc16x86_64
systemd-37-3.fc16.x86_64

How reproducible:
Setup host key authentication to another system, insert the command to create an SSH tunnel in /etc/rc.d/rc.local to be run at boot, reboot system

Steps to Reproduce:
1. Create SSH key pair (no passphrase): `$ ssh-keygen -t rsa`
2. Send public key to target host and have it appended to the user's /home/$USER/.ssh/authorized_keys file
3. Verify host key authentication works: `$ ssh user@server1` (no password should be prompted)
4. Create /etc/rc.d/rc.local if it doesn't exist: `$ echo "#!/bin/sh" > /etc/rc.d/rc.local`
5. Append the command to create the SSH tunnel to /etc/rc.d/rc.local: `$ echo "ssh -fNL 3306:127.0.0.1:3306 user@server1" >> /etc/rc.d/rc.local`
6. Make sure /etc/rc.d/rc.local is executable: `$ chmod ug+x /etc/rc.d/rc.local`
7. Make sure the rc-local.service is enabled: `$ systemctl enable rc-local.servce`
8. Reboot system: `$ shutdown -r now`
9. When system boots up, check to see if the tunnel has been created: `ps aux | grep ssh`
10. Check the boot log: `$ cat /var/log/boot.log`
  
Actual results:
Tunnel(s) are not created, rc-local.service failed, ssh command unable to reach host

Expected results:
Tunnel(s) should be created

Additional info:
It doesn't appear that rc.local is truly being run last in the boot process like it was before it was ported to systemd, I just don't know what has not yet loaded that is making the network not totally functional.

Comment 1 vincent.santamaria 2011-11-17 18:22:18 UTC
Created attachment 534284 [details]
file created by the debugging commands

Comment 2 Bill Nottingham 2011-11-17 20:32:37 UTC
How are you configuring your network?

Comment 3 vincent.santamaria 2011-11-17 20:54:59 UTC
1. Physical machine, same subnet as tunnel target, network configured via Network Manager (DHCP)
2. Physical machine, different subnet as tunnel target, network configured manually via /etc/sysconfig/network-scripts/ifcfg-em2 (NetworkManager is disabled)
3. Virtual machine, same subnet as tunnel target, network configured via NetworkManager (DHCP)

Same problem on all systems: the ssh command cannot resolve FQDN to IP (if no entry exists in /etc/hosts), and cannot find a route to the tunnel target host (with specified entry in /etc/hosts).

Let me know if you need more specific network configuration info.

Thanks!

Comment 4 Michal Schmidt 2011-11-18 15:32:20 UTC
Running things "last" in the boot process does not translate well into systemd.
To tell systemd about the ordering you expect, create a unit file /etc/systemd/system/rc-local.service and put these 3 lines in it:

.include /lib/systemd/system/rc-local.service
[Unit]
After=network.target

Or even better, forget rc.local and make your own unit. It's simpler and shorter than writing SysV scripts was.
/etc/systemd/system/my-ssh-tunnel.service:

[Unit]
Description=ssh tunnel to server1
After=network.target
[Service]
Type=forking
ExecStart=/usr/bin/ssh -fNL 6360:127.0.0.1:6360 user@server1
[Install]
WantedBy=multi-user.target

Comment 5 Bill Nottingham 2011-11-18 15:37:59 UTC
Given SysVStartPriority=99, should it still be run after /etc/init.d/network, at least?

Comment 6 Michal Schmidt 2011-11-18 16:20:39 UTC
I think the ordering by SysV priorities is not applied in this case because rc-local.service is native and /etc/init.d/network has an LSB header.

From sysv_fix_order():

                /* If both units have modern headers we don't care
                 * about the priorities */
                if ((s->meta.fragment_path || s->sysv_has_lsb) &&
                    (t->meta.fragment_path || t->sysv_has_lsb))
                        continue;

That looks like a bug. Even though rc-local.service is native, the fact that it has SysVStartPriority defined should be taken into account.

Comment 7 Bill Nottingham 2011-11-18 16:29:04 UTC
The other alternative might be to simply add:

After=network.target multi-user.target

to rc-local.service.

Comment 8 Matt Chambers 2011-11-18 19:55:29 UTC
Vincent, I suggest refactoring this bug report to be about a simple ping command failing (so that it's much easier to understand and reproduce). I think we can assume that once the ping starts working, your SSH tunnel will work too. My manifestation of this bug is wget failing to work from rc.local at boot:
"Connecting to 172.20.0.1:81... failed: Network is unreachable."

I tried Bill's suggestion also with no effect. However, I did notice this interesting result from dmesg:
[12.01] ADDRCONF(NETDEV_UP): em1: link is not ready
[12.01] NetworkManager[783]: <info> (em1): preparing device.
[12.01] NetworkManager[783]: NetworkManager[783]: <info> (em1): preparing device.
[12.01] NetworkManager[783]: <info> (em1): deactivating device (reason 'managed') [2]
[12.01] NetworkManager[783]: NetworkManager[783]: <info> (em1): deactivating device (reason 'managed') [2]
[12.02] sshd[814]: Server listening on 0.0.0.0 port 22.
[12.02] sshd[814]: Server listening on :: port 22.
[15.79] e1000e: em1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx
[15.79] ADDRCONF(NETDEV_CHANGE): em1: link becomes ready
[26.11] em1: no IPv6 routers present

Could it be that the network isn't ready to use until the IPv6 request times out? I'll try disabling it...

Comment 9 Matt Chambers 2011-11-18 20:53:32 UTC
Michal, I tried making a separate .service like you suggested, but it fails for probably the same reason as the rc.local script: it's not really waiting until the network is up.

[11.37] ADDRCONF(NETDEV_UP): em1: link is not ready
[11.37] NetworkManager[782]: NetworkManager[782]: <info> (em1): preparing device.
[11.37] NetworkManager[782]: <info> (em1): preparing device.
[11.37] NetworkManager[782]: NetworkManager[782]: <info> (em1): deactivating device (reason 'managed') [2]
[11.37] NetworkManager[782]: <info> (em1): deactivating device (reason 'managed') [2]
[11.39] sshd[813]: Server listening on 0.0.0.0 port 22.
[11.39] sshd[813]: Server listening on :: port 22.
[11.39] wget[815]: --2011-11-18 15:48:48--  http://172.20.0.1:81/configure-node.sh
[11.39] wget[815]: Connecting to 172.20.0.1:81... failed: Network is unreachable.
[11.42] systemd[1]: configure-node.service: control process exited, code=exited status=4
[11.45] systemd[1]: Unit configure-node.service entered failed state.
[15.25] e1000e: em1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx
[15.25] ADDRCONF(NETDEV_CHANGE): em1: link becomes ready

Here's configure-node.service:
[Unit]
Description=Download and run node configuration script at bootup
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/bin/wget -O /root/configure-node.sh http://172.20.0.1:81/configure-node.sh
ExecStart=/bin/chmod u+x /root/configure-node.sh
ExecStart=/root/configure-node.sh

[Install]
WantedBy=multi-user.target

My successful temporary workaround is doing this before wget:
ExecStart=/bin/sleep 5

:)

Comment 10 Michal Schmidt 2011-11-18 21:05:03 UTC
If you're using NetworkManager, make sure you enable NetworkManager-wait-online.service. Otherwise network.target is reached too early.

Comment 11 Matt Chambers 2011-11-18 21:16:28 UTC
That fixed it. Can services that require the network to be ready depend on that even if it's not enabled?

Comment 12 Michal Schmidt 2011-11-18 21:44:55 UTC
(In reply to comment #7)
> The other alternative might be to simply add:
> 
> After=network.target multi-user.target
> 
> to rc-local.service.

The ordering to network.target we should add, because talking to the net seems to be a common use for rc.local. It's not likely to break anything.

Adding multi-user.target is less clear. It would cause an ordering cycle, because multi-user.target wants rc-local.service and target units are supposed to complement Wants= deps with After= deps. This is solvable using DefaultDependencies=no in rc-local.service.
But there's another reason why this could break stuff. Users may have legitimately added After=rc-local.service to their systemd-user-sessions.service to ensure rc.local is run before anyone logs in. There would be another ordering cycle.

Comment 13 vincent.santamaria 2011-11-18 22:42:34 UTC
Mr. Schmidt's two suggestions fixed two of my three systems. The one that is a physical machine and on a different subnet still reports 'Name or service not known' for any FQDN I don't have an entry for in /etc/hosts, and 'No route to host' for any I do. I re-enabled NetworkManager.service and enabled NetworkManager-wait-online.service, and set it to get its IP from DHCP, but something in the network configuration is still dragging.

Comment 14 Michal Schmidt 2011-11-19 01:07:11 UTC
(In reply to comment #12)
> The ordering to network.target we should add, because talking to the net seems
> to be a common use for rc.local. It's not likely to break anything.

Actually, rc-local.service acts as an ordering barrier even when /etc/rc.d/rc.local does not exist, so adding the ordering dependency would delay prefdm.service and a few other services for everyone.

Comment 15 Michal Schmidt 2011-12-02 10:38:59 UTC
Added After=network.target in upstream:
http://cgit.freedesktop.org/systemd/commit/?id=91b684c7300879a8d2006038f7d9185d92c3c3bf

Avoided the ordering barrier by default with:
http://cgit.freedesktop.org/systemd/commit/?id=156730831730701cada2750e826abbf7b113861f

Comment 16 Michal Schmidt 2012-01-05 10:38:37 UTC
*** Bug 771805 has been marked as a duplicate of this bug. ***

Comment 17 Michal Schmidt 2012-01-09 14:11:42 UTC
*** Bug 772407 has been marked as a duplicate of this bug. ***

Comment 18 Fedora Update System 2012-01-11 15:02:27 UTC
systemd-37-6.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/systemd-37-6.fc16

Comment 19 Fedora Update System 2012-01-11 20:58:01 UTC
Package systemd-37-6.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-6.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-6.fc16
then log in and leave karma (feedback).

Comment 20 Greg Scott 2012-01-12 04:49:38 UTC
Trying the fix right now. 

(Several minutes later . . .)  

It's not booting back again.  Before putting in the systemd fix above, a reboot cycle would take about 2 minutes.  Almost 20 minutes later, still not booted again.  That system is 20 miles away and the building is locked.  It's a firewall, so the whole customer network is offline to the Internet for the night.  Why do I do this to myself???????

Would this be a good place to mention my other startup problem, where boot.log sometimes ends up empty?  I put in a loop in the script my rc.local calls, waiting for the network to start with some echo statements marking progress.  The first time I tried it, /var/log/boot.log didn't show the changes.  I started to freak out until I noticed boot.log was two weeks old.  Renaming boot.log and rebooting - I got a current boot.log showing I could wait forever for the network to start.  After that, with every reboot and test cycle, I would rename the old boot.log.  About every other reboot cycle, boot.log would be zero length, so nothing from that boot would be recorded.  

- Greg Scott

Comment 21 Greg Scott 2012-01-12 18:22:59 UTC
The systemd fix worked.  My rebooting problem last night turned out to be a hardware issue.  Just my luck, a hardware problem hit me at the same time I put in the systemd update.  

I did notice /var/log/boot.log did not show any of my debugging output once I got it up and running. Here are the last few lines.

.
.
.
Started Xinetd A Powerful Replacement For Inetd
                                       [  OK  ]
Starting NFS file locking service....

Started NFS file locking service.
                                       [  OK  ]
Started /etc/rc.d/rc.local Compatibility
                                       [  OK  ]
Starting Wait for Plymouth Boot Screen to Quit...

Starting Display Manager...

Started Display Manager
                                       [  OK  ]

Looks like all the output from all the individual scripts goes somewhere else now?

Comment 22 Michal Schmidt 2012-01-13 09:01:16 UTC
(In reply to comment #21)
> Looks like all the output from all the individual scripts goes somewhere else
> now?

By default the messages (stdout, stderr) of services go to syslog. Check /var/log/messages.
This is actually more reliable than boot.log, which is generated by plymouth.

It works even for messages that are generated before rsyslogd itself is started! (You may have to set "$KLogPermitNonKernelFacility on" in rsyslog.conf to take full advantage of this.)

Comment 23 Fedora Update System 2012-01-16 02:25:36 UTC
Package systemd-37-7.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-7.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-7.fc16
then log in and leave karma (feedback).

Comment 24 Fedora Update System 2012-01-17 20:23:32 UTC
Package systemd-37-8.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-8.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-8.fc16
then log in and leave karma (feedback).

Comment 25 Fedora Update System 2012-01-22 22:54:49 UTC
Package systemd-37-10.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-10.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-10.fc16
then log in and leave karma (feedback).

Comment 26 Fedora Update System 2012-01-26 22:58:36 UTC
Package systemd-37-11.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-11.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-11.fc16
then log in and leave karma (feedback).

Comment 27 Fedora Update System 2012-01-30 20:59:23 UTC
systemd-37-11.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.