Bug 1637275 - radiusd - Fails to start at boot with VLAN interfaces
Summary: radiusd - Fails to start at boot with VLAN interfaces
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: freeradius
Version: 28
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Alex Scheel
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-09 03:49 UTC by Ben Herrick
Modified: 2019-01-15 02:33 UTC (History)
4 users (show)

Fixed In Version: freeradius-3.0.17-2.fc28 freeradius-3.0.17-2.fc29
Clone Of:
Environment:
Last Closed: 2019-01-15 01:53:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Ben Herrick 2018-10-09 03:49:41 UTC
Description of problem:
I have multiple VLAN interfaces defined with radiusd listening on one of the VLAN interfaces. This only happens on boot. Once booted, radiusd will start normally with a 'systemctl start radiusd'. It appears the radiusd unit file needs to be wait on the start of VLAN interfaces.

Errors from journalctl:
Oct 09 03:09:02 junction1.den1.tauran.net systemd[1]: radiusd.service: Failed with result 'exit-code'.
Oct 09 03:09:02 junction1.den1.tauran.net systemd[1]: Failed to start FreeRADIUS high performance RADIUS server..

Errors from /var/log/radius/radius.log:
Tue Oct  9 03:09:02 2018 : Error: Failed binding to auth address 192.168.254.1 port 1812 bound to server default: Cannot assign requested address 
Tue Oct  9 03:09:02 2018 : Error: /etc/raddb/sites-enabled/default[59]: Error binding to port for 192.168.254.1 port 1812

Version-Release number of selected component (if applicable):
freeradius-3.0.15-12.fc28.x86_64

How reproducible:
Every reboot.

Steps to Reproduce:
1. Reboot
2. Watch radiusd fail
3. Start radiusd manually

Actual results:
radiusd fails

Expected results:
radiusd starts automatically

Additional info:
Yes, radiusd is enabled, see below.
Loaded: loaded (/usr/lib/systemd/system/radiusd.service; enabled; vendor preset: disabled)

Comment 1 Ben Herrick 2018-11-30 05:13:18 UTC
This appears to be a problem with the "After=" line for the systemd unit file.

The unit should start "After=network-online.target".

It should read as follows:
After=syslog.target network-online.target ipa.service dirsrv.target krb5kdc.service

Comment 2 Alex Scheel 2018-11-30 16:10:12 UTC
I'm slightly hesitant to add the dependency on the latter three services (ipa.service dirsrv.target krb5kdc.service) as, if I understand correctly, it is possible to run FreeRADIUS without either of these three services, though it definitely depends on configuration.

syslog.target seems fine though, but I'll need to look into that more.


Thoughts?

Comment 3 Ben Herrick 2018-11-30 17:14:18 UTC
ipa, dirsrv, and krb5kdc are in the unit file today. I modified the line from the existing unit file in Fedora 28.

Comment 4 Alex Scheel 2018-11-30 17:44:00 UTC
Ah ah, I misunderstood your comment. My mistake. I'll go ahead and add the network-online.target. 



I thought you said that the current was:
After=network-online.target

And you proposed changing it to:
After=syslog.target network-online.target ipa.service dirsrv.target krb5kdc.service




I'll take a look at where I should fix this and push it out as appropriate. :)

Comment 5 Alex Scheel 2018-12-12 18:09:19 UTC
So looking at this again, I'm thinking I'll make a documentation change and leave it at that.

The issue with network-online.target vs. network.target is that network-online.target will block/hang boot IF the network cannot be started (e.g., misconfiguration). If someone has RADIUS enabled on boot (probably fairly common in server environments) and we change this to network-online.target from network.target, when there's a misconfiguration (again, more common than we'd care to admit), boot will stall for a while, increasing the time to fix. That's why systemd prefers services depend on network.target, not network-online.target. A downside as you mentioned is vlan configuration not working, which seems to require network-online.target.

This also seems configuration specific and something that should be configured on a per-deployment basis, not globally at the package level.



My 2c.


- Alex

Comment 6 Ben Herrick 2018-12-13 04:16:17 UTC
(In reply to Alex Scheel from comment #5)
> So looking at this again, I'm thinking I'll make a documentation change and
> leave it at that.
> 
> The issue with network-online.target vs. network.target is that
> network-online.target will block/hang boot IF the network cannot be started
> (e.g., misconfiguration). If someone has RADIUS enabled on boot (probably
> fairly common in server environments) and we change this to
> network-online.target from network.target, when there's a misconfiguration
> (again, more common than we'd care to admit), boot will stall for a while,
> increasing the time to fix. That's why systemd prefers services depend on
> network.target, not network-online.target. A downside as you mentioned is
> vlan configuration not working, which seems to require network-online.target.
> 
> This also seems configuration specific and something that should be
> configured on a per-deployment basis, not globally at the package level.
> 
> 
> 
> My 2c.
> 
> 
> - Alex

Respectfully I have to disagree. It doesn't make sense to me that we should prevent this service from starting up properly because "something else is misconfigured". How does it help for radiusd to start when the network isn't working properly in the first place?

While I agree that running radiusd on a VLAN interface is probably a very small corner case, and it's trivial for me to keep the unit file modified to support my setup, it's my belief that each service should be idempotent outside of its dependencies and should start automatically assuming said dependencies are met.

My 2c. :)

-Ben

Comment 7 Alex Scheel 2018-12-13 15:59:58 UTC
Playing around with a VM, I guess my reading of https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ was incorrect. 

I thought that depending on network-online.target would cause the system to stall *when the network is misconfigured*, but I guess it appears to start normally, just with all of the services disabled. 

Perhaps I'm remembering the upstart days when Debian/Ubuntu would hang forever if you had a mistake in /etc/network/interfaces... :)



I'm still not convinced that you'd need to depend on network-online in every scenario (binding only to localhost would be one -- if you're responding to RADIUS requests only locally for instance. This'd happen if you're using RADIUS for OTP responses where the RADIUS protocol isn't sufficiently secured, so you're forced to proxy traffic over, e.g., ipsec). Another example is binding to 0.0.0.0 instead of an interface IP address.


Point being, "dependencies are met" is the hard part -- for some people, that is just "network.target". For others, it is "network-online.target". Detecting which and depending on the correct one -- in the systemd unit file no less -- seems non-trivial... My 2c.



Out of curiosity, it looks like you're binding directly to the VLANed interface's IP address. Could you try binding to 0.0.0.0 and seeing if the RADIUS server will start with "network.target" and respond to RADIUS requests on VLAN after the VLAN is up?


- A

Comment 8 Ben Herrick 2018-12-14 03:10:09 UTC
(In reply to Alex Scheel from comment #7)
> Playing around with a VM, I guess my reading of
> https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ was
> incorrect. 
> 
> I thought that depending on network-online.target would cause the system to
> stall *when the network is misconfigured*, but I guess it appears to start
> normally, just with all of the services disabled. 
> 
> Perhaps I'm remembering the upstart days when Debian/Ubuntu would hang
> forever if you had a mistake in /etc/network/interfaces... :)
> 
> 
> 
> I'm still not convinced that you'd need to depend on network-online in every
> scenario (binding only to localhost would be one -- if you're responding to
> RADIUS requests only locally for instance. This'd happen if you're using
> RADIUS for OTP responses where the RADIUS protocol isn't sufficiently
> secured, so you're forced to proxy traffic over, e.g., ipsec). Another
> example is binding to 0.0.0.0 instead of an interface IP address.
> 
> 
> Point being, "dependencies are met" is the hard part -- for some people,
> that is just "network.target". For others, it is "network-online.target".
> Detecting which and depending on the correct one -- in the systemd unit file
> no less -- seems non-trivial... My 2c.
> 
> 
> 
> Out of curiosity, it looks like you're binding directly to the VLANed
> interface's IP address. Could you try binding to 0.0.0.0 and seeing if the
> RADIUS server will start with "network.target" and respond to RADIUS
> requests on VLAN after the VLAN is up?
> 
> 
> - A

Alex,

Changing the binding to 0.0.0.0 does indeed work, but just shifts the work of limiting access to firewalld. This is probably okay, but I still think it's a bug in the radiusd unit file.

From reading the systemd description in the link above, I'm really not seeing a downside to just moving to network-online.target. Perhaps the "most right" solution would be to have FreeRadius adopt the IP_FREEBIND setsockopt, but how many instances of FreeRadius are running on machines where the network interface comes and goes?

IMHO, any daemon which can be configured to bind to a specific interface or IP address should Want or After network-online.target. Otherwise it could fail to start, even though its config is entirely valid.

I suspect others agree with me as well, here's the list of other services which mention network-online.target in their unit file:

(pts/0)[xxxx] juncxxx:/home/xxxx > grep -c network-online.target /usr/lib/systemd/system/*.service | grep -v ':0'
/usr/lib/systemd/system/auditd.service:1
/usr/lib/systemd/system/chrony-dnssrv@.service:2
/usr/lib/systemd/system/cockpit-motd.service:2
/usr/lib/systemd/system/dhcpd6.service:2
/usr/lib/systemd/system/dhcpd.service:2
/usr/lib/systemd/system/kdump.service:1
/usr/lib/systemd/system/NetworkManager-wait-online.service:2
/usr/lib/systemd/system/nfs-lock.service:2
/usr/lib/systemd/system/nfs-mountd.service:2
/usr/lib/systemd/system/nfs-server.service:2
/usr/lib/systemd/system/nfs.service:2
/usr/lib/systemd/system/rpc-statd-notify.service:2
/usr/lib/systemd/system/rpc-statd.service:2
/usr/lib/systemd/system/systemd-networkd.service:2
/usr/lib/systemd/system/systemd-networkd-wait-online.service:2

I hope this help.

-Ben

Comment 9 Alex Scheel 2018-12-14 20:11:45 UTC
This has been added upstream here:

https://github.com/FreeRADIUS/freeradius-server/commit/ed76dd96b5df1dc553a79508b398dcc163244a0d


I'll rebase F29 and F28 off of v3.0.17 + this patch. 



- Alex

Comment 10 Alex Scheel 2018-12-15 00:51:09 UTC
Feel free to test the update from here and give feedback:

https://bodhi.fedoraproject.org/updates/FEDORA-2018-1bc4a63a4f

Note that this also bumps F28 from 3.0.15 to 3.0.17. 

- Alex

Comment 11 Fedora Update System 2018-12-16 03:17:29 UTC
freeradius-3.0.17-2.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-1bc4a63a4f

Comment 12 Fedora Update System 2018-12-16 03:57:48 UTC
freeradius-3.0.17-2.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-bcf7fd8277

Comment 13 Fedora Update System 2019-01-15 01:53:08 UTC
freeradius-3.0.17-2.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 14 Fedora Update System 2019-01-15 02:33:06 UTC
freeradius-3.0.17-2.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.