Bug 1262439 - dhcrelay doesn't forward from VM until restarted [NEEDINFO]
dhcrelay doesn't forward from VM until restarted
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.2
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Laine Stump
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-11 12:47 EDT by Nicholas Nachefski
Modified: 2016-08-04 13:03 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-04 13:03:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
laine: needinfo? (nick)
laine: needinfo? (nick)


Attachments (Terms of Use)
systemd-analyze_plot.svg (88.76 KB, application/octet-stream)
2016-04-08 03:29 EDT, Petr Janda
no flags Details

  None (edit)
Description Nicholas Nachefski 2015-09-11 12:47:47 EDT
Description of problem:

dhcrelay doesnt forward requests until it is restarted.  When i start my system, it is enabled and starts, however, does nothing (does not forward dhcp requests to server).  When i 'systemctl restart dhcrelay', then it starts working.



[root@hv3 ~]# strace -p 1031
Process 1031 attached
select(22, [4 5 6], [], NULL, NULL
)     = 1 (in [6])
recvfrom(6, "\1\1\6\0Io\263\r\0\33\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0RT\0("..., 1540, 0, {sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")}, [16]) = 300
select(22, [4 5 6], [], NULL, NULL)     = 1 (in [6])
recvfrom(6, "\1\1\6\0Io\263\r\0*\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0RT\0("..., 1540, 0, {sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")}, [16]) = 300
select(22, [4 5 6], [], NULL, NULL)     = 1 (in [6])
recvfrom(6, "\1\1\6\0\336wKE\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0RT\0("..., 1540, 0, {sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")}, [16]) = 300
.
.
.
<detached ...>


[root@hv3 ~]# systemctl restart dhcrelay


[root@hv3 ~]# ps -aef |grep dhcrelay
root      2873     1  0 11:39 ?        00:00:00 /usr/sbin/dhcrelay -d --no-pid 192.168.0.254
root      2875  2474  0 11:39 pts/0    00:00:00 grep --color=auto dhcrelay


[root@hv3 ~]# strace -p 2873
Process 2873 attached
select(22, [4 5 6 7 8], [], NULL, NULL) = 3 (in [4 5 8])
recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"\377\377\377\377\377\377RT\0(g\t\10\0E\20\1H\0\0\0\0\200\0219\226\0\0\0\0\377\377"..., 1536}], msg_controllen=36, {cmsg_len=36, cmsg_level=SOL_PACKET, cmsg_type=, ...}, msg_flags=0}, 0) = 342
sendto(3, "<30>Sep 11 11:39:34 dhcrelay: Di"..., 110, MSG_NOSIGNAL, NULL, 0) = 110
write(2, "Discarding packet received on vn"..., 80) = 80
write(2, "\n", 1)                       = 1
recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=
.
.
.
.

write(2, "\n", 1)                       = 1
recvfrom(8, 0\0\3\300\250"..., 1536}], msg_controllen=36, {cmsg_len=36, cmsg_level=SOL_PACKET, cmsg_type=, ...}, msg_flags=0}, 0) = 342
select(22, [4 5 6 7 8], [], NULL, NULL) = 2 (in [6 8])
recvmsg(6, {msg_name(0)=NULL, msg_iov(1)=[{"\0\24\321*G\305\354\250k\371\377\263\10\0E\0\1H\353\244@\0@\21\310\260\300\250\0\376\300\250"..., 1536}], msg_controllen=36, {cmsg_len=36, cmsg_level=SOL_PACKET, cmsg_type=, ...}, msg_flags=0}, 0) = 342
write(5, "RT\0(g\tRT\0\264T\255\10\0E\20\1H\0\0\0\0\200\21\261\274\300\250\3\1\300\250"..., 342) = 342
write(2, "Forwarded BOOTREPLY for 52:54:00"..., 58) = 58
write(2, "\n", 1)                       = 1
recvfrom(8, "\2\1\6\1\366\353<O\0\0\0\0\0\0\0\0\300\250\3\207\0\0\0\0\300\250\3\1RT\0("...,  <detached ...>


[root@hv3 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.1 (Maipo)

[root@hv3 ~]# rpm -qa |grep dhc
dhcp-libs-4.2.5-36.el7.x86_64
dhclient-4.2.5-36.el7.x86_64
dhcp-common-4.2.5-36.el7.x86_64
dhcp-4.2.5-36.el7.x86_64


Possibly a problem with it's systemd unit file?  ie, dependencies not structured properly
Comment 2 Jiri Popelka 2015-09-15 08:20:39 EDT
(In reply to Nicholas Nachefski from comment #0)
> dhcrelay doesnt forward requests until it is restarted.  When i start my
> system, it is enabled and starts, however, does nothing (does not forward
> dhcp requests to server).  When i 'systemctl restart dhcrelay', then it
> starts working.

Hmm, I haven't been able to reproduce the issue in VM.

> Possibly a problem with it's systemd unit file?  ie, dependencies not
> structured properly

We recently changed
After=network.target
to
Wants=network-online.target
After=network-online.target
in dhcrelay.service due to bug #1145832 and I have no idea what else could be a culprit.
Comment 3 Nicholas Nachefski 2016-02-24 15:10:04 EST
This is still a problem.  And, if you search for 'dhcrelay not starting' in google you get a ton of hits about this from other BZ sites.  I think this should be looked at a little harder.  I'm running F23 (updated) and this is still happening as of today.

I think i just found the problem.  It's an ordering problem with the systemd unit file.  I need this dhcrelay to proxy DHCP request from the VM network to my IPA server (which is also running dhcpd).  When dhcrelay starts, virbr0 doesnt exist yet.  However, once the box is fully booted, virbr0 *does* exist and when i manually restart dhcrelay it works fine.  If you set '-i virbr0' in the unit file (explicitly setting the interface) dhrelay wont start *at all*.

Please take a harder look as this is apparently a PITA for everyone.

-Nick
Comment 4 Jiri Popelka 2016-02-25 03:44:43 EST
Does it help if you

cp /usr/lib/systemd/system/dhcrelay.service /etc/systemd/system/dhcrelay.service

and add
Wants=libvirtd.service
After=libvirtd.service
to [Unit] in /etc/systemd/system/dhcrelay.service ?
Comment 5 Nicholas Nachefski 2016-03-05 14:17:37 EST
Sorry for not getting back sooner.  This fixed my problem.

THANKS!

-Nick
Comment 6 Nicholas Nachefski 2016-03-05 14:20:05 EST
this fixed the issue on both my RHEL72 and F23 hyper visors.  Thanks again!
Comment 7 Petr Janda 2016-04-06 11:48:33 EDT
reproduced on RHEL-7.2 x86_64 Server
Comment 9 Nicholas Nachefski 2016-04-06 11:57:54 EDT
I spoke too soon when i said that this problem was fixed.  It is *not* fixed in my lab environment, the problem persists.  

It's worth noting that if you specify the virbr0 interface for the relay it wont even start at all.

-Nick
Comment 10 Petr Janda 2016-04-08 03:29 EDT
Created attachment 1145038 [details]
systemd-analyze_plot.svg

I can confirm that proposed changes in unit file do not fix problem. Attaching systemd-analyze plot  output.
Comment 11 Jiri Popelka 2016-04-08 12:22:48 EDT
So libvirtd service is considered running even its virtual bridges are not properly configured yet.

In case of dhcpd and NetworkManager managed interfaces we have 2 mechanisms how to mitigate such scenario: network-online.target and NM dispatcher script, but none of them can be applied here as the libvirtd's virtual bridges are not NM managed (AFAIK).

I'm at my wit's end here so I guess we can only ask libvirt folks if they have any idea how to properly sort the dependencies in a service file if a service needs (during start) the libvirt's virtual bridges being configured.

Reassigning to libvirt so the above question gets some attention. Thanks.
Comment 12 Jaroslav Suchanek 2016-04-29 08:17:46 EDT
Laine, is there anything what can libvirt do about this? I guess the daemon returns immediately and does not wait for network initialization finish.
Comment 13 Laine Stump 2016-06-01 14:17:17 EDT
It can take several seconds for each libvirt virtual network to start, and some hosts have a *lot* of networks. On top of that, the whole point of libvirt's virtual networks is to have a single point of configuration for a simple bridge+nat+dhcp+dns setup. If you're going to be doing parts of that yourself, you can just as well setup the bridge in the host system configuration (using NM, or directly creating ifcfg files).

I don't think it's a good idea to serialize host system startup in order to benefit the small minority of users who want to set up socket listeners on libvirt's networks. So there's nothing that libvirt can (should) do *by default* for this situation.

However, I was looking at the example of /usr/lib/systemd/Network-Manager-wait-online.service, and it seems like it would be reasonable to add a similar service that executed a script that would retrieve the device name for a given network, then wait in a loop until that device had an ip address. I'll attach such a script as soon as I'm done typing this comment.

Here's a servicefile that I *think* will prevent dhcrelay.service from starting until this new service (I called it libvirt-default-network-wait-online.service) is completed:

[Unit]
Description=Wait for libvirt's default network to be fully operational
Requisite=libvirtd.service
After=libvirtd.service
# Before=dhcrelay.service
# Either uncomment the next above, or add
#"After=libvirt-default-network-wait-online.service"
# to dhcrelay.service

[Service]
Type=oneshot
ExecStart=/usr/libexec/libvirt-net-wait-online.sh default

[Install]
# I don't know if this is needed or not :-)
# WantedBy=dhcrelay.service

I'm no systemd afficianado, but this is what I came up with in about 10 minutes of searching for "systemd network-online.target" and looking at related descriptions and an hour or so of playing around. I'm not 100% if it does what is needed, but it seems to. If you could give it a try I would appreciate it. Possibly we can add a service like this to the package, leaving it disabled until somebody needs it.

Petr - how do I generate the "systemd analyze plot"? That looks very useful!
Comment 14 Laine Stump 2016-06-15 14:19:35 EDT
("systemd-analyze plot". Duh :-)
Comment 15 Laine Stump 2016-06-29 14:38:57 EDT
Nick - if you can try out my suggestion from Comment 13 and give feedback, possibly we can include an example .service file in the libvirt package. Otherwise I guess this BZ should be closed.
Comment 16 Laine Stump 2016-08-04 13:03:02 EDT
No response to my request, so I'm closing this. Re-open if my suggested systemd service file works and you think it should be included in libvirt's examples.

Note You need to log in before you can comment on or make changes to this bug.