Bug 1168388
Summary: | veth device goes down when ipv4 dhcp lease expires | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Vladimir Benes <vbenes> | ||||
Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | ||||
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.1 | CC: | bgalvani, danw, dcbw, jklimes, lrintel, thaller, tpelka, vbenes | ||||
Target Milestone: | rc | ||||||
Target Release: | 7.1 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Activation of connections with static addresses no longer fails when DHCP server does not respond.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-11-19 10:58:00 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Vladimir Benes
2014-11-26 18:58:45 UTC
At this point, I think that's expected. If DHCP fails on a configured interface, NetworkManager will fail that interface even if there was a static IP assigned in addition to the DHCP. I think we want to change that behavior (to not down the interface, and to also periodically retry DHCP), but that would be an enhancement. One thought: if you set "ipv4.may-fail=no", does that work around the problem? actually this worked in older 0.9.9.1-29 as instead of going down the device was overtaken by "Wired Connection X" so after these 5 minutes it went up again. now I can see: Dec 5 10:05:25 qe-dell-ovs5-vm-45 NetworkManager[13884]: <info> (test2): Activation: Stage 4 of 5 (IPv6 Configure Timeout) scheduled... Dec 5 10:05:25 qe-dell-ovs5-vm-45 NetworkManager[13884]: <info> (test2): Activation: Stage 4 of 5 (IPv6 Configure Timeout) started... Dec 5 10:05:25 qe-dell-ovs5-vm-45 NetworkManager[13884]: <info> (test2): Activation: Stage 4 of 5 (IPv6 Configure Timeout) complete. Dec 5 10:05:25 qe-dell-ovs5-vm-45 dhclient[15667]: DHCPREQUEST on test2 to 192.168.100.1 port 67 (xid=0x17ae6a2a) Dec 5 10:05:33 qe-dell-ovs5-vm-45 dhclient[15667]: DHCPREQUEST on test2 to 192.168.100.1 port 67 (xid=0x17ae6a2a) Dec 5 10:05:47 qe-dell-ovs5-vm-45 dhclient[15667]: DHCPREQUEST on test2 to 255.255.255.255 port 67 (xid=0x17ae6a2a) Dec 5 10:05:54 qe-dell-ovs5-vm-45 NetworkManager[13884]: <info> (test2): DHCPv4 state changed bound -> fail Dec 5 10:05:54 qe-dell-ovs5-vm-45 NetworkManager[13884]: <info> (test2): canceled DHCP transaction, DHCP client pid 15667 Dec 5 10:05:54 qe-dell-ovs5-vm-45 NetworkManager[13884]: <info> (test2): DHCPv4 state changed fail -> done Dec 5 10:05:54 qe-dell-ovs5-vm-45 NetworkManager[13884]: <info> (test2): device state change: activated -> failed (reason 'ip-config-expired') [100 120 6] Dec 5 10:05:54 qe-dell-ovs5-vm-45 NetworkManager[13884]: <warn> (test2): Activation: failed for connection 'tc2' Dec 5 10:05:54 qe-dell-ovs5-vm-45 NetworkManager[13884]: <info> (test2): device state change: failed -> disconnected (reason 'none') [120 30 0] Dec 5 10:05:54 qe-dell-ovs5-vm-45 NetworkManager[13884]: <info> (test2): deactivating device (reason 'none') [0] Dec 5 10:05:54 qe-dell-ovs5-vm-45 NetworkManager[13884]: <info> (test2): device state change: disconnected -> unmanaged (reason 'none') [30 10 0] this in logs and device going to unmanaged. This seems to be a regression from RHEL7.0 behavior. May-fail helped in older versions but is not helping here either. In addition to steps from comment #0 ip link add test1 type veth peer name test1p ip link add test2 type veth peer name test2p brctl addbr vethbr brctl addif vethbr test1p test2p ip link set dev test1 up ip link set dev test1p up ip link set dev test2 up ip link set dev test2p up nmcli connection add type ethernet con-name tc1 ifname test1 ip4 192.168.100.1/24 nmcli connection add type ethernet con-name tc2 ifname test2 service dhcpd start (config from https://bugzilla.redhat.com/show_bug.cgi?id=1139326#c0) nmcli con up id tc2 service dhcpd stop when lease is over wait some more time (120 s) to let NM to finish it's two tries service dhcpd start and after ~5 minutes tc2 should be upped again with gw and ip all set. This works in 0.9.9.1-29 but doesn't in 0.9.11.0-6 and I can see this after lease is over in older version: Dec 5 10:26:24 qe-dell-ovs5-vm-45 dhclient[16547]: DHCPDISCOVER on test2 to 255.255.255.255 port 67 interval 7 (xid=0x654d4b08) Dec 5 10:26:24 qe-dell-ovs5-vm-45 NetworkManager: DHCPDISCOVER on test2 to 255.255.255.255 port 67 interval 7 (xid=0x654d4b08) Dec 5 10:26:31 qe-dell-ovs5-vm-45 dhclient[16547]: DHCPDISCOVER on test2 to 255.255.255.255 port 67 interval 12 (xid=0x654d4b08) Dec 5 10:26:31 qe-dell-ovs5-vm-45 NetworkManager: DHCPDISCOVER on test2 to 255.255.255.255 port 67 interval 12 (xid=0x654d4b08) Dec 5 10:26:43 qe-dell-ovs5-vm-45 dhclient[16547]: DHCPDISCOVER on test2 to 255.255.255.255 port 67 interval 15 (xid=0x654d4b08) Dec 5 10:26:43 qe-dell-ovs5-vm-45 NetworkManager: DHCPDISCOVER on test2 to 255.255.255.255 port 67 interval 15 (xid=0x654d4b08) Dec 5 10:26:58 qe-dell-ovs5-vm-45 dhclient[16547]: DHCPDISCOVER on test2 to 255.255.255.255 port 67 interval 16 (xid=0x654d4b08) Dec 5 10:26:58 qe-dell-ovs5-vm-45 NetworkManager: DHCPDISCOVER on test2 to 255.255.255.255 port 67 interval 16 (xid=0x654d4b08) Dec 5 10:27:04 qe-dell-ovs5-vm-45 NetworkManager[15898]: <warn> (test2): DHCPv4 request timed out. Dec 5 10:27:04 qe-dell-ovs5-vm-45 NetworkManager[15898]: <info> (test2): canceled DHCP transaction, DHCP client pid 16547 Dec 5 10:27:04 qe-dell-ovs5-vm-45 NetworkManager[15898]: <info> Activation (test2) Stage 4 of 5 (IPv4 Configure Timeout) scheduled... Dec 5 10:27:04 qe-dell-ovs5-vm-45 NetworkManager[15898]: <info> Activation (test2) Stage 4 of 5 (IPv4 Configure Timeout) started... Dec 5 10:27:04 qe-dell-ovs5-vm-45 NetworkManager[15898]: <info> (test2): device state change: ip-config -> failed (reason 'ip-config-unavailable') [70 120 5] Dec 5 10:27:04 qe-dell-ovs5-vm-45 NetworkManager[15898]: <info> Disabling autoconnect for connection 'tc2'. Dec 5 10:27:04 qe-dell-ovs5-vm-45 NetworkManager[15898]: <warn> Activation (test2) failed for connection 'tc2' Dec 5 10:27:04 qe-dell-ovs5-vm-45 NetworkManager[15898]: <info> Activation (test2) Stage 4 of 5 (IPv4 Configure Timeout) complete. Dec 5 10:27:04 qe-dell-ovs5-vm-45 NetworkManager[15898]: <info> (test2): device state change: failed -> disconnected (reason 'none') [120 30 0] Dec 5 10:27:04 qe-dell-ovs5-vm-45 NetworkManager[15898]: <info> (test2): deactivating device (reason 'none') [0] Dec 5 10:27:04 qe-dell-ovs5-vm-45 avahi-daemon[542]: Withdrawing address record for fe80::28d0:5eff:fee1:f752 on test2. Created attachment 1053329 [details]
[PATCH] device: don't disconnect after DHCP failure when there are static IPs
Don't disconnect the device when the DHCP renewal fails and there are
already configured static IP addresses on the device. Instead, keep
the device up and try DHCP again after some time.
This should solve the issue reported in bug description. Tested for IPv4 only.
LGTM How about: && nm_ip4_config_get_num_addresses (priv->con_ip4_config) > 0) { - _LOGI (LOGD_DHCP4, "Scheduling DHCPv4 restart because device has IP addresses"); - priv->dhcp4_restart_id = g_timeout_add_seconds (120, dhcp4_restart_cb, self); + if (!priv->dhcp4_restart_id) { + _LOGI (LOGD_DHCP4, "Scheduling DHCPv4 restart because device has IP addresses"); + priv->dhcp4_restart_id = g_timeout_add_seconds (120, dhcp4_restart_cb, self); + } return; and same for IPv6. Also, what happens if the connection has ipvx.may-fail=yes? I think in that case we also should not tear down the connection -- but I don't see that that is happening... (In reply to Thomas Haller from comment #9) > How about: > > && nm_ip4_config_get_num_addresses (priv->con_ip4_config) > 0) { > - _LOGI (LOGD_DHCP4, "Scheduling DHCPv4 restart because device has > IP addresses"); > - priv->dhcp4_restart_id = g_timeout_add_seconds (120, > dhcp4_restart_cb, self); > + if (!priv->dhcp4_restart_id) { > + _LOGI (LOGD_DHCP4, "Scheduling DHCPv4 restart because device > has IP addresses"); > + priv->dhcp4_restart_id = g_timeout_add_seconds (120, > dhcp4_restart_cb, self); > + } > return; > > > and same for IPv6. This isn't required as priv->dhcp4_restart_id is always cleared some lines above in dhcpx_cleanup(). > Also, what happens if the connection has ipvx.may-fail=yes? I think in that > case we also should not tear down the connection -- but I don't see that > that is happening... If there are no static addresses, dhcpx_fail() schedules nm_device_activate_ipx_config_timeout() which in turn calls act_stage4_ipx_config_timeout() to set the new device state according to the 'may-fail' setting. If there are static addresses configured and DHCP fails, the value of ipvx.may-fail is not considered because at least the "static" method succeeded. Upstream bug https://bugzilla.gnome.org/show_bug.cgi?id=741347 contains a rework of IP configuration failures and includes a more general fix for this issue. Please review the branch posted there. Since the issue was blocking automated tests, I merged the attached patch. The other improvements mentioned in comment 11 are not so urgent and can be discussed separately in the upstream bug. master: abc96ec device: don't disconnect after DHCP failure when there are static IPs 905220b device: fix clearing of dhcp6_restart_id in dhcp6_cleanup() nm-1-0: 80b3081 device: don't disconnect after DHCP failure when there are static IPs eb1ccf9 device: fix clearing of dhcp6_restart_id in dhcp6_cleanup() Veth device doesn't go down and dhcp request is send out every 5 minutes. Tested on all supported architectures. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2315.html |