Bug 1426068

Summary: Provide fdi.dhcp_timeout kernel option and increase default value to 300 seconds
Product: Red Hat Satellite Reporter: Lukas Zapletal <lzap>
Component: Discovery ImageAssignee: Lukas Zapletal <lzap>
Status: CLOSED ERRATA QA Contact: Radovan Drazny <rdrazny>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2.7CC: bkearney, jcallaha, lzap, rdrazny
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
URL: http://projects.theforeman.org/issues/15195
Whiteboard:
Fixed In Version: foreman-discovery-image-3.4.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-21 16:54:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1426067    
Bug Blocks:    

Description Lukas Zapletal 2017-02-23 07:32:30 UTC
By default, NM spawns DHCP client with 45 seconds timeout. This small patch increases this to 300 second and adds configuration value to override this.

Several customers hit this already.

TRIAGE: Small patch, zero risk. Tested upstream.

To have this fix effective, FDI must be rebased onto RHEL 7.3 as well: https://bugzilla.redhat.com/show_bug.cgi?id=1426067

Comment 2 Satellite Program 2017-02-23 09:15:57 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/15195 has been resolved.

Comment 5 Lukas Zapletal 2017-06-28 10:29:48 UTC
6.3 (FDI 3.4)

Comment 6 Radovan Drazny 2017-12-29 21:40:10 UTC
Tested on foreman-discovery-image-3.4.1-3.
I suppose this fix should give the fdi to acquire an IP address on slow networks. I tried to run "Discover using DHCP" on a network withou a functioning DHCP server, expecting a 300s timeout. Checking the journalctl, the 300s timeout is actually configured for the dhclient. Despite this, the DHCP configuration always timed out exactly after 90s. Is this the expected behaviour?


Relevant journal:
Dec 29 21:24:14 fdi /usr/bin/discovery-menu[766]: Entering screen_info
Dec 29 21:24:14 fdi /usr/bin/discovery-menu[766]: TUI executing: systemctl stop foreman-proxy
Dec 29 21:24:14 fdi polkitd[819]: Registered Authentication Agent for unix-process:1002:1903 (system bus name :1.12 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_U
Dec 29 21:24:14 fdi polkitd[819]: Unregistered Authentication Agent for unix-process:1002:1903 (system bus name :1.12, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Dec 29 21:24:14 fdi /usr/bin/discovery-menu[766]: TUI executing: nm-configure primary '52:54:00:2f:25:59' ''
Dec 29 21:24:14 fdi NetworkManager[750]: <info>  [1514582654.8978] keyfile: new connection /etc/NetworkManager/system-connections/primary (9ec7c7a4-ecde-11e7-ad77-5254002f2559,"primary")
Dec 29 21:24:14 fdi /usr/bin/discovery-menu[766]: TUI executing: nmcli connection up primary
Dec 29 21:24:14 fdi NetworkManager[750]: <info>  [1514582654.8994] policy: auto-activating connection 'primary'
Dec 29 21:24:14 fdi NetworkManager[750]: <info>  [1514582654.9004] device (ens3): Activation: starting connection 'primary' (9ec7c7a4-ecde-11e7-ad77-5254002f2559)
Dec 29 21:24:14 fdi NetworkManager[750]: <info>  [1514582654.9007] device (ens3): state change: disconnected -> prepare (reason 'none') [30 40 0]
Dec 29 21:24:14 fdi NetworkManager[750]: <info>  [1514582654.9010] manager: NetworkManager state is now CONNECTING
Dec 29 21:24:14 fdi NetworkManager[750]: <info>  [1514582654.9021] device (ens3): state change: prepare -> config (reason 'none') [40 50 0]
Dec 29 21:24:14 fdi NetworkManager[750]: <info>  [1514582654.9208] device (ens3): state change: config -> ip-config (reason 'none') [50 70 0]
Dec 29 21:24:14 fdi NetworkManager[750]: <info>  [1514582654.9216] dhcp4 (ens3): activation: beginning transaction (timeout in 300 seconds)
Dec 29 21:24:14 fdi NetworkManager[750]: <info>  [1514582654.9317] dhcp4 (ens3): dhclient started with pid 1046
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0148] device (ens3): disconnecting connection 'primary' for new activation request
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0150] device (ens3): state change: ip-config -> deactivating (reason 'new-activation') [70 110 60]
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0156] manager: NetworkManager state is now DISCONNECTING
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0162] device (ens3): disconnecting for new activation request.
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0164] audit: op="connection-activate" uuid="9ec7c7a4-ecde-11e7-ad77-5254002f2559" name="primary" pid=1041 uid=0 result="success"
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0179] device (ens3): state change: deactivating -> disconnected (reason 'new-activation') [110 30 60]
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0374] dhcp4 (ens3): canceled DHCP transaction, DHCP client pid 1046
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0377] dhcp4 (ens3): state changed unknown -> done
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0389] manager: NetworkManager state is now DISCONNECTED
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0393] device (ens3): Activation: starting connection 'primary' (9ec7c7a4-ecde-11e7-ad77-5254002f2559)
Dec 29 21:24:15 fdi dbus[607]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Dec 29 21:24:15 fdi systemd[1]: Starting Network Manager Script Dispatcher Service...
Dec 29 21:24:15 fdi dbus-daemon[607]: dbus[607]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0436] device (ens3): state change: disconnected -> prepare (reason 'none') [30 40 0]
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0441] manager: NetworkManager state is now CONNECTING
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0455] device (ens3): state change: prepare -> config (reason 'none') [40 50 0]
Dec 29 21:24:15 fdi dbus[607]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Dec 29 21:24:15 fdi dbus-daemon[607]: dbus[607]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Dec 29 21:24:15 fdi systemd[1]: Started Network Manager Script Dispatcher Service.
Dec 29 21:24:15 fdi nm-dispatcher[1049]: req:1 'down' [ens3]: new request (3 scripts)
Dec 29 21:24:15 fdi nm-dispatcher[1049]: req:1 'down' [ens3]: start running ordered scripts...
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0645] device (ens3): state change: config -> ip-config (reason 'none') [50 70 0]
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0647] dhcp4 (ens3): activation: beginning transaction (timeout in 300 seconds)
Dec 29 21:24:15 fdi NetworkManager[750]: <info>  [1514582655.0661] dhcp4 (ens3): dhclient started with pid 1058
Dec 29 21:24:15 fdi dhclient[1058]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 8 (xid=0xedb86)
Dec 29 21:24:23 fdi dhclient[1058]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 15 (xid=0xedb86)
Dec 29 21:24:38 fdi dhclient[1058]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 9 (xid=0xedb86)
Dec 29 21:24:47 fdi dhclient[1058]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 9 (xid=0xedb86)
Dec 29 21:24:56 fdi dhclient[1058]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 16 (xid=0xedb86)
Dec 29 21:25:12 fdi dhclient[1058]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 20 (xid=0xedb86)
Dec 29 21:25:32 fdi dhclient[1058]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 10 (xid=0xedb86)
Dec 29 21:25:42 fdi dhclient[1058]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 14 (xid=0xedb86)
Dec 29 21:25:44 fdi /usr/bin/discovery-menu[766]: TUI executing: nm-online -s -q --timeout=45
Dec 29 21:25:56 fdi dhclient[1058]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 9 (xid=0xedb86)
Dec 29 21:26:05 fdi dhclient[1058]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 8 (xid=0xedb86)
Dec 29 21:26:13 fdi dhclient[1058]: DHCPDISCOVER on ens3 to 255.255.255.255 port 67 interval 11 (xid=0xedb86)
Dec 29 21:26:18 fdi /usr/bin/discovery-menu[766]: Entering screen_network

Comment 7 Lukas Zapletal 2018-01-01 11:06:41 UTC
Hey, this is indeed not expected. We are missing "--wait" option in nmcli up command.

Comment 9 Radovan Drazny 2018-01-02 14:35:39 UTC
Switching to ASSIGNED, for the protocol.

Comment 11 Lukas Zapletal 2018-01-12 09:26:37 UTC
QA: As part off this fix, we reviewed all timeout options and cleaned it a bit, introduced one new and properly documented all:

https://github.com/theforeman/theforeman.org/pull/1000/files

fdi.dhcp_timeout - DHCP NetworkManager timeout in seconds (300 by default)

fdi.ipwait - wait time for IP to be available in proxy SSL cert start (120 by default)

fdi.nmwait - nmcli --wait option for NetworkManager (120 by default)

DHCP timeout is for NetworkManager DHCP, ipwait is wait time until IPv4 becomes available (no support for IPv6 yet in discovery), nmwait is for nmcli utility (does not make any difference actually)

Comment 12 Radovan Drazny 2018-01-16 13:15:54 UTC
Tested on foreman-discovery-image-3.4.3-1. 
DHCP config times out after 300 seconds (5 minutes) as expected. 

VERIFIED

Comment 13 Bryan Kearney 2018-01-19 14:30:13 UTC
This will be delivered with Satellite 6.3

Comment 14 Lukas Zapletal 2018-01-29 16:01:46 UTC
Googlers or Satellite 6.2.x customers:

Please use this build to workaround the issue:

http://people.redhat.com/~lzapleta/scratch/fdi/foreman-discovery-image-3.4.4-1.el7sat.noarch.rpm

Comment 15 Satellite Program 2018-02-21 16:54:37 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> > 
> > For information on the advisory, and where to find the updated files, follow the link below.
> > 
> > If the solution does not work for you, open a new bug report.
> > 
> > https://access.redhat.com/errata/RHSA-2018:0336