Created attachment 794807 [details] sos_reoprt_from_node Description of problem: host installation fails if host has multiple interfaces configured with DHCP IP default route changes during install - for me it changed from 10.34.66.254 (GW for interface em1) to 10.34.67.62 (GW for interface p1p1) Version-Release number of selected component (if applicable): is13 How reproducible: 100% Steps to Reproduce: 1. configure 2 interfaces on host (RHEL 6.4/6.5) with DHCP IP, each interface should have different gateway 2. add the host to rhevm Actual results: host install fails because of SSH session timeout Expected results: host is installed, default route not changed Additional info: engine.log 2013-09-06 16:28:58,562 ERROR [org.ovirt.engine.core.bll.InstallVdsCommand] (pool-5-thread-50) Host installation failed for host 3511bdeb-2759-4ff7-8f59-191dfe728f9d, dell-07.: javax.naming.TimeLimitExceededException: SSH session timeout host 'root.66.71' at org.ovirt.engine.core.utils.ssh.SSHClient.executeCommand(SSHClient.java:480) [utils.jar:] at org.ovirt.engine.core.utils.ssh.SSHDialog.executeCommand(SSHDialog.java:311) [utils.jar:] at org.ovirt.engine.core.bll.VdsDeploy.execute(VdsDeploy.java:1039) [bll.jar:] at org.ovirt.engine.core.bll.InstallVdsCommand.installHost(InstallVdsCommand.java:192) [bll.jar:] at org.ovirt.engine.core.bll.InstallVdsCommand.executeCommand(InstallVdsCommand.java:105) [bll.jar:] deploy log 2013-09-06 16:32:09 DEBUG otopi.context context._executeMethod:133 method exception Traceback (most recent call last): File "/tmp/ovirt-z2zXEXc3m6/pythonlib/otopi/context.py", line 123, in _executeMethod method['method']() File "/tmp/ovirt-z2zXEXc3m6/otopi-plugins/otopi/dialog/cli.py", line 162, in _pre_terminate note=_("\nProcessing ended, use 'quit' to quit\nCOMMAND> ") File "/tmp/ovirt-z2zXEXc3m6/otopi-plugins/otopi/dialog/cli.py", line 102, in _runCommandPrompt prompt=True, File "/tmp/ovirt-z2zXEXc3m6/otopi-plugins/otopi/dialog/machine.py", line 162, in queryString value = self._readline() File "/tmp/ovirt-z2zXEXc3m6/pythonlib/otopi/dialog.py", line 259, in _readline raise IOError(_('End of file')) IOError: End of file 2013-09-06 16:32:09 ERROR otopi.context context._executeMethod:142 Failed to execute stage 'Pre-termination': End of file
Created attachment 794808 [details] sos_reoprt_from_engine
Created attachment 794809 [details] deploy_log
Sorry, but I do not understand the bug and its logs. Would you restate the reproduction procedure? What is `ip a` and `ip route show table all` before the installation attempt? And when do they change? Could it be that the very same change happens with `service network restart` regardless of host deployment?
Now I see what happened. There is 4 interfaces on the machine (em1,em2,p1p1,p1p2), only em1 is supposed to be up after clean machine install and boot it is given by ONBOOT="yes" in ifcfg file for em1 and ONBOOT="no" for the rest of NICs from "ip a l" is visible that this really happened [root@dell-r210ii-07 ~]# ip a l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo 2: p1p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 90:e2:ba:04:29:88 brd ff:ff:ff:ff:ff:ff 3: p1p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 90:e2:ba:04:29:89 brd ff:ff:ff:ff:ff:ff 4: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether d0:67:e5:f0:82:44 brd ff:ff:ff:ff:ff:ff inet 10.34.66.71/24 brd 10.34.66.255 scope global em1 5: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether d0:67:e5:f0:82:45 brd ff:ff:ff:ff:ff:ff restart of network service does not change this state [root@dell-r210ii-07 ~]# service network restart Shutting down interface em1: [ OK ] Shutting down loopback interface: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface em1: Determining IP information for em1... done. [ OK ] [root@dell-r210ii-07 ~]# ip a l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo 2: p1p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 90:e2:ba:04:29:88 brd ff:ff:ff:ff:ff:ff 3: p1p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 90:e2:ba:04:29:89 brd ff:ff:ff:ff:ff:ff 4: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether d0:67:e5:f0:82:44 brd ff:ff:ff:ff:ff:ff inet 10.34.66.71/24 brd 10.34.66.255 scope global em1 5: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether d0:67:e5:f0:82:45 brd ff:ff:ff:ff:ff:ff in process of adding host to setup em2,p1p1,p1p2 are brought up 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo 2: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 90:e2:ba:04:29:88 brd ff:ff:ff:ff:ff:ff inet 10.34.67.36/27 brd 10.34.67.63 scope global p1p1 3: p1p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 90:e2:ba:04:29:89 brd ff:ff:ff:ff:ff:ff inet 10.34.67.33/27 brd 10.34.67.63 scope global p1p2 4: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether d0:67:e5:f0:82:44 brd ff:ff:ff:ff:ff:ff inet 10.34.66.71/24 brd 10.34.66.255 scope global em1 5: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether d0:67:e5:f0:82:45 brd ff:ff:ff:ff:ff:ff inet 10.34.67.3/27 brd 10.34.67.31 scope global em2 7: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether f6:53:46:fb:c2:58 brd ff:ff:ff:ff:ff:ff 8: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 9: bond4: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 10: bond1: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 11: bond2: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 12: bond3: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff and that changes the default route and prevents proper finish of host installation.
Created attachment 795523 [details] ip commands after host is added
Created attachment 795524 [details] ip commands before host is added
Created attachment 795525 [details] ifcfg files
Reproduced it on my machine. Working on figuring it out...
ovirt-host-deploy is running this line: "/sbin/udevadm trigger --type=devices" which is running ifup on all devices and messing up connectivity.
Either change ovirt-host-deploy to issue: "/sbin/udevadm trigger --action=change --type=devices" Or you get what you ask for with ACTION==add (which is the default for udevadm trigger). # cat /lib/udev/rules.d/60-net.rules ACTION=="add", SUBSYSTEM=="net", DEVPATH=="/devices/virtual/net/lo", RUN+="/sbin/ifup $env{INTERFACE}" ACTION=="add", SUBSYSTEM=="net", PROGRAM="/lib/udev/rename_device", RESULT=="?*", ENV{INTERFACE_NAME}="$result" SUBSYSTEM=="net", RUN+="/etc/sysconfig/network-scripts/net.hotplug" # rpm -qf /lib/udev/rules.d/60-net.rules initscripts-9.03.27-1.el6.i686 /etc/sysconfig/network-scripts/net.hotplug: export IN_HOTPLUG=1 exec /sbin/ifup $INTERFACE /etc/sysconfig/network-scripts/ifup: if [ -n "$IN_HOTPLUG" ] && [ "${HOTPLUG}" = "no" -o "${HOTPLUG}" = "NO" ] then exit 0 fi /usr/share/doc/initscripts-9.03.27/sysconfig.txt: Also, interfaces may be brought up via the hotplug scripts; in this case, HOTPLUG=no needs to be set to no to avoid this. This is useful e.g. to prevent bonding device activation by merely loading the bonding kernel module. /etc/sysconfig/network-scripts/ifcfg-<interface-name> and /etc/sysconfig/network-scripts/ifcfg-<interface-name>:<alias-name>: HOTPLUG=yes|no
per the recommendation of bug#1007476, which is not applicable for us, but is correct based on current initscripts implementation. commit ea1f74af86b6adb1affe8e91dde8308a0363f0f3 Author: Alon Bar-Lev <alonbl> Date: Fri Sep 13 10:12:28 2013 +0300 packaging: tune: iosched: modify udev trigger to change event rhel initscripts does not distinguish where udev even comes from, so triggering hotplug even in all cases. change udev trigger to issue change event, as current initscripts ignores this event. current targeted udev rules handle change events so uneffected. Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1005278 Change-Id: I626c30eb1f281cf14a753f657c9313144df1b0db Signed-off-by: Alon Bar-Lev <alonbl>
I would not mind the posted fix. But please note that the reporter's DHCP configuration is problematic with no regards to oVirt: if a host has two competing default gateways, the outcome is non-deterministic. To avoid one nic's gateway to override the other's, you should add DEFROUTE=no to at least one of the ifcfg files (before installation).
(In reply to Dan Kenigsberg from comment #12) > I would not mind the posted fix. But please note that the reporter's DHCP > configuration is problematic with no regards to oVirt: if a host has two > competing default gateways, the outcome is non-deterministic. > > To avoid one nic's gateway to override the other's, you should add > > DEFROUTE=no > > to at least one of the ifcfg files (before installation). Finally! proper root cause analysis! This means that this configuration is unusable anyway. Thanks!
Just to make it clear, the fix in host-dpeloy does not provide a fix for this issue, whenever this computer is rebooted, or one of the interfaces is plugged out/in there is a chance it become unresponsive. So I am closing this as NOTABUG, as this is invalid configuration.