Bug 1005278 - [RHEVM] host installation fails if host has multiple interfaces configured with DHCP IP (default route changes during install)
Summary: [RHEVM] host installation fails if host has multiple interfaces configured wi...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-host-deploy
Classification: oVirt
Component: Plugins.VDSM
Version: master
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: ---
Assignee: Alon Bar-Lev
QA Contact: Haim
URL:
Whiteboard: network
Depends On: 1007476
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-06 14:56 UTC by Martin Pavlik
Modified: 2016-02-10 19:47 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-16 22:53:31 UTC
oVirt Team: Network
Embargoed:
mpavlik: devel_ack?


Attachments (Terms of Use)
sos_reoprt_from_node (5.05 MB, application/x-xz)
2013-09-06 14:56 UTC, Martin Pavlik
no flags Details
sos_reoprt_from_engine (5.27 MB, application/x-xz)
2013-09-06 14:58 UTC, Martin Pavlik
no flags Details
deploy_log (35.77 KB, application/x-compressed-tar)
2013-09-06 14:59 UTC, Martin Pavlik
no flags Details
ip commands after host is added (5.20 KB, text/plain)
2013-09-09 08:16 UTC, Martin Pavlik
no flags Details
ip commands before host is added (5.88 KB, text/plain)
2013-09-09 08:16 UTC, Martin Pavlik
no flags Details
ifcfg files (462 bytes, application/gzip)
2013-09-09 08:16 UTC, Martin Pavlik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 19206 0 None MERGED packaging: tune: iosched: modify udev trigger to change event 2020-02-10 19:06:57 UTC

Description Martin Pavlik 2013-09-06 14:56:13 UTC
Created attachment 794807 [details]
sos_reoprt_from_node

Description of problem:

host installation fails if host has multiple interfaces configured with DHCP IP default route changes during install - for me it changed from 10.34.66.254 (GW for interface em1) to 10.34.67.62 (GW for interface p1p1)

Version-Release number of selected component (if applicable):
is13

How reproducible:
100%

Steps to Reproduce:
1. configure 2 interfaces on host (RHEL 6.4/6.5) with DHCP IP, each interface should have different gateway
2. add the host to rhevm


Actual results:
host install fails because of SSH session timeout

Expected results:
host is installed, default route not changed

Additional info:
engine.log
2013-09-06 16:28:58,562 ERROR [org.ovirt.engine.core.bll.InstallVdsCommand] (pool-5-thread-50) Host installation failed for host 3511bdeb-2759-4ff7-8f59-191dfe728f9d, dell-07.: javax.naming.TimeLimitExceededException: SSH session timeout host 'root.66.71'
        at org.ovirt.engine.core.utils.ssh.SSHClient.executeCommand(SSHClient.java:480) [utils.jar:]
        at org.ovirt.engine.core.utils.ssh.SSHDialog.executeCommand(SSHDialog.java:311) [utils.jar:]
        at org.ovirt.engine.core.bll.VdsDeploy.execute(VdsDeploy.java:1039) [bll.jar:]
        at org.ovirt.engine.core.bll.InstallVdsCommand.installHost(InstallVdsCommand.java:192) [bll.jar:]
        at org.ovirt.engine.core.bll.InstallVdsCommand.executeCommand(InstallVdsCommand.java:105) [bll.jar:]




deploy log

2013-09-06 16:32:09 DEBUG otopi.context context._executeMethod:133 method exception
Traceback (most recent call last):
  File "/tmp/ovirt-z2zXEXc3m6/pythonlib/otopi/context.py", line 123, in _executeMethod
    method['method']()
  File "/tmp/ovirt-z2zXEXc3m6/otopi-plugins/otopi/dialog/cli.py", line 162, in _pre_terminate
    note=_("\nProcessing ended, use 'quit' to quit\nCOMMAND> ")
  File "/tmp/ovirt-z2zXEXc3m6/otopi-plugins/otopi/dialog/cli.py", line 102, in _runCommandPrompt
    prompt=True,
  File "/tmp/ovirt-z2zXEXc3m6/otopi-plugins/otopi/dialog/machine.py", line 162, in queryString
    value = self._readline()
  File "/tmp/ovirt-z2zXEXc3m6/pythonlib/otopi/dialog.py", line 259, in _readline
    raise IOError(_('End of file'))
IOError: End of file
2013-09-06 16:32:09 ERROR otopi.context context._executeMethod:142 Failed to execute stage 'Pre-termination': End of file

Comment 1 Martin Pavlik 2013-09-06 14:58:35 UTC
Created attachment 794808 [details]
sos_reoprt_from_engine

Comment 2 Martin Pavlik 2013-09-06 14:59:02 UTC
Created attachment 794809 [details]
deploy_log

Comment 3 Dan Kenigsberg 2013-09-07 23:07:50 UTC
Sorry, but I do not understand the bug and its logs. Would you restate the reproduction procedure? What is `ip a` and `ip route show table all` before the installation attempt? And when do they change?

Could it be that the very same change happens with `service network restart` regardless of host deployment?

Comment 4 Martin Pavlik 2013-09-09 08:14:43 UTC
Now I see what happened.

There is 4 interfaces on the machine (em1,em2,p1p1,p1p2), only em1 is supposed to be up after clean machine install and boot

it is given by ONBOOT="yes" in ifcfg file for em1 and ONBOOT="no" for the rest of NICs

from "ip a l" is visible that this really happened

[root@dell-r210ii-07 ~]# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: p1p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 90:e2:ba:04:29:88 brd ff:ff:ff:ff:ff:ff
3: p1p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 90:e2:ba:04:29:89 brd ff:ff:ff:ff:ff:ff
4: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether d0:67:e5:f0:82:44 brd ff:ff:ff:ff:ff:ff
    inet 10.34.66.71/24 brd 10.34.66.255 scope global em1
5: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether d0:67:e5:f0:82:45 brd ff:ff:ff:ff:ff:ff


restart of network service does not change this state


[root@dell-r210ii-07 ~]# service network restart
Shutting down interface em1:                               [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface em1:  
Determining IP information for em1... done.
                                                           [  OK  ]

[root@dell-r210ii-07 ~]# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: p1p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 90:e2:ba:04:29:88 brd ff:ff:ff:ff:ff:ff
3: p1p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 90:e2:ba:04:29:89 brd ff:ff:ff:ff:ff:ff
4: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether d0:67:e5:f0:82:44 brd ff:ff:ff:ff:ff:ff
    inet 10.34.66.71/24 brd 10.34.66.255 scope global em1
5: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether d0:67:e5:f0:82:45 brd ff:ff:ff:ff:ff:ff


in process of adding host to setup em2,p1p1,p1p2 are brought up

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 90:e2:ba:04:29:88 brd ff:ff:ff:ff:ff:ff
    inet 10.34.67.36/27 brd 10.34.67.63 scope global p1p1
3: p1p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 90:e2:ba:04:29:89 brd ff:ff:ff:ff:ff:ff
    inet 10.34.67.33/27 brd 10.34.67.63 scope global p1p2
4: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether d0:67:e5:f0:82:44 brd ff:ff:ff:ff:ff:ff
    inet 10.34.66.71/24 brd 10.34.66.255 scope global em1
5: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether d0:67:e5:f0:82:45 brd ff:ff:ff:ff:ff:ff
    inet 10.34.67.3/27 brd 10.34.67.31 scope global em2
7: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
    link/ether f6:53:46:fb:c2:58 brd ff:ff:ff:ff:ff:ff
8: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN 
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
9: bond4: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN 
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
10: bond1: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN 
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
11: bond2: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN 
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
12: bond3: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN 
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff


and that changes the default route and prevents proper finish of host installation.

Comment 5 Martin Pavlik 2013-09-09 08:16:06 UTC
Created attachment 795523 [details]
ip commands after host is added

Comment 6 Martin Pavlik 2013-09-09 08:16:32 UTC
Created attachment 795524 [details]
ip commands before host is added

Comment 7 Martin Pavlik 2013-09-09 08:16:54 UTC
Created attachment 795525 [details]
ifcfg files

Comment 8 Assaf Muller 2013-09-09 15:50:09 UTC
Reproduced it on my machine. Working on figuring it out...

Comment 9 Assaf Muller 2013-09-11 13:02:10 UTC
ovirt-host-deploy is running this line: "/sbin/udevadm trigger --type=devices" which is running ifup on all devices and messing up connectivity.

Comment 10 Harald Hoyer 2013-09-12 08:33:58 UTC
Either change ovirt-host-deploy to issue:
 "/sbin/udevadm trigger --action=change --type=devices" 

Or you get what you ask for with ACTION==add (which is the default for udevadm trigger).


# cat /lib/udev/rules.d/60-net.rules 
ACTION=="add", SUBSYSTEM=="net", DEVPATH=="/devices/virtual/net/lo", RUN+="/sbin/ifup $env{INTERFACE}"
ACTION=="add", SUBSYSTEM=="net", PROGRAM="/lib/udev/rename_device", RESULT=="?*", ENV{INTERFACE_NAME}="$result"
SUBSYSTEM=="net", RUN+="/etc/sysconfig/network-scripts/net.hotplug"

# rpm -qf /lib/udev/rules.d/60-net.rules 
initscripts-9.03.27-1.el6.i686

/etc/sysconfig/network-scripts/net.hotplug:

            export IN_HOTPLUG=1
            exec /sbin/ifup $INTERFACE

/etc/sysconfig/network-scripts/ifup:
if [ -n "$IN_HOTPLUG" ] && [ "${HOTPLUG}" = "no" -o "${HOTPLUG}" = "NO" ]
then
    exit 0
fi

/usr/share/doc/initscripts-9.03.27/sysconfig.txt:
  Also, interfaces may be brought up via the hotplug scripts;
  in this case, HOTPLUG=no needs to be set to no to avoid this.
  This is useful e.g. to prevent bonding device activation by merely
  loading the bonding kernel module.

/etc/sysconfig/network-scripts/ifcfg-<interface-name> and
/etc/sysconfig/network-scripts/ifcfg-<interface-name>:<alias-name>:
    HOTPLUG=yes|no

Comment 11 Alon Bar-Lev 2013-09-13 07:38:29 UTC
per the recommendation of bug#1007476, which is not applicable for us, but is correct based on current initscripts implementation.

commit ea1f74af86b6adb1affe8e91dde8308a0363f0f3
Author: Alon Bar-Lev <alonbl>
Date:   Fri Sep 13 10:12:28 2013 +0300

    packaging: tune: iosched: modify udev trigger to change event
    
    rhel initscripts does not distinguish where udev even comes from, so
    triggering hotplug even in all cases.
    
    change udev trigger to issue change event, as current initscripts
    ignores this event.
    
    current targeted udev rules handle change events so uneffected.
    
    Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1005278
    Change-Id: I626c30eb1f281cf14a753f657c9313144df1b0db
    Signed-off-by: Alon Bar-Lev <alonbl>

Comment 12 Dan Kenigsberg 2013-09-13 11:05:01 UTC
I would not mind the posted fix. But please note that the reporter's DHCP configuration is problematic with no regards to oVirt: if a host has two competing default gateways, the outcome is non-deterministic.

To avoid one nic's gateway to override the other's, you should add

  DEFROUTE=no

to at least one of the ifcfg files (before installation).

Comment 13 Alon Bar-Lev 2013-09-14 18:25:23 UTC
(In reply to Dan Kenigsberg from comment #12)
> I would not mind the posted fix. But please note that the reporter's DHCP
> configuration is problematic with no regards to oVirt: if a host has two
> competing default gateways, the outcome is non-deterministic.
> 
> To avoid one nic's gateway to override the other's, you should add
> 
>   DEFROUTE=no
> 
> to at least one of the ifcfg files (before installation).

Finally! proper root cause analysis!

This means that this configuration is unusable anyway.

Thanks!

Comment 14 Alon Bar-Lev 2013-09-16 22:53:31 UTC
Just to make it clear, the fix in host-dpeloy does not provide a fix for this issue, whenever this computer is rebooted, or one of the interfaces is plugged out/in there is a chance it become unresponsive.

So I am closing this as NOTABUG, as this is invalid configuration.


Note You need to log in before you can comment on or make changes to this bug.