1230638 – [6.7/7.1_3.5.4]Hosted engine:vm couldn't get ip adress from pxe when using "bond0" in rhevh

Bug 1230638 - [6.7/7.1_3.5.4]Hosted engine:vm couldn't get ip adress from pxe when using "bond0" in rhevh

Summary: [6.7/7.1_3.5.4]Hosted engine:vm couldn't get ip adress from pxe when using "b...

Keywords:
Status:	CLOSED DUPLICATE of bug 1094842
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-hosted-engine-setup
Sub Component:
Version:	3.5.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	3.5.4
Assignee:	Sandro Bonazzola
QA Contact:	meital avital
Docs Contact:
URL:
Whiteboard:	integration
Depends On:
Blocks:	1059435 1250199
TreeView+	depends on / blocked

Reported:	2015-06-11 09:21 UTC by haiyang,dong
Modified:	2015-08-04 17:51 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-06-18 05:56:59 UTC
oVirt Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
attached Screenshot for vm form pxe (19.32 KB, image/png) 2015-06-11 09:21 UTC, haiyang,dong	no flags	Details
View All

Description haiyang,dong 2015-06-11 09:21:00 UTC

Created attachment 1037570 [details]
attached Screenshot for vm form pxe

Description of problem:
Configure network with bond0 in rhevh, then enter into hosted engine page, configure hosted engine
by used "PXE Boot Engine VM", but vm couldn't get ip adress from pxe when using "bond0" in rhevh.
(Seen vm form pxe.png)

Version-Release number of selected component (if applicable):
rhev-hypervisor6-6.7-20150609.0
ovirt-node-plugin-hosted-engine-0.2.0-15.0.el6ev.noarch
ovirt-node-3.2.3-3.el6.noarch
ovirt-hosted-engine-ha-1.2.6-2.el6ev.noarch
ovirt-hosted-engine-setup-1.2.4-2.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Clean install rhevh6.7/7.1 and Configure network with bond0
2. enter into hosted engine page, configure hosted engine
by used "PXE Boot Engine VM"

Actual results:
vm couldn't get ip adress from pxe when using "bond0" in rhevh.

Expected results:
vm could get ip adress from pxe when using "bond0" in rhevh.

Additional info:
Network info in rhevh side.
[root@hp-z600-03 admin]# brctl show
bridge name        bridge id                STP enabled        interfaces
;vdsmdummy;                8000.000000000000        no                
rhevm                8000.18a905bf8be6        no                bond0
                                                        vnet0
[root@hp-z600-03 admin]# ifconfig 
bond0     Link encap:Ethernet  HWaddr 18:A9:05:BF:8B:E6  
          inet6 addr: fe80::1aa9:5ff:febf:8be6/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:212893 errors:0 dropped:0 overruns:0 frame:0
          TX packets:68394 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:205474380 (195.9 MiB)  TX bytes:16101541 (15.3 MiB)

eth0      Link encap:Ethernet  HWaddr 18:A9:05:BF:8B:E6  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:99522 errors:0 dropped:0 overruns:0 frame:0
          TX packets:33354 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:98593731 (94.0 MiB)  TX bytes:7805421 (7.4 MiB)
          Interrupt:17 

eth1      Link encap:Ethernet  HWaddr 18:A9:05:BF:8B:E6  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:112559 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34340 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:256 
          RX bytes:106831829 (101.8 MiB)  TX bytes:8020376 (7.6 MiB)
          Interrupt:37 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:7217 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7217 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:2014808 (1.9 MiB)  TX bytes:2014808 (1.9 MiB)

rhevm     Link encap:Ethernet  HWaddr 18:A9:05:BF:8B:E6  
          inet addr:10.66.72.13  Bcast:10.66.73.255  Mask:255.255.254.0
          inet6 addr: fe80::1aa9:5ff:febf:8be6/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:155441 errors:0 dropped:0 overruns:0 frame:0
          TX packets:63750 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:198879011 (189.6 MiB)  TX bytes:15180325 (14.4 MiB)

vnet0     Link encap:Ethernet  HWaddr FE:16:3E:62:3F:06  
          inet6 addr: fe80::fc16:3eff:fe62:3f06/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1108 errors:0 dropped:0 overruns:1 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:1716 (1.6 KiB)  TX bytes:95588 (93.3 KiB)

Comment 2 Fabian Deutsch 2015-06-11 09:54:30 UTC

Sandro, is it supported to install hosted-engine on a host with a predefined bond?

Comment 3 Sandro Bonazzola 2015-06-11 09:58:25 UTC

(In reply to Fabian Deutsch from comment #2)
> Sandro, is it supported to install hosted-engine on a host with a predefined
> bond?

Yes, support has been added in 3.5 with bug #1078206.

Comment 4 Fabian Deutsch 2015-06-11 10:06:53 UTC

Thanks Sandro.

Haiyang, can you please try to reproduce this bug on plain RHEL with hosted-engine, to find out if it is a problem of the network or hardware?

Comment 5 haiyang,dong 2015-06-12 04:04:52 UTC

(In reply to Fabian Deutsch from comment #4)
> Thanks Sandro.
> 
> Haiyang, can you please try to reproduce this bug on plain RHEL with
> hosted-engine, to find out if it is a problem of the network or hardware?

Hey Fabian,

I think it should be rhevh special bug due to:
1. According to https://bugzilla.redhat.com/show_bug.cgi?id=1078206#c29, it should be work in RHEL with hosted-engine.
2. Also it shouldn't be environment issue due to rhevh bond0 could get ip address

Comment 6 Fabian Deutsch 2015-06-15 14:58:09 UTC

I can also reproduce this.

The dhcp request is going out, I also see a reply, but the reply does not reach the guest.

Dan/Sandro, can you tell if some procfs, sysfs or iptables rule must be set to allow ARP/dhcp replies to reach a VM?

Comment 7 Sandro Bonazzola 2015-06-16 06:33:45 UTC

(In reply to Fabian Deutsch from comment #6)
> I can also reproduce this.
> 
> The dhcp request is going out, I also see a reply, but the reply does not
> reach the guest.
> 
> Dan/Sandro, can you tell if some procfs, sysfs or iptables rule must be set
> to allow ARP/dhcp replies to reach a VM?

dhcp requests are outgoing connections so no special port need to be open for dhcp to work.

My own laptop has: 
# Generated by iptables-save v1.4.21 on Tue Jun 16 08:31:41 2015
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [136726:66659941]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 5432 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 3128 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
# Completed on Tue Jun 16 08:31:41 2015

And it works fine with dhcp.
I'm not aware of any special requirement for allowing pxe boot to work on HE Host or VM side.

Comment 8 Fabian Deutsch 2015-06-16 08:06:43 UTC

Thanks Sandro.

Miroslav, have you seen a bug with symptoms as described in comment 0?

Comment 9 Dan Kenigsberg 2015-06-16 08:59:32 UTC

How was bond0 defined? On top of which nics, and with which options?

I'm not sure that this is related to firewall - can you confirm that the bug does not reproduce when firewall is turned off?

Comment 10 Fabian Deutsch 2015-06-16 09:09:10 UTC

I could also reproduce this bug with the firewall turned off.
The network layout is:
+ rhevm
  + bond0
  | + eth0
  | + eth1
  + vnet0 (VM)

	networks = {'rhevm': {'addr': '192.168.2.107',
	                      'bootproto4': 'dhcp',
	                      'bridged': True,
	                      'cfg': {'BOOTPROTO': 'dhcp',
	                              'DEFROUTE': 'yes',
	                              'DELAY': '0',
	                              'DEVICE': 'rhevm',
	                              'HOTPLUG': 'no',
	                              'NM_CONTROLLED': 'no',
	                              'ONBOOT': 'yes',
	                              'STP': 'off',
	                              'TYPE': 'Bridge'},
	                      'gateway': '192.168.2.1',
	                      'iface': 'rhevm',
	                      'ipv4addrs': ['192.168.2.107/24'],
	                      'ipv6addrs': ['fd82:6903:8934:42:5054:ff:fe2c:3c8e/64',
	                                    '2003:56:ce42:f871:5054:ff:fe2c:3c8e/64',
	                                    'fe80::5054:ff:fe2c:3c8e/64'],
	                      'ipv6gateway': 'fe80::1',
	                      'mtu': '1500',
	                      'netmask': '255.255.255.0',
	                      'ports': ['bond0', 'vnet0'],
	                      'stp': 'off'}}
	nics = {'eth0': {'addr': '',
	                 'cfg': {'DEVICE': 'eth0',
	                         'HWADDR': '52:54:00:2c:3c:8e',
	                         'MASTER': 'bond0',
	                         'MTU': '1500',
	                         'NM_CONTROLLED': 'no',
	                         'ONBOOT': 'yes',
	                         'SLAVE': 'yes'},
	                 'hwaddr': '52:54:00:2c:3c:8e',
	                 'ipv4addrs': [],
	                 'ipv6addrs': ['fe80::5054:ff:fe2c:3c8e/64'],
	                 'mtu': '1500',
	                 'netmask': '',
	                 'permhwaddr': '52:54:00:2c:3c:8e',
	                 'speed': 0},
	        'eth1': {'addr': '',
	                 'cfg': {'DEVICE': 'eth1',
	                         'HWADDR': '52:54:00:c0:7a:da',
	                         'MASTER': 'bond0',
	                         'MTU': '1500',
	                         'NM_CONTROLLED': 'no',
	                         'ONBOOT': 'yes',
	                         'SLAVE': 'yes'},
	                 'hwaddr': '52:54:00:2c:3c:8e',
	                 'ipv4addrs': [],
	                 'ipv6addrs': ['fe80::5054:ff:fe2c:3c8e/64'],
	                 'mtu': '1500',
	                 'netmask': '',
	                 'permhwaddr': '52:54:00:c0:7a:da',
	                 'speed': 0}}

Comment 11 Dan Kenigsberg 2015-06-16 10:50:51 UTC

Can you attach your ifcfg-bond0 (with its mode and options)?

Does a "ping" traverses the bond0? is it UP and as an active slave?

Comment 12 Fabian Deutsch 2015-06-16 12:38:14 UTC

(In reply to Dan Kenigsberg from comment #11)
> Can you attach your ifcfg-bond0 (with its mode and options)?

# cat /etc/sysconfig/network-scripts/ifcfg-bond0 
# Generated by VDSM version 4.16.20-1.el6ev
DEVICE=bond0
BONDING_OPTS='mode=balance-rr miimon=100'
BRIDGE=rhevm
ONBOOT=yes
NM_CONTROLLED=no
HOTPLUG=no

# ip link show dev bond0
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 52:54:00:2c:3c:8e brd ff:ff:ff:ff:ff:ff

# ip a show dev bond0
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 52:54:00:2c:3c:8e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe2c:3c8e/64 scope link 
       valid_lft forever preferred_lft forever


> Does a "ping" traverses the bond0? is it UP and as an active slave?

Yes:

# ping 192.168.2.1 -c1 
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
64 bytes from 192.168.2.1: icmp_seq=1 ttl=64 time=2.36 ms

--- 192.168.2.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 2.362/2.362/2.362/0.000 ms

Comment 13 Fabian Deutsch 2015-06-16 13:15:50 UTC

When dropping to the iPXE shell I see that packages are received:

gPXE> ifstat
net0: 00:16:3e:19:c5:63 on PCI00:03.0 (open)
  [Link:up, TX:0 TXE:0 RX:73 RXE:0]
gPXE> dhcp net0
DHCP (net0: 00:16:3e:19:c5:63)....................................................... Connection timed out (0x4c106035)
Could not configure net0: Connection timed out (0x4c106035)
gPXE>

Comment 14 Fabian Deutsch 2015-06-17 10:11:05 UTC

For the record: Using a plain device (eth0) in the same setup works, occasionally bug 1206884 appears.

Comment 15 Fabian Deutsch 2015-06-17 10:24:20 UTC

This also happens with bond mode=4.

Comment 16 Sandro Bonazzola 2015-06-17 10:40:12 UTC

first try, can't reproduce with bond mode 4 on:
 rhev-hypervisor6-6.6-20150609.0.el6ev

Comment 17 Fabian Deutsch 2015-06-17 10:46:47 UTC

Maybe this is related to bug 1094842.

But after all my research I do not find any clue that it's RHEV-H specific, thus I am moving this to hosted-engine-setup for now for further investigation.

Comment 19 Sandro Bonazzola 2015-06-17 12:13:44 UTC

Second run, can't reproduce with bond mode 4 on:
 rhev-hypervisor6-6.6-20150609.0.el6ev
The VM is up and got a valid ip address. it's not booting pxe because I don't have a tftp server on my network but I don't see any dhcp issue here.

I see the bug has been opened on rhev-hypervisor6-6.7-20150609.0 maybe something related to 6.7?

Comment 20 Sandro Bonazzola 2015-06-17 12:15:49 UTC

Need to correct myself, I'm using the 6.7 node too

Comment 21 Sandro Bonazzola 2015-06-17 12:20:40 UTC

If you manage to reproduce, please attach full sosreport ( or at least /var/log content if sos doesn't work)

Comment 22 Vlad Yasevich 2015-06-17 14:45:09 UTC

A ballance-rr mode requires switch configuration because every next packet is sent using a different bond port.  The ports on the switch have to be aggregated
or you can change mode to active-backup.

Comment 23 Fabian Deutsch 2015-06-17 15:42:12 UTC

I can not provide the sosreport for now, because sos does not work on 6.7 yet. But it is really easy and quick to reproduce.

Comment 24 Fabian Deutsch 2015-06-17 15:43:19 UTC

But it can be as Vlad says, that the bond mode was was choosen incorrectly.

Comment 25 Sandro Bonazzola 2015-06-17 15:51:25 UTC

So closing as not a bug?

Comment 26 Fabian Deutsch 2015-06-18 04:41:43 UTC

Let us confirm with Haiyang what bonding mode he used, then I'd be fine to close it as NOTABUG, but we need to somehow ensure that only the correct bonding modes are used for VM networks, as described in bug 1094842.

Haiyang, what bond mode did you use?

Comment 27 haiyang,dong 2015-06-18 05:00:01 UTC

(In reply to Fabian Deutsch from comment #26)
> Let us confirm with Haiyang what bonding mode he used, then I'd be fine to
> close it as NOTABUG, but we need to somehow ensure that only the correct
> bonding modes are used for VM networks, as described in bug 1094842.
> 
> Haiyang, what bond mode did you use?

Used the default value bond mode "mode=balance-rr"

Comment 28 Fabian Deutsch 2015-06-18 05:56:59 UTC

Okay, then I'm closing this as a duplicate of bug 1094842 according to comment 22 and 27.

*** This bug has been marked as a duplicate of bug 1094842 ***

Comment 29 Fabian Deutsch 2015-06-18 06:04:00 UTC

Sandro, should he-setup - or vdsm? - throw a warning if a known-to be non working bond mode is used?

Comment 30 Sandro Bonazzola 2015-06-18 06:30:35 UTC

I guess it should be he-setup.

Note You need to log in before you can comment on or make changes to this bug.