Created attachment 1037570 [details] attached Screenshot for vm form pxe Description of problem: Configure network with bond0 in rhevh, then enter into hosted engine page, configure hosted engine by used "PXE Boot Engine VM", but vm couldn't get ip adress from pxe when using "bond0" in rhevh. (Seen vm form pxe.png) Version-Release number of selected component (if applicable): rhev-hypervisor6-6.7-20150609.0 ovirt-node-plugin-hosted-engine-0.2.0-15.0.el6ev.noarch ovirt-node-3.2.3-3.el6.noarch ovirt-hosted-engine-ha-1.2.6-2.el6ev.noarch ovirt-hosted-engine-setup-1.2.4-2.el6ev.noarch How reproducible: 100% Steps to Reproduce: 1. Clean install rhevh6.7/7.1 and Configure network with bond0 2. enter into hosted engine page, configure hosted engine by used "PXE Boot Engine VM" Actual results: vm couldn't get ip adress from pxe when using "bond0" in rhevh. Expected results: vm could get ip adress from pxe when using "bond0" in rhevh. Additional info: Network info in rhevh side. [root@hp-z600-03 admin]# brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no rhevm 8000.18a905bf8be6 no bond0 vnet0 [root@hp-z600-03 admin]# ifconfig bond0 Link encap:Ethernet HWaddr 18:A9:05:BF:8B:E6 inet6 addr: fe80::1aa9:5ff:febf:8be6/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:212893 errors:0 dropped:0 overruns:0 frame:0 TX packets:68394 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:205474380 (195.9 MiB) TX bytes:16101541 (15.3 MiB) eth0 Link encap:Ethernet HWaddr 18:A9:05:BF:8B:E6 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:99522 errors:0 dropped:0 overruns:0 frame:0 TX packets:33354 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:98593731 (94.0 MiB) TX bytes:7805421 (7.4 MiB) Interrupt:17 eth1 Link encap:Ethernet HWaddr 18:A9:05:BF:8B:E6 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:112559 errors:0 dropped:0 overruns:0 frame:0 TX packets:34340 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:106831829 (101.8 MiB) TX bytes:8020376 (7.6 MiB) Interrupt:37 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:7217 errors:0 dropped:0 overruns:0 frame:0 TX packets:7217 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2014808 (1.9 MiB) TX bytes:2014808 (1.9 MiB) rhevm Link encap:Ethernet HWaddr 18:A9:05:BF:8B:E6 inet addr:10.66.72.13 Bcast:10.66.73.255 Mask:255.255.254.0 inet6 addr: fe80::1aa9:5ff:febf:8be6/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:155441 errors:0 dropped:0 overruns:0 frame:0 TX packets:63750 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:198879011 (189.6 MiB) TX bytes:15180325 (14.4 MiB) vnet0 Link encap:Ethernet HWaddr FE:16:3E:62:3F:06 inet6 addr: fe80::fc16:3eff:fe62:3f06/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4 errors:0 dropped:0 overruns:0 frame:0 TX packets:1108 errors:0 dropped:0 overruns:1 carrier:0 collisions:0 txqueuelen:500 RX bytes:1716 (1.6 KiB) TX bytes:95588 (93.3 KiB)
Sandro, is it supported to install hosted-engine on a host with a predefined bond?
(In reply to Fabian Deutsch from comment #2) > Sandro, is it supported to install hosted-engine on a host with a predefined > bond? Yes, support has been added in 3.5 with bug #1078206.
Thanks Sandro. Haiyang, can you please try to reproduce this bug on plain RHEL with hosted-engine, to find out if it is a problem of the network or hardware?
(In reply to Fabian Deutsch from comment #4) > Thanks Sandro. > > Haiyang, can you please try to reproduce this bug on plain RHEL with > hosted-engine, to find out if it is a problem of the network or hardware? Hey Fabian, I think it should be rhevh special bug due to: 1. According to https://bugzilla.redhat.com/show_bug.cgi?id=1078206#c29, it should be work in RHEL with hosted-engine. 2. Also it shouldn't be environment issue due to rhevh bond0 could get ip address
I can also reproduce this. The dhcp request is going out, I also see a reply, but the reply does not reach the guest. Dan/Sandro, can you tell if some procfs, sysfs or iptables rule must be set to allow ARP/dhcp replies to reach a VM?
(In reply to Fabian Deutsch from comment #6) > I can also reproduce this. > > The dhcp request is going out, I also see a reply, but the reply does not > reach the guest. > > Dan/Sandro, can you tell if some procfs, sysfs or iptables rule must be set > to allow ARP/dhcp replies to reach a VM? dhcp requests are outgoing connections so no special port need to be open for dhcp to work. My own laptop has: # Generated by iptables-save v1.4.21 on Tue Jun 16 08:31:41 2015 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [136726:66659941] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 5432 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 3128 -j ACCEPT -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -j REJECT --reject-with icmp-host-prohibited COMMIT # Completed on Tue Jun 16 08:31:41 2015 And it works fine with dhcp. I'm not aware of any special requirement for allowing pxe boot to work on HE Host or VM side.
Thanks Sandro. Miroslav, have you seen a bug with symptoms as described in comment 0?
How was bond0 defined? On top of which nics, and with which options? I'm not sure that this is related to firewall - can you confirm that the bug does not reproduce when firewall is turned off?
I could also reproduce this bug with the firewall turned off. The network layout is: + rhevm + bond0 | + eth0 | + eth1 + vnet0 (VM) networks = {'rhevm': {'addr': '192.168.2.107', 'bootproto4': 'dhcp', 'bridged': True, 'cfg': {'BOOTPROTO': 'dhcp', 'DEFROUTE': 'yes', 'DELAY': '0', 'DEVICE': 'rhevm', 'HOTPLUG': 'no', 'NM_CONTROLLED': 'no', 'ONBOOT': 'yes', 'STP': 'off', 'TYPE': 'Bridge'}, 'gateway': '192.168.2.1', 'iface': 'rhevm', 'ipv4addrs': ['192.168.2.107/24'], 'ipv6addrs': ['fd82:6903:8934:42:5054:ff:fe2c:3c8e/64', '2003:56:ce42:f871:5054:ff:fe2c:3c8e/64', 'fe80::5054:ff:fe2c:3c8e/64'], 'ipv6gateway': 'fe80::1', 'mtu': '1500', 'netmask': '255.255.255.0', 'ports': ['bond0', 'vnet0'], 'stp': 'off'}} nics = {'eth0': {'addr': '', 'cfg': {'DEVICE': 'eth0', 'HWADDR': '52:54:00:2c:3c:8e', 'MASTER': 'bond0', 'MTU': '1500', 'NM_CONTROLLED': 'no', 'ONBOOT': 'yes', 'SLAVE': 'yes'}, 'hwaddr': '52:54:00:2c:3c:8e', 'ipv4addrs': [], 'ipv6addrs': ['fe80::5054:ff:fe2c:3c8e/64'], 'mtu': '1500', 'netmask': '', 'permhwaddr': '52:54:00:2c:3c:8e', 'speed': 0}, 'eth1': {'addr': '', 'cfg': {'DEVICE': 'eth1', 'HWADDR': '52:54:00:c0:7a:da', 'MASTER': 'bond0', 'MTU': '1500', 'NM_CONTROLLED': 'no', 'ONBOOT': 'yes', 'SLAVE': 'yes'}, 'hwaddr': '52:54:00:2c:3c:8e', 'ipv4addrs': [], 'ipv6addrs': ['fe80::5054:ff:fe2c:3c8e/64'], 'mtu': '1500', 'netmask': '', 'permhwaddr': '52:54:00:c0:7a:da', 'speed': 0}}
Can you attach your ifcfg-bond0 (with its mode and options)? Does a "ping" traverses the bond0? is it UP and as an active slave?
(In reply to Dan Kenigsberg from comment #11) > Can you attach your ifcfg-bond0 (with its mode and options)? # cat /etc/sysconfig/network-scripts/ifcfg-bond0 # Generated by VDSM version 4.16.20-1.el6ev DEVICE=bond0 BONDING_OPTS='mode=balance-rr miimon=100' BRIDGE=rhevm ONBOOT=yes NM_CONTROLLED=no HOTPLUG=no # ip link show dev bond0 4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 52:54:00:2c:3c:8e brd ff:ff:ff:ff:ff:ff # ip a show dev bond0 4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 52:54:00:2c:3c:8e brd ff:ff:ff:ff:ff:ff inet6 fe80::5054:ff:fe2c:3c8e/64 scope link valid_lft forever preferred_lft forever > Does a "ping" traverses the bond0? is it UP and as an active slave? Yes: # ping 192.168.2.1 -c1 PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data. 64 bytes from 192.168.2.1: icmp_seq=1 ttl=64 time=2.36 ms --- 192.168.2.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 2ms rtt min/avg/max/mdev = 2.362/2.362/2.362/0.000 ms
When dropping to the iPXE shell I see that packages are received: gPXE> ifstat net0: 00:16:3e:19:c5:63 on PCI00:03.0 (open) [Link:up, TX:0 TXE:0 RX:73 RXE:0] gPXE> dhcp net0 DHCP (net0: 00:16:3e:19:c5:63)....................................................... Connection timed out (0x4c106035) Could not configure net0: Connection timed out (0x4c106035) gPXE>
For the record: Using a plain device (eth0) in the same setup works, occasionally bug 1206884 appears.
This also happens with bond mode=4.
first try, can't reproduce with bond mode 4 on: rhev-hypervisor6-6.6-20150609.0.el6ev
Maybe this is related to bug 1094842. But after all my research I do not find any clue that it's RHEV-H specific, thus I am moving this to hosted-engine-setup for now for further investigation.
Second run, can't reproduce with bond mode 4 on: rhev-hypervisor6-6.6-20150609.0.el6ev The VM is up and got a valid ip address. it's not booting pxe because I don't have a tftp server on my network but I don't see any dhcp issue here. I see the bug has been opened on rhev-hypervisor6-6.7-20150609.0 maybe something related to 6.7?
Need to correct myself, I'm using the 6.7 node too
If you manage to reproduce, please attach full sosreport ( or at least /var/log content if sos doesn't work)
A ballance-rr mode requires switch configuration because every next packet is sent using a different bond port. The ports on the switch have to be aggregated or you can change mode to active-backup.
I can not provide the sosreport for now, because sos does not work on 6.7 yet. But it is really easy and quick to reproduce.
But it can be as Vlad says, that the bond mode was was choosen incorrectly.
So closing as not a bug?
Let us confirm with Haiyang what bonding mode he used, then I'd be fine to close it as NOTABUG, but we need to somehow ensure that only the correct bonding modes are used for VM networks, as described in bug 1094842. Haiyang, what bond mode did you use?
(In reply to Fabian Deutsch from comment #26) > Let us confirm with Haiyang what bonding mode he used, then I'd be fine to > close it as NOTABUG, but we need to somehow ensure that only the correct > bonding modes are used for VM networks, as described in bug 1094842. > > Haiyang, what bond mode did you use? Used the default value bond mode "mode=balance-rr"
Okay, then I'm closing this as a duplicate of bug 1094842 according to comment 22 and 27. *** This bug has been marked as a duplicate of bug 1094842 ***
Sandro, should he-setup - or vdsm? - throw a warning if a known-to be non working bond mode is used?
I guess it should be he-setup.