Hide Forgot
Created attachment 1192887 [details] all log Description of problem: Networking is unstable when vlan over bond configured by anaconda interactive installation or NM TUI. --- 192.168.20.134 ping statistics --- 115 packets transmitted, 12 received, 89% packet loss, time 114001ms rtt min/avg/max/mdev = 0.137/0.213/0.405/0.070 ms Version-Release number of selected component (if applicable): redhat-virtualization-host-4.0-20160817.0.x86_64 imgbased-0.8.4-1.el7ev.noarch redhat-release-virtualization-host-4.0-2.el7.x86_64 How reproducible: 100% Steps to Reproduce: Scenario 1: Configure vlan over bond by anaconda interactive installation. 1. Anaconda interactive install RHVH via iso(with default ks) 2. Enter network page. 3. Add bond network(select 2 nics, bond mode set -> active backup) -> save. 4. Add vlan network(select above bond network, set vlan ID) - > save 5. Save above network. 6. Continue the installation. 7. Reboot and login RHVH. 8. ip addr Scenario 2: Configure vlan over bond by NMTUI. Actual results: Scenario 1. 1. After step5 & 8, the vlan over bond network is unstable, RHVH sometimes can obtain vlan IP, sometimes can't. 2. 80% + packet loss during ping statistics. Scenario 2: Can reproduce the issue via NMTUI to configure the bond+vlan. If dhcp bond+vlan, the ip were appear on occasion, while static bond+vlan, can ping the vlan switch on occasion. Expected results: The vlan over bond network is stabilized and no packet loss all the time. Additional info:
Above logs include: /var/log/*.*; /tmp/*.log; sosreport /etc/sysconfig/network-scripts/*
Created attachment 1192902 [details] sosreport
Created attachment 1192903 [details] network script
Created attachment 1192904 [details] /var/log/*
The log and nic configure file for scenario 2(NMTUI), please reference #c2,3,4.
Moving to NetworkManager, doesn't look Node specific.
bond0 is configured for DHCP, but there is no server responding on the interface: (bond0): DHCPv4 request timed out. (bond0): DHCPv4 state changed unknown -> timeout (bond0): canceled DHCP transaction, DHCP client pid 3261 (bond0): DHCPv4 state changed timeout -> done (bond0): device state change: ip-config -> failed (reason 'ip-config-unavailable') [70 120 5] (bond0): Activation: failed for connection 'bond0' and so NM keeps retrying the connection, bringing it down and up. Please specify BOOTPROTO=none (and also IPV6INIT=no) if there is no DHCP server (IPv6 router) on bond0; in this case it seems that only the VLAN should get a DHCP address.
(In reply to Beniamino Galvani from comment #7) > bond0 is configured for DHCP, but there is no server responding on the > interface: > > (bond0): DHCPv4 request timed out. > (bond0): DHCPv4 state changed unknown -> timeout > (bond0): canceled DHCP transaction, DHCP client pid 3261 > (bond0): DHCPv4 state changed timeout -> done > (bond0): device state change: ip-config -> failed (reason > 'ip-config-unavailable') [70 120 5] > (bond0): Activation: failed for connection 'bond0' > > and so NM keeps retrying the connection, bringing it down and up. > > Please specify BOOTPROTO=none (and also IPV6INIT=no) if there is no DHCP > server (IPv6 router) on bond0; in this case it seems that only the VLAN > should get a DHCP address. Vlan can't get IP address after specify BOOTPROTO=none. # cat ifcfg-bond0 DEVICE=bond0 BONDING_OPTS="resend_igmp=1 updelay=0 use_carrier=1 miimon=100 downdelay=0 xmit_hash_policy=0 primary_reselect=0 fail_over_mac=0 arp_validate=0 mode=active-backup lacp_rate=0 arp_interval=0 ad_select=0" TYPE=Bond BONDING_MASTER=yes BOOTPROTO=none DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6INIT=no NAME=bond0 UUID=7e6e976c-f3f1-4478-89f7-4caa6ac76b39 ONBOOT=yes Detail info please refer "790.tar.gz" for more details.
Created attachment 1193157 [details] /var/log/*.*; /tmp/log; sosreport
(In reply to shaochen from comment #10) > Created attachment 1193157 [details] > /var/log/*.*; /tmp/log; sosreport Hi, I can't say what's wrong from the logs above. Can you please set 'level=DEBUG' in the [logging] section of /etc/NetworkManager/NetworkManager.conf, reboot the system and attach the output of 'journalctl -u NetworkManager -b'? Thanks!
Created attachment 1193316 [details] journalctl -u NetworkManager -b Also provide our test env to you by mail.
Hi, this is strange, in the logs I still see DHCP enabled for bond0: nm_utils_log_connection_diff(): ++ connection.id = 'bond0' nm_utils_log_connection_diff(): ++ connection.interface-name = 'bond0' nm_utils_log_connection_diff(): ++ ipv4.method = 'auto' and the bond0 connection going up and down several times: $ grep "Beginning DHCP\|timed out" journalctl.txt | grep \(bond0\) 22:40:29 NetworkManager[1053]: <info> Activation (bond0) Beginning DHCPv4 transaction (timeout in 45 seconds) 22:41:14 NetworkManager[1053]: <warn> (bond0): DHCPv4 request timed out. 22:41:18 NetworkManager[1053]: <info> Activation (bond0) Beginning DHCPv4 transaction (timeout in 45 seconds) 22:42:03 NetworkManager[1053]: <warn> (bond0): DHCPv4 request timed out. 22:42:07 NetworkManager[1053]: <info> Activation (bond0) Beginning DHCPv4 transaction (timeout in 45 seconds) 22:42:52 NetworkManager[1053]: <warn> (bond0): DHCPv4 request timed out. [...] Can you please double check if the bond0 connection has BOOTPROTO=none as suggested in comment 7? A quick method to verify it is, after updating the ifcfg file, to do a 'nmcli connection reload' as root, and check that output of 'nmcli connection show bond0' contains 'ipv4.method: disabled'. What's the content of /etc/sysconfig/network-scripts/ifcfg-bond0 and the output of 'nmcli connection show bond0'? Thanks!
(In reply to Beniamino Galvani from comment #13) > Hi, > > this is strange, in the logs I still see DHCP enabled for bond0: > > nm_utils_log_connection_diff(): ++ connection.id = 'bond0' > nm_utils_log_connection_diff(): ++ connection.interface-name = 'bond0' > nm_utils_log_connection_diff(): ++ ipv4.method = 'auto' > > and the bond0 connection going up and down several times: > > $ grep "Beginning DHCP\|timed out" journalctl.txt | grep \(bond0\) > 22:40:29 NetworkManager[1053]: <info> Activation (bond0) Beginning DHCPv4 > transaction (timeout in 45 seconds) > 22:41:14 NetworkManager[1053]: <warn> (bond0): DHCPv4 request timed out. > 22:41:18 NetworkManager[1053]: <info> Activation (bond0) Beginning DHCPv4 > transaction (timeout in 45 seconds) > 22:42:03 NetworkManager[1053]: <warn> (bond0): DHCPv4 request timed out. > 22:42:07 NetworkManager[1053]: <info> Activation (bond0) Beginning DHCPv4 > transaction (timeout in 45 seconds) > 22:42:52 NetworkManager[1053]: <warn> (bond0): DHCPv4 request timed out. > [...] > > Can you please double check if the bond0 connection has BOOTPROTO=none > as suggested in comment 7? A quick method to verify it is, after > updating the ifcfg file, to do a 'nmcli connection reload' as root, > and check that output of 'nmcli connection show bond0' contains > 'ipv4.method: disabled'. > > What's the content of /etc/sysconfig/network-scripts/ifcfg-bond0 and > the output of 'nmcli connection show bond0'? Thanks! Sorry for later reply, I was not in the office last week. # cat ifcfg-bond0 DEVICE=bond0 BONDING_OPTS="resend_igmp=1 updelay=0 use_carrier=1 miimon=100 downdelay=0 xmit_hash_policy=0 primary_reselect=0 fail_over_mac=0 arp_validate=0 mode=active-backup lacp_rate=0 arp_interval=0 ad_select=0" TYPE=Bond BONDING_MASTER=yes BOOTPROTO=none DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6INIT=no NAME=bond0 UUID=6b543529-5c52-4272-8123-a2868f5d2de8 ONBOOT=yes # nmcli connection show bond0 | grep ipv4.method ipv4.method: disabled # ping 192.168.20.134 PING 192.168.20.134 (192.168.20.134) 56(84) bytes of data. 64 bytes from 192.168.20.134: icmp_seq=1 ttl=64 time=0.192 ms 64 bytes from 192.168.20.134: icmp_seq=2 ttl=64 time=0.190 ms 64 bytes from 192.168.20.134: icmp_seq=3 ttl=64 time=0.183 ms 64 bytes from 192.168.20.134: icmp_seq=4 ttl=64 time=0.187 ms 64 bytes from 192.168.20.134: icmp_seq=5 ttl=64 time=0.186 ms 64 bytes from 192.168.20.134: icmp_seq=6 ttl=64 time=0.188 ms 64 bytes from 192.168.20.134: icmp_seq=7 ttl=64 time=0.185 ms 64 bytes from 192.168.20.134: icmp_seq=8 ttl=64 time=0.179 ms 64 bytes from 192.168.20.134: icmp_seq=9 ttl=64 time=0.189 ms 64 bytes from 192.168.20.134: icmp_seq=10 ttl=64 time=0.177 ms ^C --- 192.168.20.134 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 8999ms rtt min/avg/max/mdev = 0.177/0.185/0.192/0.015 ms Seem the networking is stable now, no packet loss during ping. Detail info please refer new log "0829".
Created attachment 1195342 [details] 0829
(In reply to shaochen from comment #14) > > # nmcli connection show bond0 | grep ipv4.method > ipv4.method: disabled > Seem the networking is stable now, no packet loss during ping. Can this bug be closed then? It seems a configuration issue and this behavior is documented in [1]. [1] https://access.redhat.com/solutions/1608803
(In reply to Beniamino Galvani from comment #16) > (In reply to shaochen from comment #14) > > > > # nmcli connection show bond0 | grep ipv4.method > > ipv4.method: disabled > > > Seem the networking is stable now, no packet loss during ping. > > Can this bug be closed then? It seems a configuration issue and this > behavior is documented in [1]. > > [1] https://access.redhat.com/solutions/1608803 Seem yes, according the workaround the bug was gone. But I think this is inconvenient, will the bug fix (without the workaround) in the future?
(In reply to shaochen from comment #17) > (In reply to Beniamino Galvani from comment #16) > > (In reply to shaochen from comment #14) > > > > > > # nmcli connection show bond0 | grep ipv4.method > > > ipv4.method: disabled > > > > > Seem the networking is stable now, no packet loss during ping. > > > > Can this bug be closed then? It seems a configuration issue and this > > behavior is documented in [1]. > > > > [1] https://access.redhat.com/solutions/1608803 > > Seem yes, according the workaround the bug was gone. But I think this is > inconvenient, will the bug fix (without the workaround) in the future? According to the discussion in bug 1261686, this is how NM is supposed to work. If the bond is used only to provide L2 connectivity for the VLAN, it must not be configured to use DHCP or IPv6 autoconf, otherwise the connection will fail.
(In reply to Beniamino Galvani from comment #18) > (In reply to shaochen from comment #17) > > (In reply to Beniamino Galvani from comment #16) > > > (In reply to shaochen from comment #14) > > > > > > > > # nmcli connection show bond0 | grep ipv4.method > > > > ipv4.method: disabled > > > > > > > Seem the networking is stable now, no packet loss during ping. > > > > > > Can this bug be closed then? It seems a configuration issue and this > > > behavior is documented in [1]. > > > > > > [1] https://access.redhat.com/solutions/1608803 > > > > Seem yes, according the workaround the bug was gone. But I think this is > > inconvenient, will the bug fix (without the workaround) in the future? > > According to the discussion in bug 1261686, this is how NM is supposed to > work. > > If the bond is used only to provide L2 connectivity for the VLAN, it must > not be configured to use DHCP or IPv6 autoconf, otherwise the connection > will fail. Thank you for your explanation. Hi ycui, Can we close this bug according comments?
Dan, could you check the comment 16 and comment 17 whether the behavior and the workaround is OK for our RHV networking.
Hi, any news regarding this?
Hi Dan, Is there any chance to get a little feedback for #c20? Thanks.
I'm closing this since it seems there is nothing to be done on NM side. Please reopen if needed.