1369020 – Networking is unstable when vlan over bond configured by anaconda interactive installation and NM TUI.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1369020 - Networking is unstable when vlan over bond configured by anaconda interactive installation and NM TUI.

Summary: Networking is unstable when vlan over bond configured by anaconda interactive...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	NetworkManager
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	pre-dev-freeze
Target Release:	---
Assignee:	Beniamino Galvani
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	ovirt-node-ng-platform
TreeView+	depends on / blocked

Reported:	2016-08-22 10:45 UTC by cshao
Modified:	2023-09-14 03:29 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-09 17:38:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
all log (6.29 MB, application/x-gzip) 2016-08-22 10:45 UTC, cshao	no flags	Details
sosreport (6.24 MB, application/x-xz) 2016-08-22 11:07 UTC, dguo	no flags	Details
network script (40.62 KB, application/x-gzip) 2016-08-22 11:07 UTC, dguo	no flags	Details
/var/log/* (415.92 KB, application/x-gzip) 2016-08-22 11:08 UTC, dguo	no flags	Details
*/var/log/.; /tmp/log; sosreport* (6.18 MB, application/x-gzip) 2016-08-23 06:36 UTC, cshao	no flags	Details
journalctl -u NetworkManager -b (2.73 MB, text/plain) 2016-08-23 13:50 UTC, cshao	no flags	Details
0829 (6.35 MB, application/x-gzip) 2016-08-29 11:55 UTC, cshao	no flags	Details
View All

Description cshao 2016-08-22 10:45:45 UTC

Created attachment 1192887 [details]
all log

Description of problem:
Networking is unstable when vlan over bond configured by anaconda interactive installation or NM TUI. 

--- 192.168.20.134 ping statistics ---
115 packets transmitted, 12 received, 89% packet loss, time 114001ms
rtt min/avg/max/mdev = 0.137/0.213/0.405/0.070 ms


Version-Release number of selected component (if applicable):
redhat-virtualization-host-4.0-20160817.0.x86_64
imgbased-0.8.4-1.el7ev.noarch
redhat-release-virtualization-host-4.0-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:

Scenario 1: Configure vlan over bond by anaconda interactive installation.

1. Anaconda interactive install RHVH via iso(with default ks) 
2. Enter network page.
3. Add bond network(select 2 nics, bond mode set -> active backup) -> save.
4. Add vlan network(select above bond network, set vlan ID) - > save
5. Save above network.
6. Continue the installation.
7. Reboot and login RHVH.
8. ip addr

Scenario 2: Configure vlan over bond by NMTUI.

Actual results:
Scenario 1.
1. After step5 & 8,  the vlan over bond network is unstable, RHVH sometimes can obtain vlan IP, sometimes can't.
2. 80% + packet loss during ping statistics.

Scenario 2:
Can reproduce the issue via NMTUI to configure the bond+vlan.
If dhcp bond+vlan, the ip were appear on occasion, while static bond+vlan, can ping the vlan switch on occasion.

Expected results:
The vlan over bond network is stabilized and no packet loss all the time.

Additional info:

Comment 1 cshao 2016-08-22 11:00:11 UTC

Above logs include: 
/var/log/*.*; 
/tmp/*.log;  
sosreport 
/etc/sysconfig/network-scripts/*

Comment 2 dguo 2016-08-22 11:07:26 UTC

Created attachment 1192902 [details]
sosreport

Comment 3 dguo 2016-08-22 11:07:52 UTC

Created attachment 1192903 [details]
network script

Comment 4 dguo 2016-08-22 11:08:16 UTC

Created attachment 1192904 [details]
/var/log/*

Comment 5 cshao 2016-08-22 12:29:44 UTC

The log and nic configure file for scenario 2(NMTUI), please reference #c2,3,4.

Comment 6 Fabian Deutsch 2016-08-22 13:48:13 UTC

Moving to NetworkManager, doesn't look Node specific.

Comment 7 Beniamino Galvani 2016-08-22 14:02:21 UTC

bond0 is configured for DHCP, but there is no server responding on the interface:

 (bond0): DHCPv4 request timed out.
 (bond0): DHCPv4 state changed unknown -> timeout
 (bond0): canceled DHCP transaction, DHCP client pid 3261
 (bond0): DHCPv4 state changed timeout -> done
 (bond0): device state change: ip-config -> failed (reason 'ip-config-unavailable') [70 120 5]
 (bond0): Activation: failed for connection 'bond0'

and so NM keeps retrying the connection, bringing it down and up.

Please specify BOOTPROTO=none (and also IPV6INIT=no) if there is no DHCP server (IPv6 router) on bond0; in this case it seems that only the VLAN should get a DHCP address.

Comment 9 cshao 2016-08-23 06:34:44 UTC

(In reply to Beniamino Galvani from comment #7)
> bond0 is configured for DHCP, but there is no server responding on the
> interface:
> 
>  (bond0): DHCPv4 request timed out.
>  (bond0): DHCPv4 state changed unknown -> timeout
>  (bond0): canceled DHCP transaction, DHCP client pid 3261
>  (bond0): DHCPv4 state changed timeout -> done
>  (bond0): device state change: ip-config -> failed (reason
> 'ip-config-unavailable') [70 120 5]
>  (bond0): Activation: failed for connection 'bond0'
> 
> and so NM keeps retrying the connection, bringing it down and up.
> 
> Please specify BOOTPROTO=none (and also IPV6INIT=no) if there is no DHCP
> server (IPv6 router) on bond0; in this case it seems that only the VLAN
> should get a DHCP address.

Vlan can't get IP address after specify BOOTPROTO=none.

# cat ifcfg-bond0
DEVICE=bond0
BONDING_OPTS="resend_igmp=1 updelay=0 use_carrier=1 miimon=100 downdelay=0 xmit_hash_policy=0 primary_reselect=0 fail_over_mac=0 arp_validate=0 mode=active-backup lacp_rate=0 arp_interval=0 ad_select=0"
TYPE=Bond
BONDING_MASTER=yes
BOOTPROTO=none
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=bond0
UUID=7e6e976c-f3f1-4478-89f7-4caa6ac76b39
ONBOOT=yes

Detail info please refer "790.tar.gz" for more details.

Comment 10 cshao 2016-08-23 06:36:02 UTC

Created attachment 1193157 [details]
/var/log/*.*; /tmp/log; sosreport

Comment 11 Beniamino Galvani 2016-08-23 08:16:36 UTC

(In reply to shaochen from comment #10)
> Created attachment 1193157 [details]
> /var/log/*.*; /tmp/log; sosreport

Hi, I can't say what's wrong from the logs above. Can you please set 'level=DEBUG' in the [logging] section of /etc/NetworkManager/NetworkManager.conf, reboot the system and attach the output of 'journalctl -u NetworkManager -b'? Thanks!

Comment 12 cshao 2016-08-23 13:50:12 UTC

Created attachment 1193316 [details]
journalctl -u NetworkManager -b

Also provide our test env to you by mail.

Comment 13 Beniamino Galvani 2016-08-23 19:38:17 UTC

Hi,

this is strange, in the logs I still see DHCP enabled for bond0:

  nm_utils_log_connection_diff(): ++ connection.id             = 'bond0'
  nm_utils_log_connection_diff(): ++ connection.interface-name = 'bond0'
  nm_utils_log_connection_diff(): ++ ipv4.method               = 'auto'

and the bond0 connection going up and down several times:

  $ grep "Beginning DHCP\|timed out" journalctl.txt  | grep \(bond0\)
  22:40:29 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4 transaction (timeout in 45 seconds)
  22:41:14 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
  22:41:18 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4 transaction (timeout in 45 seconds)
  22:42:03 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
  22:42:07 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4 transaction (timeout in 45 seconds)
  22:42:52 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
  [...]

Can you please double check if the bond0 connection has BOOTPROTO=none
as suggested in comment 7? A quick method to verify it is, after
updating the ifcfg file, to do a 'nmcli connection reload' as root,
and check that output of 'nmcli connection show bond0' contains
'ipv4.method: disabled'.

What's the content of /etc/sysconfig/network-scripts/ifcfg-bond0 and
the output of 'nmcli connection show bond0'? Thanks!

Comment 14 cshao 2016-08-29 11:53:35 UTC

(In reply to Beniamino Galvani from comment #13)
> Hi,
> 
> this is strange, in the logs I still see DHCP enabled for bond0:
> 
>   nm_utils_log_connection_diff(): ++ connection.id             = 'bond0'
>   nm_utils_log_connection_diff(): ++ connection.interface-name = 'bond0'
>   nm_utils_log_connection_diff(): ++ ipv4.method               = 'auto'
> 
> and the bond0 connection going up and down several times:
> 
>   $ grep "Beginning DHCP\|timed out" journalctl.txt  | grep \(bond0\)
>   22:40:29 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4
> transaction (timeout in 45 seconds)
>   22:41:14 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
>   22:41:18 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4
> transaction (timeout in 45 seconds)
>   22:42:03 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
>   22:42:07 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4
> transaction (timeout in 45 seconds)
>   22:42:52 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
>   [...]
> 
> Can you please double check if the bond0 connection has BOOTPROTO=none
> as suggested in comment 7? A quick method to verify it is, after
> updating the ifcfg file, to do a 'nmcli connection reload' as root,
> and check that output of 'nmcli connection show bond0' contains
> 'ipv4.method: disabled'.
> 
> What's the content of /etc/sysconfig/network-scripts/ifcfg-bond0 and
> the output of 'nmcli connection show bond0'? Thanks!


Sorry for later reply, I was not in the office last week.

# cat ifcfg-bond0
DEVICE=bond0
BONDING_OPTS="resend_igmp=1 updelay=0 use_carrier=1 miimon=100 downdelay=0 xmit_hash_policy=0 primary_reselect=0 fail_over_mac=0 arp_validate=0 mode=active-backup lacp_rate=0 arp_interval=0 ad_select=0"
TYPE=Bond
BONDING_MASTER=yes
BOOTPROTO=none
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=bond0
UUID=6b543529-5c52-4272-8123-a2868f5d2de8
ONBOOT=yes



# nmcli connection show bond0 | grep ipv4.method
ipv4.method:                            disabled


# ping 192.168.20.134
PING 192.168.20.134 (192.168.20.134) 56(84) bytes of data.
64 bytes from 192.168.20.134: icmp_seq=1 ttl=64 time=0.192 ms
64 bytes from 192.168.20.134: icmp_seq=2 ttl=64 time=0.190 ms
64 bytes from 192.168.20.134: icmp_seq=3 ttl=64 time=0.183 ms
64 bytes from 192.168.20.134: icmp_seq=4 ttl=64 time=0.187 ms
64 bytes from 192.168.20.134: icmp_seq=5 ttl=64 time=0.186 ms
64 bytes from 192.168.20.134: icmp_seq=6 ttl=64 time=0.188 ms
64 bytes from 192.168.20.134: icmp_seq=7 ttl=64 time=0.185 ms
64 bytes from 192.168.20.134: icmp_seq=8 ttl=64 time=0.179 ms
64 bytes from 192.168.20.134: icmp_seq=9 ttl=64 time=0.189 ms
64 bytes from 192.168.20.134: icmp_seq=10 ttl=64 time=0.177 ms
^C
--- 192.168.20.134 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8999ms
rtt min/avg/max/mdev = 0.177/0.185/0.192/0.015 ms

Seem the networking is stable now, no packet loss during ping.
Detail info please refer new log "0829".

Comment 15 cshao 2016-08-29 11:55:20 UTC

Created attachment 1195342 [details]
0829

Comment 16 Beniamino Galvani 2016-09-01 07:37:58 UTC

(In reply to shaochen from comment #14)
> 
> # nmcli connection show bond0 | grep ipv4.method
> ipv4.method:                            disabled

> Seem the networking is stable now, no packet loss during ping.

Can this bug be closed then? It seems a configuration issue and this behavior is documented  in [1].

[1] https://access.redhat.com/solutions/1608803

Comment 17 cshao 2016-09-01 08:43:07 UTC

(In reply to Beniamino Galvani from comment #16)
> (In reply to shaochen from comment #14)
> > 
> > # nmcli connection show bond0 | grep ipv4.method
> > ipv4.method:                            disabled
> 
> > Seem the networking is stable now, no packet loss during ping.
> 
> Can this bug be closed then? It seems a configuration issue and this
> behavior is documented  in [1].
> 
> [1] https://access.redhat.com/solutions/1608803

Seem yes, according the workaround the bug was gone. But I think this is inconvenient, will the bug fix (without the workaround) in the future?

Comment 18 Beniamino Galvani 2016-09-01 09:01:52 UTC

(In reply to shaochen from comment #17)
> (In reply to Beniamino Galvani from comment #16)
> > (In reply to shaochen from comment #14)
> > > 
> > > # nmcli connection show bond0 | grep ipv4.method
> > > ipv4.method:                            disabled
> > 
> > > Seem the networking is stable now, no packet loss during ping.
> > 
> > Can this bug be closed then? It seems a configuration issue and this
> > behavior is documented  in [1].
> > 
> > [1] https://access.redhat.com/solutions/1608803
> 
> Seem yes, according the workaround the bug was gone. But I think this is
> inconvenient, will the bug fix (without the workaround) in the future?

According to the discussion in bug 1261686, this is how NM is supposed to work.

If the bond is used only to provide L2 connectivity for the VLAN, it must not be configured to use DHCP or IPv6 autoconf, otherwise the connection will fail.

Comment 19 cshao 2016-09-01 09:35:49 UTC

(In reply to Beniamino Galvani from comment #18)
> (In reply to shaochen from comment #17)
> > (In reply to Beniamino Galvani from comment #16)
> > > (In reply to shaochen from comment #14)
> > > > 
> > > > # nmcli connection show bond0 | grep ipv4.method
> > > > ipv4.method:                            disabled
> > > 
> > > > Seem the networking is stable now, no packet loss during ping.
> > > 
> > > Can this bug be closed then? It seems a configuration issue and this
> > > behavior is documented  in [1].
> > > 
> > > [1] https://access.redhat.com/solutions/1608803
> > 
> > Seem yes, according the workaround the bug was gone. But I think this is
> > inconvenient, will the bug fix (without the workaround) in the future?
> 
> According to the discussion in bug 1261686, this is how NM is supposed to
> work.
> 
> If the bond is used only to provide L2 connectivity for the VLAN, it must
> not be configured to use DHCP or IPv6 autoconf, otherwise the connection
> will fail.

Thank you for your explanation.


Hi ycui,

Can we close this bug according comments?

Comment 20 Ying Cui 2016-09-01 09:49:44 UTC

Dan, could you check the comment 16 and comment 17 whether the behavior and the workaround is OK for our RHV networking.

Comment 21 Beniamino Galvani 2016-10-03 19:15:50 UTC

Hi, any news regarding this?

Comment 22 cshao 2016-10-10 05:13:17 UTC

Hi Dan,

Is there any chance to get a little feedback for #c20?

Thanks.

Comment 23 Beniamino Galvani 2016-11-09 17:38:13 UTC

I'm closing this since it seems there is nothing to be done on NM side. Please reopen if needed.

Comment 24 Red Hat Bugzilla 2023-09-14 03:29:56 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.