RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1369020 - Networking is unstable when vlan over bond configured by anaconda interactive installation and NM TUI.
Summary: Networking is unstable when vlan over bond configured by anaconda interactive...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager
Version: 7.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: pre-dev-freeze
: ---
Assignee: Beniamino Galvani
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks: ovirt-node-ng-platform
TreeView+ depends on / blocked
 
Reported: 2016-08-22 10:45 UTC by cshao
Modified: 2023-09-14 03:29 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-09 17:38:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
all log (6.29 MB, application/x-gzip)
2016-08-22 10:45 UTC, cshao
no flags Details
sosreport (6.24 MB, application/x-xz)
2016-08-22 11:07 UTC, dguo
no flags Details
network script (40.62 KB, application/x-gzip)
2016-08-22 11:07 UTC, dguo
no flags Details
/var/log/* (415.92 KB, application/x-gzip)
2016-08-22 11:08 UTC, dguo
no flags Details
/var/log/*.*; /tmp/log; sosreport (6.18 MB, application/x-gzip)
2016-08-23 06:36 UTC, cshao
no flags Details
journalctl -u NetworkManager -b (2.73 MB, text/plain)
2016-08-23 13:50 UTC, cshao
no flags Details
0829 (6.35 MB, application/x-gzip)
2016-08-29 11:55 UTC, cshao
no flags Details

Description cshao 2016-08-22 10:45:45 UTC
Created attachment 1192887 [details]
all log

Description of problem:
Networking is unstable when vlan over bond configured by anaconda interactive installation or NM TUI. 

--- 192.168.20.134 ping statistics ---
115 packets transmitted, 12 received, 89% packet loss, time 114001ms
rtt min/avg/max/mdev = 0.137/0.213/0.405/0.070 ms


Version-Release number of selected component (if applicable):
redhat-virtualization-host-4.0-20160817.0.x86_64
imgbased-0.8.4-1.el7ev.noarch
redhat-release-virtualization-host-4.0-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:

Scenario 1: Configure vlan over bond by anaconda interactive installation.

1. Anaconda interactive install RHVH via iso(with default ks) 
2. Enter network page.
3. Add bond network(select 2 nics, bond mode set -> active backup) -> save.
4. Add vlan network(select above bond network, set vlan ID) - > save
5. Save above network.
6. Continue the installation.
7. Reboot and login RHVH.
8. ip addr

Scenario 2: Configure vlan over bond by NMTUI.

Actual results:
Scenario 1.
1. After step5 & 8,  the vlan over bond network is unstable, RHVH sometimes can obtain vlan IP, sometimes can't.
2. 80% + packet loss during ping statistics.

Scenario 2:
Can reproduce the issue via NMTUI to configure the bond+vlan.
If dhcp bond+vlan, the ip were appear on occasion, while static bond+vlan, can ping the vlan switch on occasion.

Expected results:
The vlan over bond network is stabilized and no packet loss all the time.

Additional info:

Comment 1 cshao 2016-08-22 11:00:11 UTC
Above logs include: 
/var/log/*.*; 
/tmp/*.log;  
sosreport 
/etc/sysconfig/network-scripts/*

Comment 2 dguo 2016-08-22 11:07:26 UTC
Created attachment 1192902 [details]
sosreport

Comment 3 dguo 2016-08-22 11:07:52 UTC
Created attachment 1192903 [details]
network script

Comment 4 dguo 2016-08-22 11:08:16 UTC
Created attachment 1192904 [details]
/var/log/*

Comment 5 cshao 2016-08-22 12:29:44 UTC
The log and nic configure file for scenario 2(NMTUI), please reference #c2,3,4.

Comment 6 Fabian Deutsch 2016-08-22 13:48:13 UTC
Moving to NetworkManager, doesn't look Node specific.

Comment 7 Beniamino Galvani 2016-08-22 14:02:21 UTC
bond0 is configured for DHCP, but there is no server responding on the interface:

 (bond0): DHCPv4 request timed out.
 (bond0): DHCPv4 state changed unknown -> timeout
 (bond0): canceled DHCP transaction, DHCP client pid 3261
 (bond0): DHCPv4 state changed timeout -> done
 (bond0): device state change: ip-config -> failed (reason 'ip-config-unavailable') [70 120 5]
 (bond0): Activation: failed for connection 'bond0'

and so NM keeps retrying the connection, bringing it down and up.

Please specify BOOTPROTO=none (and also IPV6INIT=no) if there is no DHCP server (IPv6 router) on bond0; in this case it seems that only the VLAN should get a DHCP address.

Comment 9 cshao 2016-08-23 06:34:44 UTC
(In reply to Beniamino Galvani from comment #7)
> bond0 is configured for DHCP, but there is no server responding on the
> interface:
> 
>  (bond0): DHCPv4 request timed out.
>  (bond0): DHCPv4 state changed unknown -> timeout
>  (bond0): canceled DHCP transaction, DHCP client pid 3261
>  (bond0): DHCPv4 state changed timeout -> done
>  (bond0): device state change: ip-config -> failed (reason
> 'ip-config-unavailable') [70 120 5]
>  (bond0): Activation: failed for connection 'bond0'
> 
> and so NM keeps retrying the connection, bringing it down and up.
> 
> Please specify BOOTPROTO=none (and also IPV6INIT=no) if there is no DHCP
> server (IPv6 router) on bond0; in this case it seems that only the VLAN
> should get a DHCP address.

Vlan can't get IP address after specify BOOTPROTO=none.

# cat ifcfg-bond0
DEVICE=bond0
BONDING_OPTS="resend_igmp=1 updelay=0 use_carrier=1 miimon=100 downdelay=0 xmit_hash_policy=0 primary_reselect=0 fail_over_mac=0 arp_validate=0 mode=active-backup lacp_rate=0 arp_interval=0 ad_select=0"
TYPE=Bond
BONDING_MASTER=yes
BOOTPROTO=none
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=bond0
UUID=7e6e976c-f3f1-4478-89f7-4caa6ac76b39
ONBOOT=yes

Detail info please refer "790.tar.gz" for more details.

Comment 10 cshao 2016-08-23 06:36:02 UTC
Created attachment 1193157 [details]
/var/log/*.*; /tmp/log; sosreport

Comment 11 Beniamino Galvani 2016-08-23 08:16:36 UTC
(In reply to shaochen from comment #10)
> Created attachment 1193157 [details]
> /var/log/*.*; /tmp/log; sosreport

Hi, I can't say what's wrong from the logs above. Can you please set 'level=DEBUG' in the [logging] section of /etc/NetworkManager/NetworkManager.conf, reboot the system and attach the output of 'journalctl -u NetworkManager -b'? Thanks!

Comment 12 cshao 2016-08-23 13:50:12 UTC
Created attachment 1193316 [details]
journalctl -u NetworkManager -b

Also provide our test env to you by mail.

Comment 13 Beniamino Galvani 2016-08-23 19:38:17 UTC
Hi,

this is strange, in the logs I still see DHCP enabled for bond0:

  nm_utils_log_connection_diff(): ++ connection.id             = 'bond0'
  nm_utils_log_connection_diff(): ++ connection.interface-name = 'bond0'
  nm_utils_log_connection_diff(): ++ ipv4.method               = 'auto'

and the bond0 connection going up and down several times:

  $ grep "Beginning DHCP\|timed out" journalctl.txt  | grep \(bond0\)
  22:40:29 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4 transaction (timeout in 45 seconds)
  22:41:14 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
  22:41:18 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4 transaction (timeout in 45 seconds)
  22:42:03 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
  22:42:07 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4 transaction (timeout in 45 seconds)
  22:42:52 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
  [...]

Can you please double check if the bond0 connection has BOOTPROTO=none
as suggested in comment 7? A quick method to verify it is, after
updating the ifcfg file, to do a 'nmcli connection reload' as root,
and check that output of 'nmcli connection show bond0' contains
'ipv4.method: disabled'.

What's the content of /etc/sysconfig/network-scripts/ifcfg-bond0 and
the output of 'nmcli connection show bond0'? Thanks!

Comment 14 cshao 2016-08-29 11:53:35 UTC
(In reply to Beniamino Galvani from comment #13)
> Hi,
> 
> this is strange, in the logs I still see DHCP enabled for bond0:
> 
>   nm_utils_log_connection_diff(): ++ connection.id             = 'bond0'
>   nm_utils_log_connection_diff(): ++ connection.interface-name = 'bond0'
>   nm_utils_log_connection_diff(): ++ ipv4.method               = 'auto'
> 
> and the bond0 connection going up and down several times:
> 
>   $ grep "Beginning DHCP\|timed out" journalctl.txt  | grep \(bond0\)
>   22:40:29 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4
> transaction (timeout in 45 seconds)
>   22:41:14 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
>   22:41:18 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4
> transaction (timeout in 45 seconds)
>   22:42:03 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
>   22:42:07 NetworkManager[1053]: <info>  Activation (bond0) Beginning DHCPv4
> transaction (timeout in 45 seconds)
>   22:42:52 NetworkManager[1053]: <warn>  (bond0): DHCPv4 request timed out.
>   [...]
> 
> Can you please double check if the bond0 connection has BOOTPROTO=none
> as suggested in comment 7? A quick method to verify it is, after
> updating the ifcfg file, to do a 'nmcli connection reload' as root,
> and check that output of 'nmcli connection show bond0' contains
> 'ipv4.method: disabled'.
> 
> What's the content of /etc/sysconfig/network-scripts/ifcfg-bond0 and
> the output of 'nmcli connection show bond0'? Thanks!


Sorry for later reply, I was not in the office last week.

# cat ifcfg-bond0
DEVICE=bond0
BONDING_OPTS="resend_igmp=1 updelay=0 use_carrier=1 miimon=100 downdelay=0 xmit_hash_policy=0 primary_reselect=0 fail_over_mac=0 arp_validate=0 mode=active-backup lacp_rate=0 arp_interval=0 ad_select=0"
TYPE=Bond
BONDING_MASTER=yes
BOOTPROTO=none
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=bond0
UUID=6b543529-5c52-4272-8123-a2868f5d2de8
ONBOOT=yes



# nmcli connection show bond0 | grep ipv4.method
ipv4.method:                            disabled


# ping 192.168.20.134
PING 192.168.20.134 (192.168.20.134) 56(84) bytes of data.
64 bytes from 192.168.20.134: icmp_seq=1 ttl=64 time=0.192 ms
64 bytes from 192.168.20.134: icmp_seq=2 ttl=64 time=0.190 ms
64 bytes from 192.168.20.134: icmp_seq=3 ttl=64 time=0.183 ms
64 bytes from 192.168.20.134: icmp_seq=4 ttl=64 time=0.187 ms
64 bytes from 192.168.20.134: icmp_seq=5 ttl=64 time=0.186 ms
64 bytes from 192.168.20.134: icmp_seq=6 ttl=64 time=0.188 ms
64 bytes from 192.168.20.134: icmp_seq=7 ttl=64 time=0.185 ms
64 bytes from 192.168.20.134: icmp_seq=8 ttl=64 time=0.179 ms
64 bytes from 192.168.20.134: icmp_seq=9 ttl=64 time=0.189 ms
64 bytes from 192.168.20.134: icmp_seq=10 ttl=64 time=0.177 ms
^C
--- 192.168.20.134 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8999ms
rtt min/avg/max/mdev = 0.177/0.185/0.192/0.015 ms

Seem the networking is stable now, no packet loss during ping.
Detail info please refer new log "0829".

Comment 15 cshao 2016-08-29 11:55:20 UTC
Created attachment 1195342 [details]
0829

Comment 16 Beniamino Galvani 2016-09-01 07:37:58 UTC
(In reply to shaochen from comment #14)
> 
> # nmcli connection show bond0 | grep ipv4.method
> ipv4.method:                            disabled

> Seem the networking is stable now, no packet loss during ping.

Can this bug be closed then? It seems a configuration issue and this behavior is documented  in [1].

[1] https://access.redhat.com/solutions/1608803

Comment 17 cshao 2016-09-01 08:43:07 UTC
(In reply to Beniamino Galvani from comment #16)
> (In reply to shaochen from comment #14)
> > 
> > # nmcli connection show bond0 | grep ipv4.method
> > ipv4.method:                            disabled
> 
> > Seem the networking is stable now, no packet loss during ping.
> 
> Can this bug be closed then? It seems a configuration issue and this
> behavior is documented  in [1].
> 
> [1] https://access.redhat.com/solutions/1608803

Seem yes, according the workaround the bug was gone. But I think this is inconvenient, will the bug fix (without the workaround) in the future?

Comment 18 Beniamino Galvani 2016-09-01 09:01:52 UTC
(In reply to shaochen from comment #17)
> (In reply to Beniamino Galvani from comment #16)
> > (In reply to shaochen from comment #14)
> > > 
> > > # nmcli connection show bond0 | grep ipv4.method
> > > ipv4.method:                            disabled
> > 
> > > Seem the networking is stable now, no packet loss during ping.
> > 
> > Can this bug be closed then? It seems a configuration issue and this
> > behavior is documented  in [1].
> > 
> > [1] https://access.redhat.com/solutions/1608803
> 
> Seem yes, according the workaround the bug was gone. But I think this is
> inconvenient, will the bug fix (without the workaround) in the future?

According to the discussion in bug 1261686, this is how NM is supposed to work.

If the bond is used only to provide L2 connectivity for the VLAN, it must not be configured to use DHCP or IPv6 autoconf, otherwise the connection will fail.

Comment 19 cshao 2016-09-01 09:35:49 UTC
(In reply to Beniamino Galvani from comment #18)
> (In reply to shaochen from comment #17)
> > (In reply to Beniamino Galvani from comment #16)
> > > (In reply to shaochen from comment #14)
> > > > 
> > > > # nmcli connection show bond0 | grep ipv4.method
> > > > ipv4.method:                            disabled
> > > 
> > > > Seem the networking is stable now, no packet loss during ping.
> > > 
> > > Can this bug be closed then? It seems a configuration issue and this
> > > behavior is documented  in [1].
> > > 
> > > [1] https://access.redhat.com/solutions/1608803
> > 
> > Seem yes, according the workaround the bug was gone. But I think this is
> > inconvenient, will the bug fix (without the workaround) in the future?
> 
> According to the discussion in bug 1261686, this is how NM is supposed to
> work.
> 
> If the bond is used only to provide L2 connectivity for the VLAN, it must
> not be configured to use DHCP or IPv6 autoconf, otherwise the connection
> will fail.

Thank you for your explanation.


Hi ycui,

Can we close this bug according comments?

Comment 20 Ying Cui 2016-09-01 09:49:44 UTC
Dan, could you check the comment 16 and comment 17 whether the behavior and the workaround is OK for our RHV networking.

Comment 21 Beniamino Galvani 2016-10-03 19:15:50 UTC
Hi, any news regarding this?

Comment 22 cshao 2016-10-10 05:13:17 UTC
Hi Dan,

Is there any chance to get a little feedback for #c20?

Thanks.

Comment 23 Beniamino Galvani 2016-11-09 17:38:13 UTC
I'm closing this since it seems there is nothing to be done on NM side. Please reopen if needed.

Comment 24 Red Hat Bugzilla 2023-09-14 03:29:56 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.