Bug 1549456 - Instances network are intermittently disconnected during stack update when using BCF
Summary: Instances network are intermittently disconnected during stack update when us...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: z1
: 13.0 (Queens)
Assignee: Bob Fournier
QA Contact: mlammon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-27 06:58 UTC by Chen
Modified: 2022-08-16 11:03 UTC (History)
8 users (show)

Fixed In Version: os-net-config-8.4.1-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-13 10:34:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1757130 0 None None None 2018-03-20 12:56:24 UTC
OpenStack gerrit 556998 0 'None' MERGED Don't restart ivs/nfvswitch in os-net-config 2020-02-06 19:31:14 UTC
Red Hat Issue Tracker OSP-4927 0 None None None 2022-08-16 11:03:10 UTC

Description Chen 2018-02-27 06:58:54 UTC
Description of problem:

Instances network are intermittently disconnected during stack update when using BCF.
ivs service will be restarted several times during which the instance network get disconnected

Feb 26 12:26:48 XXX.localnet os-collect-config[7070]: [2018/02/26 12:26:48 PM] [INFO] Restart ivs
Feb 26 12:48:34 XXX.localnet os-collect-config[7070]: [2018/02/26 12:48:34 PM] [INFO] Restart ivs
Feb 26 12:49:56 XXX.localnet os-collect-config[7070]: [2018/02/26 12:49:56 PM] [INFO] Restart ivs
Feb 26 12:52:24 XXX.localnet os-collect-config[7070]: [2018/02/26 12:52:24 PM] [INFO] Restart ivs
Feb 26 12:58:22 XXX.localnet os-collect-config[7070]: [2018/02/26 12:58:22 PM] [INFO] Restart ivs
Feb 26 13:04:20 XXX.localnet os-collect-config[7070]: [2018/02/26 01:04:20 PM] [INFO] Restart ivs
Feb 26 13:10:18 XXX.localnet os-collect-config[7070]: [2018/02/26 01:10:18 PM] [INFO] Restart ivs
Feb 26 13:16:25 XXX.localnet os-collect-config[7070]: [2018/02/26 01:16:25 PM] [INFO] Restart ivs
Feb 26 13:22:35 XXX.localnet os-collect-config[7070]: [2018/02/26 01:22:35 PM] [INFO] Restart ivs

Version-Release number of selected component (if applicable):

OSP10
BCF 4.5.1

How reproducible:

100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Chen 2018-03-01 03:13:40 UTC
Hi,

Can we get any workaround to prevent the existing VMs' network from disconnecting ?

Best Regards,
Chen

Comment 3 Bob Fournier 2018-03-01 21:19:07 UTC
>Scaling out compute nodes shouldn't affect existing VMs. Can we just simply reload ivs >instead of restart ?

Chen - this is really a question for BigSwitch.  The restart and all of the IVS supported was added here - https://review.openstack.org/#/c/274492/ by Xin Wu from BigSwitch.

Adding a NeedInfo.

Comment 4 Chen 2018-03-09 02:25:22 UTC
Hi Xinwu,

Can we get any information for this bugzilla ?

Best Regards,
Chen

Comment 7 Chen 2018-03-19 06:54:04 UTC
Hi Bob,

I know nothing about ivs so really not sure whether reloading the service will be a proper workaround here or not.

This is a production environment and I'm not sure whether removing the "restart" will impact the function of ivs or not... Do you have any advice Bob ?

Best Regards,
Chen

Comment 8 Bob Fournier 2018-03-21 20:46:04 UTC
We have discussed the ivs restart with Sarath Kumar from BigSwitch and feel it is OK to remove the "ivs restart".  Here is the text of the email:

***************
I feel it should be fine to have the 'systemctl restart ivs' removed from os-net-config. We are double checking how and where we 'enable' and 'start' IVS on the Compute nodes to confirm that this change doesn't break any assumptions made in the past (if any).
***************
We have confirmed that we do not make any assumptions about IVS start/enable at our end and do the right thing.

The only concern/question we have is the following -
When 'os-net-config' is run, we generate a config file for IVS to consume here[1]. Can we confirm that the first time os-net-config is run, it already has the correct mapping of nicX configs (provided via the RHOSP YAML files) to the correct interface name (i.e. nic1 => eno1, nic3 => p1p1, nic4 => p2p1, etc) [2] ? If this mapping changes between os-net-config calls (or nicX to actual interface name changes), then we would need to restart IVS so that it picks the correct interface configs.


[1] https://github.com/openstack/os-net-config/blob/96d17b251737495be2bae1646debfa0fe44da1da/os_net_config/impl_ifcfg.py#L778
[2] https://github.com/openstack/os-net-config/blob/96d17b251737495be2bae1646debfa0fe44da1da/os_net_config/impl_ifcfg.py#L446
***************

It is our (RedHat) view that the mapping will be correct when os-net-config runs.

Comment 9 Bob Fournier 2018-03-22 20:24:19 UTC
Upstream patch will ned to be backported - https://review.openstack.org/#/c/555369/

Comment 10 Chen 2018-03-27 05:33:06 UTC
Hi Bob,

Can we get hotfix for OSP10 ? Or is it acceptable that we just *manually* edit the file on overcloud node to workaround the issue ?

Best Regards,
Chen

Comment 11 Bob Fournier 2018-06-26 15:39:28 UTC
This is upstream Ocata patch for this fix - https://review.openstack.org/#/c/561609/.

Downstream backport to OSP-10 is still pending.

However I'd like to confirm that this actually fixes the problem.

Chen - have you been able to manually edit the file to see if it fixes this issue?

Comment 13 Bob Fournier 2018-06-26 18:03:29 UTC
Thanks Chen.  As hotfix is not required, retargeting this bug to OSP-13 where fix is already available.

Comment 15 mlammon 2018-07-12 21:10:54 UTC
We don't have specific hardware so all we can do is verify this new code is in place.
Environment:

(overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep os-net-config
os-net-config-8.4.1-4.el7ost.noarch

looks like code for ivs restart has been removed in the impl_ifcfg.py

            if ivs_uplinks or ivs_interfaces:
                logger.info("Attach to ivs with "
                            "uplinks: %s, "
                            "interfaces: %s" %
                            (ivs_uplinks, ivs_interfaces))
                for ivs_uplink in ivs_uplinks:
                    self.ifup(ivs_uplink)
                for ivs_interface in ivs_interfaces:
                    self.ifup(ivs_interface)

            if nfvswitch_interfaces or nfvswitch_internal_ifaces:
                logger.info("Attach to nfvswitch with "
                            "interfaces: %s, "
                            "internal interfaces: %s" %
                            (nfvswitch_interfaces, nfvswitch_internal_ifaces))
                for nfvswitch_interface in nfvswitch_interfaces:
                    self.ifup(nfvswitch_interface)
                for nfvswitch_internal in nfvswitch_internal_ifaces:
                    self.ifup(nfvswitch_internal)

If submitter finds a problem , please re-open bug or file a new one.

Comment 16 Lon Hohberger 2018-07-13 10:34:37 UTC
According to our records, this should be resolved by os-net-config-8.4.1-4.el7ost.  This build is available now.


Note You need to log in before you can comment on or make changes to this bug.