Bug 1396605 - OVS DPDK bonds fail to configure during provisioning
Summary: OVS DPDK bonds fail to configure during provisioning
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-net-config
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Linux
unspecified
urgent
Target Milestone: ---
: 10.0 (Newton)
Assignee: Sanjay Upadhyay
QA Contact: Ofer Blaut
URL:
Whiteboard:
: 1416421 (view as bug list)
Depends On: 1422248 1428719
Blocks: 1335596 1396161 1406834 1416421
TreeView+ depends on / blocked
 
Reported: 2016-11-18 18:19 UTC by Andrew Bays
Modified: 2020-04-15 14:59 UTC (History)
21 users (show)

Fixed In Version: os-net-config-5.1.0-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1416421 (view as bug list)
Environment:
Last Closed: 2017-03-16 17:24:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
/var/log/messages file from compute node (671.69 KB, text/plain)
2017-01-06 08:59 UTC, surya
no flags Details
ovs-vswitchd.log from compute node (24.89 KB, text/plain)
2017-01-06 09:00 UTC, surya
no flags Details
ovsdb-server.log from compute node (785 bytes, text/plain)
2017-01-06 09:01 UTC, surya
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1643026 0 None None None 2016-11-18 18:19:06 UTC
OpenStack gerrit 417805 0 None MERGED Remove child members activation for OVS-DPDK bond 2020-04-16 19:59:31 UTC
Red Hat Product Errata RHBA-2017:0357 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 director Bug Fix Advisory 2017-03-01 18:32:13 UTC

Description Andrew Bays 2016-11-18 18:19:07 UTC
Details are available here:

https://bugs.launchpad.net/tripleo/+bug/1643026

Comment 3 Sean Merrow 2017-01-05 21:41:40 UTC
Partner has hit this issue, but the work-around listed in the upstream did not work for him and he got a new message. Can you please take a look and offer guidance?

Comment 4 surya 2017-01-06 08:59:55 UTC
Created attachment 1237886 [details]
/var/log/messages file from compute node

Comment 5 surya 2017-01-06 09:00:39 UTC
Created attachment 1237887 [details]
ovs-vswitchd.log from compute node

Comment 6 surya 2017-01-06 09:01:27 UTC
Created attachment 1237889 [details]
ovsdb-server.log from compute node

Comment 7 Andrew Bays 2017-01-06 13:10:07 UTC
@surya

I ran into another issue after I found the first workaround that I forgot to include here.  By default, OVS doesn't seem to be running with the "--dpdk" flag and other required parameters (even though it seems to be properly set to do so in the OOO templates).  That's why you see these messages repeated in your ovs-vswitchd.log:

...
2017-01-06T12:03:14.291Z|00070|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
...
2017-01-06T12:03:14.291Z|00072|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
...

Check to see if this is the case with the partner's setup.  What I had to do was to run OVS manually with the missing DPDK flags because setting DPDK_OPTIONS in OOO templates (NeutronDpdkCoreList, NeutronDpdkMemoryChannels, NeutronDpdkSocketMemory and NeutronDpdkDriverType) did not work properly.  Once I had OVS running with DPDK enabled, os-collect-config was able to add the ports when it looped and tried to configure networking again.

Comment 8 pillala 2017-01-10 05:17:53 UTC
OVS DPDK BONDING [ ACTIVE BACKUP ] 

ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 2048,2048 \ -- unix:$DB_SOCK --pidfile --detach


NOTE : This is completely manual work Around, it is working. 

NOTE : We are completely doing the automated deployment and definitely looking for patch to help us in the automation. 

We require PATCH for OVS DPDK BONDING for AUTOMATED DEPLOYMENT

Comment 9 Yariv 2017-01-11 07:55:43 UTC
(In reply to Andrew Bays from comment #7)
> @surya
> 
> I ran into another issue after I found the first workaround that I forgot to
> include here.  By default, OVS doesn't seem to be running with the "--dpdk"
> flag and other required parameters (even though it seems to be properly set
> to do so in the OOO templates).  That's why you see these messages repeated
> in your ovs-vswitchd.log:
> 
> ...
> 2017-01-06T12:03:14.291Z|00070|netdev|WARN|could not create netdev dpdk0 of
> unknown type dpdk
> ...
> 2017-01-06T12:03:14.291Z|00072|netdev|WARN|could not create netdev dpdk1 of
> unknown type dpdk
> ...
> 
> Check to see if this is the case with the partner's setup.  What I had to do
> was to run OVS manually with the missing DPDK flags because setting
> DPDK_OPTIONS in OOO templates (NeutronDpdkCoreList,
> NeutronDpdkMemoryChannels, NeutronDpdkSocketMemory and
> NeutronDpdkDriverType) did not work properly.  Once I had OVS running with
> DPDK enabled, os-collect-config was able to add the ports when it looped and
> tried to configure networking again.
Please refer
https://bugzilla.redhat.com/show_bug.cgi?id=1366356

Comment 11 Sanjay Upadhyay 2017-01-11 10:46:19 UTC
patch submitted -  https://review.openstack.org/#/c/417805/ 
moving to QA

Comment 12 hrushi 2017-01-11 16:35:07 UTC
@(In reply to Sanjay Upadhyay from comment #11)
> patch submitted -  https://review.openstack.org/#/c/417805/ 
> moving to QA

Does this patch take care of the workaround mentioned in comment #7 and #8?

Comment 13 Sean Merrow 2017-01-13 14:51:01 UTC
Hi Sanjay, does the patch you submitted include the work-arounds for those mentioned in comments 7 and 8?

Comment 15 Sanjay Upadhyay 2017-01-16 08:01:53 UTC
(In reply to Sean Merrow from comment #13)
> Hi Sanjay, does the patch you submitted include the work-arounds for those
> mentioned in comments 7 and 8?

some parts of the workaround are not required, ie the restarting of ovs-vswitchd. THe ovsswitchd is being started by the installer on step4 of the installation.

Comment 16 surya 2017-01-17 15:02:23 UTC
Thanks Sanjay for the workaround. We are able to deploy a compute node with OVS dpdkbond in active-stndby mode

Comment 17 Yariv 2017-01-19 11:24:24 UTC
Changed back to On_DEV
Patch is not merged yet.. 10z released on January 25th
Please put change status Back, once the patch received +2

Comment 18 Sanjay Upadhyay 2017-01-20 08:08:13 UTC
(In reply to Yariv from comment #17)
> Changed back to On_DEV
> Patch is not merged yet.. 10z released on January 25th
> Please put change status Back, once the patch received +2

https://review.openstack.org/#/c/417805 is merged now, placing the bug on_qa again.

Comment 19 Vijay Chundury 2017-01-24 08:43:11 UTC
sanjay, i believe this patch is merged. Can you create a new bz and start the downporting asap.

Yariv,
We will start the backport, but let us know if with this change we can get past this issue.

Regards
Vijay.

Comment 20 Jon Schlueter 2017-01-25 14:52:09 UTC
moving back to POST as patch is not in a build yet, but has been proposed/merged upstream

Comment 21 Sanjay Upadhyay 2017-01-25 15:00:24 UTC
*** Bug 1416421 has been marked as a duplicate of this bug. ***

Comment 24 Sanjay Upadhyay 2017-02-14 05:17:24 UTC
Thanks @Jschlueter.

This is now packaged, and I am moving the bug to QA for testing and validating. I have run a test and this is fixed.

Comment 26 Eyal Dannon 2017-02-16 12:31:44 UTC
I have verified DPDK + bonding deployment with "os-net-config-5.1.0-1.el7ost.noarch".

Comment 27 Sanjay Upadhyay 2017-02-21 08:12:02 UTC
os-net-config-5.1.0-1.el7ost.noarch released

Comment 28 Chen 2017-03-02 06:33:35 UTC
Hi,

Sorry for reopening the bugzilla.

I tried the newest 10.0.1 image from official site but the os-net-config version is still os-net-config-5.0.0-5.el7ost.noarch. Where can I get images which contain os-net-config-5.1.0-1 ?

Best Regards,
Chen

Comment 29 Chen 2017-03-03 06:10:28 UTC
Hi Sanjay,

Can you guide me where I can get image with os-net-config-5.1.0-1 ?

Best Regards,
Chen

Comment 30 Sanjay Upadhyay 2017-03-03 07:11:59 UTC
[root@rhelv73 ~]# yum list os-net-config
Loaded plugins: search-disabled-repos
Available Packages
os-net-config.noarch      5.1.0-1.el7ost     rhel-7-server-openstack-10-rpms


Can you please tally from the info given above with your setup. I am guessing its a simple case of correcting to correct repolist. It should have rhel-7-server-openstack-10-rpms.

Comment 31 Chen 2017-03-03 07:45:44 UTC
Hi Sanjay,

I am using director to deploy the overcloud nodes. My understanding is that it should be the os-net-config inside the overcloud image who configures the ovs-dpdk bond, not the packages in the official repository. I understand new os-net-config is available in the official repo but the os-net-config inside the overcloud image is too old to configure the ovs-dpdk bond. 

Or do I need to update the os-net-config in undercloud side ? Please correct me if I'm wrong.

Best Regards,
Chen

Comment 32 Sanjay Upadhyay 2017-03-03 08:55:06 UTC
Hi Chen,

you are right, the overcloud image is having older os-net-config. -

Eyal, can you check on this?

[stack@rhelv73 ~]$ sudo yum list rhosp-director-images
Loaded plugins: search-disabled-repos
Installed Packages
rhosp-director-images.noarch  10.0-20170201.1.el7ost    @rhel-7-server-openstack-10-rpms
[root@rhelv73 ~]# LIBGUESTFS_BACKEND=direct guestmount -a ./overcloud-full.qcow2 -m /dev/sda /mnt
[root@rhelv73 mnt]# logout
[stack@rhelv73 ~]$ sudo chroot /mnt
[root@rhelv73 /]# cd
[root@rhelv73 ~]# rpm -q os-net-config
os-net-config-5.0.0-5.el7ost.noarch

^^^ overcloud image in the latest package has the old os-net-config, which would lead to a failed deploy.

Comment 33 Chen 2017-03-06 03:23:17 UTC
Hi Sanjay,

This case currently affects a serials of huawei PoC project so it is in a high priority from Red Hat side.

Best Regards,
Chen

Comment 34 Eyal Dannon 2017-03-06 07:53:49 UTC
(In reply to Sanjay Upadhyay from comment #32)
> Hi Chen,
> 
> you are right, the overcloud image is having older os-net-config. -
> 
> Eyal, can you check on this?
> 
> [stack@rhelv73 ~]$ sudo yum list rhosp-director-images
> Loaded plugins: search-disabled-repos
> Installed Packages
> rhosp-director-images.noarch  10.0-20170201.1.el7ost   
> @rhel-7-server-openstack-10-rpms
> [root@rhelv73 ~]# LIBGUESTFS_BACKEND=direct guestmount -a
> ./overcloud-full.qcow2 -m /dev/sda /mnt
> [root@rhelv73 mnt]# logout
> [stack@rhelv73 ~]$ sudo chroot /mnt
> [root@rhelv73 /]# cd
> [root@rhelv73 ~]# rpm -q os-net-config
> os-net-config-5.0.0-5.el7ost.noarch
> 
> ^^^ overcloud image in the latest package has the old os-net-config, which
> would lead to a failed deploy.

Hi,

Not in my case, both have the same version..

[root@controller-0 ~]# rpm -qa | grep os-net-config
os-net-config-5.1.0-1.el7ost.noarch

Comment 35 Chen 2017-03-06 08:00:47 UTC
Hi Eyal,

Which image version are you using ?

Comment 36 Eyal Dannon 2017-03-06 08:53:14 UTC
Hi Chen,

[root@panther05 ~]# rpm -qa | grep rhosp
rhosp-director-images-ipa-10.0-20170214.1.el7ost.noarch
rhosp-director-images-10.0-20170214.1.el7ost.noarch
[root@panther05 ~]# yum list rhosp-director-images
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Installed Packages
rhosp-director-images.noarch       10.0-20170214.1.el7ost        @rhelosp-10.0-puddle

Any further information?

Comment 37 Chen 2017-03-08 08:59:26 UTC
Hi Eyal. Sanjay and Jon,

The customer confirmed that 10.0-20170228.1 from puddle solved the bonding issue but it is not practical to let the customer use our internal images. Do you have any idea when the new images will be released ?

Best Regards,
Chen

Comment 38 Jon Schlueter 2017-03-08 13:11:49 UTC
(In reply to Chen from comment #37)
> Hi Eyal. Sanjay and Jon,
> 
> The customer confirmed that 10.0-20170228.1 from puddle solved the bonding
> issue but it is not practical to let the customer use our internal images.
> Do you have any idea when the new images will be released ?
> 
> Best Regards,
> Chen

Should be shipping today.

Comment 39 Jon Schlueter 2017-03-16 17:24:49 UTC
This is already shipped live, if you have additional issues here please open a new issue please open a new bug for the issue.


Note You need to log in before you can comment on or make changes to this bug.