Bug 1396605

Summary: OVS DPDK bonds fail to configure during provisioning
Product: Red Hat OpenStack Reporter: Andrew Bays <abays>
Component: os-net-configAssignee: Sanjay Upadhyay <supadhya>
Status: CLOSED ERRATA QA Contact: Ofer Blaut <oblaut>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: akarlsso, atelang, cchen, edannon, emacchi, fbaudin, hbrock, hrushikesh.gangur, jschluet, jslagle, lpeer, mburns, pillala, rhel-osp-director-maint, smerrow, srevivo, supadhya, suryanarayana.nayani, trozet, vchundur, yrachman
Target Milestone: ---Keywords: Reopened, Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: os-net-config-5.1.0-1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1416421 (view as bug list) Environment:
Last Closed: 2017-03-16 17:24:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1422248, 1428719    
Bug Blocks: 1335596, 1396161, 1406834, 1416421    
Attachments:
Description Flags
/var/log/messages file from compute node
none
ovs-vswitchd.log from compute node
none
ovsdb-server.log from compute node none

Description Andrew Bays 2016-11-18 18:19:07 UTC
Details are available here:

https://bugs.launchpad.net/tripleo/+bug/1643026

Comment 3 Sean Merrow 2017-01-05 21:41:40 UTC
Partner has hit this issue, but the work-around listed in the upstream did not work for him and he got a new message. Can you please take a look and offer guidance?

Comment 4 surya 2017-01-06 08:59:55 UTC
Created attachment 1237886 [details]
/var/log/messages file from compute node

Comment 5 surya 2017-01-06 09:00:39 UTC
Created attachment 1237887 [details]
ovs-vswitchd.log from compute node

Comment 6 surya 2017-01-06 09:01:27 UTC
Created attachment 1237889 [details]
ovsdb-server.log from compute node

Comment 7 Andrew Bays 2017-01-06 13:10:07 UTC
@surya

I ran into another issue after I found the first workaround that I forgot to include here.  By default, OVS doesn't seem to be running with the "--dpdk" flag and other required parameters (even though it seems to be properly set to do so in the OOO templates).  That's why you see these messages repeated in your ovs-vswitchd.log:

...
2017-01-06T12:03:14.291Z|00070|netdev|WARN|could not create netdev dpdk0 of unknown type dpdk
...
2017-01-06T12:03:14.291Z|00072|netdev|WARN|could not create netdev dpdk1 of unknown type dpdk
...

Check to see if this is the case with the partner's setup.  What I had to do was to run OVS manually with the missing DPDK flags because setting DPDK_OPTIONS in OOO templates (NeutronDpdkCoreList, NeutronDpdkMemoryChannels, NeutronDpdkSocketMemory and NeutronDpdkDriverType) did not work properly.  Once I had OVS running with DPDK enabled, os-collect-config was able to add the ports when it looped and tried to configure networking again.

Comment 8 pillala 2017-01-10 05:17:53 UTC
OVS DPDK BONDING [ ACTIVE BACKUP ] 

ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 2048,2048 \ -- unix:$DB_SOCK --pidfile --detach


NOTE : This is completely manual work Around, it is working. 

NOTE : We are completely doing the automated deployment and definitely looking for patch to help us in the automation. 

We require PATCH for OVS DPDK BONDING for AUTOMATED DEPLOYMENT

Comment 9 Yariv 2017-01-11 07:55:43 UTC
(In reply to Andrew Bays from comment #7)
> @surya
> 
> I ran into another issue after I found the first workaround that I forgot to
> include here.  By default, OVS doesn't seem to be running with the "--dpdk"
> flag and other required parameters (even though it seems to be properly set
> to do so in the OOO templates).  That's why you see these messages repeated
> in your ovs-vswitchd.log:
> 
> ...
> 2017-01-06T12:03:14.291Z|00070|netdev|WARN|could not create netdev dpdk0 of
> unknown type dpdk
> ...
> 2017-01-06T12:03:14.291Z|00072|netdev|WARN|could not create netdev dpdk1 of
> unknown type dpdk
> ...
> 
> Check to see if this is the case with the partner's setup.  What I had to do
> was to run OVS manually with the missing DPDK flags because setting
> DPDK_OPTIONS in OOO templates (NeutronDpdkCoreList,
> NeutronDpdkMemoryChannels, NeutronDpdkSocketMemory and
> NeutronDpdkDriverType) did not work properly.  Once I had OVS running with
> DPDK enabled, os-collect-config was able to add the ports when it looped and
> tried to configure networking again.
Please refer
https://bugzilla.redhat.com/show_bug.cgi?id=1366356

Comment 11 Sanjay Upadhyay 2017-01-11 10:46:19 UTC
patch submitted -  https://review.openstack.org/#/c/417805/ 
moving to QA

Comment 12 hrushi 2017-01-11 16:35:07 UTC
@(In reply to Sanjay Upadhyay from comment #11)
> patch submitted -  https://review.openstack.org/#/c/417805/ 
> moving to QA

Does this patch take care of the workaround mentioned in comment #7 and #8?

Comment 13 Sean Merrow 2017-01-13 14:51:01 UTC
Hi Sanjay, does the patch you submitted include the work-arounds for those mentioned in comments 7 and 8?

Comment 15 Sanjay Upadhyay 2017-01-16 08:01:53 UTC
(In reply to Sean Merrow from comment #13)
> Hi Sanjay, does the patch you submitted include the work-arounds for those
> mentioned in comments 7 and 8?

some parts of the workaround are not required, ie the restarting of ovs-vswitchd. THe ovsswitchd is being started by the installer on step4 of the installation.

Comment 16 surya 2017-01-17 15:02:23 UTC
Thanks Sanjay for the workaround. We are able to deploy a compute node with OVS dpdkbond in active-stndby mode

Comment 17 Yariv 2017-01-19 11:24:24 UTC
Changed back to On_DEV
Patch is not merged yet.. 10z released on January 25th
Please put change status Back, once the patch received +2

Comment 18 Sanjay Upadhyay 2017-01-20 08:08:13 UTC
(In reply to Yariv from comment #17)
> Changed back to On_DEV
> Patch is not merged yet.. 10z released on January 25th
> Please put change status Back, once the patch received +2

https://review.openstack.org/#/c/417805 is merged now, placing the bug on_qa again.

Comment 19 Vijay Chundury 2017-01-24 08:43:11 UTC
sanjay, i believe this patch is merged. Can you create a new bz and start the downporting asap.

Yariv,
We will start the backport, but let us know if with this change we can get past this issue.

Regards
Vijay.

Comment 20 Jon Schlueter 2017-01-25 14:52:09 UTC
moving back to POST as patch is not in a build yet, but has been proposed/merged upstream

Comment 21 Sanjay Upadhyay 2017-01-25 15:00:24 UTC
*** Bug 1416421 has been marked as a duplicate of this bug. ***

Comment 24 Sanjay Upadhyay 2017-02-14 05:17:24 UTC
Thanks @Jschlueter.

This is now packaged, and I am moving the bug to QA for testing and validating. I have run a test and this is fixed.

Comment 26 Eyal Dannon 2017-02-16 12:31:44 UTC
I have verified DPDK + bonding deployment with "os-net-config-5.1.0-1.el7ost.noarch".

Comment 27 Sanjay Upadhyay 2017-02-21 08:12:02 UTC
os-net-config-5.1.0-1.el7ost.noarch released

Comment 28 Chen 2017-03-02 06:33:35 UTC
Hi,

Sorry for reopening the bugzilla.

I tried the newest 10.0.1 image from official site but the os-net-config version is still os-net-config-5.0.0-5.el7ost.noarch. Where can I get images which contain os-net-config-5.1.0-1 ?

Best Regards,
Chen

Comment 29 Chen 2017-03-03 06:10:28 UTC
Hi Sanjay,

Can you guide me where I can get image with os-net-config-5.1.0-1 ?

Best Regards,
Chen

Comment 30 Sanjay Upadhyay 2017-03-03 07:11:59 UTC
[root@rhelv73 ~]# yum list os-net-config
Loaded plugins: search-disabled-repos
Available Packages
os-net-config.noarch      5.1.0-1.el7ost     rhel-7-server-openstack-10-rpms


Can you please tally from the info given above with your setup. I am guessing its a simple case of correcting to correct repolist. It should have rhel-7-server-openstack-10-rpms.

Comment 31 Chen 2017-03-03 07:45:44 UTC
Hi Sanjay,

I am using director to deploy the overcloud nodes. My understanding is that it should be the os-net-config inside the overcloud image who configures the ovs-dpdk bond, not the packages in the official repository. I understand new os-net-config is available in the official repo but the os-net-config inside the overcloud image is too old to configure the ovs-dpdk bond. 

Or do I need to update the os-net-config in undercloud side ? Please correct me if I'm wrong.

Best Regards,
Chen

Comment 32 Sanjay Upadhyay 2017-03-03 08:55:06 UTC
Hi Chen,

you are right, the overcloud image is having older os-net-config. -

Eyal, can you check on this?

[stack@rhelv73 ~]$ sudo yum list rhosp-director-images
Loaded plugins: search-disabled-repos
Installed Packages
rhosp-director-images.noarch  10.0-20170201.1.el7ost    @rhel-7-server-openstack-10-rpms
[root@rhelv73 ~]# LIBGUESTFS_BACKEND=direct guestmount -a ./overcloud-full.qcow2 -m /dev/sda /mnt
[root@rhelv73 mnt]# logout
[stack@rhelv73 ~]$ sudo chroot /mnt
[root@rhelv73 /]# cd
[root@rhelv73 ~]# rpm -q os-net-config
os-net-config-5.0.0-5.el7ost.noarch

^^^ overcloud image in the latest package has the old os-net-config, which would lead to a failed deploy.

Comment 33 Chen 2017-03-06 03:23:17 UTC
Hi Sanjay,

This case currently affects a serials of huawei PoC project so it is in a high priority from Red Hat side.

Best Regards,
Chen

Comment 34 Eyal Dannon 2017-03-06 07:53:49 UTC
(In reply to Sanjay Upadhyay from comment #32)
> Hi Chen,
> 
> you are right, the overcloud image is having older os-net-config. -
> 
> Eyal, can you check on this?
> 
> [stack@rhelv73 ~]$ sudo yum list rhosp-director-images
> Loaded plugins: search-disabled-repos
> Installed Packages
> rhosp-director-images.noarch  10.0-20170201.1.el7ost   
> @rhel-7-server-openstack-10-rpms
> [root@rhelv73 ~]# LIBGUESTFS_BACKEND=direct guestmount -a
> ./overcloud-full.qcow2 -m /dev/sda /mnt
> [root@rhelv73 mnt]# logout
> [stack@rhelv73 ~]$ sudo chroot /mnt
> [root@rhelv73 /]# cd
> [root@rhelv73 ~]# rpm -q os-net-config
> os-net-config-5.0.0-5.el7ost.noarch
> 
> ^^^ overcloud image in the latest package has the old os-net-config, which
> would lead to a failed deploy.

Hi,

Not in my case, both have the same version..

[root@controller-0 ~]# rpm -qa | grep os-net-config
os-net-config-5.1.0-1.el7ost.noarch

Comment 35 Chen 2017-03-06 08:00:47 UTC
Hi Eyal,

Which image version are you using ?

Comment 36 Eyal Dannon 2017-03-06 08:53:14 UTC
Hi Chen,

[root@panther05 ~]# rpm -qa | grep rhosp
rhosp-director-images-ipa-10.0-20170214.1.el7ost.noarch
rhosp-director-images-10.0-20170214.1.el7ost.noarch
[root@panther05 ~]# yum list rhosp-director-images
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
Installed Packages
rhosp-director-images.noarch       10.0-20170214.1.el7ost        @rhelosp-10.0-puddle

Any further information?

Comment 37 Chen 2017-03-08 08:59:26 UTC
Hi Eyal. Sanjay and Jon,

The customer confirmed that 10.0-20170228.1 from puddle solved the bonding issue but it is not practical to let the customer use our internal images. Do you have any idea when the new images will be released ?

Best Regards,
Chen

Comment 38 Jon Schlueter 2017-03-08 13:11:49 UTC
(In reply to Chen from comment #37)
> Hi Eyal. Sanjay and Jon,
> 
> The customer confirmed that 10.0-20170228.1 from puddle solved the bonding
> issue but it is not practical to let the customer use our internal images.
> Do you have any idea when the new images will be released ?
> 
> Best Regards,
> Chen

Should be shipping today.

Comment 39 Jon Schlueter 2017-03-16 17:24:49 UTC
This is already shipped live, if you have additional issues here please open a new issue please open a new bug for the issue.