Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1583976

Summary: Contrail: Support for qemu reconnect in client mode of operation
Product: Red Hat OpenStack Reporter: Jeya ganesh babu J <jjeya>
Component: openstack-novaAssignee: smooney
Status: CLOSED NOTABUG QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: medium Docs Contact:
Priority: medium    
Version: 14.0 (Rocky)CC: cylopez, dasmith, egallen, eglynn, jhakimra, jjeya, kchamart, knoel, lyarwood, mbooth, mprivozn, mrussell, sbauza, sferdjao, sgordon, smooney, srevivo, virt-maint, vromanso
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-17 14:09:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeya ganesh babu J 2018-05-30 06:03:36 UTC
Description of problem:

When qemu runs in client mode for vhostuser network interface and if the vhostuser application restarts, qemu doesnot reconnect back to the vhostuser application.
This causes loss of connectivity to the guest VMs. 

Version-Release number of selected component (if applicable):
qemu-kvm-rhev.x86_64               10:2.10.0-21.el7_5.3

How reproducible:
The issue can be reproduced with a contrail dpdk setup and restarting the contrail-vrouter-dpdk application.

Steps to Reproduce:
1. Provision a contrail dpdk setup.
2. Launch a VM.
3. Restart the dpdk vhostuser application using 'service supervisor-vrouter restart'

Actual results:
The VM connectivity will be lost.

Expected results:
The VM connectivity to be restored once the dpdk application comes up.

Additional info:
The issue was fixed by adding retry count in qemu. 
https://github.com/Juniper/contrail-dpdk-extra-packages/blob/kilo/patches/trusty/qemu/CODE-chardev-reconnect.patch

Comment 1 Kashyap Chamarthy 2018-05-31 12:05:04 UTC
I don't see anything specific to OpenStack here.

If you want to request changes in 'qemu-kvm-rhev', the Product should be Red Hat Enterprise Linux.  So I've changed the bug 'Product'.

Comment 3 Sahid Ferdjaoui 2018-06-01 14:55:09 UTC
Based on the QEMU patch shared I imagine the request is on Nova to configure the vhostuser interfaces with reconnect enabled [0].

Could you confirm that?

[0] https://libvirt.org/formatdomain.html#elementVhostuser

Comment 4 Jeya ganesh babu J 2018-06-13 16:26:38 UTC
Yes, ideally it should be from nova. But i am not sure if there is a way to set this from nova.

Comment 5 Amnon Ilan 2018-06-18 10:13:52 UTC
Michal, can you have a look? 
Should the reconnect config mentioned above (comment#3) be done 
by libvirt or Nova?

Comment 6 Sahid Ferdjaoui 2018-06-18 10:25:18 UTC
Seems to be already on libvirt:

  https://libvirt.org/formatdomain.html#elementVhostuser

The question is what would be the good value or do we have to let operators decide.

Comment 7 Michal Privoznik 2018-06-19 04:15:14 UTC
(In reply to Amnon Ilan from comment #5)
>

Yes, libvirt already supports reconnect (from version 4.1.0 onwards). So the only part that is missing is Nova putting the attribute in the domain XML.

(In reply to Sahid Ferdjaoui from comment #6)
> 

I think the value should be somewhere around units of seconds. ten seconds being the upper limit. Restart of vhostuser does not take too long and reconnect basically tells qemu how long to wait between each connect retry. In other words, it's not like qemu gives up reconnecting after first failed attempt.


(In reply to Kashyap Chamarthy from comment #1)
> 

I think this bug should be moved back to OpenStack.

Comment 9 smooney 2018-08-02 19:50:57 UTC
the bevhior described related to restart the vhost user server( the contrail vrouter in this instance) is the expect behaviour of the vhost user protcol.

vhost user recoonect is not supported by qemu when qemu is the client.
the featre was devel sepficly to only work when qemu is the sever and dpdk is the client.

the reason the connect is broken is that when the vhost user server retarts the unix socket i created are closed but qemu is still holind an open file descrtorp to the socket. when the backend restart it creates new unix sockets at the same file paths with different file descriptors. qemu will not detect this and recoonect when it is in client mode.

the desision not to support this when qemu was the clint was made between the dpdk and qemu comunityies to simplfy both codebases and converge on a common deployment configuration where the life time of the unix socket is tied to the liftime of the vm by making qemu the server and dpdk the client.

from a nova neutron design perspective nova required neutron to pass the vhost user socket mode as part of the vif binding details form the neutron ml2 driver.
introducing a nova config option for enabling retry of violates several previous decisions relating to not add networking options to nova.

as such i do not belive this is a bug but rather a miss configuration of the vrouter. if you wish to use reconnect you should set the vhost-user mode in the ml2 dirver to server to indicate that qemu is the server and configure the vrouter as the client.

Comment 10 Sahid Ferdjaoui 2018-08-03 08:36:33 UTC
Sean I think the context of ther equest is vhu server so QEMU in client.

QEMU is now providing a reconnect timeout so the socket will not be closed. That looks reasonable for other vswitch than OVS which are using vhu server (QEMU in client). That has been accepted upstream and it's exposed by libvirt [0]. That looks reasonable to provide such tunable in Nova. Or do you see something not right here?

[0] https://www.redhat.com/archives/libvir-list/2017-September/msg00180.html

Comment 11 smooney 2018-08-03 09:46:49 UTC
allowing the use of the feature is reasonable.
provideidng a nova config option for it is not.

it would be reasonbable to add the xml genration code to nova but enableing it needs to be done via neutron.

Comment 12 smooney 2018-08-03 15:44:11 UTC
*** Bug 1608531 has been marked as a duplicate of this bug. ***

Comment 13 smooney 2018-08-03 15:50:11 UTC
As per bug triage call i have closed
https://bugzilla.redhat.com/show_bug.cgi?id=1608531 as a duplicate and we will use this BZ to track the delivery of this feature.

as also noted this has expressly been requested for backport to osp 13 which is the version on which the customer is currently deployed.
as we cannot determine how feasible a backport is at this time it should be reassessed as part of closing this BZ.

Comment 17 Irina Petrova 2018-11-03 10:32:22 UTC
(In reply to Sahid Ferdjaoui from comment #10)
> Sean I think the context of ther equest is vhu server so QEMU in client.
> 
> QEMU is now providing a reconnect timeout so the socket will not be closed.
> That looks reasonable for other vswitch than OVS which are using vhu server
> (QEMU in client). That has been accepted upstream and it's exposed by
> libvirt [0]. That looks reasonable to provide such tunable in Nova. Or do
> you see something not right here?
> 
> [0] https://www.redhat.com/archives/libvir-list/2017-September/msg00180.html

Trying to be practical here: what is the technical need/justification for this?
Why can't we just set contrail vrouter as a vhost user client and QEMU as a vhost user server? Same way OVS/DPDK and QEMU have been re-designed.

Why are we trying to fix something that we've already decided to move away from? See Sean's c#9.

Comment 19 smooney 2019-04-24 20:39:50 UTC
As of contrail version 5.0 https://github.com/Juniper/contrail-controller/releases/tag/r5.0 
release in july 2018 the contrail controller no longer use vhost-user client mode and exclusively uses
vhost user mode server where qemu is the server and vrouter is the client.
https://github.com/Juniper/contrail-controller/blob/R5.0/src/config/api-server/vnc_cfg_api_server/vnc_cfg_types.py#L2239-L2242
https://github.com/Juniper/contrail-controller/blob/R5.0/src/config/api-server/vnc_cfg_api_server/vnc_cfg_types.py#L2266-L2271
https://github.com/Juniper/contrail-controller/blob/R5.0/src/config/api-server/vnc_cfg_api_server/vnc_cfg_types.py#L2425-L2431
https://github.com/Juniper/contrail-controller/blob/R5.0/src/config/api-server/vnc_cfg_api_server/vnc_cfg_types.py#L2443-L2448

based on https://www.juniper.net/documentation/en_US/contrail5.0/topics/concept/Deploying-Contrail-with-RedHatOpenStack.html#jd0e214
the supported version of contrail with osp 13 was 5.0.1 

as such this is not required as contrail 4.1 is not supproted with osp 13 or later
and features can no longer be backproted to osp10 so i am closing as "not a bug".