Bug 1561870 - OSP10: Support for dpdkvhostuserclient mode
Summary: OSP10: Support for dpdkvhostuserclient mode
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: async
: 10.0 (Newton)
Assignee: Sahid Ferdjaoui
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On: 1557850 1568355 1568356 1572510
Blocks: 1561869
TreeView+ depends on / blocked
 
Reported: 2018-03-29 05:25 UTC by Saravanan KR
Modified: 2023-03-21 18:46 UTC (History)
19 users (show)

Fixed In Version: openstack-nova-14.0.3-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1557850
Environment:
Last Closed: 2018-06-27 17:23:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport to analyze error log after applying vhostuserclient patches (10.76 MB, application/x-xz)
2018-04-04 07:42 UTC, Saravanan KR
no flags Details
Migration Hang - compute-0 (migration target) (13.00 MB, application/x-xz)
2018-04-24 05:23 UTC, Saravanan KR
no flags Details
Migration Hang - compute-1 (migration source) (12.98 MB, application/x-xz)
2018-04-24 05:25 UTC, Saravanan KR
no flags Details
Migration fail (comment #9) compute-1 migration target (10.71 MB, application/x-xz)
2018-04-26 10:33 UTC, Saravanan KR
no flags Details
Migration fail (comment #9) compute-0 migration source (10.87 MB, application/x-xz)
2018-04-26 10:37 UTC, Saravanan KR
no flags Details

Comment 1 Sahid Ferdjaoui 2018-03-29 07:04:37 UTC
It's probably something we can do for Nova (os-vif) but an update should also be done in Neutron side to have the OVS agent returning the correct vhost user mode based on OVS capabilities.

Comment 3 Saravanan KR 2018-04-04 07:38:53 UTC
While validating the changes in OSP10 puddle, there is a ERROR log after below changes:

* Deploy OSP10 puddle 2018-04-02.1
* Create a VM with DPDK network
* Integrate all 3 patches on controller and compute
* On Compute node
    mkdir /var/lib/vhost_sockets
    chown qemu:qemu /var/lib/vhost_sockets
    Modified file /etc/neutron/plugins/ml2/openvswitch_agent.ini
      vhostuser_socket_dir = /var/lib/vhost_sockets
* Restart neutron and nova services

By looking at the log files, below ERROR log pops-up on the nova-compute.log after restarting the services (sosreport attached)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [req-abbb0aaf-9c95-4622-9ea4-9ab81e62c351 - - - - -] [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] An error occurred while refreshing the network cache.
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] Traceback (most recent call last):
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5814, in _heal_instance_info_cache
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     self.network_api.get_instance_nw_info(context, instance)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/nova/network/base_api.py", line 249, in get_instance_nw_info
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     result = self._get_instance_nw_info(context, instance, **kwargs)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 1300, in _get_instance_nw_info
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     preexisting_port_ids)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 2220, in _build_network_info_model
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     data = client.list_ports(**search_opts)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 110, in wrapper
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     ret = obj(*args, **kwargs)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 743, in list_ports
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     **_params)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 110, in wrapper
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     ret = obj(*args, **kwargs)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 376, in list
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     for r in self._pagination(collection, path, **params):
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 391, in _pagination
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     res = self.get(path, params=params)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 110, in wrapper
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     ret = obj(*args, **kwargs)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 361, in get
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     headers=headers, params=params)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 110, in wrapper
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     ret = obj(*args, **kwargs)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 338, in retry_request
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     headers=headers, params=params)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 110, in wrapper
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     ret = obj(*args, **kwargs)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 289, in do_request
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     resp, replybody = self.httpclient.do_request(action, method, body=body)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 311, in do_request
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     return self.request(url, method, **kwargs)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 299, in request
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     resp = super(SessionClient, self).request(*args, **kwargs)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 112, in request
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     return self.session.request(url, method, **kwargs)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/positional/__init__.py", line 101, in inner
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     return wrapped(*args, **kwargs)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/keystoneauth1/session.py", line 579, in request
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     resp = send(**kwargs)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]   File "/usr/lib/python2.7/site-packages/keystoneauth1/session.py", line 620, in _send_request
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e]     raise exceptions.ConnectTimeout(msg)
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] ConnectTimeout: Request to http://10.150.81.25:9696/v2.0/ports.json?tenant_id=6c5e5823e570438b9d770eb0bc097418&device_id=4c6cc3fd-096a-416d-9f55-a7ce093d3a0e timed out
2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Comment 4 Saravanan KR 2018-04-04 07:42:23 UTC
Created attachment 1417082 [details]
sosreport to analyze error log after applying vhostuserclient patches

Comment 6 Saravanan KR 2018-04-24 05:21:12 UTC
Setup: 1 controller and 2 compute nodes

* Deployed OSP10 with ovs as server mode (dpdkvhostuser)
* Created a VM in compute-1
* Executed minor update with neutron and nova changes required for dpdkvhostuserclient mode and other tripleo change required for ovs2.9
* Rebooted compute-0 after the update is complete
* Create a new VM, created successfully on compute-0 with dpdkvhostuserclient mode
* Migrate existing VM in compute-1 to compute-0
* Migration failed
* Neutron port is updated with the new socket directory path
* Virsh xml is created with old directory where as ovs is listening to the new directory 
* Migration hangs

Comment 7 Saravanan KR 2018-04-24 05:23:24 UTC
Created attachment 1425799 [details]
Migration Hang - compute-0 (migration target)

Comment 8 Saravanan KR 2018-04-24 05:25:11 UTC
Created attachment 1425800 [details]
Migration Hang - compute-1 (migration source)

Comment 9 Saravanan KR 2018-04-26 10:16:07 UTC
This time instead of chaning the vhost socket directory to new "/var/lib/vhost_sockets" as like comment #6, I tried migration with the same directory "/var/run/opevswitch". Here is the behavior:

1 controller + 2 computes, 1 VM on each compute.

scenario 1:
-----------
compute-0 in ovs2.9 after reboot
compute-1 in ovs2.6 without reboot
migrate vm from compute-1 to compute-0
migration is sucessful
vm migrated to compute-0 with the same mode - dpdkvhostuser

scenario 2:
-----------
after above step, reboot compute-1 
now compute-1 up with ovs2.9
migrate the same vm from compute-0 to compute-1
migration fails
ovs is creating socket in dpdkvhostuserclient mode
qemu is also in client mode

---------------------------------
Apr 26 09:59:31 overcloud-compute-1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl -- --may-exist add-br br-int -- set Bridge br-int datapath_type=netdev
Apr 26 09:59:31 overcloud-compute-1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- --may-exist add-port br-int vhu41649dbf-6c -- set Interface vhu41649dbf-6c external-ids:iface-id=41649dbf-6cc1-4779-a9c4-fbeff1f2f4b0 external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:8e:dd:59 external-ids:vm-uuid=930eee66-589b-4a60-92f8-ff8f7197e8d7 type=dpdkvhostuserclient options:vhost-server-path=/var/run/openvswitch/vhu41649dbf-6c
Apr 26 09:59:31 overcloud-compute-1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- set interface vhu41649dbf-6c mtu_request=1496
Apr 26 09:59:33 overcloud-compute-1 nova_migration_wrapper: Allowing connection='10.150.81.23 51804 10.150.81.27 22' command=['nc', '-U', '/var/run/libvirt/libvirt-sock']
Apr 26 09:59:33 overcloud-compute-1 systemd-machined: New machine qemu-3-instance-00000001.
Apr 26 09:59:33 overcloud-compute-1 systemd: Started Virtual Machine qemu-3-instance-00000001.
Apr 26 09:59:33 overcloud-compute-1 systemd: Starting Virtual Machine qemu-3-instance-00000001.
Apr 26 09:59:33 overcloud-compute-1 libvirtd: 2018-04-26 09:59:33.165+0000: 3567: error : qemuMonitorIORead:595 : Unable to read from monitor: Connection reset by peer
Apr 26 09:59:33 overcloud-compute-1 libvirtd: 2018-04-26 09:59:33.165+0000: 3567: error : qemuProcessReportLogError:1912 : internal error: qemu unexpectedly closed the monitor: 2018-04-26T09:59:33.157557Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhu41649dbf-6c: Failed to connect socket /var/run/openvswitch/vhu41649dbf-6c: No such file or directory
Apr 26 09:59:33 overcloud-compute-1 systemd-machined: Machine qemu-3-instance-00000001 terminated.
------------------------
sosreport will be attached for this scenario.

Comment 10 Saravanan KR 2018-04-26 10:33:21 UTC
Created attachment 1427124 [details]
Migration fail (comment #9) compute-1 migration target

Comment 11 Saravanan KR 2018-04-26 10:37:39 UTC
Created attachment 1427127 [details]
Migration fail (comment #9) compute-0 migration source

Comment 12 Lon Hohberger 2018-05-15 10:37:10 UTC
According to our records, this should be resolved by openstack-nova-14.1.0-3.el7ost.  This build is available now.

Comment 14 Yariv 2018-06-21 10:40:47 UTC
Live Migration w/o cpu pinning is working 
see BZ https://bugzilla.redhat.com/show_bug.cgi?id=1542107


Note You need to log in before you can comment on or make changes to this bug.