It's probably something we can do for Nova (os-vif) but an update should also be done in Neutron side to have the OVS agent returning the correct vhost user mode based on OVS capabilities.
While validating the changes in OSP10 puddle, there is a ERROR log after below changes: * Deploy OSP10 puddle 2018-04-02.1 * Create a VM with DPDK network * Integrate all 3 patches on controller and compute * On Compute node mkdir /var/lib/vhost_sockets chown qemu:qemu /var/lib/vhost_sockets Modified file /etc/neutron/plugins/ml2/openvswitch_agent.ini vhostuser_socket_dir = /var/lib/vhost_sockets * Restart neutron and nova services By looking at the log files, below ERROR log pops-up on the nova-compute.log after restarting the services (sosreport attached) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [req-abbb0aaf-9c95-4622-9ea4-9ab81e62c351 - - - - -] [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] An error occurred while refreshing the network cache. 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] Traceback (most recent call last): 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 5814, in _heal_instance_info_cache 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] self.network_api.get_instance_nw_info(context, instance) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/nova/network/base_api.py", line 249, in get_instance_nw_info 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] result = self._get_instance_nw_info(context, instance, **kwargs) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 1300, in _get_instance_nw_info 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] preexisting_port_ids) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 2220, in _build_network_info_model 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] data = client.list_ports(**search_opts) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 110, in wrapper 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] ret = obj(*args, **kwargs) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 743, in list_ports 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] **_params) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 110, in wrapper 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] ret = obj(*args, **kwargs) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 376, in list 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] for r in self._pagination(collection, path, **params): 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 391, in _pagination 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] res = self.get(path, params=params) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 110, in wrapper 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] ret = obj(*args, **kwargs) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 361, in get 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] headers=headers, params=params) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 110, in wrapper 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] ret = obj(*args, **kwargs) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 338, in retry_request 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] headers=headers, params=params) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 110, in wrapper 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] ret = obj(*args, **kwargs) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 289, in do_request 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] resp, replybody = self.httpclient.do_request(action, method, body=body) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 311, in do_request 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] return self.request(url, method, **kwargs) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 299, in request 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] resp = super(SessionClient, self).request(*args, **kwargs) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 112, in request 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] return self.session.request(url, method, **kwargs) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/positional/__init__.py", line 101, in inner 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] return wrapped(*args, **kwargs) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/keystoneauth1/session.py", line 579, in request 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] resp = send(**kwargs) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] File "/usr/lib/python2.7/site-packages/keystoneauth1/session.py", line 620, in _send_request 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] raise exceptions.ConnectTimeout(msg) 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] ConnectTimeout: Request to http://10.150.81.25:9696/v2.0/ports.json?tenant_id=6c5e5823e570438b9d770eb0bc097418&device_id=4c6cc3fd-096a-416d-9f55-a7ce093d3a0e timed out 2018-04-03 15:00:19.441 102408 ERROR nova.compute.manager [instance: 4c6cc3fd-096a-416d-9f55-a7ce093d3a0e] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Created attachment 1417082 [details] sosreport to analyze error log after applying vhostuserclient patches
Setup: 1 controller and 2 compute nodes * Deployed OSP10 with ovs as server mode (dpdkvhostuser) * Created a VM in compute-1 * Executed minor update with neutron and nova changes required for dpdkvhostuserclient mode and other tripleo change required for ovs2.9 * Rebooted compute-0 after the update is complete * Create a new VM, created successfully on compute-0 with dpdkvhostuserclient mode * Migrate existing VM in compute-1 to compute-0 * Migration failed * Neutron port is updated with the new socket directory path * Virsh xml is created with old directory where as ovs is listening to the new directory * Migration hangs
Created attachment 1425799 [details] Migration Hang - compute-0 (migration target)
Created attachment 1425800 [details] Migration Hang - compute-1 (migration source)
This time instead of chaning the vhost socket directory to new "/var/lib/vhost_sockets" as like comment #6, I tried migration with the same directory "/var/run/opevswitch". Here is the behavior: 1 controller + 2 computes, 1 VM on each compute. scenario 1: ----------- compute-0 in ovs2.9 after reboot compute-1 in ovs2.6 without reboot migrate vm from compute-1 to compute-0 migration is sucessful vm migrated to compute-0 with the same mode - dpdkvhostuser scenario 2: ----------- after above step, reboot compute-1 now compute-1 up with ovs2.9 migrate the same vm from compute-0 to compute-1 migration fails ovs is creating socket in dpdkvhostuserclient mode qemu is also in client mode --------------------------------- Apr 26 09:59:31 overcloud-compute-1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl -- --may-exist add-br br-int -- set Bridge br-int datapath_type=netdev Apr 26 09:59:31 overcloud-compute-1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- --may-exist add-port br-int vhu41649dbf-6c -- set Interface vhu41649dbf-6c external-ids:iface-id=41649dbf-6cc1-4779-a9c4-fbeff1f2f4b0 external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:8e:dd:59 external-ids:vm-uuid=930eee66-589b-4a60-92f8-ff8f7197e8d7 type=dpdkvhostuserclient options:vhost-server-path=/var/run/openvswitch/vhu41649dbf-6c Apr 26 09:59:31 overcloud-compute-1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- set interface vhu41649dbf-6c mtu_request=1496 Apr 26 09:59:33 overcloud-compute-1 nova_migration_wrapper: Allowing connection='10.150.81.23 51804 10.150.81.27 22' command=['nc', '-U', '/var/run/libvirt/libvirt-sock'] Apr 26 09:59:33 overcloud-compute-1 systemd-machined: New machine qemu-3-instance-00000001. Apr 26 09:59:33 overcloud-compute-1 systemd: Started Virtual Machine qemu-3-instance-00000001. Apr 26 09:59:33 overcloud-compute-1 systemd: Starting Virtual Machine qemu-3-instance-00000001. Apr 26 09:59:33 overcloud-compute-1 libvirtd: 2018-04-26 09:59:33.165+0000: 3567: error : qemuMonitorIORead:595 : Unable to read from monitor: Connection reset by peer Apr 26 09:59:33 overcloud-compute-1 libvirtd: 2018-04-26 09:59:33.165+0000: 3567: error : qemuProcessReportLogError:1912 : internal error: qemu unexpectedly closed the monitor: 2018-04-26T09:59:33.157557Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhu41649dbf-6c: Failed to connect socket /var/run/openvswitch/vhu41649dbf-6c: No such file or directory Apr 26 09:59:33 overcloud-compute-1 systemd-machined: Machine qemu-3-instance-00000001 terminated. ------------------------ sosreport will be attached for this scenario.
Created attachment 1427124 [details] Migration fail (comment #9) compute-1 migration target
Created attachment 1427127 [details] Migration fail (comment #9) compute-0 migration source
According to our records, this should be resolved by openstack-nova-14.1.0-3.el7ost. This build is available now.
Live Migration w/o cpu pinning is working see BZ https://bugzilla.redhat.com/show_bug.cgi?id=1542107