Bug 1804752

Summary: Not able to spawn vms after minor update (osp13 z4 to latest)
Product: Red Hat OpenStack Reporter: Miguel Angel Nieto <mnietoji>
Component: openstack-novaAssignee: Piotr Kopec <pkopec>
Status: CLOSED NOTABUG QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: dasmith, eglynn, ekuris, fbaudin, gmuthukr, jhakimra, kchamart, lyarwood, mbracho, morazi, oblaut, pkopec, sbauza, sgordon, smooney, supadhya, vromanso
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-06 14:48:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Miguel Angel Nieto 2020-02-19 14:54:26 UTC
Description of problem:
I have installed osp13 z4 and updated to latest. I didnt get any error during update. But after update all tempest testcases are failing as it is not possible to spawn a vm


Version-Release number of selected component (if applicable):


How reproducible:
Minor update from osp13z4 to latest using these templates:
https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=tree;f=ospd-13-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid-minor-update;h=458669a799ae1037efde30094bac0e639ba72c38;hb=refs/heads/ci

Spawn a vm
openstack server create --flavor $flavor_id --image $image_id --key-name  mykeypair --nic net-id=$network_id myinstance2


Actual results:

VM in BUILD state, it never goes to active
(overcloud) [stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+-------------+--------+------------------------+---------------------------------------+----------+
| ID                                   | Name        | Status | Networks               | Image                                 | Flavor   |
+--------------------------------------+-------------+--------+------------------------+---------------------------------------+----------+
| 25002de0-8743-4597-af26-548ad985368d | myinstance2 | BUILD  | mynetwork=192.168.20.5 | rhel-guest-image-7-6-210-x86-64-qcow2 | myflavor |
+--------------------------------------+-------------+--------+------------------------+---------------------------------------+----------+

Expected results:
VM should be in active state


Additional info:
Some error in log files:
COMPUTE:
messages:Feb 19 13:03:04 compute-1 systemd-machined: New machine qemu-1-instance-00000060.
messages:Feb 19 13:03:04 compute-1 systemd: Started Virtual Machine qemu-1-instance-00000060.
messages:Feb 19 13:04:07 compute-1 journal: 2020-02-19 13:04:07.777+0000: 5165: warning : qemuDomainObjBeginJobInternal:6722 : Cannot start job (query, none, none) for domain instance-00000060; current job is (async nested, none, start) owned by (5167 remoteDispatchDomainCreateWithFlags, 0 <null>, 5167 remoteDispatchDomainCreateWithFlags (flags=0x1)) for (63s, 0s, 63s)
messages:Feb 19 13:04:37 compute-1 journal: 2020-02-19 13:04:37.783+0000: 5165: warning : qemuDomainObjBeginJobInternal:6722 : Cannot start job (query, none, none) for domain instance-00000060; current job is (async nested, none, start) owned by (5167 remoteDispatchDomainCreateWithFlags, 0 <null>, 5167 remoteDispatchDomainCreateWithFlags (flags=0x1)) for (93s, 0s, 93s)

[root@compute-1 heat-admin]# virsh list
 Id    Name                           State
----------------------------------------------------
 1     instance-00000060              paused

Comment 7 Stephen Finucane 2020-03-06 15:48:17 UTC
Assuming this can be reproduced, can we get debug-level logs for libvirt also, please? You can follow the guide at [1] for more information.

[1] https://kashyapc.fedorapeople.org/Notes/docs/qemu-and-libvirt-docs/request-nova-libvirt-qemu-debug-logs.txt

Comment 10 Stephen Finucane 2020-03-27 11:27:36 UTC
What's the output of 'virt-host-validate'? Also, are you still able to run 'virsh list'? I can't see anything suspect in the libvirt logs provided yet.

Comment 11 Miguel Angel Nieto 2020-03-27 11:35:30 UTC
Stephen, I have not the setup now, next week i can configure the setup and allow you to access

Comment 12 Miguel Angel Nieto 2020-03-30 08:23:03 UTC
[heat-admin@compute-1 ~]$ virt-host-validate
  QEMU: Verificando for hardware virtualization                                 : PASA
  QEMU: Verificando if device /dev/kvm exists                                   : PASA
  QEMU: Verificando if device /dev/kvm is accessible                            : PASA
  QEMU: Verificando if device /dev/vhost-net exists                             : PASA
  QEMU: Verificando if device /dev/net/tun exists                               : PASA
  QEMU: Verificando for cgroup 'memory' controller support                      : PASA
  QEMU: Verificando for cgroup 'memory' controller mount-point                  : PASA
  QEMU: Verificando for cgroup 'cpu' controller support                         : PASA
  QEMU: Verificando for cgroup 'cpu' controller mount-point                     : PASA
  QEMU: Verificando for cgroup 'cpuacct' controller support                     : PASA
  QEMU: Verificando for cgroup 'cpuacct' controller mount-point                 : PASA
  QEMU: Verificando for cgroup 'cpuset' controller support                      : PASA
  QEMU: Verificando for cgroup 'cpuset' controller mount-point                  : PASA
  QEMU: Verificando for cgroup 'devices' controller support                     : PASA
  QEMU: Verificando for cgroup 'devices' controller mount-point                 : PASA
  QEMU: Verificando for cgroup 'blkio' controller support                       : PASA
  QEMU: Verificando for cgroup 'blkio' controller mount-point                   : PASA
  QEMU: Verificando for device assignment IOMMU support                         : PASA
  QEMU: Verificando if IOMMU is enabled by kernel                               : PASA
   LXC: Verificando Para Linux >= 2.6.26                                        : PASA
   LXC: Verificando for namespace ipc                                           : PASA
   LXC: Verificando for namespace mnt                                           : PASA
   LXC: Verificando for namespace pid                                           : PASA
   LXC: Verificando for namespace uts                                           : PASA
   LXC: Verificando for namespace net                                           : PASA
   LXC: Verificando for namespace user                                          : PASA
   LXC: Verificando for cgroup 'memory' controller support                      : PASA
   LXC: Verificando for cgroup 'memory' controller mount-point                  : PASA
   LXC: Verificando for cgroup 'cpu' controller support                         : PASA
   LXC: Verificando for cgroup 'cpu' controller mount-point                     : PASA
   LXC: Verificando for cgroup 'cpuacct' controller support                     : PASA
   LXC: Verificando for cgroup 'cpuacct' controller mount-point                 : PASA
   LXC: Verificando for cgroup 'cpuset' controller support                      : PASA
   LXC: Verificando for cgroup 'cpuset' controller mount-point                  : PASA
   LXC: Verificando for cgroup 'devices' controller support                     : PASA
   LXC: Verificando for cgroup 'devices' controller mount-point                 : PASA
   LXC: Verificando for cgroup 'blkio' controller support                       : PASA
   LXC: Verificando for cgroup 'blkio' controller mount-point                   : PASA
   LXC: Verificando if device /sys/fs/fuse/connections exists                   : FALLA (Load the 'fuse' module to enable /proc/ overrides)

Comment 13 Miguel Angel Nieto 2020-03-30 13:23:59 UTC
[root@compute-1 heat-admin]# virsh list
 Id    Nombre                         Estado
----------------------------------------------------
 1     instance-00000002              en pausa

Comment 14 Miguel Angel Nieto 2020-03-30 13:36:39 UTC
Stephen Finucane, I have an environment ready in case you want to check. Contact me by chat (mnietoji)

Comment 15 Stephen Finucane 2020-03-31 13:43:36 UTC
Had a look at the compute node. We're seeing the following errors in the 'nova_libvirt' container:

  2020-03-31 13:40:39.297+0000: 901849: error : virNetSocketNewConnectUNIX:712 : Failed to connect socket to '/var/run/libvirt/virtlogd-sock': Connection refused
  2020-03-31 13:40:39.298+0000: 901849: error : virNetSocketNewConnectUNIX:712 : Failed to connect socket to '/var/run/libvirt/virtlogd-sock': Connection refused

Considering the file is created by that same container (it disappears when the container is stopped), I'm not sure why would be the case. There's nothing in the SELinux logs on the host to suggest that's to blame.

Someone from deployment will have to look at this.