1804752 – Not able to spawn vms after minor update (osp13 z4 to latest)

Bug 1804752 - Not able to spawn vms after minor update (osp13 z4 to latest)

Summary: Not able to spawn vms after minor update (osp13 z4 to latest)

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Piotr Kopec
QA Contact:	OSP DFG:Compute
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-02-19 14:54 UTC by Miguel Angel Nieto
Modified:	2023-03-21 19:27 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-06 14:48:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Miguel Angel Nieto 2020-02-19 14:54:26 UTC

Description of problem:
I have installed osp13 z4 and updated to latest. I didnt get any error during update. But after update all tempest testcases are failing as it is not possible to spawn a vm


Version-Release number of selected component (if applicable):


How reproducible:
Minor update from osp13z4 to latest using these templates:
https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=tree;f=ospd-13-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid-minor-update;h=458669a799ae1037efde30094bac0e639ba72c38;hb=refs/heads/ci

Spawn a vm
openstack server create --flavor $flavor_id --image $image_id --key-name  mykeypair --nic net-id=$network_id myinstance2


Actual results:

VM in BUILD state, it never goes to active
(overcloud) [stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+-------------+--------+------------------------+---------------------------------------+----------+
| ID                                   | Name        | Status | Networks               | Image                                 | Flavor   |
+--------------------------------------+-------------+--------+------------------------+---------------------------------------+----------+
| 25002de0-8743-4597-af26-548ad985368d | myinstance2 | BUILD  | mynetwork=192.168.20.5 | rhel-guest-image-7-6-210-x86-64-qcow2 | myflavor |
+--------------------------------------+-------------+--------+------------------------+---------------------------------------+----------+

Expected results:
VM should be in active state


Additional info:
Some error in log files:
COMPUTE:
messages:Feb 19 13:03:04 compute-1 systemd-machined: New machine qemu-1-instance-00000060.
messages:Feb 19 13:03:04 compute-1 systemd: Started Virtual Machine qemu-1-instance-00000060.
messages:Feb 19 13:04:07 compute-1 journal: 2020-02-19 13:04:07.777+0000: 5165: warning : qemuDomainObjBeginJobInternal:6722 : Cannot start job (query, none, none) for domain instance-00000060; current job is (async nested, none, start) owned by (5167 remoteDispatchDomainCreateWithFlags, 0 <null>, 5167 remoteDispatchDomainCreateWithFlags (flags=0x1)) for (63s, 0s, 63s)
messages:Feb 19 13:04:37 compute-1 journal: 2020-02-19 13:04:37.783+0000: 5165: warning : qemuDomainObjBeginJobInternal:6722 : Cannot start job (query, none, none) for domain instance-00000060; current job is (async nested, none, start) owned by (5167 remoteDispatchDomainCreateWithFlags, 0 <null>, 5167 remoteDispatchDomainCreateWithFlags (flags=0x1)) for (93s, 0s, 93s)

[root@compute-1 heat-admin]# virsh list
 Id    Name                           State
----------------------------------------------------
 1     instance-00000060              paused

Comment 7 Stephen Finucane 2020-03-06 15:48:17 UTC

Assuming this can be reproduced, can we get debug-level logs for libvirt also, please? You can follow the guide at [1] for more information.

[1] https://kashyapc.fedorapeople.org/Notes/docs/qemu-and-libvirt-docs/request-nova-libvirt-qemu-debug-logs.txt

Comment 10 Stephen Finucane 2020-03-27 11:27:36 UTC

What's the output of 'virt-host-validate'? Also, are you still able to run 'virsh list'? I can't see anything suspect in the libvirt logs provided yet.

Comment 11 Miguel Angel Nieto 2020-03-27 11:35:30 UTC

Stephen, I have not the setup now, next week i can configure the setup and allow you to access

Comment 12 Miguel Angel Nieto 2020-03-30 08:23:03 UTC

[heat-admin@compute-1 ~]$ virt-host-validate
  QEMU: Verificando for hardware virtualization                                 : PASA
  QEMU: Verificando if device /dev/kvm exists                                   : PASA
  QEMU: Verificando if device /dev/kvm is accessible                            : PASA
  QEMU: Verificando if device /dev/vhost-net exists                             : PASA
  QEMU: Verificando if device /dev/net/tun exists                               : PASA
  QEMU: Verificando for cgroup 'memory' controller support                      : PASA
  QEMU: Verificando for cgroup 'memory' controller mount-point                  : PASA
  QEMU: Verificando for cgroup 'cpu' controller support                         : PASA
  QEMU: Verificando for cgroup 'cpu' controller mount-point                     : PASA
  QEMU: Verificando for cgroup 'cpuacct' controller support                     : PASA
  QEMU: Verificando for cgroup 'cpuacct' controller mount-point                 : PASA
  QEMU: Verificando for cgroup 'cpuset' controller support                      : PASA
  QEMU: Verificando for cgroup 'cpuset' controller mount-point                  : PASA
  QEMU: Verificando for cgroup 'devices' controller support                     : PASA
  QEMU: Verificando for cgroup 'devices' controller mount-point                 : PASA
  QEMU: Verificando for cgroup 'blkio' controller support                       : PASA
  QEMU: Verificando for cgroup 'blkio' controller mount-point                   : PASA
  QEMU: Verificando for device assignment IOMMU support                         : PASA
  QEMU: Verificando if IOMMU is enabled by kernel                               : PASA
   LXC: Verificando Para Linux >= 2.6.26                                        : PASA
   LXC: Verificando for namespace ipc                                           : PASA
   LXC: Verificando for namespace mnt                                           : PASA
   LXC: Verificando for namespace pid                                           : PASA
   LXC: Verificando for namespace uts                                           : PASA
   LXC: Verificando for namespace net                                           : PASA
   LXC: Verificando for namespace user                                          : PASA
   LXC: Verificando for cgroup 'memory' controller support                      : PASA
   LXC: Verificando for cgroup 'memory' controller mount-point                  : PASA
   LXC: Verificando for cgroup 'cpu' controller support                         : PASA
   LXC: Verificando for cgroup 'cpu' controller mount-point                     : PASA
   LXC: Verificando for cgroup 'cpuacct' controller support                     : PASA
   LXC: Verificando for cgroup 'cpuacct' controller mount-point                 : PASA
   LXC: Verificando for cgroup 'cpuset' controller support                      : PASA
   LXC: Verificando for cgroup 'cpuset' controller mount-point                  : PASA
   LXC: Verificando for cgroup 'devices' controller support                     : PASA
   LXC: Verificando for cgroup 'devices' controller mount-point                 : PASA
   LXC: Verificando for cgroup 'blkio' controller support                       : PASA
   LXC: Verificando for cgroup 'blkio' controller mount-point                   : PASA
   LXC: Verificando if device /sys/fs/fuse/connections exists                   : FALLA (Load the 'fuse' module to enable /proc/ overrides)

Comment 13 Miguel Angel Nieto 2020-03-30 13:23:59 UTC

[root@compute-1 heat-admin]# virsh list
 Id    Nombre                         Estado
----------------------------------------------------
 1     instance-00000002              en pausa

Comment 14 Miguel Angel Nieto 2020-03-30 13:36:39 UTC

Stephen Finucane, I have an environment ready in case you want to check. Contact me by chat (mnietoji)

Comment 15 Stephen Finucane 2020-03-31 13:43:36 UTC

Had a look at the compute node. We're seeing the following errors in the 'nova_libvirt' container:

  2020-03-31 13:40:39.297+0000: 901849: error : virNetSocketNewConnectUNIX:712 : Failed to connect socket to '/var/run/libvirt/virtlogd-sock': Connection refused
  2020-03-31 13:40:39.298+0000: 901849: error : virNetSocketNewConnectUNIX:712 : Failed to connect socket to '/var/run/libvirt/virtlogd-sock': Connection refused

Considering the file is created by that same container (it disappears when the container is stopped), I'm not sure why would be the case. There's nothing in the SELinux logs on the host to suggest that's to blame.

Someone from deployment will have to look at this.

Note You need to log in before you can comment on or make changes to this bug.