Description of problem: I have installed osp13 z4 and updated to latest. I didnt get any error during update. But after update all tempest testcases are failing as it is not possible to spawn a vm Version-Release number of selected component (if applicable): How reproducible: Minor update from osp13z4 to latest using these templates: https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=tree;f=ospd-13-vxlan-dpdk-sriov-ctlplane-dataplane-bonding-hybrid-minor-update;h=458669a799ae1037efde30094bac0e639ba72c38;hb=refs/heads/ci Spawn a vm openstack server create --flavor $flavor_id --image $image_id --key-name mykeypair --nic net-id=$network_id myinstance2 Actual results: VM in BUILD state, it never goes to active (overcloud) [stack@undercloud-0 ~]$ openstack server list +--------------------------------------+-------------+--------+------------------------+---------------------------------------+----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-------------+--------+------------------------+---------------------------------------+----------+ | 25002de0-8743-4597-af26-548ad985368d | myinstance2 | BUILD | mynetwork=192.168.20.5 | rhel-guest-image-7-6-210-x86-64-qcow2 | myflavor | +--------------------------------------+-------------+--------+------------------------+---------------------------------------+----------+ Expected results: VM should be in active state Additional info: Some error in log files: COMPUTE: messages:Feb 19 13:03:04 compute-1 systemd-machined: New machine qemu-1-instance-00000060. messages:Feb 19 13:03:04 compute-1 systemd: Started Virtual Machine qemu-1-instance-00000060. messages:Feb 19 13:04:07 compute-1 journal: 2020-02-19 13:04:07.777+0000: 5165: warning : qemuDomainObjBeginJobInternal:6722 : Cannot start job (query, none, none) for domain instance-00000060; current job is (async nested, none, start) owned by (5167 remoteDispatchDomainCreateWithFlags, 0 <null>, 5167 remoteDispatchDomainCreateWithFlags (flags=0x1)) for (63s, 0s, 63s) messages:Feb 19 13:04:37 compute-1 journal: 2020-02-19 13:04:37.783+0000: 5165: warning : qemuDomainObjBeginJobInternal:6722 : Cannot start job (query, none, none) for domain instance-00000060; current job is (async nested, none, start) owned by (5167 remoteDispatchDomainCreateWithFlags, 0 <null>, 5167 remoteDispatchDomainCreateWithFlags (flags=0x1)) for (93s, 0s, 93s) [root@compute-1 heat-admin]# virsh list Id Name State ---------------------------------------------------- 1 instance-00000060 paused
Assuming this can be reproduced, can we get debug-level logs for libvirt also, please? You can follow the guide at [1] for more information. [1] https://kashyapc.fedorapeople.org/Notes/docs/qemu-and-libvirt-docs/request-nova-libvirt-qemu-debug-logs.txt
What's the output of 'virt-host-validate'? Also, are you still able to run 'virsh list'? I can't see anything suspect in the libvirt logs provided yet.
Stephen, I have not the setup now, next week i can configure the setup and allow you to access
[heat-admin@compute-1 ~]$ virt-host-validate QEMU: Verificando for hardware virtualization : PASA QEMU: Verificando if device /dev/kvm exists : PASA QEMU: Verificando if device /dev/kvm is accessible : PASA QEMU: Verificando if device /dev/vhost-net exists : PASA QEMU: Verificando if device /dev/net/tun exists : PASA QEMU: Verificando for cgroup 'memory' controller support : PASA QEMU: Verificando for cgroup 'memory' controller mount-point : PASA QEMU: Verificando for cgroup 'cpu' controller support : PASA QEMU: Verificando for cgroup 'cpu' controller mount-point : PASA QEMU: Verificando for cgroup 'cpuacct' controller support : PASA QEMU: Verificando for cgroup 'cpuacct' controller mount-point : PASA QEMU: Verificando for cgroup 'cpuset' controller support : PASA QEMU: Verificando for cgroup 'cpuset' controller mount-point : PASA QEMU: Verificando for cgroup 'devices' controller support : PASA QEMU: Verificando for cgroup 'devices' controller mount-point : PASA QEMU: Verificando for cgroup 'blkio' controller support : PASA QEMU: Verificando for cgroup 'blkio' controller mount-point : PASA QEMU: Verificando for device assignment IOMMU support : PASA QEMU: Verificando if IOMMU is enabled by kernel : PASA LXC: Verificando Para Linux >= 2.6.26 : PASA LXC: Verificando for namespace ipc : PASA LXC: Verificando for namespace mnt : PASA LXC: Verificando for namespace pid : PASA LXC: Verificando for namespace uts : PASA LXC: Verificando for namespace net : PASA LXC: Verificando for namespace user : PASA LXC: Verificando for cgroup 'memory' controller support : PASA LXC: Verificando for cgroup 'memory' controller mount-point : PASA LXC: Verificando for cgroup 'cpu' controller support : PASA LXC: Verificando for cgroup 'cpu' controller mount-point : PASA LXC: Verificando for cgroup 'cpuacct' controller support : PASA LXC: Verificando for cgroup 'cpuacct' controller mount-point : PASA LXC: Verificando for cgroup 'cpuset' controller support : PASA LXC: Verificando for cgroup 'cpuset' controller mount-point : PASA LXC: Verificando for cgroup 'devices' controller support : PASA LXC: Verificando for cgroup 'devices' controller mount-point : PASA LXC: Verificando for cgroup 'blkio' controller support : PASA LXC: Verificando for cgroup 'blkio' controller mount-point : PASA LXC: Verificando if device /sys/fs/fuse/connections exists : FALLA (Load the 'fuse' module to enable /proc/ overrides)
[root@compute-1 heat-admin]# virsh list Id Nombre Estado ---------------------------------------------------- 1 instance-00000002 en pausa
Stephen Finucane, I have an environment ready in case you want to check. Contact me by chat (mnietoji)
Had a look at the compute node. We're seeing the following errors in the 'nova_libvirt' container: 2020-03-31 13:40:39.297+0000: 901849: error : virNetSocketNewConnectUNIX:712 : Failed to connect socket to '/var/run/libvirt/virtlogd-sock': Connection refused 2020-03-31 13:40:39.298+0000: 901849: error : virNetSocketNewConnectUNIX:712 : Failed to connect socket to '/var/run/libvirt/virtlogd-sock': Connection refused Considering the file is created by that same container (it disappears when the container is stopped), I'm not sure why would be the case. There's nothing in the SELinux logs on the host to suggest that's to blame. Someone from deployment will have to look at this.