1545350 – OSP13 vm boot fails - libvirt/qemu fails - ERROR: No valid host was found. There are not enough hosts available.

Bug 1545350 - OSP13 vm boot fails - libvirt/qemu fails - ERROR: No valid host was found. There are not enough hosts available.

Summary: OSP13 vm boot fails - libvirt/qemu fails - ERROR: No valid host was found. Th...

Keywords:
Status:	CLOSED DUPLICATE of bug 1543914
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-containers
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Dan Prince
QA Contact:	Omri Hochman
Docs Contact:	Andrew Burden
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-02-14 17:24 UTC by Pavel Sedlák
Modified:	2018-08-03 18:30 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-02-14 19:02:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
controller-0: snippet from nova-scheduler.log (10.54 KB, text/plain) 2018-02-14 17:26 UTC, Pavel Sedlák	no flags	Details
1545350-logs/ compute-0-messages and full nova-scheduler.log (3.06 MB, application/x-bzip) 2018-02-14 17:27 UTC, Pavel Sedlák	no flags	Details
View All

Description Pavel Sedlák 2018-02-14 17:24:59 UTC

With OSP13 ospd installation of 1controller 1compute 1ceph,
plenty of tempest tests fail due to being unable to spawn vm in overcloud.

So far consistent issue.
Looks that libvirt/qemu fails to spawn the vm, though reason for that is unclear to me.

nova-scheduler.log lines related to uuid of one of vms which failed:
> 2018-02-13 22:00:48.185 1 DEBUG nova.scheduler.manager [req-d3ffcf0c-11d7-45e7-8383-69427df98dfc a2bd7f1c9d9f4292a2856df5db7c722e fd797297047347b4a41892f29f554bbb - default default] Starting to schedule for instances: [u'23f946c1-7a8c-49b5-a921-9d23d66023df'] select_destinations /usr/lib/python2.7/site-packages/nova/scheduler/manager.py:110
> 2018-02-13 22:00:48.296 1 DEBUG nova.filters [req-d3ffcf0c-11d7-45e7-8383-69427df98dfc a2bd7f1c9d9f4292a2856df5db7c722e fd797297047347b4a41892f29f554bbb - default default] Filtering removed all hosts for the request with instance ID '23f946c1-7a8c-49b5-a921-9d23d66023df'. Filter results: [('RetryFilter', [(u'compute-0.localdomain', u'compute-0.localdomain')]), ('AvailabilityZoneFilter', [(u'compute-0.localdomain', u'compute-0.localdomain')]), ('ComputeFilter', None)] get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:129
> 2018-02-13 22:00:48.296 1 INFO nova.filters [req-d3ffcf0c-11d7-45e7-8383-69427df98dfc a2bd7f1c9d9f4292a2856df5db7c722e fd797297047347b4a41892f29f554bbb - default default] Filtering removed all hosts for the request with instance ID '23f946c1-7a8c-49b5-a921-9d23d66023df'. Filter results: ['RetryFilter: (start: 1, end: 1)', 'AvailabilityZoneFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1, end: 0)']

aside whole scheduler.log, will also attach scheduler.log-snippet-one-vm
which contains the whole section between these three lines

also attaching whole /var/log/messages from compute-0 node, few notable things there:
ovs_neutron_agent MessagingTimeout (followed by traceback):
> 30256 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Failed reporting state!: MessagingTimeout: Timed out waiting for a reply to message ID 2aadcd21e
nova libvirt monitor connecting issue (libvirt's qemu failed, below), again traceback in log:
> Feb 13 16:46:01 compute-0 journal: : libvirtError: internal error: process exited while connecting to monitor
> Feb 13 16:46:01 compute-0 journal: 2018-02-13 21:46:01.502 1 ERROR nova.virt.libvirt.driver [req-34fa3b3e-3b4e-45a3-9333-1f0bf5f97735 f6a83356f4214fe7b81ef524dbef2919 05ae87111c274a24b86d6a4002be373d - default default] [instance: 48b99034-3d7c-4958-bf24-21642a89ed4b] Failed to start libvirt guest: libvirtError: internal error: process exited while connecting to monitor
> ...
> Feb 13 16:46:02 compute-0 journal: 2018-02-13 21:46:02.046 1 ERROR nova.compute.manager [req-34fa3b3e-3b4e-45a3-9333-1f0bf5f97735 f6a83356f4214fe7b81ef524dbef2919 05ae87111c274a24b86d6a4002be373d - default default] [instance: 48b99034-3d7c-4958-bf24-21642a89ed4b] Instance failed to spawn: libvirtError: internal error: process exited while connecting to monitor
> Feb 13 16:46:02 compute-0 journal: 2018-02-13 21:46:02.046 1 ERROR nova.compute.manager [instance: 48b99034-3d7c-4958-bf24-21642a89ed4b] Traceback (most recent call last):

nova containers running on controller-0:
> # docker ps -a|grep nova|grep ' Up '
> 4cd2b527b6a2        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-02-07.4                    "kolla_start"            19 hours ago        Up 19 hours                                   nova_metadata
> c0a807f1f85f        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-02-07.4                    "kolla_start"            19 hours ago        Up 19 hours (healthy)                         nova_api
> 0c2401dbb206        192.168.24.1:8787/rhosp13/openstack-nova-conductor:2018-02-07.4              "kolla_start"            19 hours ago        Up 19 hours (healthy)                         nova_conductor
> 6f3aabb1da55        192.168.24.1:8787/rhosp13/openstack-nova-novncproxy:2018-02-07.4             "kolla_start"            19 hours ago        Up 19 hours (healthy)                         nova_vnc_proxy
> 06e8f9a6c3eb        192.168.24.1:8787/rhosp13/openstack-nova-consoleauth:2018-02-07.4            "kolla_start"            19 hours ago        Up 19 hours (healthy)                         nova_consoleauth
> 2fae7e102d61        192.168.24.1:8787/rhosp13/openstack-nova-api:2018-02-07.4                    "kolla_start"            19 hours ago        Up 19 hours                                   nova_api_cron
> d95ec90a8fe5        192.168.24.1:8787/rhosp13/openstack-nova-scheduler:2018-02-07.4              "kolla_start"            19 hours ago        Up 19 hours (healthy)                         nova_scheduler
> f6293c6e3aa4        192.168.24.1:8787/rhosp13/openstack-nova-placement-api:2018-02-07.4          "kolla_start"            19 hours ago        Up 19 hours                                   nova_placement

containers running on compute-0
> [root@compute-0 containers]# docker ps -a
> CONTAINER ID        IMAGE                                                                        COMMAND                  CREATED             STATUS                    PORTS               NAMES
> 21776dc78709        192.168.24.1:8787/rhosp13/openstack-neutron-openvswitch-agent:2018-02-07.4   "kolla_start"            18 hours ago        Up 18 hours (healthy)                         neutron_ovs_agent
> 112832cd61d3        192.168.24.1:8787/rhosp13/openstack-cron:2018-02-07.4                        "kolla_start"            18 hours ago        Up 18 hours                                   logrotate_crond
> 0b0bcc95757f        192.168.24.1:8787/rhosp13/openstack-nova-libvirt:2018-02-07.4                "/bin/bash -c '/us..."   18 hours ago        Exited (0) 18 hours ago                       nova_libvirt_init_secret
> 5eae5be2282e        192.168.24.1:8787/rhosp13/openstack-ceilometer-compute:2018-02-07.4          "kolla_start"            18 hours ago        Up 18 hours                                   ceilometer_agent_compute
> e490c4176217        192.168.24.1:8787/rhosp13/openstack-nova-compute:2018-02-07.4                "kolla_start"            18 hours ago        Up 18 hours (healthy)                         nova_compute
> c8aa0ecbc6ea        192.168.24.1:8787/rhosp13/openstack-nova-compute:2018-02-07.4                "kolla_start"            18 hours ago        Up 18 hours                                   nova_migration_target
> 22546fbdaeec        192.168.24.1:8787/rhosp13/openstack-iscsid:2018-02-07.4                      "kolla_start"            18 hours ago        Up 18 hours                                   iscsid
> 5f46cd62bbe3        192.168.24.1:8787/rhosp13/openstack-nova-libvirt:2018-02-07.4                "kolla_start"            18 hours ago        Up 18 hours                                   nova_libvirt
> cd9912b87c3b        192.168.24.1:8787/rhosp13/openstack-nova-libvirt:2018-02-07.4                "kolla_start"            18 hours ago        Up 18 hours                                   nova_virtlogd
> a2d57cab00a3        192.168.24.1:8787/rhosp13/openstack-neutron-server:2018-02-07.4              "puppet apply --mo..."   18 hours ago        Exited (0) 18 hours ago                       neutron_ovs_bridge

nova/libvirt packages in nova_compute/nova_libvirt containers:
> libvirt-client-3.9.0-10.el7.x86_64
> libvirt-daemon-3.9.0-10.el7.x86_64
> libvirt-daemon-config-nwfilter-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-interface-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-lxc-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-network-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-nodedev-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-nwfilter-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-qemu-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-secret-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-storage-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-storage-core-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-storage-disk-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-storage-gluster-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-storage-iscsi-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-storage-logical-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-storage-mpath-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-storage-rbd-3.9.0-10.el7.x86_64
> libvirt-daemon-driver-storage-scsi-3.9.0-10.el7.x86_64
> libvirt-daemon-kvm-3.9.0-10.el7.x86_64
> libvirt-libs-3.9.0-10.el7.x86_64
> libvirt-python-3.9.0-1.el7.x86_64
> openstack-nova-common-17.0.0-0.20180123163703.27eadbc.el7ost.noarch
> openstack-nova-compute-17.0.0-0.20180123163703.27eadbc.el7ost.noarch
> openstack-nova-migration-17.0.0-0.20180123163703.27eadbc.el7ost.noarch
> puppet-nova-12.2.0-0.20180119085024.c5771fe.el7ost.noarch
> python-nova-17.0.0-0.20180123163703.27eadbc.el7ost.noarch
> python-novaclient-9.1.1-1.el7ost.noarch
> qemu-img-rhev-2.10.0-19.el7.x86_64
> qemu-kvm-common-rhev-2.10.0-19.el7.x86_64
> qemu-kvm-rhev-2.10.0-19.el7.x86_64

not much of service logs on compute-0 itself, libvirt below
> [root@compute-0 ~]# find /var/log/containers/
> /var/log/containers/
> /var/log/containers/ceilometer
> /var/log/containers/ceilometer/compute.log
> /var/log/containers/neutron
> /var/log/containers/nova
> /var/log/containers/libvirt
> /var/log/containers/libvirt/qemu

multiple 'shutting down, failed' libvirt qemu instances
> [root@compute-0 containers]# find /var/log
> ... snip ...
> /var/log/libvirt
> /var/log/libvirt/qemu
> /var/log/libvirt/qemu/instance-00000001.log
> /var/log/libvirt/qemu/instance-00000003.log
> /var/log/libvirt/qemu/instance-00000004.log
> /var/log/libvirt/qemu/instance-00000005.log
> /var/log/libvirt/qemu/instance-00000006.log
> /var/log/libvirt/qemu/instance-00000002.log
> /var/log/libvirt/qemu/instance-00000007.log
> /var/log/libvirt/qemu/instance-00000008.log
> /var/log/libvirt/qemu/instance-0000000a.log
> /var/log/libvirt/qemu/instance-00000009.log
> /var/log/libvirt/qemu/instance-0000000c.log
> /var/log/libvirt/qemu/instance-0000000b.log
> /var/log/nova
> /var/log/openvswitch
> ...
single one, all other look similar, all have shutting down/failed
> [root@compute-0 containers]# cat /var/log/libvirt/qemu/instance-00000001.log 
> 2018-02-13 21:46:00.992+0000: starting up libvirt version: 3.9.0, package: 10.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-01-31-11:40:38, x86-041.build.eng.bos.redhat.com), qemu version: 2.10.0(qemu-kvm-rhev-2.10.0-19.el7), hostname: compute-0.localdomain
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOME=/root QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000001,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-instance-00000001/master-key.aes -machine pc-i440fx-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off -cpu Broadwell,vme=on,ss=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on,abm=on,invpcid=off,rtm=on,hle=on -m 64 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 48b99034-3d7c-4958-bf24-21642a89ed4b -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=17.0.0-0.20180123163703.27eadbc.el7ost,serial=bd7f6cfe-a628-4725-b53a-042bced74e9d,uuid=48b99034-3d7c-4958-bf24-21642a89ed4b,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-instance-00000001/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -object secret,id=virtio-disk0-secret0,data=IK5A1lbS+HyHL0IqgO4FSkP2SurqPetkPKftm24O0MQ=,keyid=masterKey0,iv=Ze9xF8Ou+BaPFw2gswKSLA==,format=base64 -drive 'file=rbd:vms/48b99034-3d7c-4958-bf24-21642a89ed4b_disk:id=openstack:auth_supported=cephx\;none:mon_host=172.17.3.11\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback,discard=unmap' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -add-fd set=0,fd=80 -chardev pty,id=charserial0,logfile=/dev/fdset/0,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 172.17.1.22:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
> 2018-02-13 21:46:01.493+0000: shutting down, reason=failed

btw the above qemu command line has 'accel=kvm'
and seems kvm_intel is in use, in case it could be related
> [root@compute-0 containers]# lsmod |grep -i 'vhost\|kvm'
> vhost_net              22499  0 
> vhost                  48474  1 vhost_net
> macvtap                22497  1 vhost_net
> tun                    31665  1 vhost_net
> kvm_intel             175300  0 
> kvm                   574422  1 kvm_intel
> irqbypass              13503  1 kvm

Comment 1 Pavel Sedlák 2018-02-14 17:26:25 UTC

Created attachment 1396045 [details]
controller-0: snippet from nova-scheduler.log

Comment 2 Pavel Sedlák 2018-02-14 17:27:50 UTC

Created attachment 1396047 [details]
1545350-logs/  compute-0-messages and full nova-scheduler.log

Comment 3 Alan Pevec 2018-02-14 18:19:15 UTC

Is SELinux enforcing, any AVC denials in the audit.log ?

Comment 4 Alan Pevec 2018-02-14 18:36:40 UTC

Most likely duplicate for bug 1543914

Comment 5 Lee Yarwood 2018-02-14 19:02:15 UTC

Agreed, this is a duplicate of bug 1543914, the following denials are seen in /var/log/audit/audit.log on the compute:

type=AVC msg=audit(1518631963.510:3776): avc:  denied  { write } for  pid=320440 comm="qemu-kvm" path="pipe:[5525852]" dev="pipefs" ino=5525852 scontext=system_u:system_r:svirt_t:s0:c27,c469 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file
type=AVC msg=audit(1518631971.886:3786): avc:  denied  { append } for  pid=320440 comm=43505520302F4B564D path="pipe:[5531164]" dev="pipefs" ino=5531164 scontext=system_u:system_r:svirt_t:s0:c27,c469 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file
type=AVC msg=audit(1518632035.573:3817): avc:  denied  { append } for  pid=320440 comm=43505520302F4B564D path="pipe:[5531164]" dev="pipefs" ino=5531164 scontext=system_u:system_r:svirt_t:s0:c27,c469 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file
type=AVC msg=audit(1518632191.935:3818): avc:  denied  { append } for  pid=320440 comm=43505520302F4B564D path="pipe:[5531164]" dev="pipefs" ino=5531164 scontext=system_u:system_r:svirt_t:s0:c27,c469 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file

These initial failures to launch an instance lead to n-cpu being auto-disabled that in turn leads to the failure schedule the remaining instances.

*** This bug has been marked as a duplicate of bug 1543914 ***

Note You need to log in before you can comment on or make changes to this bug.