With OSP13 ospd installation of 1controller 1compute 1ceph, plenty of tempest tests fail due to being unable to spawn vm in overcloud. So far consistent issue. Looks that libvirt/qemu fails to spawn the vm, though reason for that is unclear to me. nova-scheduler.log lines related to uuid of one of vms which failed: > 2018-02-13 22:00:48.185 1 DEBUG nova.scheduler.manager [req-d3ffcf0c-11d7-45e7-8383-69427df98dfc a2bd7f1c9d9f4292a2856df5db7c722e fd797297047347b4a41892f29f554bbb - default default] Starting to schedule for instances: [u'23f946c1-7a8c-49b5-a921-9d23d66023df'] select_destinations /usr/lib/python2.7/site-packages/nova/scheduler/manager.py:110 > 2018-02-13 22:00:48.296 1 DEBUG nova.filters [req-d3ffcf0c-11d7-45e7-8383-69427df98dfc a2bd7f1c9d9f4292a2856df5db7c722e fd797297047347b4a41892f29f554bbb - default default] Filtering removed all hosts for the request with instance ID '23f946c1-7a8c-49b5-a921-9d23d66023df'. Filter results: [('RetryFilter', [(u'compute-0.localdomain', u'compute-0.localdomain')]), ('AvailabilityZoneFilter', [(u'compute-0.localdomain', u'compute-0.localdomain')]), ('ComputeFilter', None)] get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:129 > 2018-02-13 22:00:48.296 1 INFO nova.filters [req-d3ffcf0c-11d7-45e7-8383-69427df98dfc a2bd7f1c9d9f4292a2856df5db7c722e fd797297047347b4a41892f29f554bbb - default default] Filtering removed all hosts for the request with instance ID '23f946c1-7a8c-49b5-a921-9d23d66023df'. Filter results: ['RetryFilter: (start: 1, end: 1)', 'AvailabilityZoneFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1, end: 0)'] aside whole scheduler.log, will also attach scheduler.log-snippet-one-vm which contains the whole section between these three lines also attaching whole /var/log/messages from compute-0 node, few notable things there: ovs_neutron_agent MessagingTimeout (followed by traceback): > 30256 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Failed reporting state!: MessagingTimeout: Timed out waiting for a reply to message ID 2aadcd21e nova libvirt monitor connecting issue (libvirt's qemu failed, below), again traceback in log: > Feb 13 16:46:01 compute-0 journal: : libvirtError: internal error: process exited while connecting to monitor > Feb 13 16:46:01 compute-0 journal: 2018-02-13 21:46:01.502 1 ERROR nova.virt.libvirt.driver [req-34fa3b3e-3b4e-45a3-9333-1f0bf5f97735 f6a83356f4214fe7b81ef524dbef2919 05ae87111c274a24b86d6a4002be373d - default default] [instance: 48b99034-3d7c-4958-bf24-21642a89ed4b] Failed to start libvirt guest: libvirtError: internal error: process exited while connecting to monitor > ... > Feb 13 16:46:02 compute-0 journal: 2018-02-13 21:46:02.046 1 ERROR nova.compute.manager [req-34fa3b3e-3b4e-45a3-9333-1f0bf5f97735 f6a83356f4214fe7b81ef524dbef2919 05ae87111c274a24b86d6a4002be373d - default default] [instance: 48b99034-3d7c-4958-bf24-21642a89ed4b] Instance failed to spawn: libvirtError: internal error: process exited while connecting to monitor > Feb 13 16:46:02 compute-0 journal: 2018-02-13 21:46:02.046 1 ERROR nova.compute.manager [instance: 48b99034-3d7c-4958-bf24-21642a89ed4b] Traceback (most recent call last): nova containers running on controller-0: > # docker ps -a|grep nova|grep ' Up ' > 4cd2b527b6a2 192.168.24.1:8787/rhosp13/openstack-nova-api:2018-02-07.4 "kolla_start" 19 hours ago Up 19 hours nova_metadata > c0a807f1f85f 192.168.24.1:8787/rhosp13/openstack-nova-api:2018-02-07.4 "kolla_start" 19 hours ago Up 19 hours (healthy) nova_api > 0c2401dbb206 192.168.24.1:8787/rhosp13/openstack-nova-conductor:2018-02-07.4 "kolla_start" 19 hours ago Up 19 hours (healthy) nova_conductor > 6f3aabb1da55 192.168.24.1:8787/rhosp13/openstack-nova-novncproxy:2018-02-07.4 "kolla_start" 19 hours ago Up 19 hours (healthy) nova_vnc_proxy > 06e8f9a6c3eb 192.168.24.1:8787/rhosp13/openstack-nova-consoleauth:2018-02-07.4 "kolla_start" 19 hours ago Up 19 hours (healthy) nova_consoleauth > 2fae7e102d61 192.168.24.1:8787/rhosp13/openstack-nova-api:2018-02-07.4 "kolla_start" 19 hours ago Up 19 hours nova_api_cron > d95ec90a8fe5 192.168.24.1:8787/rhosp13/openstack-nova-scheduler:2018-02-07.4 "kolla_start" 19 hours ago Up 19 hours (healthy) nova_scheduler > f6293c6e3aa4 192.168.24.1:8787/rhosp13/openstack-nova-placement-api:2018-02-07.4 "kolla_start" 19 hours ago Up 19 hours nova_placement containers running on compute-0 > [root@compute-0 containers]# docker ps -a > CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES > 21776dc78709 192.168.24.1:8787/rhosp13/openstack-neutron-openvswitch-agent:2018-02-07.4 "kolla_start" 18 hours ago Up 18 hours (healthy) neutron_ovs_agent > 112832cd61d3 192.168.24.1:8787/rhosp13/openstack-cron:2018-02-07.4 "kolla_start" 18 hours ago Up 18 hours logrotate_crond > 0b0bcc95757f 192.168.24.1:8787/rhosp13/openstack-nova-libvirt:2018-02-07.4 "/bin/bash -c '/us..." 18 hours ago Exited (0) 18 hours ago nova_libvirt_init_secret > 5eae5be2282e 192.168.24.1:8787/rhosp13/openstack-ceilometer-compute:2018-02-07.4 "kolla_start" 18 hours ago Up 18 hours ceilometer_agent_compute > e490c4176217 192.168.24.1:8787/rhosp13/openstack-nova-compute:2018-02-07.4 "kolla_start" 18 hours ago Up 18 hours (healthy) nova_compute > c8aa0ecbc6ea 192.168.24.1:8787/rhosp13/openstack-nova-compute:2018-02-07.4 "kolla_start" 18 hours ago Up 18 hours nova_migration_target > 22546fbdaeec 192.168.24.1:8787/rhosp13/openstack-iscsid:2018-02-07.4 "kolla_start" 18 hours ago Up 18 hours iscsid > 5f46cd62bbe3 192.168.24.1:8787/rhosp13/openstack-nova-libvirt:2018-02-07.4 "kolla_start" 18 hours ago Up 18 hours nova_libvirt > cd9912b87c3b 192.168.24.1:8787/rhosp13/openstack-nova-libvirt:2018-02-07.4 "kolla_start" 18 hours ago Up 18 hours nova_virtlogd > a2d57cab00a3 192.168.24.1:8787/rhosp13/openstack-neutron-server:2018-02-07.4 "puppet apply --mo..." 18 hours ago Exited (0) 18 hours ago neutron_ovs_bridge nova/libvirt packages in nova_compute/nova_libvirt containers: > libvirt-client-3.9.0-10.el7.x86_64 > libvirt-daemon-3.9.0-10.el7.x86_64 > libvirt-daemon-config-nwfilter-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-interface-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-lxc-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-network-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-nodedev-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-nwfilter-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-qemu-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-secret-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-storage-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-storage-core-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-storage-disk-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-storage-gluster-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-storage-iscsi-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-storage-logical-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-storage-mpath-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-storage-rbd-3.9.0-10.el7.x86_64 > libvirt-daemon-driver-storage-scsi-3.9.0-10.el7.x86_64 > libvirt-daemon-kvm-3.9.0-10.el7.x86_64 > libvirt-libs-3.9.0-10.el7.x86_64 > libvirt-python-3.9.0-1.el7.x86_64 > openstack-nova-common-17.0.0-0.20180123163703.27eadbc.el7ost.noarch > openstack-nova-compute-17.0.0-0.20180123163703.27eadbc.el7ost.noarch > openstack-nova-migration-17.0.0-0.20180123163703.27eadbc.el7ost.noarch > puppet-nova-12.2.0-0.20180119085024.c5771fe.el7ost.noarch > python-nova-17.0.0-0.20180123163703.27eadbc.el7ost.noarch > python-novaclient-9.1.1-1.el7ost.noarch > qemu-img-rhev-2.10.0-19.el7.x86_64 > qemu-kvm-common-rhev-2.10.0-19.el7.x86_64 > qemu-kvm-rhev-2.10.0-19.el7.x86_64 not much of service logs on compute-0 itself, libvirt below > [root@compute-0 ~]# find /var/log/containers/ > /var/log/containers/ > /var/log/containers/ceilometer > /var/log/containers/ceilometer/compute.log > /var/log/containers/neutron > /var/log/containers/nova > /var/log/containers/libvirt > /var/log/containers/libvirt/qemu multiple 'shutting down, failed' libvirt qemu instances > [root@compute-0 containers]# find /var/log > ... snip ... > /var/log/libvirt > /var/log/libvirt/qemu > /var/log/libvirt/qemu/instance-00000001.log > /var/log/libvirt/qemu/instance-00000003.log > /var/log/libvirt/qemu/instance-00000004.log > /var/log/libvirt/qemu/instance-00000005.log > /var/log/libvirt/qemu/instance-00000006.log > /var/log/libvirt/qemu/instance-00000002.log > /var/log/libvirt/qemu/instance-00000007.log > /var/log/libvirt/qemu/instance-00000008.log > /var/log/libvirt/qemu/instance-0000000a.log > /var/log/libvirt/qemu/instance-00000009.log > /var/log/libvirt/qemu/instance-0000000c.log > /var/log/libvirt/qemu/instance-0000000b.log > /var/log/nova > /var/log/openvswitch > ... single one, all other look similar, all have shutting down/failed > [root@compute-0 containers]# cat /var/log/libvirt/qemu/instance-00000001.log > 2018-02-13 21:46:00.992+0000: starting up libvirt version: 3.9.0, package: 10.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-01-31-11:40:38, x86-041.build.eng.bos.redhat.com), qemu version: 2.10.0(qemu-kvm-rhev-2.10.0-19.el7), hostname: compute-0.localdomain > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOME=/root QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000001,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-instance-00000001/master-key.aes -machine pc-i440fx-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off -cpu Broadwell,vme=on,ss=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on,abm=on,invpcid=off,rtm=on,hle=on -m 64 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 48b99034-3d7c-4958-bf24-21642a89ed4b -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=17.0.0-0.20180123163703.27eadbc.el7ost,serial=bd7f6cfe-a628-4725-b53a-042bced74e9d,uuid=48b99034-3d7c-4958-bf24-21642a89ed4b,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-instance-00000001/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -object secret,id=virtio-disk0-secret0,data=IK5A1lbS+HyHL0IqgO4FSkP2SurqPetkPKftm24O0MQ=,keyid=masterKey0,iv=Ze9xF8Ou+BaPFw2gswKSLA==,format=base64 -drive 'file=rbd:vms/48b99034-3d7c-4958-bf24-21642a89ed4b_disk:id=openstack:auth_supported=cephx\;none:mon_host=172.17.3.11\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback,discard=unmap' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -add-fd set=0,fd=80 -chardev pty,id=charserial0,logfile=/dev/fdset/0,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 172.17.1.22:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on > 2018-02-13 21:46:01.493+0000: shutting down, reason=failed btw the above qemu command line has 'accel=kvm' and seems kvm_intel is in use, in case it could be related > [root@compute-0 containers]# lsmod |grep -i 'vhost\|kvm' > vhost_net 22499 0 > vhost 48474 1 vhost_net > macvtap 22497 1 vhost_net > tun 31665 1 vhost_net > kvm_intel 175300 0 > kvm 574422 1 kvm_intel > irqbypass 13503 1 kvm
Created attachment 1396045 [details] controller-0: snippet from nova-scheduler.log
Created attachment 1396047 [details] 1545350-logs/ compute-0-messages and full nova-scheduler.log
Is SELinux enforcing, any AVC denials in the audit.log ?
Most likely duplicate for bug 1543914
Agreed, this is a duplicate of bug 1543914, the following denials are seen in /var/log/audit/audit.log on the compute: type=AVC msg=audit(1518631963.510:3776): avc: denied { write } for pid=320440 comm="qemu-kvm" path="pipe:[5525852]" dev="pipefs" ino=5525852 scontext=system_u:system_r:svirt_t:s0:c27,c469 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file type=AVC msg=audit(1518631971.886:3786): avc: denied { append } for pid=320440 comm=43505520302F4B564D path="pipe:[5531164]" dev="pipefs" ino=5531164 scontext=system_u:system_r:svirt_t:s0:c27,c469 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file type=AVC msg=audit(1518632035.573:3817): avc: denied { append } for pid=320440 comm=43505520302F4B564D path="pipe:[5531164]" dev="pipefs" ino=5531164 scontext=system_u:system_r:svirt_t:s0:c27,c469 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file type=AVC msg=audit(1518632191.935:3818): avc: denied { append } for pid=320440 comm=43505520302F4B564D path="pipe:[5531164]" dev="pipefs" ino=5531164 scontext=system_u:system_r:svirt_t:s0:c27,c469 tcontext=system_u:system_r:spc_t:s0 tclass=fifo_file These initial failures to launch an instance lead to n-cpu being auto-disabled that in turn leads to the failure schedule the remaining instances. *** This bug has been marked as a duplicate of bug 1543914 ***