Bug 1022561
Summary: | nova: killing qemu pid which remains after delete of instances will cause openstack-nova-compute to die | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Dafna Ron <dron> | ||||
Component: | openstack-nova | Assignee: | Solly Ross <sross> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Ami Jeain <ajeain> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.0 | CC: | dallan, dron, hateya, ndipanov, sclewis, sross, xqueralt, yeylon | ||||
Target Milestone: | --- | Keywords: | Reopened, Unconfirmed, ZStream | ||||
Target Release: | 4.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-04-07 15:02:53 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Can you get me logs from the other compute services (namely scheduler)? Additionally, what is your setup (I want to make sure I duplicate it correctly) -- what nodes, etc. Also, does this only happen when running Cinder with a Gluster backend? will give setup privately (logs are there). I am not sure if this is gluster related since I only have a gluster setup at this time. Notes: processes are left in STAT=S (interruptible sleep) on account of poll_schedule_timeout (i.e. they are waiting for a poll call). This makes me suspect that the cause is the Gluster driver. Will do some more digging. could not reproduce this anymore I'm re-openeing since this is 100% reproduced on my setup (4.0 last puddle release) If you like to investigate further and cannot reproduce on your setup please contact me and I will give you access to my setup. root@puma31 ~]# ps -elf |grep qemu 0 S root 9993 9708 0 80 0 - 25813 pipe_w 16:05 pts/0 00:00:00 grep qemu 2 Z nova 16552 8454 0 80 0 - 0 exit Feb21 ? 00:03:03 [qemu-kvm] <defunct> 2 S nova 18210 8454 0 80 0 - 212852 poll_s Feb21 ? 00:02:54 /usr/libexec/qemu-kvm -global virtio-blk-pci.scsi=off -nodefconfig -nodefaults -nographic -machine accel=kvm:tcg -cpu host,+kvmclock -m 500 -no-reboot -kernel /var/tmp/.guestfs-162/kernel.8454 -initrd /var/tmp/.guestfs-162/initrd.8454 -device virtio-scsi-pci,id=scsi -drive file=/var/lib/nova/instances/c1d544cb-1d1d-435f-9672-af5347d29c0e/disk,cache=none,format=qcow2,id=hd0,if=none -device scsi-hd,drive=hd0 -drive file=/var/tmp/.guestfs-162/root.8454,snapshot=on,id=appliance,if=none,cache=unsafe -device scsi-hd,drive=appliance -device virtio-serial -serial stdio -device sga -chardev socket,path=/tmp/libguestfsRV7oiz/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -append panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 TERM=xterm [root@puma31 ~]# ps -elf |grep qemu 0 S root 9996 9708 0 80 0 - 25813 pipe_w 16:05 pts/0 00:00:00 grep qemu 2 Z nova 16552 8454 0 80 0 - 0 exit Feb21 ? 00:03:03 [qemu-kvm] <defunct> 2 S nova 18210 8454 0 80 0 - 212852 poll_s Feb21 ? 00:02:54 /usr/libexec/qemu-kvm -global virtio-blk-pci.scsi=off -nodefconfig -nodefaults -nographic -machine accel=kvm:tcg -cpu host,+kvmclock -m 500 -no-reboot -kernel /var/tmp/.guestfs-162/kernel.8454 -initrd /var/tmp/.guestfs-162/initrd.8454 -device virtio-scsi-pci,id=scsi -drive file=/var/lib/nova/instances/c1d544cb-1d1d-435f-9672-af5347d29c0e/disk,cache=none,format=qcow2,id=hd0,if=none -device scsi-hd,drive=hd0 -drive file=/var/tmp/.guestfs-162/root.8454,snapshot=on,id=appliance,if=none,cache=unsafe -device scsi-hd,drive=appliance -device virtio-serial -serial stdio -device sga -chardev socket,path=/tmp/libguestfsRV7oiz/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -append panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 TERM=xterm [root@puma31 ~]# [root@puma31 ~]# [root@puma31 ~]# [root@puma31 ~]# [root@puma31 ~]# kill -9 8454 [root@puma31 ~]# /etc/init.d/openstack-nova-compute status openstack-nova-compute dead but pid file exists [root@puma31 ~]# @Dafna: is this related to https://bugzilla.redhat.com/show_bug.cgi?id=1022627 ? Did you try the proposed fix for that? I believe the two are related. Please try the fix proposed upstream for the bug. Solly, can you provide Dafna a scratch build with the proposed fix, or work with her to patch her systems for her so she can test? I've tested the patch on the QE systems, and it appears that it works. The backport is being checked upstream, so it should be in next time someone rebases RHOS 4.0.z off of stable/havana upstream. I think this could be closed as a duplicate of bug 1022627 What do you think Solly? @Xavier Queralt: yeah, sounds good. *** This bug has been marked as a duplicate of bug 1022627 *** |
Created attachment 815437 [details] log Description of problem: I am working with gluster as cinder backend and when I boot an instance I can see two different qemu pid's run: [root@cougar07 ~(keystone_admin)]# ps -elf |grep qemu 2 S nova 13775 13645 6 80 0 - 212831 poll_s 16:51 ? 00:00:02 /usr/libexec/qemu-kvm -global virtio-blk-pci.scsi=off -nodefconfig -nodefaults -nographic -machine accel=kvm:tcg -cpu host,+kvmclock -m 500 -no-reboot -kernel /var/tmp/.guestfs-162/kernel.13645 -initrd /var/tmp/.guestfs-162/initrd.13645 -device virtio-scsi-pci,id=scsi -drive file=/var/lib/nova/instances/2f469c9d-2bf8-44ba-ac9a-e280289802bf/disk,cache=none,format=qcow2,id=hd0,if=none -device scsi-hd,drive=hd0 -drive file=/var/tmp/.guestfs-162/root.13645,snapshot=on,id=appliance,if=none,cache=unsafe -device scsi-hd,drive=appliance -device virtio-serial -serial stdio -device sga -chardev socket,path=/tmp/libguestfsWTO2Jc/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -append panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 TERM=xterm 6 S qemu 14018 1 72 80 0 - 217865 poll_s 16:51 ? 00:00:19 /usr/libexec/qemu-kvm -name instance-00000035 -S -M rhel6.5.0 -cpu Opteron_G3,+nodeid_msr,+wdt,+skinit,+ibs,+osvw,+3dnowprefetch,+cr8legacy,+extapic,+cmp_legacy,+3dnow,+3dnowext,+pdpe1gb,+fxsr_opt,+mmxext,+ht,+vme -enable-kvm -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 2f469c9d-2bf8-44ba-ac9a-e280289802bf -smbios type=1,manufacturer=Red Hat Inc.,product=OpenStack Nova,version=2013.2-0.25.rc1.el6ost,serial=44454c4c-4a00-1044-804c-b5c04f39354a,uuid=2f469c9d-2bf8-44ba-ac9a-e280289802bf -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000035.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/2f469c9d-2bf8-44ba-ac9a-e280289802bf/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:37:43:23,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/2f469c9d-2bf8-44ba-ac9a-e280289802bf/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 10.35.160.135:0 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 when I delete the instance I can see that only one of these pid's are left [root@cougar07 ~(keystone_admin)]# ps -elf |grep qemu 2 S nova 13775 13645 4 80 0 - 212831 poll_s 16:51 ? 00:00:02 /usr/libexec/qemu-kvm -global virtio-blk-pci.scsi=off -nodefconfig -nodefaults -nographic -machine accel=kvm:tcg -cpu host,+kvmclock -m 500 -no-reboot -kernel /var/tmp/.guestfs-162/kernel.13645 -initrd /var/tmp/.guestfs-162/initrd.13645 -device virtio-scsi-pci,id=scsi -drive file=/var/lib/nova/instances/2f469c9d-2bf8-44ba-ac9a-e280289802bf/disk,cache=none,format=qcow2,id=hd0,if=none -device scsi-hd,drive=hd0 -drive file=/var/tmp/.guestfs-162/root.13645,snapshot=on,id=appliance,if=none,cache=unsafe -device scsi-hd,drive=appliance -device virtio-serial -serial stdio -device sga -chardev socket,path=/tmp/libguestfsWTO2Jc/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -append panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 TERM=xterm 0 S root 14135 13613 0 80 0 - 25813 pipe_w 16:52 pts/2 00:00:00 grep qemu if I kill it nova-compute will die: [root@cougar07 ~(keystone_admin)]# kill -9 13645 [root@cougar07 ~(keystone_admin)]# [root@cougar07 ~(keystone_admin)]# [root@cougar07 ~(keystone_admin)]# ps -elf |grep qemu 0 S root 14140 13613 0 80 0 - 25813 pipe_w 16:52 pts/2 00:00:00 grep qemu [root@cougar07 ~(keystone_admin)]# [root@cougar07 ~(keystone_admin)]# /etc/init.d/openstack-nova-compute status openstack-nova-compute dead but pid file exists [root@cougar07 ~(keystone_admin)]# Version-Release number of selected component (if applicable): openstack-nova-compute-2013.2-0.25.rc1.el6ost.noarch How reproducible: 100% Steps to Reproduce: 1. configure gluster as cinder backend 2. boot an instance and run ps on qemu 3. delete instance 4. kill the left over pid Actual results: nova-compute dies as well Expected results: nova should not die Additional info: