Bug 672895

Summary: [vdsm] split brain in vdsm in case destroy is called before call create returns
Product: Red Hat Enterprise Linux 6 Reporter: Haim <hateya>
Component: vdsmAssignee: Igor Lvovsky <ilvovsky>
Status: CLOSED CURRENTRELEASE QA Contact: yeylon <yeylon>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: abaron, bazulay, danken, dnaori, iheim, ilvovsky, lpeer, mgoldboi, srevivo, yeylon
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.9-48 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-19 15:28:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
vdsm logs. hsm and spm. probletmatic state started in nott-vds1 (spm)
none
split brain logs. both hosts. nott is problematic none

Description Haim 2011-01-26 16:56:19 UTC
Created attachment 475438 [details]
vdsm logs. hsm and spm. probletmatic state started in nott-vds1 (spm)

Description of problem:

vm split brain in vdsm (2 identical qemu processes on 2 different host under same cluster) in case destroy is called before call create returns. 
note that when trying to lunch the vm again, vdsm outputs the following error: 

libvirtError: Requested operation is not valid: domain is already active as 'RH_LVEXT'

as LV still opened ... 

Danken mentioned possible raise in the following code:

        try:
                self._dom.destroy()
        except:
                 pass

[root@nott-vds1 tmp]# ps -ww `pgrep qemu ` |grep --color RH_LVEXT
19991 ?        Sl     1:00 /usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu Conroe -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name RH_LVEXT -uuid 5f91ee05-25ac-48e8-b53b-aec7fedf2bda -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/RH_LVEXT.monitor,server,nowait -mon chardev=monitor,mode=control -rtc base=2011-01-26T16:00:07 -boot dcn -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/5592bc97-8b9d-4cfa-9946-d4e2bbbcb14f/efc8a873-437f-4b66-81eb-e0f9d9801f33/images/5ea7594f-8298-4370-8dec-2ef13da3545d/cb43e2f0-dac3-490f-bfd2-234bd537d6c0,if=none,id=drive-virtio-disk0,format=qcow2,serial=70-8dec-2ef13da3545d,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/rhev/data-center/5592bc97-8b9d-4cfa-9946-d4e2bbbcb14f/ae6e433e-6820-420a-bbba-d163d654d026/images/11111111-1111-1111-1111-111111111111/Fedora-13-x86_64-Live.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=23,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:1a:4a:16:87:83,bus=pci.0,addr=0x3 -chardev socket,id=channel0,path=/var/lib/libvirt/qemu/channels/RH_LVEXT.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=0,chardev=channel0,name=com.redhat.rhevm.vdsm -usb -device usb-tablet,id=input0 -vnc 0:0,password -k en-us -vga cirrus


13116 ?        Sl     0:45 /usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu Conroe -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name RH_LVEXT -uuid 5f91ee05-25ac-48e8-b53b-aec7fedf2bda -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/RH_LVEXT.monitor,server,nowait -mon chardev=monitor,mode=control -rtc base=2011-01-26T16:03:04 -boot dcn -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/5592bc97-8b9d-4cfa-9946-d4e2bbbcb14f/efc8a873-437f-4b66-81eb-e0f9d9801f33/images/5ea7594f-8298-4370-8dec-2ef13da3545d/cb43e2f0-dac3-490f-bfd2-234bd537d6c0,if=none,id=drive-virtio-disk0,format=qcow2,serial=70-8dec-2ef13da3545d,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/rhev/data-center/5592bc97-8b9d-4cfa-9946-d4e2bbbcb14f/ae6e433e-6820-420a-bbba-d163d654d026/images/11111111-1111-1111-1111-111111111111/Fedora-13-i686-Live.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=23,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:1a:4a:16:87:83,bus=pci.0,addr=0x3 -chardev socket,id=channel0,path=/var/lib/libvirt/qemu/channels/RH_LVEXT.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=0,chardev=channel0,name=com.redhat.rhevm.vdsm -usb -device usb-tablet,id=input0 -vnc 0:0,password -k en-us -vga cirrus

repro steps:

1) start vm - and restart vdsm service 
2) start vm - make sure selinux is on 

try both. 

libvirt-0.8.7-3.el6.x86_64

Comment 2 Haim 2011-02-02 09:12:07 UTC
Created attachment 476531 [details]
split brain logs. both hosts. nott is problematic

Comment 3 Ayal Baron 2011-02-10 09:24:46 UTC
Should be fixed.

Comment 4 Ayal Baron 2011-02-10 09:25:58 UTC
Should be fixed.

Comment 5 Haim 2011-02-28 19:00:17 UTC
verified, got the following log when trying to reproduce: 

Thread-335::ERROR::2011-02-28 20:31:08,449::vm::643::vm.Vm::(_startUnderlyingVm) vmId=`2117a5f8-634b-48e1-90ff-76ad8185846f`::Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 613, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 787, in _run
    self._domDependentInit()
  File "/usr/share/vdsm/libvirtvm.py", line 718, in _domDependentInit
    raise Exception('destroy() called before Vm started')
Exception: destroy() called before Vm started

no split brain occurred. 

vdsm-4.9-51.el6.x86_64
libvirt-0.8.7-8.el6.x86_64