Bug 857889

Summary: beta3 - vm with direct lun fails to start
Product: Red Hat Enterprise Linux 6 Reporter: Andrew Cathrow <acathrow>
Component: vdsmAssignee: Greg Padgett <gpadgett>
Status: CLOSED WORKSFORME QA Contact: Haim <hateya>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.3CC: abaron, acathrow, amureini, bazulay, dyasny, gickowic, iheim, lpeer, lvroyce, mgoldboi, rbalakri, yeylon, ykaul
Target Milestone: beta   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: SI21 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 844130 Environment:
Last Closed: 2012-10-04 12:00:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 844130    
Bug Blocks:    

Description Andrew Cathrow 2012-09-17 12:29:03 UTC
Cloning from upstream to RHEL6 since this has been reported with RHEV 3.1 beta2




+++ This bug was initially created as a clone of Bug #844130 +++

Description of problem:
successfully add a lun to a vm as the only disk,
the lun status is active,
but when start the vm,get the following bug and vm fail to start:


Thread-61739::ERROR::2012-07-27 10:47:45,808::vm::600::vm.Vm::(_startUnderlyingVm) vmId=`0cab4e53-3c10-400f-90dc-c445af68da2d`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 566, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1361, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2420, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error Process exited while reading console log output: char device redirected to /dev/pts/5
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied

Thread-61739::DEBUG::2012-07-27 10:47:45,810::vm::916::vm.Vm::(setDownStatus) vmId=`0cab4e53-3c10-400f-90dc-c445af68da2d`::Changed state to Down: internal error Process exited while reading console log output: char device redirected to /dev/pts/5
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied

Dummy-254::DEBUG::2012-07-27 10:47:46,505::__init__::1249::Storage.Misc.excCmd::(_log) 'dd if=/rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None)
Dummy-254::DEBUG::2012-07-27 10:47:46,537::__init__::1249::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0167183 s, 61.3 MB/s\n'; <rc> = 0
Thread-61742::DEBUG::2012-07-27 10:47:46,960::BindingXMLRPC::872::vds::(wrapper) client [9.115.122.67]::call vmGetStats with ('0cab4e53-3c10-400f-90dc-c445af68da2d',) {} flowID [20d378e8]
Thread-61742::DEBUG::2012-07-27 10:47:46,961::BindingXMLRPC::879::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'hash': '0', 'exitMessage': 'internal error Process exited while reading console log output: char device redirected to /dev/pts/5\ndo_spice_init: starting 0.10.1\nspice_server_add_interface: SPICE_INTERFACE_MIGRATION\nspice_server_add_interface: SPICE_INTERFACE_KEYBOARD\nspice_server_add_interface: SPICE_INTERFACE_MOUSE\nqemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied\n', 'vmId': '0cab4e53-3c10-400f-90dc-c445af68da2d', 'timeOffset': '0', 'exitCode': 1}]


libvirt log:
2012-07-29 07:05:27.521+0000: 624: debug : virCommandRunAsync:2183 : About to run LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/bin/qemu-kvm -S -M pc-0.14 -cpu kvm64,+lahf_lm,+popcnt,+sse4.2,+sse4.1,+ssse3 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name vm1 -uuid 0cab4e53-3c10-400f-90dc-c445af68da2d -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=17-1,serial=0269D49C-BE4F-11E1-B3AB-4F7B23F93C00_00:21:86:2a:aa:f9,uuid=0cab4e53-3c10-400f-90dc-c445af68da2d -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2012-07-29T07:05:27,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/0240590c-a924-4daf-b3cf-a1de10df567a/images/11111111-1111-1111-1111-111111111111/ubuntu-10.04.3-server-amd64.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/vm1.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -spice port=5901,tls-port=5902,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record -k en-us -vga qxl -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
2012-07-29 07:05:27.522+0000: 624: debug : virCommandRunAsync:2201 : Command result 0, with PID 5177
2012-07-29 07:05:27.523+0000: 624: debug : virCommandRun:1999 : Result status 0, stdout: '(null)' stderr: '(null)'
2012-07-29 07:05:27.523+0000: 624: debug : qemuProcessStart:3596 : Writing early domain status to disk
2012-07-29 07:05:27.523+0000: 624: debug : qemuProcessStart:3601 : Waiting for handshake from child
2012-07-29 07:05:27.523+0000: 624: debug : virCommandHandshakeWait:2491 : Wait for handshake on 26
2012-07-29 07:05:27.566+0000: 624: debug : qemuProcessStart:3606 : Setting domain security labels
2012-07-29 07:05:27.566+0000: 624: debug : virStorageFileIsSharedFSType:1031 : Check if path /rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/0240590c-a924-4daf-b3cf-a1de10df567a/images/11111111-1111-1111-1111-111111111111/ubuntu-10.04.3-server-amd64.iso with FS magic 26985 is shared
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3635 : Labelling done, completing handshake to child
2012-07-29 07:05:27.567+0000: 624: debug : virCommandHandshakeNotify:2558 : Notify handshake on 30
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3639 : Handshake complete, child running
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3650 : Waiting for monitor to show up
2012-07-29 07:05:27.668+0000: 624: error : qemuProcessReadLogOutput:1298 : internal error Process exited while reading console log output: char device redirected to /dev/pts/2
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied


Version-Release number of selected component (if applicable):
[lvroyce@localhost test_base]$ rpm -q libvirt
libvirt-0.9.11.4-3.fc17.x86_64
[lvroyce@localhost test_base]$ rpm -q vdsm
vdsm-4.10.0-0.185.gitb52165e.fc17.lvroyce1343027059.x86_64


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

--- Additional comment from lvroyce.ibm.com on 2012-08-02 03:17:01 EDT ---

*** Bug 845179 has been marked as a duplicate of this bug. ***

--- Additional comment from ewarszaw on 2012-08-12 04:26:45 EDT ---

Please add libvirt, vdsm and engine complete logs.

--- Additional comment from ewarszaw on 2012-08-12 07:07:13 EDT ---

Tested in a loop, the VM always starts.

--- Additional comment from gickowic on 2012-08-12 07:38:40 EDT ---

Created attachment 603768 [details]
vdsm logs

Tested using a hosted with ISCSI session already set up. 

The loop created a VM, tested to ensure it was up using virsh and then destroyed the VM.

500 iterations ran succesfully

--- Additional comment from lvroyce.ibm.com on 2012-08-12 22:18:18 EDT ---

Our test env is all in one,tgtd SCSI server also on the same machine.ISCSI session with no passwd.
I recomand to reproduce in the following way :
1.create a new LUN, login
2.check /dev/mapper/xxx (xxx is the symlink of the LUN) pointed to a device with owner/group:root/disk
3.create a new vm using this direct LUN

Gadi,
I guess if you can make sure your disk owner/grp is root/disk before create vm,
this problem can be reproduced.
I checked your log and found you used this direct lun disk before your test.And if you already used the disk, it means the vdsm changed it's owner/group, so how many time you tried after, vm will be started successfully.This bug is triggered when first time use a fresh direct Lun disk. 

Edward,
I checked the udev-182 source code, the write rules and reload rules are really ran in an async way,so there is indeed some race here.

--- Additional comment from lvroyce.ibm.com on 2012-08-12 22:22:19 EDT ---

Created attachment 603837 [details]
engine_vdsm_libvirt logs

--- Additional comment from gickowic on 2012-08-13 02:20:20 EDT ---

(In reply to comment #5)
> Our test env is all in one,tgtd SCSI server also on the same machine.ISCSI
> session with no passwd.
> I recomand to reproduce in the following way :
> 1.create a new LUN, login
> 2.check /dev/mapper/xxx (xxx is the symlink of the LUN) pointed to a device
> with owner/group:root/disk
> 3.create a new vm using this direct LUN
> 
> Gadi,
> I guess if you can make sure your disk owner/grp is root/disk before create
> vm,
> this problem can be reproduced.
> I checked your log and found you used this direct lun disk before your
> test.And if you already used the disk, it means the vdsm changed it's
> owner/group, so how many time you tried after, vm will be started
> successfully.This bug is triggered when first time use a fresh direct Lun
> disk. 
> 
> Edward,
> I checked the udev-182 source code, the write rules and reload rules are
> really ran in an async way,so there is indeed some race here.

I actually changed the ownership back to root/disk after each run in the test loop after each machine is destroyed

--- Additional comment from lvroyce.ibm.com on 2012-08-13 10:25:59 EDT ---

Gadi,
Thanks for clarification.
my udev is udev-182,shipped with fc17,according to its changeLog:
move rules dirs to udev context; replace inotify with time-controlled stat()

I guess if you are using old udev, the inotify will notify udev rules load to memory immediately,but the newest udev is doing like this:
 /* check for changed config, every 3 seconds at most */
                if ((now_usec() - last_usec) > 3 * 1000 * 1000) {
                        if (check_rules_timestamp(udev))
                                reload = true;
                        if (udev_builtin_validate(udev))
                                reload = true;

                        last_usec = now_usec();
                }

So it also depends on udev version.

Comment 1 Ayal Baron 2012-10-04 11:59:47 UTC
(In reply to comment #0)
> Cloning from upstream to RHEL6 since this has been reported with RHEV 3.1
> beta2
> 

After checking, this was *not* reproduced with RHEV 3.1, The reproduced bug was 
Bug 854141.
The thread that was referenced is: http://post-office.corp.redhat.com/archives/rhev-beta/2012-September/msg00196.html

Comment 2 Ayal Baron 2012-10-04 12:00:48 UTC
Note that just to verify we tried to reproduce (on rhel) and failed.