Bug 844130

Summary: vm with direct lun fails to start
Product: [Retired] oVirt Reporter: Royce Lv <lvroyce>
Component: vdsmAssignee: Eduardo Warszawski <ewarszaw>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, acathrow, amureini, bazulay, gickowic, iheim, jkt, mgoldboi
Target Milestone: ---   
Target Release: 3.3.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 857889 (view as bug list) Environment:
Last Closed: 2013-09-04 21:13:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 857889    
Attachments:
Description Flags
vdsm logs
none
engine_vdsm_libvirt logs none

Description Royce Lv 2012-07-29 07:22:13 UTC
Description of problem:
successfully add a lun to a vm as the only disk,
the lun status is active,
but when start the vm,get the following bug and vm fail to start:


Thread-61739::ERROR::2012-07-27 10:47:45,808::vm::600::vm.Vm::(_startUnderlyingVm) vmId=`0cab4e53-3c10-400f-90dc-c445af68da2d`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 566, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1361, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2420, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error Process exited while reading console log output: char device redirected to /dev/pts/5
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied

Thread-61739::DEBUG::2012-07-27 10:47:45,810::vm::916::vm.Vm::(setDownStatus) vmId=`0cab4e53-3c10-400f-90dc-c445af68da2d`::Changed state to Down: internal error Process exited while reading console log output: char device redirected to /dev/pts/5
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied

Dummy-254::DEBUG::2012-07-27 10:47:46,505::__init__::1249::Storage.Misc.excCmd::(_log) 'dd if=/rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None)
Dummy-254::DEBUG::2012-07-27 10:47:46,537::__init__::1249::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0167183 s, 61.3 MB/s\n'; <rc> = 0
Thread-61742::DEBUG::2012-07-27 10:47:46,960::BindingXMLRPC::872::vds::(wrapper) client [9.115.122.67]::call vmGetStats with ('0cab4e53-3c10-400f-90dc-c445af68da2d',) {} flowID [20d378e8]
Thread-61742::DEBUG::2012-07-27 10:47:46,961::BindingXMLRPC::879::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'hash': '0', 'exitMessage': 'internal error Process exited while reading console log output: char device redirected to /dev/pts/5\ndo_spice_init: starting 0.10.1\nspice_server_add_interface: SPICE_INTERFACE_MIGRATION\nspice_server_add_interface: SPICE_INTERFACE_KEYBOARD\nspice_server_add_interface: SPICE_INTERFACE_MOUSE\nqemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied\n', 'vmId': '0cab4e53-3c10-400f-90dc-c445af68da2d', 'timeOffset': '0', 'exitCode': 1}]


libvirt log:
2012-07-29 07:05:27.521+0000: 624: debug : virCommandRunAsync:2183 : About to run LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/bin/qemu-kvm -S -M pc-0.14 -cpu kvm64,+lahf_lm,+popcnt,+sse4.2,+sse4.1,+ssse3 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name vm1 -uuid 0cab4e53-3c10-400f-90dc-c445af68da2d -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=17-1,serial=0269D49C-BE4F-11E1-B3AB-4F7B23F93C00_00:21:86:2a:aa:f9,uuid=0cab4e53-3c10-400f-90dc-c445af68da2d -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2012-07-29T07:05:27,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/0240590c-a924-4daf-b3cf-a1de10df567a/images/11111111-1111-1111-1111-111111111111/ubuntu-10.04.3-server-amd64.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/vm1.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -spice port=5901,tls-port=5902,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record -k en-us -vga qxl -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
2012-07-29 07:05:27.522+0000: 624: debug : virCommandRunAsync:2201 : Command result 0, with PID 5177
2012-07-29 07:05:27.523+0000: 624: debug : virCommandRun:1999 : Result status 0, stdout: '(null)' stderr: '(null)'
2012-07-29 07:05:27.523+0000: 624: debug : qemuProcessStart:3596 : Writing early domain status to disk
2012-07-29 07:05:27.523+0000: 624: debug : qemuProcessStart:3601 : Waiting for handshake from child
2012-07-29 07:05:27.523+0000: 624: debug : virCommandHandshakeWait:2491 : Wait for handshake on 26
2012-07-29 07:05:27.566+0000: 624: debug : qemuProcessStart:3606 : Setting domain security labels
2012-07-29 07:05:27.566+0000: 624: debug : virStorageFileIsSharedFSType:1031 : Check if path /rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/0240590c-a924-4daf-b3cf-a1de10df567a/images/11111111-1111-1111-1111-111111111111/ubuntu-10.04.3-server-amd64.iso with FS magic 26985 is shared
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3635 : Labelling done, completing handshake to child
2012-07-29 07:05:27.567+0000: 624: debug : virCommandHandshakeNotify:2558 : Notify handshake on 30
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3639 : Handshake complete, child running
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3650 : Waiting for monitor to show up
2012-07-29 07:05:27.668+0000: 624: error : qemuProcessReadLogOutput:1298 : internal error Process exited while reading console log output: char device redirected to /dev/pts/2
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied


Version-Release number of selected component (if applicable):
[lvroyce@localhost test_base]$ rpm -q libvirt
libvirt-0.9.11.4-3.fc17.x86_64
[lvroyce@localhost test_base]$ rpm -q vdsm
vdsm-4.10.0-0.185.gitb52165e.fc17.lvroyce1343027059.x86_64


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Royce Lv 2012-08-02 07:17:01 UTC
*** Bug 845179 has been marked as a duplicate of this bug. ***

Comment 2 Eduardo Warszawski 2012-08-12 08:26:45 UTC
Please add libvirt, vdsm and engine complete logs.

Comment 3 Eduardo Warszawski 2012-08-12 11:07:13 UTC
Tested in a loop, the VM always starts.

Comment 4 Gadi Ickowicz 2012-08-12 11:38:40 UTC
Created attachment 603768 [details]
vdsm logs

Tested using a hosted with ISCSI session already set up. 

The loop created a VM, tested to ensure it was up using virsh and then destroyed the VM.

500 iterations ran succesfully

Comment 5 Royce Lv 2012-08-13 02:18:18 UTC
Our test env is all in one,tgtd SCSI server also on the same machine.ISCSI session with no passwd.
I recomand to reproduce in the following way :
1.create a new LUN, login
2.check /dev/mapper/xxx (xxx is the symlink of the LUN) pointed to a device with owner/group:root/disk
3.create a new vm using this direct LUN

Gadi,
I guess if you can make sure your disk owner/grp is root/disk before create vm,
this problem can be reproduced.
I checked your log and found you used this direct lun disk before your test.And if you already used the disk, it means the vdsm changed it's owner/group, so how many time you tried after, vm will be started successfully.This bug is triggered when first time use a fresh direct Lun disk. 

Edward,
I checked the udev-182 source code, the write rules and reload rules are really ran in an async way,so there is indeed some race here.

Comment 6 Royce Lv 2012-08-13 02:22:19 UTC
Created attachment 603837 [details]
engine_vdsm_libvirt logs

Comment 7 Gadi Ickowicz 2012-08-13 06:20:20 UTC
(In reply to comment #5)
> Our test env is all in one,tgtd SCSI server also on the same machine.ISCSI
> session with no passwd.
> I recomand to reproduce in the following way :
> 1.create a new LUN, login
> 2.check /dev/mapper/xxx (xxx is the symlink of the LUN) pointed to a device
> with owner/group:root/disk
> 3.create a new vm using this direct LUN
> 
> Gadi,
> I guess if you can make sure your disk owner/grp is root/disk before create
> vm,
> this problem can be reproduced.
> I checked your log and found you used this direct lun disk before your
> test.And if you already used the disk, it means the vdsm changed it's
> owner/group, so how many time you tried after, vm will be started
> successfully.This bug is triggered when first time use a fresh direct Lun
> disk. 
> 
> Edward,
> I checked the udev-182 source code, the write rules and reload rules are
> really ran in an async way,so there is indeed some race here.

I actually changed the ownership back to root/disk after each run in the test loop after each machine is destroyed

Comment 8 Royce Lv 2012-08-13 14:25:59 UTC
Gadi,
Thanks for clarification.
my udev is udev-182,shipped with fc17,according to its changeLog:
move rules dirs to udev context; replace inotify with time-controlled stat()

I guess if you are using old udev, the inotify will notify udev rules load to memory immediately,but the newest udev is doing like this:
 /* check for changed config, every 3 seconds at most */
                if ((now_usec() - last_usec) > 3 * 1000 * 1000) {
                        if (check_rules_timestamp(udev))
                                reload = true;
                        if (udev_builtin_validate(udev))
                                reload = true;

                        last_usec = now_usec();
                }

So it also depends on udev version.

Comment 9 Royce Lv 2012-09-19 03:13:29 UTC
Upstream patch:
http://gerrit.ovirt.org/#/c/6780/
in reload cmd in RHEL udev-173 is :
# udevadm control --reload-rules
but in fc17-udev-182 is:
#udevadm control --reload

Due to udev cmd change, choose to sleep waiting periodicalliy rules load  instead of sychronized reload rules.