Bug 844130 - vm with direct lun fails to start
Summary: vm with direct lun fails to start
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.3.4
Assignee: Eduardo Warszawski
QA Contact:
URL:
Whiteboard: storage
: 845179 (view as bug list)
Depends On:
Blocks: 857889
TreeView+ depends on / blocked
 
Reported: 2012-07-29 07:22 UTC by Royce Lv
Modified: 2016-02-10 17:08 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 857889 (view as bug list)
Environment:
Last Closed: 2013-09-04 21:13:33 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
vdsm logs (774.33 KB, application/x-gzip)
2012-08-12 11:38 UTC, Gadi Ickowicz
no flags Details
engine_vdsm_libvirt logs (61.78 KB, text/plain)
2012-08-13 02:22 UTC, Royce Lv
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 10758 0 None None None Never

Description Royce Lv 2012-07-29 07:22:13 UTC
Description of problem:
successfully add a lun to a vm as the only disk,
the lun status is active,
but when start the vm,get the following bug and vm fail to start:


Thread-61739::ERROR::2012-07-27 10:47:45,808::vm::600::vm.Vm::(_startUnderlyingVm) vmId=`0cab4e53-3c10-400f-90dc-c445af68da2d`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 566, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1361, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2420, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error Process exited while reading console log output: char device redirected to /dev/pts/5
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied

Thread-61739::DEBUG::2012-07-27 10:47:45,810::vm::916::vm.Vm::(setDownStatus) vmId=`0cab4e53-3c10-400f-90dc-c445af68da2d`::Changed state to Down: internal error Process exited while reading console log output: char device redirected to /dev/pts/5
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied

Dummy-254::DEBUG::2012-07-27 10:47:46,505::__init__::1249::Storage.Misc.excCmd::(_log) 'dd if=/rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None)
Dummy-254::DEBUG::2012-07-27 10:47:46,537::__init__::1249::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0167183 s, 61.3 MB/s\n'; <rc> = 0
Thread-61742::DEBUG::2012-07-27 10:47:46,960::BindingXMLRPC::872::vds::(wrapper) client [9.115.122.67]::call vmGetStats with ('0cab4e53-3c10-400f-90dc-c445af68da2d',) {} flowID [20d378e8]
Thread-61742::DEBUG::2012-07-27 10:47:46,961::BindingXMLRPC::879::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'hash': '0', 'exitMessage': 'internal error Process exited while reading console log output: char device redirected to /dev/pts/5\ndo_spice_init: starting 0.10.1\nspice_server_add_interface: SPICE_INTERFACE_MIGRATION\nspice_server_add_interface: SPICE_INTERFACE_KEYBOARD\nspice_server_add_interface: SPICE_INTERFACE_MOUSE\nqemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied\n', 'vmId': '0cab4e53-3c10-400f-90dc-c445af68da2d', 'timeOffset': '0', 'exitCode': 1}]


libvirt log:
2012-07-29 07:05:27.521+0000: 624: debug : virCommandRunAsync:2183 : About to run LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/bin/qemu-kvm -S -M pc-0.14 -cpu kvm64,+lahf_lm,+popcnt,+sse4.2,+sse4.1,+ssse3 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name vm1 -uuid 0cab4e53-3c10-400f-90dc-c445af68da2d -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=17-1,serial=0269D49C-BE4F-11E1-B3AB-4F7B23F93C00_00:21:86:2a:aa:f9,uuid=0cab4e53-3c10-400f-90dc-c445af68da2d -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2012-07-29T07:05:27,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/0240590c-a924-4daf-b3cf-a1de10df567a/images/11111111-1111-1111-1111-111111111111/ubuntu-10.04.3-server-amd64.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/vm1.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -spice port=5901,tls-port=5902,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record -k en-us -vga qxl -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
2012-07-29 07:05:27.522+0000: 624: debug : virCommandRunAsync:2201 : Command result 0, with PID 5177
2012-07-29 07:05:27.523+0000: 624: debug : virCommandRun:1999 : Result status 0, stdout: '(null)' stderr: '(null)'
2012-07-29 07:05:27.523+0000: 624: debug : qemuProcessStart:3596 : Writing early domain status to disk
2012-07-29 07:05:27.523+0000: 624: debug : qemuProcessStart:3601 : Waiting for handshake from child
2012-07-29 07:05:27.523+0000: 624: debug : virCommandHandshakeWait:2491 : Wait for handshake on 26
2012-07-29 07:05:27.566+0000: 624: debug : qemuProcessStart:3606 : Setting domain security labels
2012-07-29 07:05:27.566+0000: 624: debug : virStorageFileIsSharedFSType:1031 : Check if path /rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/0240590c-a924-4daf-b3cf-a1de10df567a/images/11111111-1111-1111-1111-111111111111/ubuntu-10.04.3-server-amd64.iso with FS magic 26985 is shared
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3635 : Labelling done, completing handshake to child
2012-07-29 07:05:27.567+0000: 624: debug : virCommandHandshakeNotify:2558 : Notify handshake on 30
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3639 : Handshake complete, child running
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3650 : Waiting for monitor to show up
2012-07-29 07:05:27.668+0000: 624: error : qemuProcessReadLogOutput:1298 : internal error Process exited while reading console log output: char device redirected to /dev/pts/2
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied


Version-Release number of selected component (if applicable):
[lvroyce@localhost test_base]$ rpm -q libvirt
libvirt-0.9.11.4-3.fc17.x86_64
[lvroyce@localhost test_base]$ rpm -q vdsm
vdsm-4.10.0-0.185.gitb52165e.fc17.lvroyce1343027059.x86_64


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Royce Lv 2012-08-02 07:17:01 UTC
*** Bug 845179 has been marked as a duplicate of this bug. ***

Comment 2 Eduardo Warszawski 2012-08-12 08:26:45 UTC
Please add libvirt, vdsm and engine complete logs.

Comment 3 Eduardo Warszawski 2012-08-12 11:07:13 UTC
Tested in a loop, the VM always starts.

Comment 4 Gadi Ickowicz 2012-08-12 11:38:40 UTC
Created attachment 603768 [details]
vdsm logs

Tested using a hosted with ISCSI session already set up. 

The loop created a VM, tested to ensure it was up using virsh and then destroyed the VM.

500 iterations ran succesfully

Comment 5 Royce Lv 2012-08-13 02:18:18 UTC
Our test env is all in one,tgtd SCSI server also on the same machine.ISCSI session with no passwd.
I recomand to reproduce in the following way :
1.create a new LUN, login
2.check /dev/mapper/xxx (xxx is the symlink of the LUN) pointed to a device with owner/group:root/disk
3.create a new vm using this direct LUN

Gadi,
I guess if you can make sure your disk owner/grp is root/disk before create vm,
this problem can be reproduced.
I checked your log and found you used this direct lun disk before your test.And if you already used the disk, it means the vdsm changed it's owner/group, so how many time you tried after, vm will be started successfully.This bug is triggered when first time use a fresh direct Lun disk. 

Edward,
I checked the udev-182 source code, the write rules and reload rules are really ran in an async way,so there is indeed some race here.

Comment 6 Royce Lv 2012-08-13 02:22:19 UTC
Created attachment 603837 [details]
engine_vdsm_libvirt logs

Comment 7 Gadi Ickowicz 2012-08-13 06:20:20 UTC
(In reply to comment #5)
> Our test env is all in one,tgtd SCSI server also on the same machine.ISCSI
> session with no passwd.
> I recomand to reproduce in the following way :
> 1.create a new LUN, login
> 2.check /dev/mapper/xxx (xxx is the symlink of the LUN) pointed to a device
> with owner/group:root/disk
> 3.create a new vm using this direct LUN
> 
> Gadi,
> I guess if you can make sure your disk owner/grp is root/disk before create
> vm,
> this problem can be reproduced.
> I checked your log and found you used this direct lun disk before your
> test.And if you already used the disk, it means the vdsm changed it's
> owner/group, so how many time you tried after, vm will be started
> successfully.This bug is triggered when first time use a fresh direct Lun
> disk. 
> 
> Edward,
> I checked the udev-182 source code, the write rules and reload rules are
> really ran in an async way,so there is indeed some race here.

I actually changed the ownership back to root/disk after each run in the test loop after each machine is destroyed

Comment 8 Royce Lv 2012-08-13 14:25:59 UTC
Gadi,
Thanks for clarification.
my udev is udev-182,shipped with fc17,according to its changeLog:
move rules dirs to udev context; replace inotify with time-controlled stat()

I guess if you are using old udev, the inotify will notify udev rules load to memory immediately,but the newest udev is doing like this:
 /* check for changed config, every 3 seconds at most */
                if ((now_usec() - last_usec) > 3 * 1000 * 1000) {
                        if (check_rules_timestamp(udev))
                                reload = true;
                        if (udev_builtin_validate(udev))
                                reload = true;

                        last_usec = now_usec();
                }

So it also depends on udev version.

Comment 9 Royce Lv 2012-09-19 03:13:29 UTC
Upstream patch:
http://gerrit.ovirt.org/#/c/6780/
in reload cmd in RHEL udev-173 is :
# udevadm control --reload-rules
but in fc17-udev-182 is:
#udevadm control --reload

Due to udev cmd change, choose to sleep waiting periodicalliy rules load  instead of sychronized reload rules.


Note You need to log in before you can comment on or make changes to this bug.