Bug 857889 - beta3 - vm with direct lun fails to start
beta3 - vm with direct lun fails to start
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm (Show other bugs)
6.3
Unspecified Unspecified
urgent Severity urgent
: beta
: ---
Assigned To: Greg Padgett
Haim
storage
:
Depends On: 844130
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-17 08:29 EDT by Andrew Cathrow
Modified: 2016-06-12 19:15 EDT (History)
13 users (show)

See Also:
Fixed In Version: SI21
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 844130
Environment:
Last Closed: 2012-10-04 08:00:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Andrew Cathrow 2012-09-17 08:29:03 EDT
Cloning from upstream to RHEL6 since this has been reported with RHEV 3.1 beta2




+++ This bug was initially created as a clone of Bug #844130 +++

Description of problem:
successfully add a lun to a vm as the only disk,
the lun status is active,
but when start the vm,get the following bug and vm fail to start:


Thread-61739::ERROR::2012-07-27 10:47:45,808::vm::600::vm.Vm::(_startUnderlyingVm) vmId=`0cab4e53-3c10-400f-90dc-c445af68da2d`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 566, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1361, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2420, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error Process exited while reading console log output: char device redirected to /dev/pts/5
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied

Thread-61739::DEBUG::2012-07-27 10:47:45,810::vm::916::vm.Vm::(setDownStatus) vmId=`0cab4e53-3c10-400f-90dc-c445af68da2d`::Changed state to Down: internal error Process exited while reading console log output: char device redirected to /dev/pts/5
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied

Dummy-254::DEBUG::2012-07-27 10:47:46,505::__init__::1249::Storage.Misc.excCmd::(_log) 'dd if=/rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000' (cwd None)
Dummy-254::DEBUG::2012-07-27 10:47:46,537::__init__::1249::Storage.Misc.excCmd::(_log) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0167183 s, 61.3 MB/s\n'; <rc> = 0
Thread-61742::DEBUG::2012-07-27 10:47:46,960::BindingXMLRPC::872::vds::(wrapper) client [9.115.122.67]::call vmGetStats with ('0cab4e53-3c10-400f-90dc-c445af68da2d',) {} flowID [20d378e8]
Thread-61742::DEBUG::2012-07-27 10:47:46,961::BindingXMLRPC::879::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Down', 'hash': '0', 'exitMessage': 'internal error Process exited while reading console log output: char device redirected to /dev/pts/5\ndo_spice_init: starting 0.10.1\nspice_server_add_interface: SPICE_INTERFACE_MIGRATION\nspice_server_add_interface: SPICE_INTERFACE_KEYBOARD\nspice_server_add_interface: SPICE_INTERFACE_MOUSE\nqemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied\n', 'vmId': '0cab4e53-3c10-400f-90dc-c445af68da2d', 'timeOffset': '0', 'exitCode': 1}]


libvirt log:
2012-07-29 07:05:27.521+0000: 624: debug : virCommandRunAsync:2183 : About to run LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/bin/qemu-kvm -S -M pc-0.14 -cpu kvm64,+lahf_lm,+popcnt,+sse4.2,+sse4.1,+ssse3 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name vm1 -uuid 0cab4e53-3c10-400f-90dc-c445af68da2d -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=17-1,serial=0269D49C-BE4F-11E1-B3AB-4F7B23F93C00_00:21:86:2a:aa:f9,uuid=0cab4e53-3c10-400f-90dc-c445af68da2d -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2012-07-29T07:05:27,driftfix=slew -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/0240590c-a924-4daf-b3cf-a1de10df567a/images/11111111-1111-1111-1111-111111111111/ubuntu-10.04.3-server-amd64.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/vm1.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -spice port=5901,tls-port=5902,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record -k en-us -vga qxl -global qxl-vga.vram_size=67108864 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
2012-07-29 07:05:27.522+0000: 624: debug : virCommandRunAsync:2201 : Command result 0, with PID 5177
2012-07-29 07:05:27.523+0000: 624: debug : virCommandRun:1999 : Result status 0, stdout: '(null)' stderr: '(null)'
2012-07-29 07:05:27.523+0000: 624: debug : qemuProcessStart:3596 : Writing early domain status to disk
2012-07-29 07:05:27.523+0000: 624: debug : qemuProcessStart:3601 : Waiting for handshake from child
2012-07-29 07:05:27.523+0000: 624: debug : virCommandHandshakeWait:2491 : Wait for handshake on 26
2012-07-29 07:05:27.566+0000: 624: debug : qemuProcessStart:3606 : Setting domain security labels
2012-07-29 07:05:27.566+0000: 624: debug : virStorageFileIsSharedFSType:1031 : Check if path /rhev/data-center/8b2002e6-9f7f-490d-a8d7-022c1dfe98a3/0240590c-a924-4daf-b3cf-a1de10df567a/images/11111111-1111-1111-1111-111111111111/ubuntu-10.04.3-server-amd64.iso with FS magic 26985 is shared
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3635 : Labelling done, completing handshake to child
2012-07-29 07:05:27.567+0000: 624: debug : virCommandHandshakeNotify:2558 : Notify handshake on 30
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3639 : Handshake complete, child running
2012-07-29 07:05:27.567+0000: 624: debug : qemuProcessStart:3650 : Waiting for monitor to show up
2012-07-29 07:05:27.668+0000: 624: error : qemuProcessReadLogOutput:1298 : internal error Process exited while reading console log output: char device redirected to /dev/pts/2
do_spice_init: starting 0.10.1
spice_server_add_interface: SPICE_INTERFACE_MIGRATION
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
qemu-kvm: -drive file=/dev/mapper/1IET_00010002,if=none,id=drive-virtio-disk0,format=raw,serial=,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /dev/mapper/1IET_00010002: Permission denied


Version-Release number of selected component (if applicable):
[lvroyce@localhost test_base]$ rpm -q libvirt
libvirt-0.9.11.4-3.fc17.x86_64
[lvroyce@localhost test_base]$ rpm -q vdsm
vdsm-4.10.0-0.185.gitb52165e.fc17.lvroyce1343027059.x86_64


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

--- Additional comment from lvroyce@linux.vnet.ibm.com on 2012-08-02 03:17:01 EDT ---

*** Bug 845179 has been marked as a duplicate of this bug. ***

--- Additional comment from ewarszaw@redhat.com on 2012-08-12 04:26:45 EDT ---

Please add libvirt, vdsm and engine complete logs.

--- Additional comment from ewarszaw@redhat.com on 2012-08-12 07:07:13 EDT ---

Tested in a loop, the VM always starts.

--- Additional comment from gickowic@redhat.com on 2012-08-12 07:38:40 EDT ---

Created attachment 603768 [details]
vdsm logs

Tested using a hosted with ISCSI session already set up. 

The loop created a VM, tested to ensure it was up using virsh and then destroyed the VM.

500 iterations ran succesfully

--- Additional comment from lvroyce@linux.vnet.ibm.com on 2012-08-12 22:18:18 EDT ---

Our test env is all in one,tgtd SCSI server also on the same machine.ISCSI session with no passwd.
I recomand to reproduce in the following way :
1.create a new LUN, login
2.check /dev/mapper/xxx (xxx is the symlink of the LUN) pointed to a device with owner/group:root/disk
3.create a new vm using this direct LUN

Gadi,
I guess if you can make sure your disk owner/grp is root/disk before create vm,
this problem can be reproduced.
I checked your log and found you used this direct lun disk before your test.And if you already used the disk, it means the vdsm changed it's owner/group, so how many time you tried after, vm will be started successfully.This bug is triggered when first time use a fresh direct Lun disk. 

Edward,
I checked the udev-182 source code, the write rules and reload rules are really ran in an async way,so there is indeed some race here.

--- Additional comment from lvroyce@linux.vnet.ibm.com on 2012-08-12 22:22:19 EDT ---

Created attachment 603837 [details]
engine_vdsm_libvirt logs

--- Additional comment from gickowic@redhat.com on 2012-08-13 02:20:20 EDT ---

(In reply to comment #5)
> Our test env is all in one,tgtd SCSI server also on the same machine.ISCSI
> session with no passwd.
> I recomand to reproduce in the following way :
> 1.create a new LUN, login
> 2.check /dev/mapper/xxx (xxx is the symlink of the LUN) pointed to a device
> with owner/group:root/disk
> 3.create a new vm using this direct LUN
> 
> Gadi,
> I guess if you can make sure your disk owner/grp is root/disk before create
> vm,
> this problem can be reproduced.
> I checked your log and found you used this direct lun disk before your
> test.And if you already used the disk, it means the vdsm changed it's
> owner/group, so how many time you tried after, vm will be started
> successfully.This bug is triggered when first time use a fresh direct Lun
> disk. 
> 
> Edward,
> I checked the udev-182 source code, the write rules and reload rules are
> really ran in an async way,so there is indeed some race here.

I actually changed the ownership back to root/disk after each run in the test loop after each machine is destroyed

--- Additional comment from lvroyce@linux.vnet.ibm.com on 2012-08-13 10:25:59 EDT ---

Gadi,
Thanks for clarification.
my udev is udev-182,shipped with fc17,according to its changeLog:
move rules dirs to udev context; replace inotify with time-controlled stat()

I guess if you are using old udev, the inotify will notify udev rules load to memory immediately,but the newest udev is doing like this:
 /* check for changed config, every 3 seconds at most */
                if ((now_usec() - last_usec) > 3 * 1000 * 1000) {
                        if (check_rules_timestamp(udev))
                                reload = true;
                        if (udev_builtin_validate(udev))
                                reload = true;

                        last_usec = now_usec();
                }

So it also depends on udev version.
Comment 1 Ayal Baron 2012-10-04 07:59:47 EDT
(In reply to comment #0)
> Cloning from upstream to RHEL6 since this has been reported with RHEV 3.1
> beta2
> 

After checking, this was *not* reproduced with RHEV 3.1, The reproduced bug was 
Bug 854141.
The thread that was referenced is: http://post-office.corp.redhat.com/archives/rhev-beta/2012-September/msg00196.html
Comment 2 Ayal Baron 2012-10-04 08:00:48 EDT
Note that just to verify we tried to reproduce (on rhel) and failed.

Note You need to log in before you can comment on or make changes to this bug.