Bug 1150427

Summary: [hosted-engine-setup] [iSCSI support] Deployment fails with a 'Sanlock lockspace write failure'
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: vdsmAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED DUPLICATE QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: urgent    
Version: 3.5.0CC: amureini, bazulay, didi, ebenahar, ecohen, fsimonce, gklein, iheim, jmoskovc, lpeer, lsurette, lveyde, msivak, nsoffer, sbonazzo, sherold, stirabos, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-15 07:33:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1067162, 1149634    
Attachments:
Description Flags
vdsm and setup logs
none
sanlock.log and messages none

Description Elad 2014-10-08 09:42:40 UTC
Created attachment 944895 [details]
vdsm and setup logs

Description of problem:
Executed hosted-engine --deploy using iSCSI. The deployment fails everytime with:

[ ERROR ] Failed to execute stage 'Misc configuration': (19, 'Sanlock lockspace write failure', 'No such device')

Version-Release number of selected component (if applicable):
rhev3.5 vt5
vdsm-4.16.6-1.el7.x86_64
ovirt-hosted-engine-setup-1.2.1-1.el7.noarch


How reproducible:
Everytime, with 2 differnt setups

Steps to Reproduce:
1. hosted-engine --deploy. Used iscsi, with a 'clean' LUN (no storage domain on it)


Actual results:
Deployment fails.

vdsm.log:

Thread-56::ERROR::2014-10-08 12:32:49,116::task::866::Storage.TaskManager.Task::(_setError) Task=`edc7d5bf-5919-4a62-a314-843cc863bcdf`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3164, in getVolumePath
    volUUID=volUUID).getVolumePath()
  File "/usr/share/vdsm/storage/sd.py", line 400, in produceVolume
    volUUID)
  File "/usr/share/vdsm/storage/blockVolume.py", line 80, in __init__
    volume.Volume.__init__(self, repoPath, sdUUID, imgUUID, volUUID)
  File "/usr/share/vdsm/storage/volume.py", line 144, in __init__
    self.validate()
  File "/usr/share/vdsm/storage/blockVolume.py", line 88, in validate
    raise se.VolumeDoesNotExist(self.volUUID)  # Fix me
VolumeDoesNotExist: Volume does not exist: ('158c27c7-be1a-4f51-b6f5-d13eadd7ebd7',)




Additional info:
vdsm and setup logs

Comment 1 Sandro Bonazzola 2014-10-08 11:22:57 UTC
raise se.VolumeDoesNotExist(self.volUUID)  # Fix me

While I'll try to understand why this happens, someone in storage can check if this Fix me is the cause of the issue?

Comment 2 Sandro Bonazzola 2014-10-08 15:05:14 UTC
2014-10-08 12:32:50 DEBUG otopi.plugins.ovirt_hosted_engine_setup.sanlock.lockspace lockspace._misc:215 Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /rhev/data-center/mnt/blockSD/5eeaea5c-b19a-4458-9321-ce5af82219fd/images/8a50ea18-6e67-435b-87d2-2d20bd6df2d4/8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627)
2014-10-08 12:32:50 DEBUG otopi.context context._executeMethod:152 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 142, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/sanlock/lockspace.py", line 225, in _misc
    offset=offset
SanlockException: (19, 'Sanlock lockspace write failure', 'No such device')
2014-10-08 12:32:50 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Misc configuration': (19, 'Sanlock lockspace write failure', 'No such device')

So the lockspace has been created but failed the write operation.

Lockspace UUIDs are:
OVEHOSTED_STORAGE/lockspaceImageUUID=str:'8a50ea18-6e67-435b-87d2-2d20bd6df2d4'
OVEHOSTED_STORAGE/lockspaceVolumeUUID=str:'8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627'


Looking at VDSM logs around 12:32:50:

Thread-62::INFO::2014-10-08 12:32:50,368::logUtils::44::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='5eeaea5c-b19a-4458-9321-ce5af82219fd', spUUID='00000000-0000-0000-0000-000000000000', imgUUID='8a50ea18-6e67-435b-87d2-2d20bd6df2d4', leafUUID='8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627')
...
Thread-62::DEBUG::2014-10-08 12:32:50,606::task::1191::Storage.TaskManager.Task::(prepare) Task=`f57e3805-7c1b-411f-887c-9756d70bf81a`::finished: {'info': {'domainID': '5eeaea5c-b19a-4458-9321-ce5af82219fd', 'volType': 'path', 'leaseOffset': 109051904, 'path': '/rhev/data-center/mnt/blockSD/5eeaea5c-b19a-4458-9321-ce5af82219fd/images/8a50ea18-6e67-435b-87d2-2d20bd6df2d4/8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', 'volumeID': '8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', 'leasePath': '/dev/5eeaea5c-b19a-4458-9321-ce5af82219fd/leases', 'imageID': '8a50ea18-6e67-435b-87d2-2d20bd6df2d4'}, 'path': '/var/run/vdsm/storage/5eeaea5c-b19a-4458-9321-ce5af82219fd/8a50ea18-6e67-435b-87d2-2d20bd6df2d4/8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', 'imgVolumesInfo': [{'domainID': '5eeaea5c-b19a-4458-9321-ce5af82219fd', 'volType': 'path', 'leaseOffset': 109051904, 'path': '/rhev/data-center/mnt/blockSD/5eeaea5c-b19a-4458-9321-ce5af82219fd/images/8a50ea18-6e67-435b-87d2-2d20bd6df2d4/8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', 'volumeID': '8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', 'leasePath': '/dev/5eeaea5c-b19a-4458-9321-ce5af82219fd/leases', 'imageID': '8a50ea18-6e67-435b-87d2-2d20bd6df2d4'}]}


So the volume should be there.

Comment 3 Nir Soffer 2014-10-08 19:37:18 UTC
What I see in the log is successful attempt to check if volumes exists and create
two volumes. Nothing in the log show any unexpected behavior or error condition.

There is no storage issue in this log - the maintainers of the engine setup program
should debug this issue.

1. Trying to get volume 8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627 path:

Thread-50::INFO::2014-10-08 12:32:47,796::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumePath(sdUUID='5eeaea5c-b19a-4458-9321-ce5af82219fd', spUUID='b8862be2-e54
1-4b95-a462-81b12344eb8c', imgUUID='8a50ea18-6e67-435b-87d2-2d20bd6df2d4', volUUID='8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', options=None)

2. Volume does not exists

Thread-50::DEBUG::2014-10-08 12:32:47,799::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_s
uspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/3514f0c5447600007|'\'', '\''r|.*|'\'' ] }  global
 {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|
' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags 5eeaea5c-b19a-4458-9321-ce5af82219fd (cwd None)

Thread-50::WARNING::2014-10-08 12:32:47,844::lvm::600::Storage.LVM::(getLv) lv: 8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627 not found in lvs vg: 5eeaea5c-b19a-4458-9321-ce5af82219fd r
esponse

3. Which raises the exception VolumeDoesNotExists - expected behavior.

4. Volume 8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627 is created

Thread-51::INFO::2014-10-08 12:32:47,863::logUtils::44::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='5eeaea5c-b19a-4458-9321-ce5af82219fd', spUUID='b8862be2-e541-4b95-a462-81b12344eb8c', imgUUID='8a50ea18-6e67-435b-87d2-2d20bd6df2d4', size='1048576', volFormat=5, preallocate=1, diskType=2, volUUID='8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', desc='hosted-engine.lockspace', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000')

5. Volume 8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627 is created

487e1667-167c-4a81-ad71-573803f4cf59::DEBUG::2014-10-08 12:32:48,114::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm lvcreate --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/3514f0c5447600007|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --autobackup n --contiguous n --size 1m --addtag OVIRT_VOL_INITIALIZING --name 8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627 5eeaea5c-b19a-4458-9321-ce5af82219fd (cwd None)

6. Operation is successful

487e1667-167c-4a81-ad71-573803f4cf59::DEBUG::2014-10-08 12:32:48,253::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n  WARNING: This metadata update is NOT backed up\n  WARNING: This metadata update is NOT backed up\n'; <rc> = 0

7. Result reported to client

Thread-53::INFO::2014-10-08 12:32:48,999::logUtils::44::dispatcher::(wrapper) Run and protect: getTaskStatus(taskID='487e1667-167c-4a81-ad71-573803f4cf59', spUUID=None, options=None)
Thread-53::DEBUG::2014-10-08 12:32:49,000::taskManager::106::Storage.TaskManager::(getTaskStatus) Return. Response: {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '487e1667-167c-4a81-ad71-573803f4cf59'}

8. Attempt to get volume 158c27c7-be1a-4f51-b6f5-d13eadd7ebd7 path

Thread-56::INFO::2014-10-08 12:32:49,058::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumePath(sdUUID='5eeaea5c-b19a-4458-9321-ce5af82219fd', spUUID='b8862be2-e541-4b95-a462-81b12344eb8c', imgUUID='5e971f6f-4c46-4fd5-b37d-10ce3624559e', volUUID='158c27c7-be1a-4f51-b6f5-d13eadd7ebd7', options=None)

9. Volume does not exists (expected)

Thread-56::WARNING::2014-10-08 12:32:49,116::lvm::600::Storage.LVM::(getLv) lv: 158c27c7-be1a-4f51-b6f5-d13eadd7ebd7 not found in lvs vg: 5eeaea5c-b19a-4458-9321-ce5af82219fd response

10. Attempt to create the volume:

Thread-57::INFO::2014-10-08 12:32:49,143::logUtils::44::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='5eeaea5c-b19a-4458-9321-ce5af82219fd', spUUID='b8862be2-e541-4b95-a462-81b12344eb8c', imgUUID='5e971f6f-4c46-4fd5-b37d-10ce3624559e', size='1028096', volFormat=5, preallocate=1, diskType=2, volUUID='158c27c7-be1a-4f51-b6f5-d13eadd7ebd7', desc='hosted-engine.metadata', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000')

11. Operations succeeds (lot of warnings in the lvm command, but these are not related to the operation)

bcbc87da-a8df-450f-986d-c0c63bf01a45::DEBUG::2014-10-08 12:32:49,441::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm lvcreate --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/3514f0c5447600004|/dev/mapper/3514f0c5447600005|/dev/mapper/3514f0c5447600006|/dev/mapper/3514f0c5447600007|/dev/mapper/3514f0c5447600014|/dev/mapper/3514f0c5447600015|/dev/mapper/3514f0c5447600016|/dev/mapper/360060160128230005ed025a00a02e411|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --autobackup n --contiguous n --size 1m --addtag OVIRT_VOL_INITIALIZING --name 158c27c7-be1a-4f51-b6f5-d13eadd7ebd7 5eeaea5c-b19a-4458-9321-ce5af82219fd (cwd None)

bcbc87da-a8df-450f-986d-c0c63bf01a45::DEBUG::2014-10-08 12:32:49,578::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n  /dev/mapper/3514f0c5447600014: read failed after 0 of 4096 at 53687025664: Input/output error\n  /dev/mapper/3514f0c5447600014: read failed after 0 of 4096 at 53687083008: Input/output error\n  /dev/mapper/3514f0c5447600014: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/3514f0c5447600014 was disabled\n  /dev/mapper/3514f0c5447600015: read failed after 0 of 4096 at 53687025664: Input/output error\n  /dev/mapper/3514f0c5447600015: read failed after 0 of 4096 at 53687083008: Input/output error\n  /dev/mapper/3514f0c5447600015: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/3514f0c5447600015 was disabled\n  /dev/mapper/360060160128230005ed025a00a02e411: read failed after 0 of 4096 at 32212189184: Input/output error\n  /dev/mapper/360060160128230005ed025a00a02e411: read failed after 0 of 4096 at 32212246528: Input/output error\n  /dev/mapper/360060160128230005ed025a00a02e411: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/360060160128230005ed025a00a02e411 was disabled\n  WARNING: This metadata update is NOT backed up\n  WARNING: This metadata update is NOT backed up\n'; <rc> = 0

12. Success reported to client

Thread-59::INFO::2014-10-08 12:32:50,305::logUtils::44::dispatcher::(wrapper) Run and protect: getTaskStatus(taskID='bcbc87da-a8df-450f-986d-c0c63bf01a45', spUUID=None, options=None)
Thread-59::DEBUG::2014-10-08 12:32:50,306::taskManager::106::Storage.TaskManager::(getTaskStatus) Return. Response: {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': 'bcbc87da-a8df-450f-986d-c0c63bf01a45'}

13. Prepare image

Thread-62::INFO::2014-10-08 12:32:50,368::logUtils::44::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='5eeaea5c-b19a-4458-9321-ce5af82219fd', spUUID='00000000-0000-0000-0000-000000000000', imgUUID='8a50ea18-6e67-435b-87d2-2d20bd6df2d4', leafUUID='8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627')

14. Success

Thread-62::INFO::2014-10-08 12:32:50,606::logUtils::47::dispatcher::(wrapper) Run and protect: prepareImage, Return response: {'info': {'domainID': '5eeaea5c-b19a-4458-9321-ce5af82219fd', 'volType': 'path', 'leaseOffset': 109051904, 'path': '/rhev/data-center/mnt/blockSD/5eeaea5c-b19a-4458-9321-ce5af82219fd/images/8a50ea18-6e67-435b-87d2-2d20bd6df2d4/8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', 'volumeID': '8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', 'leasePath': '/dev/5eeaea5c-b19a-4458-9321-ce5af82219fd/leases', 'imageID': '8a50ea18-6e67-435b-87d2-2d20bd6df2d4'}, 'path': '/var/run/vdsm/storage/5eeaea5c-b19a-4458-9321-ce5af82219fd/8a50ea18-6e67-435b-87d2-2d20bd6df2d4/8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', 'imgVolumesInfo': [{'domainID': '5eeaea5c-b19a-4458-9321-ce5af82219fd', 'volType': 'path', 'leaseOffset': 109051904, 'path': '/rhev/data-center/mnt/blockSD/5eeaea5c-b19a-4458-9321-ce5af82219fd/images/8a50ea18-6e67-435b-87d2-2d20bd6df2d4/8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', 'volumeID': '8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627', 'leasePath': '/dev/5eeaea5c-b19a-4458-9321-ce5af82219fd/leases', 'imageID': '8a50ea18-6e67-435b-87d2-2d20bd6df2d4'}]}

15. Another prepare image

Thread-64::INFO::2014-10-08 12:32:50,638::logUtils::44::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='5eeaea5c-b19a-4458-9321-ce5af82219fd', spUUID='00000000-0000-0000-0000-000000000000', imgUUID='5e971f6f-4c46-4fd5-b37d-10ce3624559e', leafUUID='158c27c7-be1a-4f51-b6f5-d13eadd7ebd7')

16. Success

Thread-64::INFO::2014-10-08 12:32:50,882::logUtils::47::dispatcher::(wrapper) Run and protect: prepareImage, Return response: {'info': {'domainID': '5eeaea5c-b19a-4458-9321-ce5af82219fd', 'volType': 'path', 'leaseOffset': 110100480, 'path': '/rhev/data-center/mnt/blockSD/5eeaea5c-b19a-4458-9321-ce5af82219fd/images/5e971f6f-4c46-4fd5-b37d-10ce3624559e/158c27c7-be1a-4f51-b6f5-d13eadd7ebd7', 'volumeID': '158c27c7-be1a-4f51-b6f5-d13eadd7ebd7', 'leasePath': '/dev/5eeaea5c-b19a-4458-9321-ce5af82219fd/leases', 'imageID': '5e971f6f-4c46-4fd5-b37d-10ce3624559e'}, 'path': '/var/run/vdsm/storage/5eeaea5c-b19a-4458-9321-ce5af82219fd/5e971f6f-4c46-4fd5-b37d-10ce3624559e/158c27c7-be1a-4f51-b6f5-d13eadd7ebd7', 'imgVolumesInfo': [{'domainID': '5eeaea5c-b19a-4458-9321-ce5af82219fd', 'volType': 'path', 'leaseOffset': 110100480, 'path': '/rhev/data-center/mnt/blockSD/5eeaea5c-b19a-4458-9321-ce5af82219fd/images/5e971f6f-4c46-4fd5-b37d-10ce3624559e/158c27c7-be1a-4f51-b6f5-d13eadd7ebd7', 'volumeID': '158c27c7-be1a-4f51-b6f5-d13eadd7ebd7', 'leasePath': '/dev/5eeaea5c-b19a-4458-9321-ce5af82219fd/leases', 'imageID': '5e971f6f-4c46-4fd5-b37d-10ce3624559e'}]}

After that spm is stopped and pool is disconnected.

Comment 4 Nir Soffer 2014-10-08 19:42:41 UTC
(In reply to Sandro Bonazzola from comment #2)
> 2014-10-08 12:32:50 DEBUG
> otopi.plugins.ovirt_hosted_engine_setup.sanlock.lockspace
> lockspace._misc:215 Ensuring lease for lockspace hosted-engine, host id 1 is
> acquired (file:
> /rhev/data-center/mnt/blockSD/5eeaea5c-b19a-4458-9321-ce5af82219fd/images/
> 8a50ea18-6e67-435b-87d2-2d20bd6df2d4/8474bd7d-bd3c-4bf6-b86a-01e8b6e0e627)
> 2014-10-08 12:32:50 DEBUG otopi.context context._executeMethod:152 method
> exception
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/otopi/context.py", line 142, in
> _executeMethod
>     method['method']()
>   File
> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-
> setup/sanlock/lockspace.py", line 225, in _misc
>     offset=offset
> SanlockException: (19, 'Sanlock lockspace write failure', 'No such device')
> 2014-10-08 12:32:50 ERROR otopi.context context._executeMethod:161 Failed to
> execute stage 'Misc configuration': (19, 'Sanlock lockspace write failure',
> 'No such device')

The maintainer of the ovirt-hosted-engine-setup should debug this. It seems that the device does not exits when the program try to use it.

(also returning need info for Federico, removed by bugzilla)

Comment 5 Nir Soffer 2014-10-08 20:01:07 UTC
Please add /var/log/sanlock.log and /var/log/messages - we may find there why sanlock failed to write to the lockspace.

Comment 6 Federico Simoncelli 2014-10-08 21:48:58 UTC
(In reply to Sandro Bonazzola from comment #1)
> raise se.VolumeDoesNotExist(self.volUUID)  # Fix me
> 
> While I'll try to understand why this happens, someone in storage can check
> if this Fix me is the cause of the issue?

For 3.6 you should stop using getVolumePath since it was finally
dropped. The correct API (since December 2011) is prepareImage
(which is returning all the information that you need, path included).

Also the logic getVolumePath(id1) => createVolume(id1) is flawed if
it is intended to avoid the creation of the volume when it is already
present (racy).

Just try to create the volume and handle the VolumeAlreadyExists error.

In reply to comment #1, it seems that the "Fix me" comment was added
in 2010 without any apparent reason (we'll remove it with
http://gerrit.ovirt.org/33949 )

Comment 7 Sandro Bonazzola 2014-10-09 06:28:07 UTC
Waiting on sanlock and messages logs as per comment #5.
Please also add audit.log, just for excluding selinux issues.

Comment 8 Sandro Bonazzola 2014-10-09 06:31:56 UTC
Nir, thanks for analysis and explanation, BTW, the write is after the prepare image calls and before spm stop(cleanup action after error), so explanation should be in missing logs. 

Federico thanks for the info, do you mind to give hosted-engine code a review and push a patch to master branch with todo/fixme comments on obsoleted commands and their replacement?

Comment 9 Sandro Bonazzola 2014-10-10 13:03:37 UTC
Tried to reproduce on a clean EL7 system without success.

Comment 10 Elad 2014-10-12 06:33:45 UTC
Created attachment 946074 [details]
sanlock.log and messages

Comment 11 Elad 2014-10-12 07:04:20 UTC
(In reply to Sandro Bonazzola from comment #9)
> Tried to reproduce on a clean EL7 system without success.

It reproduced only when using the latest downstream build - vt5 on RHEL7. Deployment on previous builds didn't raise this specific failure

Comment 12 Elad 2014-10-12 07:12:43 UTC
For some reason, /var/log/audit/ dir is empty

Comment 13 Simone Tiraboschi 2014-10-14 17:34:52 UTC
Reproduced on RHEL 7 with vt5.

 [ INFO  ] Creating Volume Group
 [ INFO  ] Creating Storage Domain
 [ INFO  ] Creating Storage Pool
 [ INFO  ] Connecting Storage Pool
 [ INFO  ] Verifying sanlock lockspace initialization
 [ ERROR ] Failed to execute stage 'Misc configuration': (19, 'Sanlock lockspace write failure', 'No such device')
 [ INFO  ] Stage: Clean up

With sanlock
 [root@r70st1 ~]# rpm -qa | grep -i sanlock
 sanlock-lib-3.2.2-1.el7.x86_64
 sanlock-3.2.2-1.el7.x86_64
 sanlock-python-3.2.2-1.el7.x86_64
 libvirt-lock-sanlock-1.2.8-5.el7.x86_64

I'll investigate it.

Comment 14 Simone Tiraboschi 2014-10-14 17:39:33 UTC
It seams to be SELinux related:

 time->Tue Oct 14 19:27:22 2014
 type=SYSCALL msg=audit(1413307642.872:1663): arch=c000003e syscall=2 success=no exit=-13 a0=7f2a48966410 a1=105002 a2=0 a3=1 items=0 ppid=1 pid=5195 auid=4294967295 uid=179 gid=179 euid=179 suid=179 fsuid=179 egid=179 sgid=179 fsgid=179 tty=(none) ses=4294967295 comm="sanlock" exe="/usr/sbin/sanlock" subj=system_u:system_r:sanlock_t:s0-s0:c0.c1023 key=(null)
 type=AVC msg=audit(1413307642.872:1663): avc:  denied  { read write } for  pid=5195 comm="sanlock" name="dm-9" dev="devtmpfs" ino=76303 scontext=system_u:system_r:sanlock_t:s0-s0:c0.c1023 tcontext=system_u:object_r:svirt_image_t:s0 tclass=blk_file

Comment 15 Sandro Bonazzola 2014-10-15 07:33:31 UTC
Same issue as in bug #1148712. Marking as duplicate.

*** This bug has been marked as a duplicate of bug 1148712 ***

Comment 16 Simone Tiraboschi 2014-10-15 07:42:23 UTC

*** This bug has been marked as a duplicate of bug 1146529 ***