Bug 1142710

Summary: Volume creation failed while deploying Hosted Engine on iSCSI
Product: [Retired] oVirt Reporter: Antoni Segura Puimedon <asegurap>
Component: ovirt-hosted-engine-setupAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5CC: amureini, bazulay, bugs, ecohen, fsimonce, gklein, iheim, jmoskovc, kwolf, lsurette, mgoldboi, msivak, nsoffer, rbalakri, sbonazzo, stirabos, vered, yeylon
Target Milestone: ---Keywords: Reopened
Target Release: 3.5.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: integration
Fixed In Version: ovirt-3.5.1_rc1 Doc Type: Bug Fix
Doc Text:
Cause: Hosted-engine needs enough space to store the whole preallocated engine VM plus ancillary data structures to manage it on the selected LUN. The size of ancillary structures (about 5 GiB) was not computed. Consequence: If not enough free space was available it was going to fail with an unclear error message. Fix: Including a safety margin for ancillary structures in the space check. Result: If there is not enough free space the user is prompted again for engine image size till we get a valid value.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-21 16:02:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1155170, 1192937    
Attachments:
Description Flags
vdsm log
none
supervdsm log none

Description Antoni Segura Puimedon 2014-09-17 09:02:30 UTC
Description of problem:
When installing the hosted engine with iSCSI storage, it works fine until after
creating the VM, it tries to set a temporary VNC password and it fails and
aborts


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-1.2.0-0.1.master.20140908073936.git480e272.el6.noarch


How reproducible:



Steps to Reproduce:
1.Get some iSCSI storage (in my case 20Gb)
2.hosted-engine --deploy
3.answer the iSCSI data

Actual results:

    [ INFO  ] Stage: Setup validation
             
              --== CONFIGURATION PREVIEW ==--
             
              Bridge interface                   : em1
              Engine FQDN                        : testday3-engine
              Bridge name                        : ovirtmgmt
              SSH daemon port                    : 22
              Firewall manager                   : iptables
              Gateway address                    : xxxxxxxxxxxxxx
              Host name for web application      : hosted_engine_mpavlik
              iSCSI Target Name                  : xxxxxxxxxxxxxxxx:mpavlik
              iSCSI Portal port                  : 3260
              Host ID                            : 1
              iSCSI LUN ID                       : 2
              Image size GB                      : 18
              iSCSI Portal IP Address            : xxxxxxxxxxx
              iSCSI Portal user                  : mpavlik
              Console type                       : vnc
              Memory size MB                     : 4096
              MAC address                        : 00:16:3e:58:4a:4f
              Boot type                          : pxe
              Number of CPUs                     : 2
              CPU Type                           : model_SandyBridge
             
              Please confirm installation settings (Yes, No)[Yes]: 
    [ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
    [ INFO  ] Stage: Transaction setup
    [ INFO  ] Stage: Misc configuration
    [ INFO  ] Stage: Package installation
    [ INFO  ] Stage: Misc configuration
    [ INFO  ] Configuring libvirt
    [ INFO  ] Configuring VDSM
    [ INFO  ] Starting vdsmd
    [ INFO  ] Waiting for VDSM hardware info
    [ INFO  ] Waiting for VDSM hardware info
    [ INFO  ] Waiting for VDSM hardware info
    [ INFO  ] Configuring the management bridge
    [ INFO  ] Creating Volume Group
    [ INFO  ] Creating Storage Domain
    [ INFO  ] Creating Storage Pool
    [ INFO  ] Connecting Storage Pool
    [ INFO  ] Verifying sanlock lockspace initialization
    [ INFO  ] Creating VM Image
    [ INFO  ] Disconnecting Storage Pool
    [ INFO  ] Start monitoring domain
    [ INFO  ] Configuring VM
    [ INFO  ] Updating hosted-engine configuration
    [ INFO  ] Stage: Transaction commit
    [ INFO  ] Stage: Closing up
    [ INFO  ] Creating VM
    [ ERROR ] Failed to execute stage 'Closing up': Cannot set temporary password for console connection. The VM may not have been created: please check VDSM logs
    [ INFO  ] Stage: Clean up
    [ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
    [ INFO  ] Answer file '/etc/ovirt-hosted-engine/answers.conf' has been updated
    [ INFO  ] Stage: Pre-termination
    [ INFO  ] Stage: Termination
    [root@dell-r210ii-07 ~]# 


Expected results:
Get data for connecting to the VM and installing the engine there.

Additional info:

Comment 1 Antoni Segura Puimedon 2014-09-17 09:06:44 UTC
Created attachment 938393 [details]
vdsm log

Comment 2 Antoni Segura Puimedon 2014-09-17 09:07:13 UTC
Created attachment 938394 [details]
supervdsm log

Comment 3 Antoni Segura Puimedon 2014-09-17 09:08:02 UTC
From the log, I see a lot of:

Thread-99::ERROR::2014-09-17 10:56:23,185::BindingXMLRPC::1151::vds::(wrapper) unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 1135, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 383, in vmSetTicket
    return vm.setTicket(password, ttl, existingConnAction, params)
  File "/usr/share/vdsm/API.py", line 634, in setTicket
    return v.setTicket(password, ttl, existingConnAction, params)
  File "/usr/share/vdsm/virt/vm.py", line 4721, in setTicket
    graphics = _domParseStr(self._dom.XMLDesc(0)).childNodes[0]. \
AttributeError: 'NoneType' object has no attribute 'XMLDesc'

Comment 4 Sandro Bonazzola 2014-09-18 06:26:21 UTC
The real failure here is that the VM has not been started:

Thread-88::ERROR::2014-09-17 10:56:12,964::vm::2326::vm.Vm::(_startUnderlyingVm) vmId=`cb8b3cd0-cc10-4315-8c57-adbe013d3416`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2266, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 3301, in _run
    devices = self.buildConfDevices()
  File "/usr/share/vdsm/virt/vm.py", line 2063, in buildConfDevices
    self._normalizeVdsmImg(drv)
  File "/usr/share/vdsm/virt/vm.py", line 1986, in _normalizeVdsmImg
    drv['volumeID'])
StorageUnavailableError: ('Failed to get size for volume %s', '9507f07f-79aa-4c7c-a963-521a13e5d446')

Comment 5 Sandro Bonazzola 2014-09-18 06:27:11 UTC
This is caused by:
Thread-88::ERROR::2014-09-17 10:56:12,963::dispatcher::76::Storage.Dispatcher::(wrapper) {'status': {'message': "Logical volume does not exist: ('8d38fe02-fbfe-4e70-9785-94954d33358c/9507f07f-79aa-4c7c-a963-521a13e5d446',)", 'code': 610}}

Comment 6 Sandro Bonazzola 2014-09-18 06:28:32 UTC
the Volume has not been created because:

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 334, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1821, in createVolume
    desc=desc, srcImgUUID=srcImgUUID, srcVolUUID=srcVolUUID)
  File "/usr/share/vdsm/storage/sd.py", line 430, in createVolume
    preallocate, diskType, volUUID, desc, srcImgUUID, srcVolUUID)
  File "/usr/share/vdsm/storage/volume.py", line 440, in create
    raise e
CannotCreateLogicalVolume: Cannot create Logical Volume: ('8d38fe02-fbfe-4e70-9785-94954d33358c', '9507f07f-79aa-4c7c-a963-521a13e5d446')

Comment 7 Sandro Bonazzola 2014-09-18 06:31:32 UTC
So the issue on hosted engine side is a duplicate of Bug #1142098 - hosted engine setup doesn't detect volume creation failures

I'll move this bug to vdsm/storage for further investigation on why the Volume has not been created.

Comment 8 Allon Mureinik 2014-09-21 16:50:24 UTC
The error creating the volume does not seem like a bug in the storage layer, but as a misuse:

148a6ecb-cfe8-4d00-b085-1bcd70d7bbf1::ERROR::2014-09-17 10:55:31,201::volume::438::Storage.Volume::(create) Failed to create volume: /rhev/data-center/3991f222-a746-4534-a62d-2d7a644e6b2e/8d38fe02-fbfe-4e70-9785-94954d33358c/images/eb86912d-cb4e-441b-9932-9634f2dac436/9507f07f-79aa-4c7c-a963-521a13e5d446, volume already exist

Sandro, why is this call issued?

Comment 9 Sandro Bonazzola 2014-09-22 07:00:12 UTC
(In reply to Allon Mureinik from comment #8)
> The error creating the volume does not seem like a bug in the storage layer,
> but as a misuse:
> 
> 148a6ecb-cfe8-4d00-b085-1bcd70d7bbf1::ERROR::2014-09-17
> 10:55:31,201::volume::438::Storage.Volume::(create) Failed to create volume:
> /rhev/data-center/3991f222-a746-4534-a62d-2d7a644e6b2e/8d38fe02-fbfe-4e70-
> 9785-94954d33358c/images/eb86912d-cb4e-441b-9932-9634f2dac436/9507f07f-79aa-
> 4c7c-a963-521a13e5d446, volume already exist
> 
> Sandro, why is this call issued?

I guess it's because volume existence is not checked on second run passing answer file from first run. On first run UUID is generated automatically so if it already exists it must have been set by answer file.

This doesn't explain why later it fails with "Logical volume does not exist"

Comment 10 Allon Mureinik 2014-09-22 16:23:32 UTC
The HE setup tries to create the volume twice, which is obviously wrong. However, the real problem here is bug 1142710 which destroys the image when this happens.

I'm moving THIS bug to integration to handle the faulty sending of createVolume, and we'll use bug 1142710 to track proper handling of it if it's sent.

Comment 11 Allon Mureinik 2014-09-22 17:21:08 UTC
I reset the assignee to the default, which apparently is Dan.
Sandro, do you want to take this one?

Comment 12 Allon Mureinik 2014-09-22 19:08:54 UTC
(In reply to Allon Mureinik from comment #10)
> The HE setup tries to create the volume twice, which is obviously wrong.
> However, the real problem here is bug 1142710 which destroys the image when
> this happens.
> 
> I'm moving THIS bug to integration to handle the faulty sending of
> createVolume, and we'll use bug 1142710 to track proper handling of it if
> it's sent.
bug 1145253, of course.

Comment 13 Nir Soffer 2014-09-23 19:01:31 UTC
Here is what happens according to vdsm log:

1. Vdsm is asked to create the volume "hosted-engine.lockspace":

Thread-53::INFO::2014-09-17 10:55:25,041::logUtils::44::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='8d38fe02-fbfe-4e70-9785-94954d33358c', spUUID='3991f222-a746-4534-a62d-2d7a644e6b2e', imgUUID='95689e65-a4f3-4dbf-8363-3c45ea1ad452', size='1048576', volFormat=5, preallocate=1, diskType=2, volUUID='4fbc58ba-f726-4adc-9822-3a824b5b940f', desc='hosted-engine.lockspace', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000')

2. The operation is scheduled in task 84bc4379-bd7a-411c-b97b-f5ce258e5161:

Thread-53::DEBUG::2014-09-17 10:55:25,096::taskManager::78::Storage.TaskManager::(scheduleJob) scheduled job createVolume for task 84bc4379-bd7a-411c-b97b-f5ce258e5161 
Thread-53::INFO::2014-09-17 10:55:25,097::logUtils::47::dispatcher::(wrapper) Run and protect: createVolume, Return response: None

3. The task creates a new lv:

84bc4379-bd7a-411c-b97b-f5ce258e5161::DEBUG::2014-09-17 10:55:25,433::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm lvcreate --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/1mpavlik02|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --autobackup n --contiguous n --size 1m --addtag OVIRT_VOL_INITIALIZING --name 4fbc58ba-f726-4adc-9822-3a824b5b940f 8d38fe02-fbfe-4e70-9785-94954d33358c (cwd None)

4. The lv operation is successful:

84bc4379-bd7a-411c-b97b-f5ce258e5161::DEBUG::2014-09-17 10:55:25,597::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: This metadata update is NOT backed up\n  WARNING: This metadata update is NOT backed up\n'; <rc> = 0

5. Vdsm report the task status:

Thread-56::INFO::2014-09-17 10:55:27,329::logUtils::44::dispatcher::(wrapper) Run and protect: getTaskStatus(taskID='84bc4379-bd7a-411c-b97b-f5ce258e5161', spUUID=None, options=None)
Thread-56::INFO::2014-09-17 10:55:27,329::logUtils::47::dispatcher::(wrapper) Run and protect: getTaskStatus, Return response: {'taskStatus': {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '84bc4379-bd7a-411c-b97b-f5ce258e5161'}}

6. Vdsm is asked to create the volume "hosted-engine.metadata":

Thread-60::INFO::2014-09-17 10:55:27,677::logUtils::44::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='8d38fe02-fbfe-4e70-9785-94954d33358c', spUUID='3991f222-a746-4534-a62d-2d7a644e6b2e', imgUUID='6670cd49-58a1-4f3c-af70-fcc183f90428', size='1028096', volFormat=5, preallocate=1, diskType=2, volUUID='b51116dc-fa0b-4408-92ad-afcb4d812922', desc='hosted-engine.metadata', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000')

7. The operetaion is scheduled in task 311f0e7e-43b9-4829-be3b-0dbcb9c64509:

Thread-60::DEBUG::2014-09-17 10:55:27,715::taskManager::78::Storage.TaskManager::(scheduleJob) scheduled job createVolume for task 311f0e7e-43b9-4829-be3b-0dbcb9c64509 
Thread-60::INFO::2014-09-17 10:55:27,715::logUtils::47::dispatcher::(wrapper) Run and protect: createVolume, Return response: None

8. The task creates a new lv:

311f0e7e-43b9-4829-be3b-0dbcb9c64509::DEBUG::2014-09-17 10:55:28,057::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm lvcreate --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/1mpavlik01|/dev/mapper/1mpavlik02|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --autobackup n --contiguous n --size 1m --addtag OVIRT_VOL_INITIALIZING --name b51116dc-fa0b-4408-92ad-afcb4d812922 8d38fe02-fbfe-4e70-9785-94954d33358c (cwd None)

9. The lv operation is successful:

311f0e7e-43b9-4829-be3b-0dbcb9c64509::DEBUG::2014-09-17 10:55:28,197::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: This metadata update is NOT backed up\n  WARNING: This metadata update is NOT backed up\n'; <rc> = 0

10. Vdsm reports the task status:

Thread-62::INFO::2014-09-17 10:55:28,904::logUtils::44::dispatcher::(wrapper) Run and protect: getTaskStatus(taskID='311f0e7e-43b9-4829-be3b-0dbcb9c64509', spUUID=None, options=None)
Thread-62::DEBUG::2014-09-17 10:55:28,906::taskManager::106::Storage.TaskManager::(getTaskStatus) Return. Response: {'code': 0, 'message': '1 jobs completed successfully', 'taskState': 'finished', 'taskResult': 'success', 'taskID': '311f0e7e-43b9-4829-be3b-0dbcb9c64509'}

11. Vdsm is asked to create the volume "Hosted Engine Image":

Thread-69::INFO::2014-09-17 10:55:30,459::logUtils::44::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='8d38fe02-fbfe-4e70-9785-94954d33358c', spUUID='3991f222-a746-4534-a62d-2d7a644e6b2e', imgUUID='eb86912d-cb4e-441b-9932-9634f2dac436', size=37748736, volFormat=5, preallocate=1, diskType=2, volUUID='9507f07f-79aa-4c7c-a963-521a13e5d446', desc='Hosted Engine Image', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000')

12. The operation is scheduled in task 148a6ecb-cfe8-4d00-b085-1bcd70d7bbf1:

Thread-69::DEBUG::2014-09-17 10:55:30,602::taskManager::78::Storage.TaskManager::(scheduleJob) scheduled job createVolume for task 148a6ecb-cfe8-4d00-b085-1bcd70d7bbf1 
Thread-69::INFO::2014-09-17 10:55:30,603::logUtils::47::dispatcher::(wrapper) Run and protect: createVolume, Return response: None

13. The task creates a new lv:

148a6ecb-cfe8-4d00-b085-1bcd70d7bbf1::DEBUG::2014-09-17 10:55:31,095::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm lvcreate --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/1mpavlik01|/dev/mapper/1mpavlik02|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --autobackup n --contiguous n --size 18432m --addtag OVIRT_VOL_INITIALIZING --name 9507f07f-79aa-4c7c-a963-521a13e5d446 8d38fe02-fbfe-4e70-9785-94954d33358c (cwd None)

14. The operation fails becasue the insufficient free space in the vg:

148a6ecb-cfe8-4d00-b085-1bcd70d7bbf1::DEBUG::2014-09-17 10:55:31,199::lvm::288::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  Volume group "8d38fe02-fbfe-4e70-9785-94954d33358c" has insufficient free space (124 extents): 144 required.\n'; <rc> = 5

15. But volume.py line 438 reports that the volume already exists(!):

148a6ecb-cfe8-4d00-b085-1bcd70d7bbf1::ERROR::2014-09-17 10:55:31,201::volume::438::Storage.Volume::(create) Failed to create volume: /rhev/data-center/3991f222-a746-4534-a62d-2d7a644e6b2e/8d38fe02-fbfe-4e70-9785-94954d33358c/images/eb86912d-cb4e-441b-9932-9634f2dac436/9507f07f-79aa-4c7c-a963-521a13e5d446, volume already exists

16. Fortunately, in line 440, the code re-raise the exception, and we can see
that indeed the error is CannotCreateLogicalVolume and not VolumeAlreadyExists.

148a6ecb-cfe8-4d00-b085-1bcd70d7bbf1::ERROR::2014-09-17 10:55:31,201::volume::474::Storage.Volume::(create) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 440, in create
    raise e
CannotCreateLogicalVolume: Cannot create Logical Volume: ('8d38fe02-fbfe-4e70-9785-94954d33358c', '9507f07f-79aa-4c7c-a963-521a13e5d446')

17. Vdsm reports (correctly) the task failure:

Thread-71::DEBUG::2014-09-17 10:55:31,910::taskManager::106::Storage.TaskManager::(getTaskStatus) Return. Response: {'code': 550, 'message': 'Cannot create Logical Volume', 'taskState': 'finished', 'taskResult': 'cleanSuccess', 'taskID': '148a6ecb-cfe8-4d00-b085-1bcd70d7bbf1'}

18. If we look in volume.py line 438, we can see that the code handles both
se.VolumeAlreadyExists, se.CannotCreateLogicalVolume, but unfortunately lies
about the nature of the error.

432             try:
433                 metaId = cls._create(dom, imgUUID, volUUID, size, volFormat,
434                                      preallocate, volParent, srcImgUUID,
435                                      srcVolUUID, volPath)
436             except (se.VolumeAlreadyExists, se.CannotCreateLogicalVolume) as e:
437                 cls.log.error("Failed to create volume: %s, volume already "
438                               "exists", volPath)
439                 vars.task.popRecovery()
440                 raise e


So from storage point of view, there is no bug here. We will improve the
logging to prevent this confusion.

On the hosted engine side, there may be either a user or deployment error,
leading to the creating of a vg with insufficient size for the logical volumes
needed.

Looking at the lvm error "...(124 extents) 144 required" it seems that either
the vg size or  lv sizes is calculated incorrectly.

Comment 14 Nir Soffer 2014-09-23 19:02:37 UTC
Reopening, somebody need to resolve the insufficient size issue.

Comment 15 Nir Soffer 2014-09-23 20:13:07 UTC
The bad storage log "volume already exists" is fixed in http://gerrit.ovirt.org/33301

Comment 16 Allon Mureinik 2014-09-23 21:07:09 UTC
(In reply to Allon Mureinik from comment #12)
> (In reply to Allon Mureinik from comment #10)
> > The HE setup tries to create the volume twice, which is obviously wrong.
> > However, the real problem here is bug 1142710 which destroys the image when
> > this happens.
> > 
> > I'm moving THIS bug to integration to handle the faulty sending of
> > createVolume, and we'll use bug 1142710 to track proper handling of it if
> > it's sent.
> bug 1145253, of course.
False alarm, as explained by comment 13. Removing this dependency.

Comment 17 Sandro Bonazzola 2014-10-29 11:57:23 UTC
About:

(In reply to Nir Soffer from comment #14)
> Reopening, somebody need to resolve the insufficient size issue.

14. The operation fails becasue the insufficient free space in the vg:

148a6ecb-cfe8-4d00-b085-1bcd70d7bbf1::DEBUG::2014-09-17 10:55:31,199::lvm::288::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  Volume group "8d38fe02-fbfe-4e70-9785-94954d33358c" has insufficient free space (124 extents): 144 required.\n'; <rc> = 5

Hosted Engine setup check the device capacity by using the 'capacity' field of the answer to getDeviceList call.
The value is converted to megabyte dividing by pow(2,20) and compared against the desired image size in megabyte.

I guess that having a device with 25Gb capacity and requesting a 25Gb image somehow pass the test and fails later on volume creation.

Do you know how to get the maximum size of a volume that may fit in a device of a given capacity?

Just requiring 1Gb more than needed by the image?

Comment 19 Nir Soffer 2014-10-29 23:41:22 UTC
(In reply to Sandro Bonazzola from comment #17)
> Do you know how to get the maximum size of a volume that may fit in a device
> of a given capacity?

There are two issues here:

1. volume format - raw vs qcow2 - qcow2 needs additional space, so you cannot
   store 25G of data on 25G volume in qcow format.

2. vdsm creates vgs with 128MB (lvm default is 4M). An lv can use 1 or more 
   extents. So if you have 200MB free, you can use only 128MB, since the rest
   of the space is smaller then one extent.

The correct calculation is probably to calculate how many extents do you need 
and then add some space for qcow2 overhead.

But including vdsm internal implementation details such as the extent size in
client code does is the wrong way - you should use only data available in
vdsm response.

Kevin, what would be a safe estimate for qcow headers overhead when creating
a volume for known amount of data?

Comment 20 Nir Soffer 2014-10-30 12:52:41 UTC
We fail to create the lv - so qcow overhead is not relevant. And considering extent size can explain only error of 1 extent size (required x, available x-1), but here we have required 144, available 124.

So it may be bad capacity value returned by vdsm, or incorrect calculation. I will check the vdsm side.

Comment 21 Simone Tiraboschi 2014-10-31 15:53:58 UTC
Now, on hosted-engine over iSCSI, the user chooses a LUN between the available ones and so we know its size, than we ask him again about the VM disk size.

 [ INFO  ] Discovering iSCSI node
 [ INFO  ] Connecting to the storage server
           The following luns have been found on the requested target:
                  0 - sda - 32768MB
           Please specify the lun id (0) [0]:
 ...
           Please specify the disk size of the VM in GB [Defaults to minimum requirement: 25]:

But that LUN it's dedicated just to that VM disk and that storage domain it shouldn't be available for other purposes.
So on my opinion it will be better to directly do capacity check on the LUN size, evaluate all the overheads (qcow2, VG...) and simply use all the available space for the engine VM disk in order to avoid space waste and to prevent insufficient size issue any insufficient size issue.

Comment 22 Nir Soffer 2014-11-01 13:17:12 UTC
Using device capacity is wrong.

When vdsm creates a pv, it creates 2 metadata areas, each 128MiB. The first starts in sector 5 (lvm thing), and the second is in the end of the device. Since we use extents size of 128MiB, this practically uses 3 extents.

What hosted engine setup need to do is to check the free space on the vg before creating an lv. You can do this by calling "getVGInfo", and use the "vgfree" value.

When you compute lv size, you should make it a multiple of extents size (128MiB). It seems that vdsm does not expose this constants, so hosted engine setup will have to hard code it for now. I think we will add this value to getVGInfo in the next version.

Moving back to integration as no issue was found in the storage side.

Comment 23 Nir Soffer 2014-11-01 13:24:57 UTC
(In reply to Simone Tiraboschi from comment #21)
> Now, on hosted-engine over iSCSI, the user chooses a LUN between the
> available ones and so we know its size, than we ask him again about the VM
> disk size.
> 
>  [ INFO  ] Discovering iSCSI node
>  [ INFO  ] Connecting to the storage server
>            The following luns have been found on the requested target:
>                   0 - sda - 32768MB
>            Please specify the lun id (0) [0]:
>  ...
>            Please specify the disk size of the VM in GB [Defaults to minimum
> requirement: 25]:

This may be confusing, as 25GiB device cannot hold 25GiB disk.

Hosted engine should let you chose the device, displaying device size. But when you specify the disk size, the installer should display the maximum lv size you can create (see comment 22).

Comment 24 Simone Tiraboschi 2014-11-03 09:38:23 UTC
(In reply to Nir Soffer from comment #23)
> This may be confusing, as 25GiB device cannot hold 25GiB disk.
> 
> Hosted engine should let you chose the device, displaying device size. But
> when you specify the disk size, the installer should display the maximum lv
> size you can create (see comment 22).

Yes, I agree: so we need to round down the vgfree value to a multiple of the extent size.
On my opinion doesn't also make much sense to left unclaimed free space on an iSCSI LUN so, once the user chooses a suitable device, we can simply allocate all the usable extents for the VM without asking again about that which is indeed confusing.

Comment 25 Simone Tiraboschi 2014-11-10 17:52:28 UTC
I created a VG over iSCSI via hosted-engine-setup

The iSCSI LUN was sized at 24 GiB.
I got a vgsize of 23,625 Gib
with a vgfree of 19,5 Gib

Should I expect this kind of overhead before creating the image?


[root@c65thei ~]# vdsClient -s 0 getVGInfo Vpk4W0-kXf1-v3hn-ctAj-qgAv-eQ49-cM74BP
============================
vgsize = 25367150592 
============================
name = 9e7566af-857e-4cfb-889e-f4d92598caec 
============================
vgUUID = Vpk4W0-kXf1-v3hn-ctAj-qgAv-eQ49-cM74BP 
============================
pvlist:
vendorID = FreeBSD  capacity = 25367150592  fwrev = 0000  vgUUID = Vpk4W0-kXf1-v3hn-ctAj-qgAv-eQ49-cM74BP  pathlist = [{'connection': '192.168.1.125', 'iqn': 'iqn.2011-03.org.example.istgt:ovirt1', 'portal': '1', 'port': '3260', 'initiatorname': 'default'}]  pathstatus = [{'physdev': 'sdc', 'type': 'iSCSI', 'state': 'active', 'lun': '2'}]  devtype = iSCSI  pvUUID = 5XvYb9-kz62-QCSL-jhkY-CNhM-0l77-972fXM  serial = SFreeBSD_iSCSI_Disk_001a4a4fbd9c002  GUID = 330000000ec116e3f  devcapacity = 25769803776  productID = iSCSI Disk 
============================
state = OK 
============================
vgfree = 20937965568 
============================
type = 3 
============================
attr = {'partial': '-', 'permission': 'w', 'allocation': 'n', 'exported': '-', 'clustered': '-', 'resizeable': 'z'}

Comment 26 Nir Soffer 2014-11-10 18:57:40 UTC
(In reply to Simone Tiraboschi from comment #25)
> I created a VG over iSCSI via hosted-engine-setup
> 
> The iSCSI LUN was sized at 24 GiB.
> I got a vgsize of 23,625 Gib
> with a vgfree of 19,5 Gib
> 
> Should I expect this kind of overhead before creating the image?

No, you should expect 348MiB overhead. Did you create other lvs on this vg?

Please check what lvs do you have on this vg:

    lvs vgname

And show detailed vginfo using:

    vgdisplay vgname

Note: If you are working on rhel 7 or fedora, you should update lvmetad cache before running these commands using:

    pvscan --cache

Comment 27 Simone Tiraboschi 2014-11-11 09:28:28 UTC
I got this situation:

[root@c65thei ~]# lvs 9e7566af-857e-4cfb-889e-f4d92598caec 
  LV                                   VG                                   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  48e7d9cc-2689-4c70-b556-c17c63d7bd15 9e7566af-857e-4cfb-889e-f4d92598caec -wi------- 128,00m                                                    
  efd55606-427f-4c19-891c-3b7fd7a7fe26 9e7566af-857e-4cfb-889e-f4d92598caec -wi------- 128,00m                                                    
  ids                                  9e7566af-857e-4cfb-889e-f4d92598caec -wi------- 128,00m                                                    
  inbox                                9e7566af-857e-4cfb-889e-f4d92598caec -wi------- 128,00m                                                    
  leases                               9e7566af-857e-4cfb-889e-f4d92598caec -wi-------   2,00g                                                    
  master                               9e7566af-857e-4cfb-889e-f4d92598caec -wi-------   1,00g                                                    
  metadata                             9e7566af-857e-4cfb-889e-f4d92598caec -wi------- 512,00m                                                    
  outbox                               9e7566af-857e-4cfb-889e-f4d92598caec -wi------- 128,00m                                                    
[root@c65thei ~]# vgdisplay  9e7566af-857e-4cfb-889e-f4d92598caec
  --- Volume group ---
  VG Name               9e7566af-857e-4cfb-889e-f4d92598caec
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  20
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                8
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               23,62 GiB
  PE Size               128,00 MiB
  Total PE              189
  Alloc PE / Size       33 / 4,12 GiB
  Free  PE / Size       156 / 19,50 GiB
  VG UUID               Vpk4W0-kXf1-v3hn-ctAj-qgAv-eQ49-cM74BP


I don't think I deliberated created any lvs and at least I don't have to.

Comment 28 Nir Soffer 2014-11-11 09:52:55 UTC
(In reply to Simone Tiraboschi from comment #27)
> I got this situation:
> 
> [root@c65thei ~]# lvs 9e7566af-857e-4cfb-889e-f4d92598caec 
>   LV                                   VG                                  
> Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   48e7d9cc-2689-4c70-b556-c17c63d7bd15 9e7566af-857e-4cfb-889e-f4d92598caec
> -wi------- 128,00m                                                    
>   efd55606-427f-4c19-891c-3b7fd7a7fe26 9e7566af-857e-4cfb-889e-f4d92598caec
> -wi------- 128,00m                                                    
>   ids                                  9e7566af-857e-4cfb-889e-f4d92598caec
> -wi------- 128,00m                                                    
>   inbox                                9e7566af-857e-4cfb-889e-f4d92598caec
> -wi------- 128,00m                                                    
>   leases                               9e7566af-857e-4cfb-889e-f4d92598caec
> -wi-------   2,00g                                                    
>   master                               9e7566af-857e-4cfb-889e-f4d92598caec
> -wi-------   1,00g                                                    
>   metadata                             9e7566af-857e-4cfb-889e-f4d92598caec
> -wi------- 512,00m                                                    
>   outbox                               9e7566af-857e-4cfb-889e-f4d92598caec
> -wi------- 128,00m                                                    

This is vdsm storage domain structure, not sure why you have this when you
need only new empty vg.

Please create a new bug for this issue, since this bug is about not detecting
correctly the amount of available space, and this new issue is about creating
an empty vg intead of a storage domain.

Comment 30 Sandro Bonazzola 2015-01-15 14:36:01 UTC
This is an automated message: 
This bug should be fixed in oVirt 3.5.1 RC1, moving to QA

Comment 31 Sandro Bonazzola 2015-01-21 16:02:48 UTC
oVirt 3.5.1 has been released. If problems still persist, please make note of it in this bug report.

Comment 32 Simone Tiraboschi 2015-02-18 14:38:24 UTC
*** Bug 1143820 has been marked as a duplicate of this bug. ***