Description of problem: Using 2 hosts connected to ISCSI storage, created 665 VMs with 3165 disks created form template. The following erorrs occur: -"VolumeCreationError: Error creating a new volume: (u"Volume creation 'Sanlock resource init failure', 'No space left on device')",)" when not true. -Vdsm SchemaCache Warning provided parameters do not match any union VmStats values -ERROR could not allocate request thread "raise TOOManyTasks" -ERROR Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new volume, code = 205 //engine.log Current # of LVs: 3068 After verifying "VolumeCreationError: Error creating a new volume" whenever atempting to create another disk I then verified that manually I can create LV successfully and that the total number of LVs increase after lvcreate is manually execute. VG #PV #LV #SN Attr VSize VFree 86f1a1bc-7749-4a6b-8ea6-4a825c8cbda3 1 8 0 wz--n- 499.62g 495.50g b5067b73-9954-45b2-909a-b26f09f7105f 3 9 0 wz--n- 2.98t 2.97t c6838f59-c474-457f-8464-9771a318b752 2 3068 0 wz--n- 10.00t 6.54t d4baca8b-3aa4-4f0a-8ff1-5222812082c9 1 13 0 wz--n- 5.00t 4.98t vg0 1 3 0 wz--n- 277.97g 60.00m Version-Release number of selected component (if applicable): vdsm-api-4.18.2-0.el7ev.noarch vdsm-4.18.2-0.el7ev.x86_64 vdsm-yajsonrpc-4.18.2-0.el7ev.noarch vdsm-python-4.18.2-0.el7ev.noarch vdsm-jsonrpc-4.18.2-0.el7ev.noarch vdsm-xmlrpc-4.18.2-0.el7ev.noarch vdsm-hook-vmfex-dev-4.18.2-0.el7ev.noarch vdsm-infra-4.18.2-0.el7ev.noarch vdsm-cli-4.18.2-0.el7ev.noarch How reproducible: Seems consistent. Steps to Reproduce: 1.Setup 2 hosts and 1 engine with iscsi luns 2.Create 4 large SDs (2 or 5TB) 3.Create VM from template which has 5 disks in total execute more than 3,100 times Actual results: Unable to add more disks to the system. LV / VG operations take 1-2s pvscan --cache #took minutes improved after restart Dashboard may show inconssistent information to Storage Domain UI changes (ex: size of lun after extension) Expected results: Ability to add more than disks since I am able to create more lvs manually Additional info: See private comment for further details / attachment info
(In reply to mlehrer from comment #0) > -"VolumeCreationError: Error creating a new volume: (u"Volume creation > 'Sanlock resource init failure', 'No space left on device')",)" when not > true. Each volume has a 1MiB lease in the leases volumes. The leases volumes is 2GiB, so it can hold 2048 leases. The first 100 leases are reserved, so you have space for 1948 leases in each storage domain, limiting the number of volumes to 1948. If you tried to create more volumes, this failure is expected. We can improve the error message in this case, please open a separate bug for improving the error message if needed. > -Vdsm SchemaCache Warning provided parameters do not match any union VmStats > values This is not relevant, and should not be in the logs, and show that you are using and old unsupported version. > -ERROR could not allocate request thread "raise TOOManyTasks" This means there are too many tasks in the relevant executor, probably meaning your are overloading the host with requests. But this is not relevant to this bug, please open infra bug for this > -ERROR Failed to HSMGetAllTasksStatusesVDS, error = Error creating a new > volume, code = 205 //engine.log Probably related to the previous issue - infra bug. > Current # of LVs: 3068 > > After verifying "VolumeCreationError: Error creating a new volume" whenever > atempting to create another disk I then verified that manually I can create > LV successfully and that the total number of LVs increase after lvcreate is > manually execute. > > VG #PV #LV #SN Attr VSize VFree > 86f1a1bc-7749-4a6b-8ea6-4a825c8cbda3 1 8 0 wz--n- 499.62g 495.50g > b5067b73-9954-45b2-909a-b26f09f7105f 3 9 0 wz--n- 2.98t 2.97t > c6838f59-c474-457f-8464-9771a318b752 2 3068 0 wz--n- 10.00t 6.54t > d4baca8b-3aa4-4f0a-8ff1-5222812082c9 1 13 0 wz--n- 5.00t 4.98t > vg0 1 3 0 wz--n- 277.97g 60.00m You can create lvs manually, but this they cannot be used for ovirt volume, since there is no room for the lease. I wonder how you could create 3068 volumes, this should have failed when creating 1949th volume. I think we should limit testing to to the maximum number of volumes we can support, so this does not block further testing. If we find that 1948 volumes are usable (unlikely), we can increase the size of the leases volume in the next storage format to make room for more volumes. The maximum number of volume should be documented. > Version-Release number of selected component (if applicable): > vdsm-api-4.18.2-0.el7ev.noarch > vdsm-4.18.2-0.el7ev.x86_64 This is old unsupported version, please use vdsm-4.18.11 or later. > vdsm-yajsonrpc-4.18.2-0.el7ev.noarch > vdsm-python-4.18.2-0.el7ev.noarch > vdsm-jsonrpc-4.18.2-0.el7ev.noarch > vdsm-xmlrpc-4.18.2-0.el7ev.noarch > vdsm-hook-vmfex-dev-4.18.2-0.el7ev.noarch > vdsm-infra-4.18.2-0.el7ev.noarch > vdsm-cli-4.18.2-0.el7ev.noarch > > > How reproducible: > Seems consistent. > > Steps to Reproduce: > 1.Setup 2 hosts and 1 engine with iscsi luns > 2.Create 4 large SDs (2 or 5TB) > 3.Create VM from template which has 5 disks in total execute more than 3,100 > times > > Actual results: > Unable to add more disks to the system. > LV / VG operations take 1-2s pvscan --cache #took minutes improved after > restart > Dashboard may show inconssistent information to Storage Domain UI changes > (ex: size of lun after extension) > > > Expected results: > Ability to add more than disks since I am able to create more lvs manually We don't support this now. > Additional info: > See private comment for further details / attachment info
The relevant flow in run 1: Initializing the 1949th lease... d300c9f0-c3b2-4380-bfbf-554258ecb519::DEBUG::2016-08-23 16:03:31,552::blockVolume::299::Storage.VolumeManifest::(newVolumeLease) Initializing volume lease volUUID=bdfa9a28-014d-4d8f-a0bd-d9f 9e77f9c94 sdUUID=c6838f59-c474-457f-8464-9771a318b752, metaId=('c6838f59-c474-457f-8464-9771a318b752', 1949) Sanlock fails to initialize a lease which is after the end of the lease volume: d300c9f0-c3b2-4380-bfbf-554258ecb519::ERROR::2016-08-23 16:03:32,636::volume::843::Storage.Volume::(create) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/volume.py", line 837, in create cls.newVolumeLease(metaId, sdUUID, volUUID) File "/usr/share/vdsm/storage/volume.py", line 1156, in newVolumeLease return cls.manifestClass.newVolumeLease(metaId, sdUUID, volUUID) File "/usr/share/vdsm/storage/blockVolume.py", line 307, in newVolumeLease sanlock.init_resource(sdUUID, volUUID, [(leasePath, leaseOffset)]) SanlockException: (28, 'Sanlock resource init failure', 'No space left on device') This error is expected, as explained in comment 3. The rest of the creation attempts fail in the same way. Bottom line: 1. We should document the maximum number of leases 2. We should check the metadata slot offset and fail the entire volume creation *before* when we reached the maximum. Since we currently recommend to use less then 500 lvs, these changes can be schedule to next version.
(In reply to Nir Soffer from comment #4) > > Since we currently recommend to use less then 500 lvs, these changes can > be schedule to next version. I'm interested in knowing what is the issue limiting us to 500 LVs. (Separately, we should look at the limit of ~1950 leases of course).
(In reply to Yaniv Kaul from comment #6) > (In reply to Nir Soffer from comment #4) > > Since we currently recommend to use less then 500 lvs, these changes can > > be schedule to next version. > > I'm interested in knowing what is the issue limiting us to 500 LVs. > (Separately, we should look at the limit of ~1950 leases of course). Yes, but we already have an RFE for this, this bug was about failure to create 5000 disks and other issues that should move to other bugs. To allow 5000 disks for this test, we can do: 1. Put storage domain to maintenance 2. lvextend -L 5100m vg_uuid/leases --config 'global { use_lvmetad=0 }' 3. Activate domain
(In reply to Nir Soffer from comment #7) > (In reply to Yaniv Kaul from comment #6) > > (In reply to Nir Soffer from comment #4) > > > Since we currently recommend to use less then 500 lvs, these changes can > > > be schedule to next version. > > > > I'm interested in knowing what is the issue limiting us to 500 LVs. > > (Separately, we should look at the limit of ~1950 leases of course). > > Yes, but we already have an RFE for this, this bug was about failure to > create > 5000 disks and other issues that should move to other bugs. > > To allow 5000 disks for this test, we can do: > > 1. Put storage domain to maintenance > 2. lvextend -L 5100m vg_uuid/leases --config 'global { use_lvmetad=0 }' > 3. Activate domain Confirming Nir's recommendation resolved the issue and was able to create additional LVs once the lease was extended as described above.
This looks like a dup of bug 1386732 ?
*** This bug has been marked as a duplicate of bug 1386732 ***