Bug 1856065

Summary: [Scale] While create snapshot to VM with 13Disks (Diff SDs) - "dictionary changed size during iteration"
Product: [oVirt] vdsm Reporter: David Vaanunu <dvaanunu>
Component: GeneralAssignee: Amit Bawer <abawer>
Status: CLOSED CURRENTRELEASE QA Contact: David Vaanunu <dvaanunu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.40.22CC: abawer, aoconnor, bugs, dagur, dfodor, nsoffer, sfishbai, tnisan
Target Milestone: ovirt-4.4.3Keywords: Performance, Regression, ZStream
Target Release: 4.40.27Flags: abawer: needinfo-
abawer: needinfo-
aoconnor: blocker-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.40.27 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-11 06:39:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Vaanunu 2020-07-12 10:44:05 UTC
Description of problem:

Part of testing https://bugzilla.redhat.com/show_bug.cgi?id=1837199 , 
running create snapshot of VM with 13Disks (Each disk on diff SD)

Get Errors:
aborting: Task is aborted: 'value=dictionary changed size during iteration abortedcode=100' (task:1190)



Version-Release number of selected component (if applicable):

RHV-4.4.1-11
VDSM-4.40.22-1

How reproducible:

Create VM with 13 Disks.
The disks are distributed on 12 SDs (FC)

Steps to Reproduce:
1. 2 concurrent running
2. Create VM from template
3. Create 10 Snapshots
4. Remove VM

Actual results:
Got errors during the test 

Expected results:

No errors


Additional info:

Test timestamp:
 
Start - 2020-07-08 14:30
End - 2020-07-09 5:30
 
VDSM & ENGINE log files:
 
https://drive.google.com/drive/folders/1KsaQh2f0Ej487VVcazOjZwZh6jUkyHHG


LAB INFO:

The test running on DC with:
10 Hosts
1350VM
50 Active SDs (FC) (+27 unattached SDs)
Total LUNs: 328 (multipath)

The 1350 VMs are defined 12 SDs, same as the VMs with 13Disks. 


 VG                                   #PV #LV #SN Attr   VSize    VFree   
  09876054-0698-4b1f-b258-6fc2689807c2   1 126   0 wz--n-   <5.00t    4.87t
  10afc0f1-b8c8-4f3a-ad37-5517431a6ccd   1 135   0 wz--n-   <5.00t    4.86t
  37657106-dea4-494e-a828-f24ecbb26b3d   1   9   0 wz--n-   <5.00t    4.99t
  3895ca01-a88e-42db-806a-d8ea4937cc35   1 150   0 wz--n-   <5.00t    4.85t
  3c994815-e4f3-47f2-81c5-3d5cc38d4ad6   1 107   0 wz--n-   <5.00t    4.89t
  47f6c8c7-83db-4e13-8efe-9d2c9c951656   1 126   0 wz--n-   <5.00t   <4.66t
  628a0b20-4fff-4922-99dd-6f1ce765f6c5   1 157   0 wz--n-   <5.00t   <4.55t
  6f261e89-8fa9-467a-9d50-58e05901ff4a   1 144   0 wz--n-   <5.00t   <4.86t
  87964d59-974f-4297-a00c-53f06a2a7061   1 131   0 wz--n-   <5.00t   <4.87t
  de355e04-2f60-4019-a7a0-ecf59408736b   1 115   0 wz--n-   <5.00t   <4.89t
  f021c5cf-55ad-4b05-9020-7c06f2fff7a9   1 153   0 wz--n-   <5.00t   <4.85t
  f479c782-236d-4a15-a1a5-40b09d417fc9   1 156   0 wz--n-   <5.00t    4.84t
  f8c33fab-29b5-4598-ab9c-7eabc1b24c56   1 150   0 wz--n-   <5.00t   <4.85t
  vg_f01-h05-000-r620                    1   2   0 wz--n- <930.50g       0 



Total SDS of cluster "L0_Group_0":
12 SDs with 'VMs'
38 SDs - Empty
27 SDs - Unattached


*******************************


2020-07-09 02:48:23,454-0400 INFO  (jsonrpc/3) [vdsm.api] FINISH deleteImage error=dictionary changed size during iteration from=::ffff:10.1.41.200,56428, flow_id=1be05f02-f2c7-47a3-82c9-22960efbabe8, task_id=579e5d8d-20c1-4b8b-94f8-1494d3d9d424 (api:52)
2020-07-09 02:48:23,455-0400 ERROR (jsonrpc/3) [storage.TaskManager.Task] (Task='579e5d8d-20c1-4b8b-94f8-1494d3d9d424') Unexpected error (task:880)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-63>", line 2, in deleteImage
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1529, in deleteImage
    allVols = dom.getAllVolumes()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 841, in getAllVolumes
    return self._manifest.getAllVolumes()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 765, in getAllVolumes
    vols, rems = self.getAllVolumesImages()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 752, in getAllVolumesImages
    allVols = getAllVolumes(self.sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 285, in getAllVolumes
    vols = _getVolsTree(sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 203, in _getVolsTree
    for lv in _iter_volumes(sdUUID):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 216, in _iter_volumes
    for lv in lvm.getLV(sdUUID):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 1319, in getLV
    lv = _lvminfo.getLv(vgName, lvName)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 952, in getLv
    lvs = self._reloadlvs(vgName)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 734, in _reloadlvs
    staleLVs = [lvName for v, lvName in self._lvs
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 734, in <listcomp>
    staleLVs = [lvName for v, lvName in self._lvs
RuntimeError: dictionary changed size during iteration
2020-07-09 02:48:23,456-0400 INFO  (jsonrpc/3) [storage.TaskManager.Task] (Task='579e5d8d-20c1-4b8b-94f8-1494d3d9d424') aborting: Task is aborted: 'value=dictionary changed size during iteration abortedcode=100' (task:1190)

Comment 1 Amit Bawer 2020-07-12 11:10:07 UTC
Would we need 4.3 clone for this ticket (GA?) ?
I don't see it's different there as far is it goes for popping items from lvm cache dict outside of locks.

Comment 2 Nir Soffer 2020-07-12 11:32:36 UTC
(In reply to Amit Bawer from comment #1)
> Would we need 4.3 clone for this ticket (GA?) ?
> I don't see it's different there as far is it goes for popping items from
> lvm cache dict outside of locks.

Same issue exists in 4.3 of course. We can clone the bug and if we get acks
we can backport the fixes.

Comment 3 Amit Bawer 2020-07-12 13:47:08 UTC
Fixes should be part of build 4.40.23 when available

Comment 5 Amit Bawer 2020-07-13 10:23:57 UTC
we want to add more fixes under this ticket, so moving back to NEW

Comment 6 Amit Bawer 2020-07-14 06:50:41 UTC
Can it go back to MODIFIED now?

Comment 7 David Vaanunu 2020-08-24 11:58:43 UTC
While the system was upgrade to 4.4.2



2020-08-23 10:18:32,001-0400 WARN  (MainThread) [storage.HSM] Failed to stop RepoStats thread (hsm:3432)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 3429, in prepareForShutdown
    self.domainMonitor.shutdown()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 254, in shutdown
    self._stopMonitors(self._monitors.values(), shutdown=True)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 273, in _stopMonitors
    for monitor in monitors:
RuntimeError: dictionary changed size during iteration

Comment 8 Sandro Bonazzola 2020-08-28 12:20:22 UTC
Since this bug has been marked as fixed in version 4.40.27 by assignee and latest version included in oVirt 4.4.2 is 4.40.26 I'm moving this bug to 4.4.3.

Comment 9 David Vaanunu 2020-10-04 11:08:46 UTC
verified version:

redhat-release-8.3-1.0
rhv-release-4.4.3-5-001
vdsm-4.40.30-1



Tested scenarios:
VM snapshot with 13 Disks
50 Users VM snapshot

No Error "RuntimeError: dictionary changed size during iteration" during exection

Comment 10 Sandro Bonazzola 2020-11-11 06:39:48 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.