Bug 1856065 - [Scale] While create snapshot to VM with 13Disks (Diff SDs) - "dictionary changed size during iteration"
Summary: [Scale] While create snapshot to VM with 13Disks (Diff SDs) - "dictionary cha...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.40.22
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-4.4.3
: 4.40.27
Assignee: Amit Bawer
QA Contact: David Vaanunu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-12 10:44 UTC by David Vaanunu
Modified: 2020-11-11 06:43 UTC (History)
8 users (show)

Fixed In Version: vdsm-4.40.27
Clone Of:
Environment:
Last Closed: 2020-11-11 06:39:48 UTC
oVirt Team: Storage
Embargoed:
abawer: needinfo-
abawer: needinfo-
aoconnor: blocker-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 110243 0 master MERGED lvm: Remove lvs from lvs dict while under lock 2020-11-29 13:08:59 UTC
oVirt gerrit 110244 0 master MERGED lvm: Remove vgs from vgs dict while under lock 2020-11-29 13:09:01 UTC
oVirt gerrit 110245 0 master MERGED lvm: Avoid iterating pvs dict without locking 2020-11-29 13:08:59 UTC
oVirt gerrit 110260 0 master MERGED lvm: Use dict.copy() instead of dict() for copies 2020-11-29 13:09:01 UTC
oVirt gerrit 110913 0 ovirt-4.4.2 MERGED monitor: Use a list copy of the monitor values for stopping 2020-11-29 13:09:01 UTC

Description David Vaanunu 2020-07-12 10:44:05 UTC
Description of problem:

Part of testing https://bugzilla.redhat.com/show_bug.cgi?id=1837199 , 
running create snapshot of VM with 13Disks (Each disk on diff SD)

Get Errors:
aborting: Task is aborted: 'value=dictionary changed size during iteration abortedcode=100' (task:1190)



Version-Release number of selected component (if applicable):

RHV-4.4.1-11
VDSM-4.40.22-1

How reproducible:

Create VM with 13 Disks.
The disks are distributed on 12 SDs (FC)

Steps to Reproduce:
1. 2 concurrent running
2. Create VM from template
3. Create 10 Snapshots
4. Remove VM

Actual results:
Got errors during the test 

Expected results:

No errors


Additional info:

Test timestamp:
 
Start - 2020-07-08 14:30
End - 2020-07-09 5:30
 
VDSM & ENGINE log files:
 
https://drive.google.com/drive/folders/1KsaQh2f0Ej487VVcazOjZwZh6jUkyHHG


LAB INFO:

The test running on DC with:
10 Hosts
1350VM
50 Active SDs (FC) (+27 unattached SDs)
Total LUNs: 328 (multipath)

The 1350 VMs are defined 12 SDs, same as the VMs with 13Disks. 


 VG                                   #PV #LV #SN Attr   VSize    VFree   
  09876054-0698-4b1f-b258-6fc2689807c2   1 126   0 wz--n-   <5.00t    4.87t
  10afc0f1-b8c8-4f3a-ad37-5517431a6ccd   1 135   0 wz--n-   <5.00t    4.86t
  37657106-dea4-494e-a828-f24ecbb26b3d   1   9   0 wz--n-   <5.00t    4.99t
  3895ca01-a88e-42db-806a-d8ea4937cc35   1 150   0 wz--n-   <5.00t    4.85t
  3c994815-e4f3-47f2-81c5-3d5cc38d4ad6   1 107   0 wz--n-   <5.00t    4.89t
  47f6c8c7-83db-4e13-8efe-9d2c9c951656   1 126   0 wz--n-   <5.00t   <4.66t
  628a0b20-4fff-4922-99dd-6f1ce765f6c5   1 157   0 wz--n-   <5.00t   <4.55t
  6f261e89-8fa9-467a-9d50-58e05901ff4a   1 144   0 wz--n-   <5.00t   <4.86t
  87964d59-974f-4297-a00c-53f06a2a7061   1 131   0 wz--n-   <5.00t   <4.87t
  de355e04-2f60-4019-a7a0-ecf59408736b   1 115   0 wz--n-   <5.00t   <4.89t
  f021c5cf-55ad-4b05-9020-7c06f2fff7a9   1 153   0 wz--n-   <5.00t   <4.85t
  f479c782-236d-4a15-a1a5-40b09d417fc9   1 156   0 wz--n-   <5.00t    4.84t
  f8c33fab-29b5-4598-ab9c-7eabc1b24c56   1 150   0 wz--n-   <5.00t   <4.85t
  vg_f01-h05-000-r620                    1   2   0 wz--n- <930.50g       0 



Total SDS of cluster "L0_Group_0":
12 SDs with 'VMs'
38 SDs - Empty
27 SDs - Unattached


*******************************


2020-07-09 02:48:23,454-0400 INFO  (jsonrpc/3) [vdsm.api] FINISH deleteImage error=dictionary changed size during iteration from=::ffff:10.1.41.200,56428, flow_id=1be05f02-f2c7-47a3-82c9-22960efbabe8, task_id=579e5d8d-20c1-4b8b-94f8-1494d3d9d424 (api:52)
2020-07-09 02:48:23,455-0400 ERROR (jsonrpc/3) [storage.TaskManager.Task] (Task='579e5d8d-20c1-4b8b-94f8-1494d3d9d424') Unexpected error (task:880)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-63>", line 2, in deleteImage
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1529, in deleteImage
    allVols = dom.getAllVolumes()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 841, in getAllVolumes
    return self._manifest.getAllVolumes()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 765, in getAllVolumes
    vols, rems = self.getAllVolumesImages()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 752, in getAllVolumesImages
    allVols = getAllVolumes(self.sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 285, in getAllVolumes
    vols = _getVolsTree(sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 203, in _getVolsTree
    for lv in _iter_volumes(sdUUID):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 216, in _iter_volumes
    for lv in lvm.getLV(sdUUID):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 1319, in getLV
    lv = _lvminfo.getLv(vgName, lvName)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 952, in getLv
    lvs = self._reloadlvs(vgName)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 734, in _reloadlvs
    staleLVs = [lvName for v, lvName in self._lvs
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 734, in <listcomp>
    staleLVs = [lvName for v, lvName in self._lvs
RuntimeError: dictionary changed size during iteration
2020-07-09 02:48:23,456-0400 INFO  (jsonrpc/3) [storage.TaskManager.Task] (Task='579e5d8d-20c1-4b8b-94f8-1494d3d9d424') aborting: Task is aborted: 'value=dictionary changed size during iteration abortedcode=100' (task:1190)

Comment 1 Amit Bawer 2020-07-12 11:10:07 UTC
Would we need 4.3 clone for this ticket (GA?) ?
I don't see it's different there as far is it goes for popping items from lvm cache dict outside of locks.

Comment 2 Nir Soffer 2020-07-12 11:32:36 UTC
(In reply to Amit Bawer from comment #1)
> Would we need 4.3 clone for this ticket (GA?) ?
> I don't see it's different there as far is it goes for popping items from
> lvm cache dict outside of locks.

Same issue exists in 4.3 of course. We can clone the bug and if we get acks
we can backport the fixes.

Comment 3 Amit Bawer 2020-07-12 13:47:08 UTC
Fixes should be part of build 4.40.23 when available

Comment 5 Amit Bawer 2020-07-13 10:23:57 UTC
we want to add more fixes under this ticket, so moving back to NEW

Comment 6 Amit Bawer 2020-07-14 06:50:41 UTC
Can it go back to MODIFIED now?

Comment 7 David Vaanunu 2020-08-24 11:58:43 UTC
While the system was upgrade to 4.4.2



2020-08-23 10:18:32,001-0400 WARN  (MainThread) [storage.HSM] Failed to stop RepoStats thread (hsm:3432)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 3429, in prepareForShutdown
    self.domainMonitor.shutdown()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 254, in shutdown
    self._stopMonitors(self._monitors.values(), shutdown=True)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 273, in _stopMonitors
    for monitor in monitors:
RuntimeError: dictionary changed size during iteration

Comment 8 Sandro Bonazzola 2020-08-28 12:20:22 UTC
Since this bug has been marked as fixed in version 4.40.27 by assignee and latest version included in oVirt 4.4.2 is 4.40.26 I'm moving this bug to 4.4.3.

Comment 9 David Vaanunu 2020-10-04 11:08:46 UTC
verified version:

redhat-release-8.3-1.0
rhv-release-4.4.3-5-001
vdsm-4.40.30-1



Tested scenarios:
VM snapshot with 13 Disks
50 Users VM snapshot

No Error "RuntimeError: dictionary changed size during iteration" during exection

Comment 10 Sandro Bonazzola 2020-11-11 06:39:48 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.