Bug 1811391

Summary: Creating many thin clones corrupts vg metadata [RHV clone - 4.3.9]
Product: Red Hat Enterprise Virtualization Manager Reporter: RHV bug bot <rhv-bugzilla-bot>
Component: vdsmAssignee: Amit Bawer <abawer>
Status: CLOSED ERRATA QA Contact: mlehrer
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.2.8-4CC: adevolder, aefrat, amarchuk, aoconnor, aperotti, bubrown, bugs, christian.grundmann, dagur, dvaanunu, ebenahar, eedri, frolland, guchen, guillaume.pavese, gveitmic, gwatson, izuckerm, jcall, jortialc, kkulkarn, lsurette, lsvaty, mavital, michal.skrivanek, mkalinin, mlehrer, mshriver, mwest, nashok, ngavrilo, nkshirsa, nsoffer, oliver.albl, pelauter, schandle, srevivo, stanislav.polasek, teigland, tnisan, vendors, vjuranek, ycui
Target Milestone: ovirt-4.3.9-1Keywords: Performance, ZStream
Target Release: ---Flags: lsvaty: testing_plan_complete-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1553133 Environment:
Last Closed: 2020-04-02 17:09:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Scale RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1812441    
Bug Blocks: 1553133    

Description RHV bug bot 2020-03-08 09:44:04 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1553133 +++
======================================================================

Description of problem:

After creating many thin clones from template the storage domain is deactivated with error "Storage Domain <LUN> (Data Center <NAME>) was deactivated by system because it's not visible by any of the hosts.

Trying to activate the storage domain shows the following error for all hosts in the datacenter: VDSM <HOSTNAME> command GetVGInfoVDS failed: Volume Group does not exist: (u'vg_uuid: jJnOUj-Ju4j-0YlS-r04l-Z1tD-Hjz0-evKdjz',)

pvs output on host shows:
/dev/mapper/3600601603cc045002e83815aed1324f2: Checksum error
Couldn't read volume group metadata.
/dev/mapper/3600601603cc045002e83815aed1324f2: Checksum error
Couldn't read volume group metadata.

We run 30 hosts connected to 30 FC storage domains on 2 all-flash storage systems.

Version-Release number of selected component (if applicable):
oVirt 4.2.1
CentOS 7.4

How reproducible:
Occured 2 times within 10 days

Steps to Reproduce:
1.Create and operate many (100) thin clones from template.
2.
3.

Actual results:
Storage domain inactive, metadata corrupt

Expected results:
Storage domain active.

Additional info:
Logfiles attached

(Originally by oliver.albl)

Comment 1 RHV bug bot 2020-03-08 09:44:13 UTC
Can we have the logs, please?

(Originally by amureini)

Comment 2 RHV bug bot 2020-03-08 09:44:17 UTC
Created attachment 1405862 [details]
engine and vdsm logs

(Originally by oliver.albl)

Comment 3 RHV bug bot 2020-03-08 09:44:27 UTC
Oliver, can you share the first 129 MiB of the bad PV?

You can copy it like this:

    dd if=/dev/mapper/xxxyyy of=pv.md bs=1M count=129 iflag=direct conv=fsync

And then compress the generated file (e.g xz pv.md). The output may be small 
enough to attach the bug, if not, please shared via dropbox or similar service.

It will also help if you share the backups of this vg metadata from /etc/lvm/backup.
The problem with the backup is you will have to search for the latest backup
on all the hosts running as SPM in your setup. The last host used as SPM with
this vg will have the most up to date backup info.

The metadata of the pv should not contain any sensitive info.

(Originally by Nir Soffer)

Comment 4 RHV bug bot 2020-03-08 09:44:32 UTC
I already ran vgcfgrestore on the LUNs to be able to recover vms. Would the PV data still be relevant?

(Originally by oliver.albl)

Comment 5 RHV bug bot 2020-03-08 09:44:36 UTC
(In reply to oliver.albl from comment #4)
> I already ran vgcfgrestore on the LUNs to be able to recover vms. Would the
> PV data still be relevant?

The PV metadata contains all the changes made to the pv, including the date the
host, and the lvm version for every change. If we are lucky, we will have complete
history of this PV. If we are less lucky, some of the history will be deleted by
the restore operation. I hope we will have enough info to understand why the PV
disappeared.

You can see example of the data here:
https://bugzilla.redhat.com/attachment.cgi?id=1385115

and here you can see how to see all the entries:
https://bugzilla.redhat.com/show_bug.cgi?id=1500395#c3

bug 1500395 is similar case, the conclusion was that another host running different
lvm version accessed the LUN and and removed the vg/pv.

(Originally by Nir Soffer)

Comment 6 RHV bug bot 2020-03-08 09:44:40 UTC
Created attachment 1407244 [details]
dd dump of two luns

(Originally by oliver.albl)

Comment 7 RHV bug bot 2020-03-08 09:44:46 UTC
Thanks for the dump Oliver!

In LUN204 we see this history only from Tue Mar 6 23:19:46 until Wed Mar 7 
23:09:00, is this the date when you restored the metadata?

$ grep -a '# Generated by LVM2 version' LUN204 | head
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar  6 23:19:46 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar  6 23:19:47 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar  6 23:19:48 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar  6 23:19:50 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar  6 23:19:52 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar  6 23:19:52 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar  6 23:19:54 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar  6 23:19:57 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar  6 23:19:59 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar  6 23:20:02 2018

$ grep -a '# Generated by LVM2 version' LUN204 | tail
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 23:07:56 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 23:08:06 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 23:08:43 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 23:08:45 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 23:08:46 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 23:08:48 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 23:08:57 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 23:08:58 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 23:08:59 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 23:09:00 2018

We also have some strange metadata lines:

# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 00:52:45 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:28:12 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 00:52:48 2018

Maybe this is leftover from older metadata - do you remember something special about
Tue Feb 27 09:28:12 2018?

# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 01:50:32 2018
# Generated by LVM2 version 2.02.171(237e56b4-9f3a-436a-84b6-0a2b09133b54 {
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 01:50:33 2018

Looks like leftover form previous metadata version, not wiped when writing new metadata.

# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 03:16:37 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon M237e56b4-9f3a-436a-84b6-0a2b09133b54 {
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar  7 03:16:38 2018

Another leftover?

Oliver, can provide timeline - when the domains started to fail? what did you change
in your deployment and when in this time frame?

(Originally by Nir Soffer)

Comment 8 RHV bug bot 2020-03-08 09:44:49 UTC
Nir,

I have the following details for LUN 204:

March 8th 2018 around 0.55am CET - Lost access to storage domain on LUN 204
March 8th 2018 around 6.40am CET - Tried to activate the storage domain
March 8th 2018 around 1.00pm CET - Ran vgcfgrestore with creation_time Mon Mar  5 12:54:56 2018.

We run automated tasks against this system, creating/running/dropping vms and snapshots. I do not see anything special at Tue Feb 27 09:28:12 2018.

(Originally by oliver.albl)

Comment 9 RHV bug bot 2020-03-08 09:44:53 UTC
Looking at LUN 205:

1. Lot of history from Feb 12, 14, 27, 28,

$ grep -a '# Generated by LVM2 version' LUN205
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:57 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:58 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:58 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:58 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:59 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:59 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:59 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:59 2018
...
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:38 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:38 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:39 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:40 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:40 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:40 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:41 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:41 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:42 2018
...
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:29:03 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:29:28 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:29:44 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:29:44 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:30:06 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:30:09 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:34:00 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:34:15 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:34:29 2018
...
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:00:01 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:15 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:17 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:24 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:27 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:38 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:40 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:41 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:44 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:44 2018


2. Then history from Mar 8 - is this the point when you restored the vg?

# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:37:30 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:53:37 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:53:41 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:53:51 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:53:54 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:54:30 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:54:34 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:54:41 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:54:45 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:57:02 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:57:11 2018
...
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 15:44:56 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 15:44:56 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 15:47:27 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 15:47:27 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 15:47:33 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 15:47:40 2018
# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Fri Mar  9 11:54:42 2018


3. Looking at hosts creating the changes

$ egrep -a "creation_host = \".+\"\s+# Linux" LUN205 | sort | uniq -c
     71 creation_host = "cloud5.sq.fabasoft.com"	# Linux cloud5.sq.fabasoft.com 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64
   1771 creation_host = "fabavmhost002"	# Linux fabavmhost002 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64
     62 creation_host = "fabavmhost003"	# Linux fabavmhost003 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64
      1 creation_host = "fabavmhost008"	# Linux fabavmhost008 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64
      1 creation_host = "fabavmhost015"	# Linux fabavmhost015 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64
      1 creation_host = "fabavmhost019"	# Linux fabavmhost019 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64
      1 creation_host = "fabavmhost021"	# Linux fabavmhost021 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64
      1 creation_host = "fabavmhost030"	# Linux fabavmhost030 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64
      1 creation_host = "sqdx11.sq.fabasoft.com"	# Linux sqdx11.sq.fabasoft.com 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64

Are these hosts part of the oVirt DC? did you run SPM on all of these hosts?


Lets look at the changes of the hosts that did 1 change:


# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 13:46:43 2018
creation_host = "sqdx11.sq.fabasoft.com"        # Linux sqdx11.sq.fabasoft.com 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64
creation_time = 1518612403      # Wed Feb 14 13:46:43 2018
seqno = 73


# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 10:27:10 2018
creation_host = "fabavmhost021" # Linux fabavmhost021 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64
creation_time = 1519810030      # Wed Feb 28 10:27:10 2018
seqno = 1483


# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 10:38:34 2018
creation_host = "fabavmhost015" # Linux fabavmhost015 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64
creation_time = 1519810714      # Wed Feb 28 10:38:34 2018
seqno = 1765


# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 14:45:29 2018
creation_host = "fabavmhost008" # Linux fabavmhost008 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64
creation_time = 1518615929      # Wed Feb 14 14:45:29 2018
seqno = 74


# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 10:42:10 2018
creation_host = "fabavmhost019" # Linux fabavmhost019 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64
creation_time = 1519810930      # Wed Feb 28 10:42:10 2018
seqno = 74


# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar  8 13:37:30 2018
creation_host = "fabavmhost030" # Linux fabavmhost030 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64
creation_time = 1520512650      # Thu Mar  8 13:37:30 2018
seqno = 75


Note: we have 2 entries with the same seqno! (seqno = 74)


Looking at seqno numbers:

$ grep -a 'seqno = ' LUN205
seqno = 1
seqno = 2
seqno = 3
seqno = 4
seqno = 5
seqno = 6
seqno = 7
seqno = 8
seqno = 9
seqno = 10
...
seqno = 1837
seqno = 1838
seqno = 1839
seqno = 1840
seqno = 1841
seqno = 1842
seqno = 1843
seqno = 1844
seqno = 1845
seqno = 1846
seqno = 1847   << note the switch here
seqno = 74
seqno = 75
seqno = 76
seqno = 77
seqno = 78
seqno = 79
seqno = 80
seqno = 81
seqno = 82
seqno = 83
...
seqno = 124
seqno = 125
seqno = 126
seqno = 127
seqno = 128
seqno = 129
seqno = 130
seqno = 131
seqno = 132
seqno = 133
seqno = 134
seqno = 135
seqno = 136


After seqno = 1847 we started over. Lets look at this version:


# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 10:42:10 2018
creation_host = "fabavmhost002" # Linux fabavmhost002 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64
creation_time = 1519810930      # Wed Feb 28 10:42:10 2018

b83c159c-4ad6-4613-ba16-bab95ccd10c0 {
id = "813oRe-64r8-mloU-k9G2-LFsS-dXSG-hpN4kf"
seqno = 1847
format = "lvm2"
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
tags = ["MDT_LEASETIMESEC=60", "MDT_VERSION=4", "MDT_SDUUID=b83c159c-4ad6-4613-ba16-bab95ccd10c0", "MDT_LOGBLKSIZE=512", "MDT_DESCRIPTION=FABAVMHOST_LUN_205", "MDT_LEASERETRIES=3", "MDT_IOOPTIMEOUTSEC=10", "MDT_LOCKRENEWALINTERVALSEC=5", "MDT_ROLE=Regular", "MDT_PHYBLKSIZE=512", "MDT_PV0=pv:3600601603cc045005d83815aff3a307c&44&uuid:pyN2Sr-QOh0-8QSX-tShd-x3oI-dh9B-Ck59H3&44&pestart:0&44&pecount:65533&44&mapoffset:0", "MDT_CLASS=Data", "MDT_TYPE=FCP", "MDT_LOCKPOLICY=", "MDT_VGUUID=813oRe-64r8-mloU-k9G2-LFsS-dXSG-hpN4kf", "RHAT_storage_domain", "MDT__SHA_CKSUM=536309f4105827dc86058e897c68a613873223a5", "MDT_POOL_UUID=5849b030-626e-47cb-ad90-3ce782d831b3"]
extent_size = 262144
max_lv = 0
max_pv = 0
metadata_copies = 0

physical_volumes {

pv0 {
id = "pyN2Sr-QOh0-8QSX-tShd-x3oI-dh9B-Ck59H3"
device = "/dev/mapper/3600601603cc045005d83815aff3a307c"
...


Next version is seqo = 74:


# Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 10:42:10 2018
creation_host = "fabavmhost019" # Linux fabavmhost019 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64
creation_time = 1519810930      # Wed Feb 28 10:42:10 2018

b83c159c-4ad6-4613-ba16-bab95ccd10c0 {
id = "813oRe-64r8-mloU-k9G2-LFsS-dXSG-hpN4kf"
seqno = 74
format = "lvm2"
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
tags = ["MDT_LEASETIMESEC=60", "MDT_VERSION=4", "MDT_SDUUID=b83c159c-4ad6-4613-ba16-bab95ccd10c0", "MDT_LOGBLKSIZE=512", "MDT_DESCRIPTION=FABAVMHOST_LUN_205", "MDT_LEASERETRIES=3", "MDT_IOOPTIMEOUTSEC=10", "MDT_LOCKRENEWALINTERVALSEC=5", "MDT_ROLE=Regular", "MDT_PHYBLKSIZE=512", "MDT_PV0=pv:3600601603cc045005d83815aff3a307c&44&uuid:pyN2Sr-QOh0-8QSX-tShd-x3oI-dh9B-Ck59H3&44&pestart:0&44&pecount:65533&44&mapoffset:0", "MDT_CLASS=Data", "MDT_TYPE=FCP", "MDT_LOCKPOLICY=", "MDT_VGUUID=813oRe-64r8-mloU-k9G2-LFsS-dXSG-hpN4kf", "RHAT_storage_domain", "MDT__SHA_CKSUM=536309f4105827dc86058e897c68a613873223a5", "MDT_POOL_UUID=5849b030-626e-47cb-ad90-3ce782d831b3"]
extent_size = 262144
max_lv = 0
max_pv = 0
metadata_copies = 0

physical_volumes {

pv0 {
id = "pyN2Sr-QOh0-8QSX-tShd-x3oI-dh9B-Ck59H3"
device = "/dev/mapper/3600601603cc045005d83815aff3a307c"
...


Note that both metadata changes happened in the *same* second
(1519810930 # Wed Feb 28 10:42:10 2018) from these hosts:
- fabavmhost002
- fabavmhost019

Only one of them could be the SPM at this time.

Can you extract messages from engine log about the SPM?
(Storage Pool Manager runs on Host xxx.yyy (Address: xxx.yyy), Data Center v42.)

Also we need from both hosts
- /var/log/vdsm/vdsm.log
- /var/log/messages
- /var/log/sanlock.log

Around that time - Wed Feb 28 10:42:10 2018.

(Originally by Nir Soffer)

Comment 10 RHV bug bot 2020-03-08 09:44:57 UTC
For LUN 205:

February 02nd 2018 around 02.40pm CET - Imported LUN 205 that was detached from another oVirt installation.
February 28th 2018 around 10.40am CET - Lost access to storage domain on LUN 205. In this case we also saw messages regaring the storage system. We did not see those messages when we lost access to storage domain on LUN 204.
March 3rd 2018 around 05.00pm CET - Managed to switch the storage domain from inactive to maintenance mode
March 8th 2018 around 01.30pm CET - Ran vgcfgrestore with creation_time  Mon Feb 26 14:55:43 2018

All hosts named FABAVMHOST* are part of one oVirt DC. All hosts named *.sq.fabasoft.com are part of another oVirt DC (we detached the storage domain and imported it). Here are the SPM events:

# select log_time,severity,message from audit_log where message like '%Storage Pool%Data Center Production%' order by 1;
          log_time          | severity |                                                                  message
----------------------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------
 2018-02-12 08:23:46.455+01 |        0 | Storage Pool Manager runs on Host fabavmhost013.fabagl.fabasoft.com (Address: fabavmhost013.fabagl.fabasoft.com), Data Center Production.
 2018-02-14 14:44:56.353+01 |        0 | Storage Pool Manager runs on Host fabavmhost008.fabagl.fabasoft.com (Address: fabavmhost008.fabagl.fabasoft.com), Data Center Production.
 2018-02-15 17:16:08.397+01 |        0 | Storage Pool Manager runs on Host fabavmhost008.fabagl.fabasoft.com (Address: fabavmhost008.fabagl.fabasoft.com), Data Center Production.
 2018-02-15 17:45:22.47+01  |        0 | Storage Pool Manager runs on Host fabavmhost008.fabagl.fabasoft.com (Address: fabavmhost008.fabagl.fabasoft.com), Data Center Production.
 2018-02-15 17:49:43.772+01 |        0 | Storage Pool Manager runs on Host fabavmhost008.fabagl.fabasoft.com (Address: fabavmhost008.fabagl.fabasoft.com), Data Center Production.
 2018-02-16 11:54:34.9+01   |        0 | Storage Pool Manager runs on Host fabavmhost002.fabagl.fabasoft.com (Address: fabavmhost002.fabagl.fabasoft.com), Data Center Production.
 2018-02-28 10:42:52.152+01 |        0 | Storage Pool Manager runs on Host fabavmhost002.fabagl.fabasoft.com (Address: fabavmhost002.fabagl.fabasoft.com), Data Center Production.
 2018-02-28 10:43:02.328+01 |        0 | Storage Pool Manager runs on Host fabavmhost002.fabagl.fabasoft.com (Address: fabavmhost002.fabagl.fabasoft.com), Data Center Production.
 2018-02-28 10:43:09.621+01 |        0 | Storage Pool Manager runs on Host fabavmhost002.fabagl.fabasoft.com (Address: fabavmhost002.fabagl.fabasoft.com), Data Center Production.
 2018-02-28 10:43:48.412+01 |        0 | Storage Pool Manager runs on Host fabavmhost020.fabagl.fabasoft.com (Address: fabavmhost020.fabagl.fabasoft.com), Data Center Production.
 2018-02-28 10:44:15.516+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-02-28 10:44:34.17+01  |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-02-28 10:44:42.878+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-02-28 11:23:58.188+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-02-28 11:25:48.534+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-01 17:31:40.922+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-01 21:40:47.256+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-02 15:02:12.661+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-02 16:54:28.204+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-06 15:21:46.471+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-06 16:40:48.419+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-06 16:51:38.168+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:54:07.208+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:54:19.464+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:54:25.753+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:54:31.151+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:54:47.339+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:54:56.896+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:55:06.913+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:55:15.058+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:55:20.855+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:55:28.16+01  |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:55:34.43+01  |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:55:49.68+01  |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:55:55.388+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:56:01.199+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:56:12.104+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:56:18.472+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:56:30.562+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:56:40.967+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:56:53.223+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:57:00.707+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:57:17.991+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:57:29.793+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 00:58:09.52+01  |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-08 06:40:13.027+01 |        0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production.
 2018-03-10 11:06:02.589+01 |        0 | Storage Pool Manager runs on Host fabavmhost002.fabagl.fabasoft.com (Address: fabavmhost002.fabagl.fabasoft.com), Data Center Production.

I attached the logs from fabavmhost002 and fabavmhost019 (vdsm logs from fabavmhost019 are not available any more).

(Originally by oliver.albl)

Comment 11 RHV bug bot 2020-03-08 09:45:01 UTC
Created attachment 1407476 [details]
Logfiles fabavmhost002 and fabavmhost019

(Originally by oliver.albl)

Comment 13 RHV bug bot 2020-03-08 09:45:11 UTC
Created attachment 1423463 [details]
qe-logs

(In reply to Yaniv Kaul from comment #12)
> Yaniv, I'm moving this to 4.2.4 - we are not going to be able to have it for
> 4.2.3.
> Elad - was this reproduced by QE? Storage or Scale QE?

Might be. Examined our latest tier3 automation execution. I'm observing the following, not sure it's related. It might be related to an issue with a LUN that has been unmasked from the hosts while it was part of a storage connection for an iSCSI domain in the setup. Nir, please take a look.



engine.log:

2018-04-14 19:24:09,971+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-29518) [7b4b5cfb] EVENT_ID: SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), 
Storage Domain sd_TestCase10147_1418564266 (Data Center golden_env_mixed) was deactivated by system because it's not visible by any of the hosts.



From around the same time, the following appears in vdsm:

2018-04-14 19:22:44,628+0300 ERROR (hsm/init) [storage.Multipath] Could not resize device 3600a09803830447a4f244c4657595264 (multipath:98)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/multipath.py", line 96, in resize_devices
    _resize_if_needed(guid)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/multipath.py", line 120, in _resize_if_needed
    supervdsm.getProxy().resizeMap(name)
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in __call__
    return callMethod()
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in <lambda>
    **kwargs)
  File "<string>", line 2, in resizeMap
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
Error: Resizing map 'dm-147' failed: out='fail\n' err=''
2018-04-14 19:22:44,629+0300 DEBUG (hsm/init) [storage.SamplingMethod] Returning last result (misc:395)

(Originally by Elad Ben Aharon)

Comment 14 RHV bug bot 2020-03-08 09:45:15 UTC
(In reply to Elad from comment #13)
> Created attachment 1423463 [details]
> qe-logs
> 
> (In reply to Yaniv Kaul from comment #12)
> > Yaniv, I'm moving this to 4.2.4 - we are not going to be able to have it for
> > 4.2.3.
> > Elad - was this reproduced by QE? Storage or Scale QE?
> 
> Might be. Examined our latest tier3 automation execution. I'm observing the
> following, not sure it's related. It might be related to an issue with a LUN
> that has been unmasked from the hosts while it was part of a storage
> connection for an iSCSI domain in the setup. Nir, please take a look.
> 
> engine.log:
> 
> 2018-04-14 19:24:09,971+03 WARN 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedThreadFactory-engine-Thread-29518) [7b4b5cfb] EVENT_ID:
> SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), 
> Storage Domain sd_TestCase10147_1418564266 (Data Center golden_env_mixed)
> was deactivated by system because it's not visible by any of the hosts.

I don't think this is related, if you changed a lun configuration and it became
invisible on all hosts, it does not mean that it was corrupted.

> From around the same time, the following appears in vdsm:
> 
> Error: Resizing map 'dm-147' failed: out='fail\n' err=''
> 2018-04-14 19:22:44,629+0300 DEBUG (hsm/init) [storage.SamplingMethod]
> Returning last result (misc:395)

This is like to happen when a lun is not available.

Elad, I think we need to write a specific script for this test, creating many
thin clones from template. Do we have such test?

We reproduced it if a storage domain which is configured properly becomes
deactivated after some time, and we cannot activate it again.

Did you try to activate the storage domain after fixing the lun zoning?

(Originally by Nir Soffer)

Comment 15 RHV bug bot 2020-03-08 09:45:20 UTC
(In reply to Nir Soffer from comment #14)
> (In reply to Elad from comment #13)
> > Created attachment 1423463 [details]
> > qe-logs
> > 
> > (In reply to Yaniv Kaul from comment #12)
> > > Yaniv, I'm moving this to 4.2.4 - we are not going to be able to have it for
> > > 4.2.3.
> > > Elad - was this reproduced by QE? Storage or Scale QE?
> > 
> > Might be. Examined our latest tier3 automation execution. I'm observing the
> > following, not sure it's related. It might be related to an issue with a LUN
> > that has been unmasked from the hosts while it was part of a storage
> > connection for an iSCSI domain in the setup. Nir, please take a look.
> > 
> > engine.log:
> > 
> > 2018-04-14 19:24:09,971+03 WARN 
> > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> > (EE-ManagedThreadFactory-engine-Thread-29518) [7b4b5cfb] EVENT_ID:
> > SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), 
> > Storage Domain sd_TestCase10147_1418564266 (Data Center golden_env_mixed)
> > was deactivated by system because it's not visible by any of the hosts.
> 
> I don't think this is related, if you changed a lun configuration and it
> became
> invisible on all hosts, it does not mean that it was corrupted.
> 
> > From around the same time, the following appears in vdsm:
> > 
> > Error: Resizing map 'dm-147' failed: out='fail\n' err=''
> > 2018-04-14 19:22:44,629+0300 DEBUG (hsm/init) [storage.SamplingMethod]
> > Returning last result (misc:395)
> 
> This is like to happen when a lun is not available.
> 
> Elad, I think we need to write a specific script for this test, creating many
> thin clones from template. Do we have such test?
We can create one
Natalie, please create one, python sdk is an option. 

> We reproduced it if a storage domain which is configured properly becomes
> deactivated after some time, and we cannot activate it again.
> 
> Did you try to activate the storage domain after fixing the lun zoning?
Yes

(Originally by Elad Ben Aharon)

Comment 16 RHV bug bot 2020-03-08 09:45:25 UTC
(In reply to Elad from comment #15)
> (In reply to Nir Soffer from comment #14)
> > (In reply to Elad from comment #13)
> > > Created attachment 1423463 [details]
> > > qe-logs
> > > 
> > > (In reply to Yaniv Kaul from comment #12)
> > > > Yaniv, I'm moving this to 4.2.4 - we are not going to be able to have it for
> > > > 4.2.3.
> > > > Elad - was this reproduced by QE? Storage or Scale QE?
> > > 
> > > Might be. Examined our latest tier3 automation execution. I'm observing the
> > > following, not sure it's related. It might be related to an issue with a LUN
> > > that has been unmasked from the hosts while it was part of a storage
> > > connection for an iSCSI domain in the setup. Nir, please take a look.
> > > 
> > > engine.log:
> > > 
> > > 2018-04-14 19:24:09,971+03 WARN 
> > > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> > > (EE-ManagedThreadFactory-engine-Thread-29518) [7b4b5cfb] EVENT_ID:
> > > SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), 
> > > Storage Domain sd_TestCase10147_1418564266 (Data Center golden_env_mixed)
> > > was deactivated by system because it's not visible by any of the hosts.
> > 
> > I don't think this is related, if you changed a lun configuration and it
> > became
> > invisible on all hosts, it does not mean that it was corrupted.
> > 
> > > From around the same time, the following appears in vdsm:
> > > 
> > > Error: Resizing map 'dm-147' failed: out='fail\n' err=''
> > > 2018-04-14 19:22:44,629+0300 DEBUG (hsm/init) [storage.SamplingMethod]
> > > Returning last result (misc:395)
> > 
> > This is like to happen when a lun is not available.
> > 
> > Elad, I think we need to write a specific script for this test, creating many
> > thin clones from template. Do we have such test?

Nir, isn't it the same as creating a VM pool?

> We can create one
> Natalie, please create one, python sdk is an option. 
> 
> > We reproduced it if a storage domain which is configured properly becomes
> > deactivated after some time, and we cannot activate it again.
> > 
> > Did you try to activate the storage domain after fixing the lun zoning?
> Yes

(Originally by Natalie Gavrilov)

Comment 17 RHV bug bot 2020-03-08 09:45:28 UTC
(In reply to Natalie Gavrielov from comment #16)
> > > Elad, I think we need to write a specific script for this test, creating many
> > > thin clones from template. Do we have such test?
> 
> Nir, isn't it the same as creating a VM pool?

Maybe, we need to ask the reporter what is the flow that reproduces this issue.

Oliver, can you explain what is "creating many thin clones from template"?

Are you using the ovirt SDK? can you show an example code you use to create
these clones, so we can reproduce this in the lab, and include this flow in our
automated tests?

(Originally by Nir Soffer)

Comment 18 RHV bug bot 2020-03-08 09:45:31 UTC
(In reply to Nir Soffer from comment #17)
> (In reply to Natalie Gavrielov from comment #16)
> > > > Elad, I think we need to write a specific script for this test, creating many
> > > > thin clones from template. Do we have such test?
> > 
> > Nir, isn't it the same as creating a VM pool?
> 
> Maybe, we need to ask the reporter what is the flow that reproduces this
> issue.
> 
> Oliver, can you explain what is "creating many thin clones from template"?
> 
> Are you using the ovirt SDK? can you show an example code you use to create
> these clones, so we can reproduce this in the lab, and include this flow in
> our
> automated tests?


Hi,
we are using the ovirt vagrant provider https://www.ovirt.org/develop/projects/proposals/vagrant-provider
which uses the Ruby SDK.
But I think it should be the same just using the SDK with the following flow:
  - Create thin clones from Template
  - Use the clones (-> extends the disk)
  - Remove the clones (-> discard)

(Originally by christian.grundmann)

Comment 19 RHV bug bot 2020-03-08 09:45:35 UTC
Please see Christian's comment 18

(Originally by oliver.albl)

Comment 21 RHV bug bot 2020-03-08 09:45:44 UTC
(In reply to christian.grundmann from comment #18)
> (In reply to Nir Soffer from comment #17)
> > (In reply to Natalie Gavrielov from comment #16)
> > > > > Elad, I think we need to write a specific script for this test, creating many
> > > > > thin clones from template. Do we have such test?
> > > 
> > > Nir, isn't it the same as creating a VM pool?
> > 
> > Maybe, we need to ask the reporter what is the flow that reproduces this
> > issue.
> > 
> > Oliver, can you explain what is "creating many thin clones from template"?
> > 
> > Are you using the ovirt SDK? can you show an example code you use to create
> > these clones, so we can reproduce this in the lab, and include this flow in
> > our
> > automated tests?
> 
> 
> Hi,
> we are using the ovirt vagrant provider
> https://www.ovirt.org/develop/projects/proposals/vagrant-provider
> which uses the Ruby SDK.
> But I think it should be the same just using the SDK with the following flow:
>   - Create thin clones from Template
>   - Use the clones (-> extends the disk)
>   - Remove the clones (-> discard)

Thanks for the above a few more questions:

Whats the rate that these actions were performed - was there any concurrency involved? 
 
What is the distribution of the templates/VMs to SDs?

Were all disks extended at same time?  if not what level of concurrency was involved in the disks extension?

Were all VMs removed at same time? if not what level of concurrency was invoked in the VM removal?

(Originally by Mordechai Lehrer)

Comment 22 RHV bug bot 2020-03-08 09:45:47 UTC
Yes there is concurrency in each step.
We create up to 50 VMs with a few seconds delay

The VMs are created on the same SD as the templates.

The disk will get extended at the same time because the use cases in every VM is the same.

The removal has the same concurrency as the creation.

(Originally by christian.grundmann)

Comment 23 RHV bug bot 2020-03-08 09:45:51 UTC
I recreated the described above scenario with a help of Jmeter on my Environment.

I monitored engine log and vdsm logs per each host (3 in my env).

No ERRORS were found in neither of logs.
No failures were seen from the UI.

What my Jmeter script did (all tasks were executed concurrently as 50 users) :

Create 50 VMs from 1 HDD Thin template with RampUp 120 Sec concurrently
Start those VMs
Validate they are up
Power them Down
Validate they are down
Extend Disk per each VM from 10G to 11G
Remove Those VMs

logs collected by ovirt-logs-collector can be found in private message below.

(Originally by Ilan Zuckerman)

Comment 26 RHV bug bot 2020-03-08 09:46:03 UTC
(In reply to Ilan Zuckerman from comment #23)
> I recreated the described above scenario with a help of Jmeter on my
> Environment.
> 
> I monitored engine log and vdsm logs per each host (3 in my env).
> 
> No ERRORS were found in neither of logs.
> No failures were seen from the UI.
> 
> What my Jmeter script did (all tasks were executed concurrently as 50 users)
> :
> 
> Create 50 VMs from 1 HDD Thin template with RampUp 120 Sec concurrently
> Start those VMs
> Validate they are up
> Power them Down
> Validate they are down
> Extend Disk per each VM from 10G to 11G
> Remove Those VMs

I'm not sure what you did. Can you attach a python script using oVirt SDK that
we can run on any setup, and understand what call were made and how they were
issued?

Also, what do you mean by concurrency of 50 users? Please show the command line
used to get the concurrent calls.

Another important detail - on the reporter system wipe-after-delete and
discard-after-delete were enabled. To test this we need storage supporting
discard, and we need to choose these options when creating the disks.

Finally, what was the storage used in this setup?

(Originally by Nir Soffer)

Comment 27 RHV bug bot 2020-03-08 09:46:07 UTC
> (In reply to Nir Soffer from comment #17)
> we are using the ovirt vagrant provider
> https://www.ovirt.org/develop/projects/proposals/vagrant-provider
> which uses the Ruby SDK.
> But I think it should be the same just using the SDK with the following flow:
>   - Create thin clones from Template
>   - Use the clones (-> extends the disk)

What do you mean by use the clones?
What is the workload running on the vm?
How much data is written to the vm disk? which rate?
How much time the vim is running before it is shut down and removed?

>   - Remove the clones (-> discard)

Do you use both discard-after-delete and wipe-after-delete?

Can you share the script, so we can make sure what we do in the lab matches what
you did on your system?

(Originally by Nir Soffer)

Comment 28 RHV bug bot 2020-03-08 09:46:10 UTC
(In reply to Ilan Zuckerman from comment #23)
> Create 50 VMs from 1 HDD Thin template with RampUp 120 Sec concurrently
> Start those VMs
> Validate they are up
> Power them Down
> Validate they are down
> Extend Disk per each VM from 10G to 11G

This does not do any LVM extension. It is not clear what Christian means by
"Use the clones (-> extend the disk)" (see comment 27).

I assume that what we need to do is to run code on the vm that will write data
to the disk, causing the disk to be extended.

Lets wait for more info from Christian on this.

> Remove Those VMs

Since all extensions happen on the SPM host, maybe we can simulate this by simply
running concurrent lvextend commands on a single host, and in the same time 
concurrent lvchange --refersh on other hosts, and we don't need to script engine.

If we find an issue in LVM, we will need such script that reproduce the issue from
the shell without dependency on oVirt, so LVM developer can reproduce this easily.

(Originally by Nir Soffer)

Comment 29 RHV bug bot 2020-03-08 09:46:14 UTC
(In reply to Nir Soffer from comment #27)
> > (In reply to Nir Soffer from comment #17)
> > we are using the ovirt vagrant provider
> > https://www.ovirt.org/develop/projects/proposals/vagrant-provider
> > which uses the Ruby SDK.
> > But I think it should be the same just using the SDK with the following flow:
> >   - Create thin clones from Template
> >   - Use the clones (-> extends the disk)
> 
> What do you mean by use the clones?
  Running Software and Tests inside the VMs

> What is the workload running on the vm?
  High CPU Usage (8 Cores)
  High Memory Usage(30GB)
  Mixed Windows and Linux Clients and Servers (CentOS, Ubuntu, Server 2012R2, Server 2016, Windows 10)

> How much data is written to the vm disk? which rate?
  Around 10-15 GB in the first 5-10 minutes after VM is created/started

> How much time the vm is running before it is shut down and removed?
  Around 30-60 minutes

> Do you use both discard-after-delete and wipe-after-delete?
  We only use discard-after-delete
 
> Can you share the script, so we can make sure what we do in the lab matches
> what
> you did on your system?
  No, not really our Scripts/vagrant Boxes are very specifiy to our Uses Cases.

(Originally by christian.grundmann)

Comment 30 RHV bug bot 2020-03-08 09:46:19 UTC
testing jira sync, please ignore

(Originally by Eyal Edri)

Comment 31 RHV bug bot 2020-03-08 09:46:23 UTC
Based on comment 29, I think we need to reproduce this again like this.

The test should:
- create lot of vms in the same time (stress the SPM)
- perform lot of automatic disk extension in the same time (stress the SPM)
- keep the host cpu busy (this may effect lvm or SPM)
- perform many disk discards in the same time (I suspect this is related to the issue)


## Setup

- Create storage domain on a FC storage supporting discard, and enable discard on
  this storage domain.

- Create vm with 30g raw disk

- Make sure the vm has at least 15G free in the root file system. If not, resize
  the root file system as needed.

- Create template from this vm.


## Basic flow for every vm

1. Create vm based on the template.
   Make sure the vm disk is marked for discard-after-delete 

2. Start the vm

3. Wait until the vm is up

4. Perform io in the vm, to cause extension of the disk

Connect to the vm using ssh, and run:

dd if=/dev/urandom of=/var/tmp/data bs=1M count=15360 oflag=direct conv=fsync

We can run additional stuff in the vm during the test to simulate what the reporter is
doing, but I don't think this is related.

5. Wait until the dd command is finished.

The vm disk will be extended by 1G 15 times while dd is running.

It is possible that the vm will pause during the test, but the system should resume it.

6. Shut down the vm

7. delete the vm
   
Make sure the vm disk is discarded when the vm is deleted. You can see messages
about discarding the volume in vdsm log, and using "ps" command you will see that
vdsm is running "blkdiscard /dev/vgname/lvname" for this volume.

I would first test that running this flow wit one vm in a loop for a hour or so works as
expected.


## Complete flow

1. Start basic flow in 50 threads,  with 10 seconds delay between threads.
2. When a vm is deleted, repeat the basic flow for every thread
3. Run this for a day or so.
4. collect logs from all hosts and engine for inspection


## Version to test

We know that discarding disks before 4.2.6 could cause delays in other processes,
this was reported by the same reporter of this bug.

- Reproduce with the version reported in this bug (oVirt 4.2.1) or 4.2.3 if you want
  to test with RHV.

- Test with 4.2.6 - we improved the way volume are discarded in this version.


It would be best if the test will written using oVirt python SDK, so we can inspect the test
scripts before we run the test.

Elad, this looks like a work for scale team, or at least using one of them servers.

(Originally by Nir Soffer)

Comment 32 RHV bug bot 2020-03-08 09:46:26 UTC
Moving to scale as per comment #31

(Originally by Elad Ben Aharon)

Comment 34 RHV bug bot 2020-03-08 09:46:33 UTC
Daniel, this is test only bug. If we can reproduce this we will make further
investigation. If we cannot reproduce this, we will close it.

See comment 31, explaning how it should be tested.

(Originally by Nir Soffer)

Comment 35 RHV bug bot 2020-03-08 09:46:39 UTC
We are working on reproducing the bug.
Please advise the following questions :

1. At the reproduction it say's "Create vm with 30g raw disk", as for http://www.ovirt.org/documentation/admin-guide/chap-Virtual_Machine_Disks/
we can onlly use QCOW2 with Sparse for the extends, please advise if we can use QCOW2 with Sparse for the bug reproduction.
2. At the reproduction it say's to extend the root file system - is it O.k to extend a disk other then root file system or it's a must to extend the disk the root FS resides on ?
3. FC storage is a must or can we use iscsi SD ?
4. Please advise it will be test only on 4.2.6 since we do not have 4.2.3 currently.

(Originally by Guy Chen)

Comment 36 RHV bug bot 2020-03-08 09:46:43 UTC
(In reply to guy chen from comment #35)
> We are working on reproducing the bug.
> Please advise the following questions :
> 
> 1. At the reproduction it say's "Create vm with 30g raw disk", as for
> http://www.ovirt.org/documentation/admin-guide/chap-Virtual_Machine_Disks/
> we can onlly use QCOW2 with Sparse for the extends, please advise if we can
> use QCOW2 with Sparse for the bug reproduction.

Read comment 31 again.

"Create vm with 30g raw disk" refer to the template. You should create a vm based
on this template. It will have qcow2 disk with the template as the base image.

> 2. At the reproduction it say's to extend the root file system - is it O.k
> to extend a disk other then root file system or it's a must to extend the
> disk the root FS resides on ?

I don't know what is "At the reproduction". Please read again comment 31. If the
instructions are not clear, quote here the text which is not clear.

> 3. FC storage is a must or can we use iscsi SD ?

Better to use FC so we can create more load (disks will be extended and deleted
faster, so we get more iterations per hour, or need less hours to reproduce.

> 4. Please advise it will be test only on 4.2.6 since we do not have 4.2.3
> currently.

I think we should test with older version, since this increases the chance to 
reproduce, and it can show if changes in 4.2.6 are related, as I explained in 
comment 31.

(Originally by Nir Soffer)

Comment 37 RHV bug bot 2020-03-08 09:46:47 UTC
Thanks Nir.

Regarding number 2, this is the relevant section :

"
- Create vm with 30g raw disk

- Make sure the vm has at least 15G free in the root file system. If not, resize
  the root file system as needed.
"

Can i have have instead of 1 disk with 30g a VM with 2 disks of 30g, one is bootable and on the other i will do the extend on ?

(Originally by Guy Chen)

Comment 38 RHV bug bot 2020-03-08 09:46:53 UTC
(In reply to guy chen from comment #37)
> Can i have have instead of 1 disk with 30g a VM with 2 disks of 30g, one is
> bootable and on the other i will do the extend on ?

You can, but I don't see why it better. Preparing the base image is something you
can do once when you create the template, then all vms will use the same image
with at least 15g free in the root file system.

(Originally by Nir Soffer)

Comment 43 RHV bug bot 2020-03-08 09:47:13 UTC
Nir please take a look and suggest

(Originally by Daniel Gur)

Comment 44 RHV bug bot 2020-03-08 09:47:20 UTC
(In reply to guy chen from comment #39)
> Run the scenario as discussed.

With original vdsm version, or latest 4.2?

> During the removal of the VMS started to get IO error's, after that got read
> only FS that prevents me doing any actions on the SD, logs attached.
> 
> -=>>grep ERROR /var/log/vdsm/vdsm.log
...
> 2018-09-20 06:49:26,346+0000 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH
> getStoragePoolInfo error=(-202, 'Unable to read resource owners', 'IO
> timeout') (dispatcher:86)

Looks like your storage was not accessible or very slow here...

> 2018-09-20 06:51:02,174+0000 ERROR (check/loop) [storage.Monitor] Error
> checking path /dev/f1168ddf-917c-4ef4-b553-e034ee2983ec/metadata
> (monitor:498)

Storage monitoring failed to access this domain metadata lv...

...
> 2018-09-20 07:21:12,023+0000 ERROR (Thread-22) [storage.TaskManager.Task]
> (Task='846d39ec-cd8e-41cb-914f-e8a789ca2ef7') Unexpected error (task:875)
> 2018-09-20 07:21:12,027+0000 ERROR (Thread-22) [storage.Dispatcher] FINISH
> downloadImageFromStream error=Can't persist task:
> (u'846d39ec-cd8e-41cb-914f-e8a789ca2ef7: cannot access/create taskdir
> /rhev/data-center/98dea9a8-0a7e-11e8-b98d-b8ca3a638904/mastersd/master/tasks/
> 846d39ec-cd8e-41cb-914f-e8a789ca2ef7: [Errno 30] Read-only file system',)
> (dispatcher:82)

Looks like storage was overloaded by the test, and you did not configure multiapth
so the master file system became read only.

Did you run this setup with FC and multiple paths to storage? iSCSI? do you have
iSCSI multipathing configured?

(Originally by Nir Soffer)

Comment 45 RHV bug bot 2020-03-08 09:47:24 UTC
re-applying need info for comment 44 https://bugzilla.redhat.com/show_bug.cgi?id=1553133#c44

(Originally by Mordechai Lehrer)

Comment 50 RHV bug bot 2020-03-08 09:47:46 UTC
Looking at vdsm logs, it looks like we reproduced the original issue in the last
test run.

The test was started at 07:53:24 (host time), with SPM on host b01-h17-r620.

1. First extend (b01-h17):

-=>>xzgrep 'extending volume' /var/log/vdsm/vdsm.log.3 | head
2018-10-07 07:55:10,948+0000 INFO  (mailbox-spm/0) [storage.SPM.Messages.Extend] processRequest: extending volume b074676d-dcba-4c0a-92d3-3f32ab8edb15 in domain f1168ddf-917c-4ef4-b553-e034ee2983ec (pool 98dea9a8-0a7e-11e8-b98d-b8ca3a638904) to size 2048 (mailbox:161)


2. The first error in the SPM log is:

2018-10-07 08:02:13,304+0000 ERROR (mailbox-spm/4) [storage.SPM.Messages.Extend] processRequest: Exception caught while trying to extend volume: ab2c4b0c-2709-4726-83bd-5671ff5697f1 in domain: f1168ddf-917c-4ef4-b553-e034ee2983ec (mailbox:172)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mailbox.py", line 166, in processRequest
    pool.extendVolume(volume['domainID'], volume['volumeID'], size)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1420, in extendVolume
    sdCache.produce(sdUUID).extendVolume(volumeUUID, size, isShuttingDown)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1564, in extendVolume
    lvm.extendLV(self.sdUUID, volumeUUID, size)  # , isShuttingDown)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1224, in extendLV
    extent_size = int(vg.extent_size)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 82, in __getattr__
    raise AttributeError("Failed reload: %s" % self.name)
AttributeError: Failed reload: f1168ddf-917c-4ef4-b553-e034ee2983ec

This error means that this VG was invalidated and we failed to reload VG metadata.
Unfortunately the original error when we failed to reload the metadata was never
logged, so we don't have any info on the first error.


3. Trying to access this VG using vgs fail with:

# vgs f1168ddf-917c-4ef4-b553-e034ee2983ec

-=>>vgs f1168ddf-917c-4ef4-b553-e034ee2983ec
  /dev/mapper/3600a098038304437415d4b6a59676d51: Checksum error at offset 22089216
  Couldn't read volume group metadata from /dev/mapper/3600a098038304437415d4b6a59676d51.
  Metadata location on /dev/mapper/3600a098038304437415d4b6a59676d51 at 22089216 has invalid summary for VG.
  Failed to read metadata summary from /dev/mapper/3600a098038304437415d4b6a59676d51
  Failed to scan VG from /dev/mapper/3600a098038304437415d4b6a59676d51
  Volume group "f1168ddf-917c-4ef4-b553-e034ee2983ec" not found
  Cannot process volume group f1168ddf-917c-4ef4-b553-e034ee2983ec


4. Using lsblk we can still see the contents of the corrupted VG:

sdd                                                                                     8:48   0     1T  0 disk  
└─3600a098038304437415d4b6a59676d51                                                   253:4    0     1T  0 mpath 
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-metadata                                 253:17   0   512M  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-ids                                      253:18   0   128M  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-leases                                   253:19   0     2G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-outbox                                   253:20   0   128M  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-xleases                                  253:21   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-inbox                                    253:22   0   128M  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-master                                   253:23   0     1G  0 lvm   /rhev/data-center/mnt/blockSD/f1168ddf-917c-4ef4-b553-e034ee2983ec/master
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-5f94fc20--93ce--4694--9e75--6e367eecd7a3 253:38   0    50G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-e1d81892--57e3--4c9c--bef2--bfae08fa4c67 253:39   0    10G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-cb2c6de9--82b2--4467--9d7b--696d5523eff6 253:40   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-dfc7d9b7--b916--4565--8aa0--98f73f814aee 253:41   0     9G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-aae97ceb--af8d--4d8a--ace5--496cb93f09ae 253:42   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-b9582616--78cc--440e--b1f0--2f2e195faeb1 253:43   0     8G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-025d51bd--4d25--4393--850f--d1da09cf1c49 253:44   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-5afb917d--0e51--4cde--9608--0563be327d95 253:45   0     6G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-e4e4e606--c6b2--4503--b4f7--6eae0a1d0e42 253:46   0     4G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-b074676d--dcba--4c0a--92d3--3f32ab8edb15 253:47   0    16G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-68243011--06df--4e86--8ad4--714b89170d5b 253:48   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-d1c2a11e--75c8--4851--903e--ce251552793a 253:49   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-452169f5--0967--473a--8a1e--a4b3922ab915 253:50   0     3G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-4a5db397--2d03--4fa8--a1aa--842ac2dd2c8a 253:51   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-4530f234--1204--42a8--905e--e4ba284bf045 253:52   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-85db5760--6e55--40b4--9319--d7cbc11ab493 253:53   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-0a584504--e2a6--4df2--a03c--81bfbef59c23 253:54   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-ca7e84cb--7ab0--4102--b38e--a6b63fe65943 253:55   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-96f491a1--922c--4297--a075--39c5b757ecdd 253:56   0    16G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-80faa090--d967--4a4c--a017--a354b2f6416c 253:57   0     1G  0 lvm   
  ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-77c695cd--7126--44c2--b5d0--dd7989d6526d 253:58   0     1G  0 lvm   
  └─f1168ddf--917c--4ef4--b553--e034ee2983ec-915b0e60--ac97--4c4a--aad6--a619663857ee 253:60   0    16G  0 lvm   


The attached patch (https://gerrit.ovirt.org/c/94751/) improve logging, showing
detailed errors when reloading vgs and lvs fail.

(Originally by Nir Soffer)

Comment 51 RHV bug bot 2020-03-08 09:47:50 UTC
Created attachment 1491367 [details]
vdsm log from SPM during test

(Originally by Nir Soffer)

Comment 52 RHV bug bot 2020-03-08 09:47:55 UTC
Created attachment 1491368 [details]
dump of lvm metadata area 1

This was created using:

    dd if=//dev/mapper/3600a098038304437415d4b6a59676d51 \
        of=3600a098038304437415d4b6a59676d51.metadata.1 \
        bs=1M count=129 iflag=direct

(Originally by Nir Soffer)

Comment 53 RHV bug bot 2020-03-08 09:47:59 UTC
Created attachment 1491369 [details]
dump of lvm metadata area 2

This was created using:

    dd if=/dev/mapper/3600a098038304437415d4b6a59676d51 \
        of=3600a098038304437415d4b6a59676d51.metadata.2 \
        bs=1M skip=$((8191*128)) count=128 iflag=direct

(Originally by Nir Soffer)

Comment 54 RHV bug bot 2020-03-08 09:48:05 UTC
Created attachment 1491370 [details]
lvm backup for the corrupted vg

(Originally by Nir Soffer)

Comment 55 RHV bug bot 2020-03-08 09:48:09 UTC
The patch only improves logging, we still investigating the root cause.

(Originally by Nir Soffer)

Comment 56 RHV bug bot 2020-03-08 09:48:13 UTC
I want to do another run with the attached patch on all hosts, and a clean
LUN. This may help to detect the reason for this corruption.

Please do this:

1. Install vdsm on all hosts from this build:
https://jenkins.ovirt.org/job/vdsm_master_build-artifacts-on-demand-el7-x86_64/1018/artifact/exported-artifacts/

2. Add a new LUN for this test

3. Zero the entire LUN - we don't want old junk data to be mixed with lvm metadata:

    blkdiscard -z -p 32m /dev/mapper/xxxyyy

4. Create new storage domain from this LUN - best in a new DC. We don't want
    to touch the old corrupted VG.

5. Change vdsm log level to DEBUG - edit /etc/vdsm/logger.conf, and set:

[logger_root]
level=DEBUG
handlers=syslog,logthread
propagate=0

[logger_vds]
level=DEBUG
handlers=syslog,logthread
qualname=vds
propagate=0

[logger_storage]
level=DEBUG
handlers=logthread
qualname=storage
propagate=0

6. Run the test again.

If we are lucky we will reproduce this again and we will have better logs
and cleaner metadata to investigate.

(Originally by Nir Soffer)

Comment 57 RHV bug bot 2020-03-08 09:48:17 UTC
Removing Performance and TestOnly since this not about performance, and looks like
we reproduce the issue, so this is not test only bug.

(Originally by Nir Soffer)

Comment 61 RHV bug bot 2020-03-08 09:48:35 UTC
Created attachment 1511697 [details]
logs-kaminario

Reproduced with Kaminario storage.

Storage migration failed with Failed reload domain:


2018-12-04 15:20:20,033+0200 ERROR (monitor/9745f66) [storage.Monitor] Error checking domain 9745f66e-8d8e-4257-a652-4346601ecd4f (monitor:424)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 405, in _checkDomainStatus
    self.domain.selftest()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 48, in __getattr__
    return getattr(self.getRealDomain(), attrName)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1637, in findDomain
    return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 958, in __init__
    manifest = self.manifestClass(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 405, in __init__
    sd.StorageDomainManifest.__init__(self, sdUUID, domaindir, metadata)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 327, in __init__
    self._domainLock = self._makeDomainLock()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 530, in _makeDomainLock
    domVersion = self.getVersion()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 401, in getVersion
    return self.getMetaParam(DMDK_VERSION)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 398, in getMetaParam
    return self._metadata[key]
  File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 91, in __getitem__
    return dec(self._dict[key])
  File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 201, in __getitem__
    with self._accessWrapper():
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 154, in _accessWrapper
    self.refresh()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 232, in refresh
    lines = self._metaRW.readlines()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 271, in readlines
    for tag in vg.tags:
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 81, in __getattr__
    raise AttributeError("Failed reload: %s" % self.name)
AttributeError: Failed reload: 9745f66e-8d8e-4257-a652-4346601ecd4f



After that, the VG is reported as [unknown] with Checksum error at offset..


2018-12-04 15:20:40,419+0200 ERROR (monitor/9745f66) [storage.Monitor] Error checking domain 9745f66e-8d8e-4257-a652-4346601ecd4f (monitor:424)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 405, in _checkDomainStatus
    self.domain.selftest()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 48, in __getattr__
    return getattr(self.getRealDomain(), attrName)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1637, in findDomain
    return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
  File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1577, in findDomainPath
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'9745f66e-8d8e-4257-a652-4346601ecd4f',)





[root@puma45 ~]# pvs
  /dev/mapper/20024f40058540033: Checksum error at offset 65298432
  Couldn't read volume group metadata.
  /dev/mapper/20024f40058540033: Checksum error at offset 16492605014016
  Couldn't read volume group metadata.
  /dev/mapper/20024f40058540033: Checksum error at offset 65298432
  Couldn't read volume group metadata.
  /dev/mapper/20024f40058540033: Checksum error at offset 16492605014016
  Couldn't read volume group metadata.
  /dev/mapper/20024f40058540033: Checksum error at offset 65298432
  Couldn't read volume group metadata.
  /dev/mapper/20024f40058540033: Checksum error at offset 16492605014016
  Couldn't read volume group metadata.
  WARNING: PV /dev/mapper/20024f40058540033 is marked in use but no VG was found using it.
  WARNING: PV /dev/mapper/20024f40058540033 might need repairing.
  PV                                            VG                                   Fmt  Attr PSize   PFree
  /dev/mapper/20024f40058540033                 [unknown]                            lvm2 u--   15.00t    0 
  /dev/mapper/360002ac0000000000000009600021f6b f30b23ee-d99c-43a5-9794-fd22010e8d46 lvm2 a--  <12.00t 7.97t
  /dev/sda3                                     VolGroup01                           lvm2 a--  461.07g    0




This issue keeps repeating also in RHV automation on Netapp as reported in bug 1637405 and now we see it also with other storages.




vdsm-4.20.28-1.el7ev.x86_64
kernel - 3.10.0-862.11.6.el7.x86_64 
device-mapper-multipath-0.4.9-119.el7_5.1.x86_64
lvm2-2.02.177-4.el7.x86_64

(Originally by Elad Ben Aharon)

Comment 62 RHV bug bot 2020-03-08 09:48:41 UTC
I could reproduce this issue outside of oVirt using the attached reproducer script.

The script simulate extend flow in oVirt system using 2 hosts:
- one host run manger extend flow - this host runs commands like
  lvcreate, lvextend, and lvremove
- the other host run regular host refresh flow - this host runs only
  "lvchange --available y" once and many "lvchange --refresh" commands.
  we expect both to be readonly.

Here is results of example run:

1. Create test vg

# python extend.py create-vg test-vg /dev/mapper/{36001405560f9d98242841cea375c479e,36001405847ec541476043d4ac9260d41,360014055fc8e87631db462caf5970d32,360014052557433e46b14abd91cfaae1d}
2019-01-04 16:29:31,213 INFO    (MainThread) creating vg test-vg

2. Start manager flow on this host:

# time python extend.py run-manager test-vg /dev/mapper/{36001405560f9d98242841cea375c479e,36001405847ec541476043d4ac9260d41,360014055fc8e87631db462caf5970d32,360014052557433e46b14abd91cfaae1d} > extend-run-manager.log 2>&1

3. After few seconds, start regular flow on another host

# time python extend.py run-regular test-vg /dev/mapper/{36001405560f9d98242841cea375c479e,36001405847ec541476043d4ac9260d41,360014055fc8e87631db462caf5970d32,360014052557433e46b14abd91cfaae1d} > extend-run-regular.log 2>&1

The manager flow fail after 6 seconds with corrupted vg.


Here are errors from the manager node log:

Traceback (most recent call last):
  File "extend.py", line 310, in manager_worker
    lvm.extend_lv(lv_name, options.lv_size_mb)
  File "extend.py", line 157, in extend_lv
    full_name)
  File "extend.py", line 225, in run
    return run(cmd)
  File "extend.py", line 341, in run
    raise CommandError(cmd, p.returncode, out, err)
CommandError: Command ['lvm', 'lvextend', '--config', ' devices {  preferred_names = ["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 } ', '--autobackup', 'n', '--size', '+128m', 'test-vg/worker-003'] failed
rc=5
out=
err=  /dev/mapper/36001405560f9d98242841cea375c479e: Checksum error at offset 1994752
  Couldn't read volume group metadata from /dev/mapper/36001405560f9d98242841cea375c479e.
  Metadata location on /dev/mapper/36001405560f9d98242841cea375c479e at 1994752 has invalid summary for VG.
  Failed to read metadata summary from /dev/mapper/36001405560f9d98242841cea375c479e
  Failed to scan VG from /dev/mapper/36001405560f9d98242841cea375c479e
  Volume group "test-vg" not found
  Cannot process volume group test-vg
...
2019-01-04 16:30:45,661 ERROR   (worker-038) failed
Traceback (most recent call last):
  File "extend.py", line 310, in manager_worker
    lvm.extend_lv(lv_name, options.lv_size_mb)
  File "extend.py", line 157, in extend_lv
    full_name)
  File "extend.py", line 225, in run
    return run(cmd)
  File "extend.py", line 341, in run
    raise CommandError(cmd, p.returncode, out, err)
CommandError: Command ['lvm', 'lvextend', '--config', ' devices {  preferred_names = ["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_
count=3  filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d
32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  reta
in_min=50  retain_days=0 } ', '--autobackup', 'n', '--size', '+128m', 'test-vg/worker-038'] failed
rc=5
out=
err=  Scan of VG test-vg from /dev/mapper/36001405560f9d98242841cea375c479e found metadata seqno 143 vs previous 142.
  /dev/mapper/36001405560f9d98242841cea375c479e: Checksum error at offset 2432000
  Couldn't read volume group metadata from /dev/mapper/36001405560f9d98242841cea375c479e.
  Metadata location on /dev/mapper/36001405560f9d98242841cea375c479e at 2432000 has invalid summary for VG.
  Failed to read metadata summary from /dev/mapper/36001405560f9d98242841cea375c479e
  Failed to scan VG from /dev/mapper/36001405560f9d98242841cea375c479e
  VG info not found after rescan of test-vg
  Volume group "test-vg" not found
  Cannot process volume group test-vg


Here are some errors from the regular node run:


2019-01-04 16:30:45,691 ERROR   (worker-034) failed to refresh worker-034
Traceback (most recent call last):
  File "extend.py", line 276, in regular_worker
    lvm.refresh_lv(lv_name)
  File "extend.py", line 162, in refresh_lv
    return self.run("lvchange", "--refresh", full_name)
  File "extend.py", line 225, in run
    return run(cmd)
  File "extend.py", line 341, in run
    raise CommandError(cmd, p.returncode, out, err)
CommandError: Command ['lvm', 'lvchange', '--config', ' devices {  preferred_names = ["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 } ', '--refresh', 'test-vg/worker-034'] failed
rc=5
out=
err=  /dev/mapper/36001405560f9d98242841cea375c479e: Checksum error at offset 2432000
  Couldn't read volume group metadata from /dev/mapper/36001405560f9d98242841cea375c479e.
  Metadata location on /dev/mapper/36001405560f9d98242841cea375c479e at 2432000 has invalid summary for VG.
  Failed to read metadata summary from /dev/mapper/36001405560f9d98242841cea375c479e
  Failed to scan VG from /dev/mapper/36001405560f9d98242841cea375c479e
  VG info not found after rescan of test-vg
  Volume group "test-vg" not found
  Cannot process volume group test-vg
...
2019-01-04 16:30:45,729 ERROR   (worker-015) failed
Traceback (most recent call last):
  File "extend.py", line 272, in regular_worker
    lvm.activate_lv(lv_name)
  File "extend.py", line 171, in activate_lv
    return self.run("lvchange", "--available", "y", full_name)
  File "extend.py", line 225, in run
    return run(cmd)
  File "extend.py", line 341, in run
    raise CommandError(cmd, p.returncode, out, err)
CommandError: Command ['lvm', 'lvchange', '--config', ' devices {  preferred_names = ["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 } ', '--available', 'y', 'test-vg/worker-015'] failed
rc=5
out=
err=  Scan of VG test-vg from /dev/mapper/36001405560f9d98242841cea375c479e found metadata seqno 88 vs previous 87.
  /dev/mapper/36001405560f9d98242841cea375c479e: Checksum error at offset 2432000
  Couldn't read volume group metadata from /dev/mapper/36001405560f9d98242841cea375c479e.
  Metadata location on /dev/mapper/36001405560f9d98242841cea375c479e at 2432000 has invalid summary for VG.
  Failed to read metadata summary from /dev/mapper/36001405560f9d98242841cea375c479e
  Failed to scan VG from /dev/mapper/36001405560f9d98242841cea375c479e
  VG info not found after rescan of test-vg
  Recovery of volume group "test-vg" failed.
  Cannot process volume group test-vg


My theory is that "lvchange --refresh" command is seeing inconsistent metadata
on storage since storage is being modified by the manager host very quickly, and
try to "recover" the metadata:

    Recovery of volume group "test-vg" failed.

This corrupt the vg metadata since only the manager host may change the vg metadata.

Tested on CentOS 7.6:

# rpm -qa | grep lvm2
lvm2-2.02.180-10.el7_6.2.x86_64
udisks2-lvm2-2.7.3-8.el7.x86_64
lvm2-libs-2.02.180-10.el7_6.2.x86_64


David, do we have a way to ensure that lvchange is readonly like "lvs --readonly"?

Or maybe we need to use the (deprecated) global:locking_type=4 option for all
commands on regular nodes?

(Originally by Nir Soffer)

Comment 63 RHV bug bot 2020-03-08 09:48:45 UTC
Created attachment 1518421 [details]
reproducer script

(Originally by Nir Soffer)

Comment 64 RHV bug bot 2020-03-08 09:48:49 UTC
Created attachment 1518422 [details]
log from reproducer script running manager flow

(Originally by Nir Soffer)

Comment 65 RHV bug bot 2020-03-08 09:48:53 UTC
Created attachment 1518423 [details]
log from reproducer script running regular node flow

(Originally by Nir Soffer)

Comment 66 RHV bug bot 2020-03-08 09:48:57 UTC
In RHEL7 locking_type=4 is still the way to do this (it's deprecated in RHEL8).

Reading a VG has this hidden and dangerous mis-feature of writing the VG if it happens to see something wrong with it.  I've been meaning to fix this for a long time, but there was a lot of prepatory work required.  It happens that over the past couple weeks I finally began working on this and am nearly finished.  Reading a VG will no longer write it and will just use a good copy of the metadata.  An explicit metadata repair command will be required to fix corrupt or stale metadata on a PV.

(Originally by David Teigland)

Comment 67 RHV bug bot 2020-03-08 09:49:01 UTC
Created attachment 1518426 [details]
reproducer script with locking_type 4 change

The same reproducer script using locking_type=1 for manager flows and locking_type=4 for regular node flows.

(Originally by Nir Soffer)

Comment 68 RHV bug bot 2020-03-08 09:49:07 UTC
Created attachment 1518427 [details]
log from reproducer script  with locking_type 4 running manager flow

(Originally by Nir Soffer)

Comment 69 RHV bug bot 2020-03-08 09:49:11 UTC
Created attachment 1518429 [details]
log from reproducer script  with locking_type 4 running regular node flow

This log from regular node show about 36% error rate refreshing lv while the lv
is manipulated on the manager node.

$ xzgrep -E 'INFO.+refreshing' extend-run-regular.log.xz | wc -l
54822

$ xzgrep -E 'ERROR.+failed to refresh' extend-run-regular.log.xz | wc -l
19785

(Originally by Nir Soffer)

Comment 70 RHV bug bot 2020-03-08 09:49:18 UTC
You can also try setting --config global/metadata_read_only=1 in the refresh command.

(Originally by David Teigland)

Comment 71 RHV bug bot 2020-03-08 09:49:21 UTC
(In reply to David Teigland from comment #66)
Thanks David, I wonder how this issue did not pop up earlier.

I added locking_type=4 to the new reproducer script (attachment 1518426 [details]).

With this the script completed successfully creating, extending and
deleting 500 lvs, with 50 concurrent threads on the manager node, and 50
concurrent refresh threads on the regular node. You can see the log in
attachment 1518427 [details].

However on the regular node, about 36% of the "lvchange --available y" operations
failed as can be seen in attachment 1518429 [details].

Most of the errors were in "lvchange --refresh":

$ xzgrep CommandError extend-run-regular.log.xz | grep '\--refresh' | wc -l
19785
$ xzgrep CommandError extend-run-regular.log.xz | grep '\--available' | wc -l
11


Here are some example failures:

CommandError: Command ['lvm', 'lvchange', '--config', ' devices {  preferred_names = ["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global {  locking_type=4  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 } ', '--refresh', 'test-vg/worker-006'] failed
rc=5
out=
err=  Scan of VG test-vg from /dev/mapper/36001405560f9d98242841cea375c479e found metadata seqno 109 vs previous 108.
  Read-only locking type set. Write locks are prohibited.
  Recovery of volume group "test-vg" failed.
  Cannot process volume group test-vg

CommandError: Command ['lvm', 'lvchange', '--config', ' devices {  preferred_names = ["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global {  locking_type=4  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 } ', '--available', 'y', 'test-vg/worker-001'] failed
rc=5
out=
err=  Scan of VG test-vg from /dev/mapper/36001405560f9d98242841cea375c479e found metadata seqno 99 vs previous 98.
  Read-only locking type set. Write locks are prohibited.
  Recovery of volume group "test-vg" failed.
  Cannot process volume group test-vg


If we apply locking_type=4 to regular nodes in vdsm, we will have a new issue
with these refresh failures. This may fail randomly in many flows with 
unpredictable results. Way better than corrupting vg metadata, but still an
issue.

It looks like "lvchange --refresh" is not trying hard enough to deal with 
concurent remote updates, doing multiple reads from storage and failing if storage
was changed between the reads. I would expect that when running with locking_type=4
it will retry to read from storage if storage was changed in the middle of
the operation, and fail only after reasonable amount of retries.

For now I think we can treat all readonly commands on regular nodes as temporary
failures, log warning, and retry the operation 2-3 times before failing the flow.

Based on the info in this bug, the corruption rate was once every few weeks, so
these error would probably be very rare, and the cost for retrying the operation
will be negligible.

David, can we configure lvm to be more robust regarding remote updates in RHEL 7?

Is there any advantage to using --config global/metadata_read_only=1?

(Originally by Nir Soffer)

Comment 72 RHV bug bot 2020-03-08 09:49:25 UTC
Before the scanning improvements in 7.6, the lvm would ignore the checksum errors during the scan and continue with reading the VG.  When reading the VG it would repeat reading the disks and may not see the same problem, or it would potentially try to repair the metadata.  Without locking_type=4 or readonly, that repair could end up corrupting the metadata.

The scanning/reading of metadata has needed an overhaul for a long time.  I completed the scanning rewrite, and now am doing the reading rewrite.  7.6 fell between the two, and this seems to be one effect of the incomplete transition.

The way vdsm uses lvm without locking between reads and writes between nodes, we naturally expect to see issues like this.  The new reading code will allow this to work because it will just take and use a good copy of the metadata when doing reporting or activation/refresh commands (and it won't ever try to repair something while reading it.)  Even when this work is done, if you are unlucky, all the copies of metadata seen by the refresh command could be partially written and fail with a checksum error, leaving the command with nothing to use.  We could look into adding an automatic retry for this case, but that's still no guarantee, it just reduces the chances.

For now I suspect that retrying the refresh command in vdsm is the only option.

(Originally by David Teigland)

Comment 73 RHV bug bot 2020-03-08 09:49:29 UTC
(In reply to David Teigland from comment #70)
> You can also try setting --config global/metadata_read_only=1 in the refresh
> command.

I tried metadata_read_only=1, and it is not allowed for lvchange --refresh:

CommandError: Command ['lvm', 'lvchange', '--config', ' devices {  preferred_names = ["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0  metadata_read_only=1 } backup {  retain_min=50  retain_days=0 } ', '--refresh', 'test-vg/worker-006'] failed
rc=5
out=
err=  Operation prohibited while global/metadata_read_only is set.
  Failed to suspend test-vg/worker-006.

So locking_type=4 seems to be the only way for RHEL 7.


(In reply to David Teigland from comment #72)
> The way vdsm uses lvm without locking between reads and writes between
> nodes, we naturally expect to see issues like this.  The new reading code
> will allow this to work because it will just take and use a good copy of the
> metadata when doing reporting or activation/refresh commands (and it won't
> ever try to repair something while reading it.)  Even when this work is
> done, if you are unlucky, all the copies of metadata seen by the refresh
> command could be partially written and fail with a checksum error, 

How can we see partly written metadata?

I thought that lvm is doing:
- take metadata lock
- write metadata content to the next area in the circular buffer
- if write was successful write new metadata header pointing to the new
  metadata content
- release metadata lock

With this flow partial metadata is no possible, and metadata changes are
atomic.

(Originally by Nir Soffer)

Comment 74 RHV bug bot 2020-03-08 09:49:34 UTC
Correct, but you are modifying and reading the metadata from different hosts, right?  In that case you do not have a lock to prevent reading at any point during the modification of the buffer or header.

(Originally by David Teigland)

Comment 75 RHV bug bot 2020-03-08 09:49:39 UTC
We need to check if limiting the number of concurrent lvm commands:
https://gerrit.ovirt.org/c/66932/

Can also lower the chance of this issue. This can be an easy change
that we can ship for 4.2.z. I don't think we can support locking_type=4
in 4.2.z.

(Originally by Nir Soffer)

Comment 76 RHV bug bot 2020-03-08 09:49:44 UTC
Moving to 4.2.9 given comment #75, when the 4.2.8 async version will be available we'll try to push the limiting concurrent commands patch there after it will be tested

(Originally by Tal Nisan)

Comment 77 RHV bug bot 2020-03-08 09:49:47 UTC
I tested limiting the number of concurrent lvm operations to 10, and it does
not prevent the corruption.

I created improved simulator for locking type 4, simulating better what the
real system does during extend flow.

The simulator has 2 modes:
- run-manger - simulates manager node flows
- run-regular - simulates regular node extend flow

The run-manager command starts a socket server, waiting for clients
commands.

The run-regular command starts multiple workers based on --concurrency
option. Every worker connect to the server and perform this flow:
    
    for every iteration:
        create LV
        activate LV
        repeat 20 times:
            extend LV
            refresh LV
            sleep --extend-delay seconds
        deactivate LV
        remove LV

The read-write operations (create LV, extend LV, remove LV) are 
performed on the manager node. The read-only operations (activate LV,
refresh LV, deactivate LV) are performed locally.

Read-only commands are performed using locking_type=4, ensuring that LVM 
does not attempt to recover the metadata if a command find the storage in
inconsistent state. The command will fail in this case.

To handle the expected failures of read-only commands, we use a retry
mechanism with exponential back-off. When a command fails, we wait and 
retry the command again; each time the command fail we double the wait
time to let the manager finish pending metadata updates.

I tested creating 50,000 LVs, and then extending and refreshing them 20
time, total of 1,000,000 successful operations at rate of 29.7
operations per second.

I also did some short runs creating 500 LVs, to check retry failure rate
with lower operation rates.

Here is a table with the results from all runs:

extend-delay  extends  retries  max-retry  total-time  extend-rate  retry-rate
------------------------------------------------------------------------------
           1  1000000   237250          8       33677        29.69      21.56%
           2    10000     1027          5         491        20.36       9.33%
           5    10000      221          2        1138         8.78       2.00%
          10    10000      100          2        2220         4.50       0.90%
          20    10000       35          1        4382         2.28       0.31%
          40    10000        8          1        8585         1.16       0.09%

This shows that locking type 4 works well even with extreme load, but we must
be prepared to handle failures of lvm commands running in read-only mode.

The next step is to support locking type 4 in vdsm:

- When host run as SPM, all lvm commands must run in read-write mode recovering
  vg metadata if needed.

- When host is not SPM, all lvm commands must run in read-only mode so they do
  not try to recover vg metadata.

- When running in read-only mode, retry failed commands. Testing show that 1-2
  retries is enough in normal load.

- When host changes read-only mode, wait for ongoing commands running in the old
  mode.

- Creating and removing storage domains can be done on any host. These flows
  should be able to override lvm read only mode if they do not run on the SPM.


Open issues:

- Before the SPM can start, it must be able to activate the special LVs. If
  there is an issue with the VG metadata, activating the special LV may fail
  since we must activate them in read-only mode.

- To acquire the SPM lease, we must be able to activate the ids and leases
  special volumes, but we may fail to activate them in read-only mode.

(Originally by Nir Soffer)

Comment 78 RHV bug bot 2020-03-08 09:49:53 UTC
Created attachment 1519091 [details]
locking type 4 simulator

(Originally by Nir Soffer)

Comment 79 RHV bug bot 2020-03-08 09:49:56 UTC
Created attachment 1519092 [details]
log for running 50 threads, 1000 lvs per thread, 1 second extend delay

(Originally by Nir Soffer)

Comment 80 RHV bug bot 2020-03-08 09:50:00 UTC
Created attachment 1519093 [details]
log for running 50 threads, 100 lvs per thread, 2 second extend delay

(Originally by Nir Soffer)

Comment 81 RHV bug bot 2020-03-08 09:50:03 UTC
Created attachment 1519094 [details]
log for running 50 threads, 100 lvs per thread, 5 second extend delay

(Originally by Nir Soffer)

Comment 82 RHV bug bot 2020-03-08 09:50:07 UTC
Created attachment 1519095 [details]
log for running 50 threads, 100 lvs per thread, 10 second extend delay

(Originally by Nir Soffer)

Comment 83 RHV bug bot 2020-03-08 09:50:10 UTC
Created attachment 1519096 [details]
log for running 50 threads, 100 lvs per thread, 20 second extend delay

(Originally by Nir Soffer)

Comment 84 RHV bug bot 2020-03-08 09:50:15 UTC
Created attachment 1519097 [details]
log for running 50 threads, 100 lvs per thread, 40 second extend delay

(Originally by Nir Soffer)

Comment 85 RHV bug bot 2020-03-08 09:50:20 UTC
Nir, does this mean support will also need to use locking_type 4 when issuing LVM commands that modify the metadata, and we should also use metadata_read_only when giving commands to just read data?

Maybe this is another reason to get those tools merged...

(Originally by Germano Veit Michel)

Comment 86 RHV bug bot 2020-03-08 09:50:23 UTC
(In reply to Germano Veit Michel from comment #85)
Yes, using raw lvm commands on a system without lvm filter is risky.
See also changes in vdsm sos plugin:
https://gerrit.ovirt.org/c/96774/
https://github.com/sosreport/sos/issues/1533

Maybe we need to have /usr/share/vdsm/lvm/lvm.conf with the proper setting
for accessing shared storage. This can be used by lvm commands like this:

    LVM_SYSTEM_DIR=/usr/share/vdsm/lvm lvs -o +tags

Or add "vdsm-tool lvm" wrapper that will do the right thing.

(Originally by Nir Soffer)

Comment 87 RHV bug bot 2020-03-08 09:50:29 UTC
Tal, Nir As 4.2.8 Final build is already arrived should we change the Target top 4.3?

(Originally by Daniel Gur)

Comment 88 RHV bug bot 2020-03-08 09:50:32 UTC
(In reply to Daniel Gur from comment #87)
> Tal, Nir As 4.2.8 Final build is already arrived should we change the Target
> top 4.3?

This is targeted to 4.2.8 async release

(Originally by Tal Nisan)

Comment 89 RHV bug bot 2020-03-08 09:50:36 UTC
Looking forward to a resolution of this issue; RE: WoRH CASE 02265434.

(Originally by vendors)

Comment 90 RHV bug bot 2020-03-08 09:50:39 UTC
(In reply to Nir Soffer from comment #86)
> (In reply to Germano Veit Michel from comment #85)
> Yes, using raw lvm commands on a system without lvm filter is risky.
> See also changes in vdsm sos plugin:
> https://gerrit.ovirt.org/c/96774/
> https://github.com/sosreport/sos/issues/1533
> 
> Maybe we need to have /usr/share/vdsm/lvm/lvm.conf with the proper setting
> for accessing shared storage. This can be used by lvm commands like this:
> 
>     LVM_SYSTEM_DIR=/usr/share/vdsm/lvm lvs -o +tags
> 
> Or add "vdsm-tool lvm" wrapper that will do the right thing.

Thanks Nir. Nice catch on the lvm2 plugin too.

Ideally I want all support to use the snapshot-tools, then no manual lvm command will be run and all is done via vdsm api.
But in the meantime, I'll start writing an article explaining these changes and how we should run LVM commands. We already use some options for filters and lvmetad on our "base" syntax.

(Originally by Germano Veit Michel)

Comment 91 RHV bug bot 2020-03-08 09:50:43 UTC
Targeting to 4.2.8-2 when it will be available

(Originally by Fred Rolland)

Comment 92 RHV bug bot 2020-03-08 09:50:46 UTC
Is this still to be fixed? If so, can you please re-target to 4.2.8-3 ?

(Originally by Sandro Bonazzola)

Comment 93 RHV bug bot 2020-03-08 09:50:51 UTC
(In reply to Sandro Bonazzola from comment #92)
> Is this still to be fixed? If so, can you please re-target to 4.2.8-3 ?

We are working on the master fix. When we have something for master we will
evaluate again if this can be delivered in 4.2.

(Originally by Nir Soffer)

Comment 95 RHV bug bot 2020-03-08 09:50:59 UTC
Can't, no 4.2.9 version yet, I suspect we might have to change the product to RHEV, Aileen?

(Originally by Tal Nisan)

Comment 98 RHV bug bot 2020-03-08 09:51:12 UTC
sync2jira

(Originally by Daniel Gur)

Comment 99 RHV bug bot 2020-03-08 09:51:16 UTC
sync2jira

(Originally by Daniel Gur)

Comment 100 RHV bug bot 2020-03-08 09:51:19 UTC
Please move to 4.4

(Originally by Mordechai Lehrer)

Comment 104 RHV bug bot 2020-03-08 09:51:35 UTC
pm_ack+

(Originally by Peter Lauterbach)

Comment 108 RHV bug bot 2020-03-08 09:51:51 UTC
(In reply to Allie DeVolder from comment #107)

Can we get vdsm logs from all hosts in the DC at the time of the error?

Or, it this is easier, output of:

    xzgrep WARN /var/log/vdsm/vdsm.log*

And the estimated time of the first error about bad checksum.

If the issue is indeed LVM corrupting its metadata during read only comamnd
such as "lvs" or "lvchange --refresh", we expect to warnings like:

    WARNING: Inconsistent metadata found for VG xxx-yyy-zzz - updating
    to use version 42

To see these warnings, you need to run vdsm 4.30.20.

The log was added in:

commit 0d6d17dbe3d926460b13f0e31f4c97d55572d09a
Author: Nir Soffer <nsoffer>
Date:   Wed May 29 00:26:11 2019 +0300

    lvm: Log LVM warnings for successful commands

(Originally by Nir Soffer)

Comment 109 RHV bug bot 2020-03-08 09:51:54 UTC
nijin, please check comment 108. Did we see these warnings in other cases?

(Originally by Nir Soffer)

Comment 138 RHV bug bot 2020-03-08 09:53:49 UTC
To fix this in RHV we need to use locking_type=4 in all LVM commands running on
non-SPM host. However pvs command cannot be used with locking_type=4 in some case.

We depend on bug 1809660. Once this fix is available in RHEL 7.7 we can require it
and use locking_type=4.

(Originally by Nir Soffer)

Comment 141 Daniel Gur 2020-03-08 10:04:46 UTC
So last weekend QE already received what should be the last Build of 4.3.9, So this bug probably will need an Async build, Additional Build or moved to 4.3.10

Comment 145 David Vaanunu 2020-03-22 12:32:50 UTC
Verify versions:
rhv-release-4.3.9-7-001
vdsm-4.30.43-1
lvm2-2.02.186-7.el7_8.1

Have hosts with RHEL7.7 & THEL7.8

I ran the tests on both RHEL versions in parallel
 * 2 hosts with RHEL7.7
 * 2 Hosts with RHEL7.8


Test list:
  * 10 workers, 100 iterations - Passed
  * 50 workers, 100 iterations - Failed ("Extreme test")
  * 50 workers, 100 iterations, delay 10 - Partial passed because have only one error(RHEL78), need Nir inputs


2020-03-21 21:19:01,239 ERROR   (worker-002) worker failed
Traceback (most recent call last):
  File "extend_new.py", line 480, in regular_worker
    lvm.refresh_lv(lv_name)
  File "extend_new.py", line 138, in refresh_lv
    return self.run("lvchange", "--refresh", full_name)
  File "extend_new.py", line 219, in run
    return run(cmd)
  File "extend_new.py", line 509, in run
    raise CommandError(cmd, p.returncode, out, err)
CommandError: Command ['lvchange', '--config', ' devices {  preferred_names = ["^/dev/mapper/"]  ignore_suspended_devices=1  write_cache_state=0  disable_after_error_count=3  filter=[ "a|^/dev/mapper/3600a098038304437415d4b6a59684962$|", "r|.*|" ] } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 } ', '--refresh', 'VG78/worker-002-000080'] failed
rc=5
out=
err=  /dev/mapper/3600a098038304437415d4b6a59684962: Checksum error at offset 7425536
  Couldn't read volume group metadata from /dev/mapper/3600a098038304437415d4b6a59684962.
  Metadata location on /dev/mapper/3600a098038304437415d4b6a59684962 at 7425536 has invalid summary for VG.
  Failed to read metadata summary from /dev/mapper/3600a098038304437415d4b6a59684962
  Failed to scan VG from /dev/mapper/3600a098038304437415d4b6a59684962
  Volume group "VG78" not found
  Cannot process volume group VG78



Log files + Summarize results:
https://drive.google.com/drive/u/0/folders/12BzuEpQVuHYCp1gCff_K8YJJ3DwyVpI0
(RHEL7_Results_100Iters - Include commands, warning & errors)

Comment 146 Nir Soffer 2020-03-22 13:11:34 UTC
(In reply to David Vaanunu from comment #145)
The reproducer script does not use RHV, so the results are not affected
by the fix in RHV.

The reproducer is expected to fail also on RHEL 7.8.

It would be nice to reproduce this with RHV, but never succeeded to do this,
so we don't have a good way to verify this change with RHV.

It would be useful to run our standard scale test with this version to make
sure this version does not introduce performance regressions.

Comment 147 David Vaanunu 2020-04-02 15:44:32 UTC
Verify versions:
rhv-release-4.3.9-7-001
vdsm-4.30.43-1

Running 50 users with below floe:
1. Create VM 
2. Create Snapshots (without memory)
3. Delete VM

Test duration 18Hrs
Problem not reproduced.



Also, tested on an older version:
RHV-4.3.6.-7 
vdsm-4.30.30

Running 50 users with below floe:
1. Create VM 
2. Create Snapshots (without memory)
3. Delete VM

Test duration 19:30Hrs
Problem not reproduced.



Link to summary doc:
https://docs.google.com/document/d/1sbMC54-1KfM-g-WF9AIIJv9vXUA0Ls3VIhHB7C4SpLM/edit?ts=5e85c8db#

Comment 149 errata-xmlrpc 2020-04-02 17:09:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1312