Bug 1553133
Description
oliver.albl
2018-03-08 11:54:11 UTC
Can we have the logs, please? Created attachment 1405862 [details]
engine and vdsm logs
Oliver, can you share the first 129 MiB of the bad PV? You can copy it like this: dd if=/dev/mapper/xxxyyy of=pv.md bs=1M count=129 iflag=direct conv=fsync And then compress the generated file (e.g xz pv.md). The output may be small enough to attach the bug, if not, please shared via dropbox or similar service. It will also help if you share the backups of this vg metadata from /etc/lvm/backup. The problem with the backup is you will have to search for the latest backup on all the hosts running as SPM in your setup. The last host used as SPM with this vg will have the most up to date backup info. The metadata of the pv should not contain any sensitive info. I already ran vgcfgrestore on the LUNs to be able to recover vms. Would the PV data still be relevant? (In reply to oliver.albl from comment #4) > I already ran vgcfgrestore on the LUNs to be able to recover vms. Would the > PV data still be relevant? The PV metadata contains all the changes made to the pv, including the date the host, and the lvm version for every change. If we are lucky, we will have complete history of this PV. If we are less lucky, some of the history will be deleted by the restore operation. I hope we will have enough info to understand why the PV disappeared. You can see example of the data here: https://bugzilla.redhat.com/attachment.cgi?id=1385115 and here you can see how to see all the entries: https://bugzilla.redhat.com/show_bug.cgi?id=1500395#c3 bug 1500395 is similar case, the conclusion was that another host running different lvm version accessed the LUN and and removed the vg/pv. Created attachment 1407244 [details]
dd dump of two luns
Thanks for the dump Oliver! In LUN204 we see this history only from Tue Mar 6 23:19:46 until Wed Mar 7 23:09:00, is this the date when you restored the metadata? $ grep -a '# Generated by LVM2 version' LUN204 | head # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar 6 23:19:46 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar 6 23:19:47 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar 6 23:19:48 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar 6 23:19:50 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar 6 23:19:52 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar 6 23:19:52 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar 6 23:19:54 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar 6 23:19:57 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar 6 23:19:59 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Mar 6 23:20:02 2018 $ grep -a '# Generated by LVM2 version' LUN204 | tail # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 23:07:56 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 23:08:06 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 23:08:43 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 23:08:45 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 23:08:46 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 23:08:48 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 23:08:57 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 23:08:58 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 23:08:59 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 23:09:00 2018 We also have some strange metadata lines: # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 00:52:45 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:28:12 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 00:52:48 2018 Maybe this is leftover from older metadata - do you remember something special about Tue Feb 27 09:28:12 2018? # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 01:50:32 2018 # Generated by LVM2 version 2.02.171(237e56b4-9f3a-436a-84b6-0a2b09133b54 { # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 01:50:33 2018 Looks like leftover form previous metadata version, not wiped when writing new metadata. # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 03:16:37 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon M237e56b4-9f3a-436a-84b6-0a2b09133b54 { # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Mar 7 03:16:38 2018 Another leftover? Oliver, can provide timeline - when the domains started to fail? what did you change in your deployment and when in this time frame? Nir, I have the following details for LUN 204: March 8th 2018 around 0.55am CET - Lost access to storage domain on LUN 204 March 8th 2018 around 6.40am CET - Tried to activate the storage domain March 8th 2018 around 1.00pm CET - Ran vgcfgrestore with creation_time Mon Mar 5 12:54:56 2018. We run automated tasks against this system, creating/running/dropping vms and snapshots. I do not see anything special at Tue Feb 27 09:28:12 2018. Looking at LUN 205: 1. Lot of history from Feb 12, 14, 27, 28, $ grep -a '# Generated by LVM2 version' LUN205 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:57 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:58 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:58 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:58 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:59 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:59 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:59 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Mon Feb 12 13:36:59 2018 ... # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:38 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:38 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:39 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:40 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:40 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:40 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:41 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:41 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 12:23:42 2018 ... # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:29:03 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:29:28 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:29:44 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:29:44 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:30:06 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:30:09 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:34:00 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:34:15 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Tue Feb 27 09:34:29 2018 ... # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:00:01 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:15 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:17 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:24 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:27 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:38 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:40 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:41 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:44 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 00:01:44 2018 2. Then history from Mar 8 - is this the point when you restored the vg? # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:37:30 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:53:37 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:53:41 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:53:51 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:53:54 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:54:30 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:54:34 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:54:41 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:54:45 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:57:02 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:57:11 2018 ... # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 15:44:56 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 15:44:56 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 15:47:27 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 15:47:27 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 15:47:33 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 15:47:40 2018 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Fri Mar 9 11:54:42 2018 3. Looking at hosts creating the changes $ egrep -a "creation_host = \".+\"\s+# Linux" LUN205 | sort | uniq -c 71 creation_host = "cloud5.sq.fabasoft.com" # Linux cloud5.sq.fabasoft.com 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 1771 creation_host = "fabavmhost002" # Linux fabavmhost002 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 62 creation_host = "fabavmhost003" # Linux fabavmhost003 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 1 creation_host = "fabavmhost008" # Linux fabavmhost008 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 1 creation_host = "fabavmhost015" # Linux fabavmhost015 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 1 creation_host = "fabavmhost019" # Linux fabavmhost019 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 1 creation_host = "fabavmhost021" # Linux fabavmhost021 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 1 creation_host = "fabavmhost030" # Linux fabavmhost030 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 1 creation_host = "sqdx11.sq.fabasoft.com" # Linux sqdx11.sq.fabasoft.com 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 Are these hosts part of the oVirt DC? did you run SPM on all of these hosts? Lets look at the changes of the hosts that did 1 change: # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 13:46:43 2018 creation_host = "sqdx11.sq.fabasoft.com" # Linux sqdx11.sq.fabasoft.com 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 creation_time = 1518612403 # Wed Feb 14 13:46:43 2018 seqno = 73 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 10:27:10 2018 creation_host = "fabavmhost021" # Linux fabavmhost021 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 creation_time = 1519810030 # Wed Feb 28 10:27:10 2018 seqno = 1483 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 10:38:34 2018 creation_host = "fabavmhost015" # Linux fabavmhost015 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 creation_time = 1519810714 # Wed Feb 28 10:38:34 2018 seqno = 1765 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 14 14:45:29 2018 creation_host = "fabavmhost008" # Linux fabavmhost008 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 creation_time = 1518615929 # Wed Feb 14 14:45:29 2018 seqno = 74 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 10:42:10 2018 creation_host = "fabavmhost019" # Linux fabavmhost019 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 creation_time = 1519810930 # Wed Feb 28 10:42:10 2018 seqno = 74 # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Thu Mar 8 13:37:30 2018 creation_host = "fabavmhost030" # Linux fabavmhost030 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 creation_time = 1520512650 # Thu Mar 8 13:37:30 2018 seqno = 75 Note: we have 2 entries with the same seqno! (seqno = 74) Looking at seqno numbers: $ grep -a 'seqno = ' LUN205 seqno = 1 seqno = 2 seqno = 3 seqno = 4 seqno = 5 seqno = 6 seqno = 7 seqno = 8 seqno = 9 seqno = 10 ... seqno = 1837 seqno = 1838 seqno = 1839 seqno = 1840 seqno = 1841 seqno = 1842 seqno = 1843 seqno = 1844 seqno = 1845 seqno = 1846 seqno = 1847 << note the switch here seqno = 74 seqno = 75 seqno = 76 seqno = 77 seqno = 78 seqno = 79 seqno = 80 seqno = 81 seqno = 82 seqno = 83 ... seqno = 124 seqno = 125 seqno = 126 seqno = 127 seqno = 128 seqno = 129 seqno = 130 seqno = 131 seqno = 132 seqno = 133 seqno = 134 seqno = 135 seqno = 136 After seqno = 1847 we started over. Lets look at this version: # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 10:42:10 2018 creation_host = "fabavmhost002" # Linux fabavmhost002 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 creation_time = 1519810930 # Wed Feb 28 10:42:10 2018 b83c159c-4ad6-4613-ba16-bab95ccd10c0 { id = "813oRe-64r8-mloU-k9G2-LFsS-dXSG-hpN4kf" seqno = 1847 format = "lvm2" status = ["RESIZEABLE", "READ", "WRITE"] flags = [] tags = ["MDT_LEASETIMESEC=60", "MDT_VERSION=4", "MDT_SDUUID=b83c159c-4ad6-4613-ba16-bab95ccd10c0", "MDT_LOGBLKSIZE=512", "MDT_DESCRIPTION=FABAVMHOST_LUN_205", "MDT_LEASERETRIES=3", "MDT_IOOPTIMEOUTSEC=10", "MDT_LOCKRENEWALINTERVALSEC=5", "MDT_ROLE=Regular", "MDT_PHYBLKSIZE=512", "MDT_PV0=pv:3600601603cc045005d83815aff3a307c&44&uuid:pyN2Sr-QOh0-8QSX-tShd-x3oI-dh9B-Ck59H3&44&pestart:0&44&pecount:65533&44&mapoffset:0", "MDT_CLASS=Data", "MDT_TYPE=FCP", "MDT_LOCKPOLICY=", "MDT_VGUUID=813oRe-64r8-mloU-k9G2-LFsS-dXSG-hpN4kf", "RHAT_storage_domain", "MDT__SHA_CKSUM=536309f4105827dc86058e897c68a613873223a5", "MDT_POOL_UUID=5849b030-626e-47cb-ad90-3ce782d831b3"] extent_size = 262144 max_lv = 0 max_pv = 0 metadata_copies = 0 physical_volumes { pv0 { id = "pyN2Sr-QOh0-8QSX-tShd-x3oI-dh9B-Ck59H3" device = "/dev/mapper/3600601603cc045005d83815aff3a307c" ... Next version is seqo = 74: # Generated by LVM2 version 2.02.171(2)-RHEL7 (2017-05-03): Wed Feb 28 10:42:10 2018 creation_host = "fabavmhost019" # Linux fabavmhost019 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 creation_time = 1519810930 # Wed Feb 28 10:42:10 2018 b83c159c-4ad6-4613-ba16-bab95ccd10c0 { id = "813oRe-64r8-mloU-k9G2-LFsS-dXSG-hpN4kf" seqno = 74 format = "lvm2" status = ["RESIZEABLE", "READ", "WRITE"] flags = [] tags = ["MDT_LEASETIMESEC=60", "MDT_VERSION=4", "MDT_SDUUID=b83c159c-4ad6-4613-ba16-bab95ccd10c0", "MDT_LOGBLKSIZE=512", "MDT_DESCRIPTION=FABAVMHOST_LUN_205", "MDT_LEASERETRIES=3", "MDT_IOOPTIMEOUTSEC=10", "MDT_LOCKRENEWALINTERVALSEC=5", "MDT_ROLE=Regular", "MDT_PHYBLKSIZE=512", "MDT_PV0=pv:3600601603cc045005d83815aff3a307c&44&uuid:pyN2Sr-QOh0-8QSX-tShd-x3oI-dh9B-Ck59H3&44&pestart:0&44&pecount:65533&44&mapoffset:0", "MDT_CLASS=Data", "MDT_TYPE=FCP", "MDT_LOCKPOLICY=", "MDT_VGUUID=813oRe-64r8-mloU-k9G2-LFsS-dXSG-hpN4kf", "RHAT_storage_domain", "MDT__SHA_CKSUM=536309f4105827dc86058e897c68a613873223a5", "MDT_POOL_UUID=5849b030-626e-47cb-ad90-3ce782d831b3"] extent_size = 262144 max_lv = 0 max_pv = 0 metadata_copies = 0 physical_volumes { pv0 { id = "pyN2Sr-QOh0-8QSX-tShd-x3oI-dh9B-Ck59H3" device = "/dev/mapper/3600601603cc045005d83815aff3a307c" ... Note that both metadata changes happened in the *same* second (1519810930 # Wed Feb 28 10:42:10 2018) from these hosts: - fabavmhost002 - fabavmhost019 Only one of them could be the SPM at this time. Can you extract messages from engine log about the SPM? (Storage Pool Manager runs on Host xxx.yyy (Address: xxx.yyy), Data Center v42.) Also we need from both hosts - /var/log/vdsm/vdsm.log - /var/log/messages - /var/log/sanlock.log Around that time - Wed Feb 28 10:42:10 2018. For LUN 205: February 02nd 2018 around 02.40pm CET - Imported LUN 205 that was detached from another oVirt installation. February 28th 2018 around 10.40am CET - Lost access to storage domain on LUN 205. In this case we also saw messages regaring the storage system. We did not see those messages when we lost access to storage domain on LUN 204. March 3rd 2018 around 05.00pm CET - Managed to switch the storage domain from inactive to maintenance mode March 8th 2018 around 01.30pm CET - Ran vgcfgrestore with creation_time Mon Feb 26 14:55:43 2018 All hosts named FABAVMHOST* are part of one oVirt DC. All hosts named *.sq.fabasoft.com are part of another oVirt DC (we detached the storage domain and imported it). Here are the SPM events: # select log_time,severity,message from audit_log where message like '%Storage Pool%Data Center Production%' order by 1; log_time | severity | message ----------------------------+----------+------------------------------------------------------------------------------------------------------------------------------------------- 2018-02-12 08:23:46.455+01 | 0 | Storage Pool Manager runs on Host fabavmhost013.fabagl.fabasoft.com (Address: fabavmhost013.fabagl.fabasoft.com), Data Center Production. 2018-02-14 14:44:56.353+01 | 0 | Storage Pool Manager runs on Host fabavmhost008.fabagl.fabasoft.com (Address: fabavmhost008.fabagl.fabasoft.com), Data Center Production. 2018-02-15 17:16:08.397+01 | 0 | Storage Pool Manager runs on Host fabavmhost008.fabagl.fabasoft.com (Address: fabavmhost008.fabagl.fabasoft.com), Data Center Production. 2018-02-15 17:45:22.47+01 | 0 | Storage Pool Manager runs on Host fabavmhost008.fabagl.fabasoft.com (Address: fabavmhost008.fabagl.fabasoft.com), Data Center Production. 2018-02-15 17:49:43.772+01 | 0 | Storage Pool Manager runs on Host fabavmhost008.fabagl.fabasoft.com (Address: fabavmhost008.fabagl.fabasoft.com), Data Center Production. 2018-02-16 11:54:34.9+01 | 0 | Storage Pool Manager runs on Host fabavmhost002.fabagl.fabasoft.com (Address: fabavmhost002.fabagl.fabasoft.com), Data Center Production. 2018-02-28 10:42:52.152+01 | 0 | Storage Pool Manager runs on Host fabavmhost002.fabagl.fabasoft.com (Address: fabavmhost002.fabagl.fabasoft.com), Data Center Production. 2018-02-28 10:43:02.328+01 | 0 | Storage Pool Manager runs on Host fabavmhost002.fabagl.fabasoft.com (Address: fabavmhost002.fabagl.fabasoft.com), Data Center Production. 2018-02-28 10:43:09.621+01 | 0 | Storage Pool Manager runs on Host fabavmhost002.fabagl.fabasoft.com (Address: fabavmhost002.fabagl.fabasoft.com), Data Center Production. 2018-02-28 10:43:48.412+01 | 0 | Storage Pool Manager runs on Host fabavmhost020.fabagl.fabasoft.com (Address: fabavmhost020.fabagl.fabasoft.com), Data Center Production. 2018-02-28 10:44:15.516+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-02-28 10:44:34.17+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-02-28 10:44:42.878+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-02-28 11:23:58.188+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-02-28 11:25:48.534+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-01 17:31:40.922+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-01 21:40:47.256+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-02 15:02:12.661+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-02 16:54:28.204+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-06 15:21:46.471+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-06 16:40:48.419+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-06 16:51:38.168+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:54:07.208+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:54:19.464+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:54:25.753+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:54:31.151+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:54:47.339+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:54:56.896+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:55:06.913+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:55:15.058+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:55:20.855+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:55:28.16+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:55:34.43+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:55:49.68+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:55:55.388+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:56:01.199+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:56:12.104+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:56:18.472+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:56:30.562+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:56:40.967+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:56:53.223+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:57:00.707+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:57:17.991+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:57:29.793+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 00:58:09.52+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-08 06:40:13.027+01 | 0 | Storage Pool Manager runs on Host fabavmhost003.fabagl.fabasoft.com (Address: fabavmhost003.fabagl.fabasoft.com), Data Center Production. 2018-03-10 11:06:02.589+01 | 0 | Storage Pool Manager runs on Host fabavmhost002.fabagl.fabasoft.com (Address: fabavmhost002.fabagl.fabasoft.com), Data Center Production. I attached the logs from fabavmhost002 and fabavmhost019 (vdsm logs from fabavmhost019 are not available any more). Created attachment 1407476 [details]
Logfiles fabavmhost002 and fabavmhost019
Created attachment 1423463 [details] qe-logs (In reply to Yaniv Kaul from comment #12) > Yaniv, I'm moving this to 4.2.4 - we are not going to be able to have it for > 4.2.3. > Elad - was this reproduced by QE? Storage or Scale QE? Might be. Examined our latest tier3 automation execution. I'm observing the following, not sure it's related. It might be related to an issue with a LUN that has been unmasked from the hosts while it was part of a storage connection for an iSCSI domain in the setup. Nir, please take a look. engine.log: 2018-04-14 19:24:09,971+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-29518) [7b4b5cfb] EVENT_ID: SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), Storage Domain sd_TestCase10147_1418564266 (Data Center golden_env_mixed) was deactivated by system because it's not visible by any of the hosts. From around the same time, the following appears in vdsm: 2018-04-14 19:22:44,628+0300 ERROR (hsm/init) [storage.Multipath] Could not resize device 3600a09803830447a4f244c4657595264 (multipath:98) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/multipath.py", line 96, in resize_devices _resize_if_needed(guid) File "/usr/lib/python2.7/site-packages/vdsm/storage/multipath.py", line 120, in _resize_if_needed supervdsm.getProxy().resizeMap(name) File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in __call__ return callMethod() File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in <lambda> **kwargs) File "<string>", line 2, in resizeMap File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result) Error: Resizing map 'dm-147' failed: out='fail\n' err='' 2018-04-14 19:22:44,629+0300 DEBUG (hsm/init) [storage.SamplingMethod] Returning last result (misc:395) (In reply to Elad from comment #13) > Created attachment 1423463 [details] > qe-logs > > (In reply to Yaniv Kaul from comment #12) > > Yaniv, I'm moving this to 4.2.4 - we are not going to be able to have it for > > 4.2.3. > > Elad - was this reproduced by QE? Storage or Scale QE? > > Might be. Examined our latest tier3 automation execution. I'm observing the > following, not sure it's related. It might be related to an issue with a LUN > that has been unmasked from the hosts while it was part of a storage > connection for an iSCSI domain in the setup. Nir, please take a look. > > engine.log: > > 2018-04-14 19:24:09,971+03 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engine-Thread-29518) [7b4b5cfb] EVENT_ID: > SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), > Storage Domain sd_TestCase10147_1418564266 (Data Center golden_env_mixed) > was deactivated by system because it's not visible by any of the hosts. I don't think this is related, if you changed a lun configuration and it became invisible on all hosts, it does not mean that it was corrupted. > From around the same time, the following appears in vdsm: > > Error: Resizing map 'dm-147' failed: out='fail\n' err='' > 2018-04-14 19:22:44,629+0300 DEBUG (hsm/init) [storage.SamplingMethod] > Returning last result (misc:395) This is like to happen when a lun is not available. Elad, I think we need to write a specific script for this test, creating many thin clones from template. Do we have such test? We reproduced it if a storage domain which is configured properly becomes deactivated after some time, and we cannot activate it again. Did you try to activate the storage domain after fixing the lun zoning? (In reply to Nir Soffer from comment #14) > (In reply to Elad from comment #13) > > Created attachment 1423463 [details] > > qe-logs > > > > (In reply to Yaniv Kaul from comment #12) > > > Yaniv, I'm moving this to 4.2.4 - we are not going to be able to have it for > > > 4.2.3. > > > Elad - was this reproduced by QE? Storage or Scale QE? > > > > Might be. Examined our latest tier3 automation execution. I'm observing the > > following, not sure it's related. It might be related to an issue with a LUN > > that has been unmasked from the hosts while it was part of a storage > > connection for an iSCSI domain in the setup. Nir, please take a look. > > > > engine.log: > > > > 2018-04-14 19:24:09,971+03 WARN > > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > > (EE-ManagedThreadFactory-engine-Thread-29518) [7b4b5cfb] EVENT_ID: > > SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), > > Storage Domain sd_TestCase10147_1418564266 (Data Center golden_env_mixed) > > was deactivated by system because it's not visible by any of the hosts. > > I don't think this is related, if you changed a lun configuration and it > became > invisible on all hosts, it does not mean that it was corrupted. > > > From around the same time, the following appears in vdsm: > > > > Error: Resizing map 'dm-147' failed: out='fail\n' err='' > > 2018-04-14 19:22:44,629+0300 DEBUG (hsm/init) [storage.SamplingMethod] > > Returning last result (misc:395) > > This is like to happen when a lun is not available. > > Elad, I think we need to write a specific script for this test, creating many > thin clones from template. Do we have such test? We can create one Natalie, please create one, python sdk is an option. > We reproduced it if a storage domain which is configured properly becomes > deactivated after some time, and we cannot activate it again. > > Did you try to activate the storage domain after fixing the lun zoning? Yes (In reply to Elad from comment #15) > (In reply to Nir Soffer from comment #14) > > (In reply to Elad from comment #13) > > > Created attachment 1423463 [details] > > > qe-logs > > > > > > (In reply to Yaniv Kaul from comment #12) > > > > Yaniv, I'm moving this to 4.2.4 - we are not going to be able to have it for > > > > 4.2.3. > > > > Elad - was this reproduced by QE? Storage or Scale QE? > > > > > > Might be. Examined our latest tier3 automation execution. I'm observing the > > > following, not sure it's related. It might be related to an issue with a LUN > > > that has been unmasked from the hosts while it was part of a storage > > > connection for an iSCSI domain in the setup. Nir, please take a look. > > > > > > engine.log: > > > > > > 2018-04-14 19:24:09,971+03 WARN > > > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > > > (EE-ManagedThreadFactory-engine-Thread-29518) [7b4b5cfb] EVENT_ID: > > > SYSTEM_DEACTIVATED_STORAGE_DOMAIN(970), > > > Storage Domain sd_TestCase10147_1418564266 (Data Center golden_env_mixed) > > > was deactivated by system because it's not visible by any of the hosts. > > > > I don't think this is related, if you changed a lun configuration and it > > became > > invisible on all hosts, it does not mean that it was corrupted. > > > > > From around the same time, the following appears in vdsm: > > > > > > Error: Resizing map 'dm-147' failed: out='fail\n' err='' > > > 2018-04-14 19:22:44,629+0300 DEBUG (hsm/init) [storage.SamplingMethod] > > > Returning last result (misc:395) > > > > This is like to happen when a lun is not available. > > > > Elad, I think we need to write a specific script for this test, creating many > > thin clones from template. Do we have such test? Nir, isn't it the same as creating a VM pool? > We can create one > Natalie, please create one, python sdk is an option. > > > We reproduced it if a storage domain which is configured properly becomes > > deactivated after some time, and we cannot activate it again. > > > > Did you try to activate the storage domain after fixing the lun zoning? > Yes (In reply to Natalie Gavrielov from comment #16) > > > Elad, I think we need to write a specific script for this test, creating many > > > thin clones from template. Do we have such test? > > Nir, isn't it the same as creating a VM pool? Maybe, we need to ask the reporter what is the flow that reproduces this issue. Oliver, can you explain what is "creating many thin clones from template"? Are you using the ovirt SDK? can you show an example code you use to create these clones, so we can reproduce this in the lab, and include this flow in our automated tests? (In reply to Nir Soffer from comment #17) > (In reply to Natalie Gavrielov from comment #16) > > > > Elad, I think we need to write a specific script for this test, creating many > > > > thin clones from template. Do we have such test? > > > > Nir, isn't it the same as creating a VM pool? > > Maybe, we need to ask the reporter what is the flow that reproduces this > issue. > > Oliver, can you explain what is "creating many thin clones from template"? > > Are you using the ovirt SDK? can you show an example code you use to create > these clones, so we can reproduce this in the lab, and include this flow in > our > automated tests? Hi, we are using the ovirt vagrant provider https://www.ovirt.org/develop/projects/proposals/vagrant-provider which uses the Ruby SDK. But I think it should be the same just using the SDK with the following flow: - Create thin clones from Template - Use the clones (-> extends the disk) - Remove the clones (-> discard) Please see Christian's comment 18 (In reply to christian.grundmann from comment #18) > (In reply to Nir Soffer from comment #17) > > (In reply to Natalie Gavrielov from comment #16) > > > > > Elad, I think we need to write a specific script for this test, creating many > > > > > thin clones from template. Do we have such test? > > > > > > Nir, isn't it the same as creating a VM pool? > > > > Maybe, we need to ask the reporter what is the flow that reproduces this > > issue. > > > > Oliver, can you explain what is "creating many thin clones from template"? > > > > Are you using the ovirt SDK? can you show an example code you use to create > > these clones, so we can reproduce this in the lab, and include this flow in > > our > > automated tests? > > > Hi, > we are using the ovirt vagrant provider > https://www.ovirt.org/develop/projects/proposals/vagrant-provider > which uses the Ruby SDK. > But I think it should be the same just using the SDK with the following flow: > - Create thin clones from Template > - Use the clones (-> extends the disk) > - Remove the clones (-> discard) Thanks for the above a few more questions: Whats the rate that these actions were performed - was there any concurrency involved? What is the distribution of the templates/VMs to SDs? Were all disks extended at same time? if not what level of concurrency was involved in the disks extension? Were all VMs removed at same time? if not what level of concurrency was invoked in the VM removal? Yes there is concurrency in each step. We create up to 50 VMs with a few seconds delay The VMs are created on the same SD as the templates. The disk will get extended at the same time because the use cases in every VM is the same. The removal has the same concurrency as the creation. I recreated the described above scenario with a help of Jmeter on my Environment. I monitored engine log and vdsm logs per each host (3 in my env). No ERRORS were found in neither of logs. No failures were seen from the UI. What my Jmeter script did (all tasks were executed concurrently as 50 users) : Create 50 VMs from 1 HDD Thin template with RampUp 120 Sec concurrently Start those VMs Validate they are up Power them Down Validate they are down Extend Disk per each VM from 10G to 11G Remove Those VMs logs collected by ovirt-logs-collector can be found in private message below. (In reply to Ilan Zuckerman from comment #23) > I recreated the described above scenario with a help of Jmeter on my > Environment. > > I monitored engine log and vdsm logs per each host (3 in my env). > > No ERRORS were found in neither of logs. > No failures were seen from the UI. > > What my Jmeter script did (all tasks were executed concurrently as 50 users) > : > > Create 50 VMs from 1 HDD Thin template with RampUp 120 Sec concurrently > Start those VMs > Validate they are up > Power them Down > Validate they are down > Extend Disk per each VM from 10G to 11G > Remove Those VMs I'm not sure what you did. Can you attach a python script using oVirt SDK that we can run on any setup, and understand what call were made and how they were issued? Also, what do you mean by concurrency of 50 users? Please show the command line used to get the concurrent calls. Another important detail - on the reporter system wipe-after-delete and discard-after-delete were enabled. To test this we need storage supporting discard, and we need to choose these options when creating the disks. Finally, what was the storage used in this setup? > (In reply to Nir Soffer from comment #17) > we are using the ovirt vagrant provider > https://www.ovirt.org/develop/projects/proposals/vagrant-provider > which uses the Ruby SDK. > But I think it should be the same just using the SDK with the following flow: > - Create thin clones from Template > - Use the clones (-> extends the disk) What do you mean by use the clones? What is the workload running on the vm? How much data is written to the vm disk? which rate? How much time the vim is running before it is shut down and removed? > - Remove the clones (-> discard) Do you use both discard-after-delete and wipe-after-delete? Can you share the script, so we can make sure what we do in the lab matches what you did on your system? (In reply to Ilan Zuckerman from comment #23) > Create 50 VMs from 1 HDD Thin template with RampUp 120 Sec concurrently > Start those VMs > Validate they are up > Power them Down > Validate they are down > Extend Disk per each VM from 10G to 11G This does not do any LVM extension. It is not clear what Christian means by "Use the clones (-> extend the disk)" (see comment 27). I assume that what we need to do is to run code on the vm that will write data to the disk, causing the disk to be extended. Lets wait for more info from Christian on this. > Remove Those VMs Since all extensions happen on the SPM host, maybe we can simulate this by simply running concurrent lvextend commands on a single host, and in the same time concurrent lvchange --refersh on other hosts, and we don't need to script engine. If we find an issue in LVM, we will need such script that reproduce the issue from the shell without dependency on oVirt, so LVM developer can reproduce this easily. (In reply to Nir Soffer from comment #27) > > (In reply to Nir Soffer from comment #17) > > we are using the ovirt vagrant provider > > https://www.ovirt.org/develop/projects/proposals/vagrant-provider > > which uses the Ruby SDK. > > But I think it should be the same just using the SDK with the following flow: > > - Create thin clones from Template > > - Use the clones (-> extends the disk) > > What do you mean by use the clones? Running Software and Tests inside the VMs > What is the workload running on the vm? High CPU Usage (8 Cores) High Memory Usage(30GB) Mixed Windows and Linux Clients and Servers (CentOS, Ubuntu, Server 2012R2, Server 2016, Windows 10) > How much data is written to the vm disk? which rate? Around 10-15 GB in the first 5-10 minutes after VM is created/started > How much time the vm is running before it is shut down and removed? Around 30-60 minutes > Do you use both discard-after-delete and wipe-after-delete? We only use discard-after-delete > Can you share the script, so we can make sure what we do in the lab matches > what > you did on your system? No, not really our Scripts/vagrant Boxes are very specifiy to our Uses Cases. testing jira sync, please ignore Based on comment 29, I think we need to reproduce this again like this. The test should: - create lot of vms in the same time (stress the SPM) - perform lot of automatic disk extension in the same time (stress the SPM) - keep the host cpu busy (this may effect lvm or SPM) - perform many disk discards in the same time (I suspect this is related to the issue) ## Setup - Create storage domain on a FC storage supporting discard, and enable discard on this storage domain. - Create vm with 30g raw disk - Make sure the vm has at least 15G free in the root file system. If not, resize the root file system as needed. - Create template from this vm. ## Basic flow for every vm 1. Create vm based on the template. Make sure the vm disk is marked for discard-after-delete 2. Start the vm 3. Wait until the vm is up 4. Perform io in the vm, to cause extension of the disk Connect to the vm using ssh, and run: dd if=/dev/urandom of=/var/tmp/data bs=1M count=15360 oflag=direct conv=fsync We can run additional stuff in the vm during the test to simulate what the reporter is doing, but I don't think this is related. 5. Wait until the dd command is finished. The vm disk will be extended by 1G 15 times while dd is running. It is possible that the vm will pause during the test, but the system should resume it. 6. Shut down the vm 7. delete the vm Make sure the vm disk is discarded when the vm is deleted. You can see messages about discarding the volume in vdsm log, and using "ps" command you will see that vdsm is running "blkdiscard /dev/vgname/lvname" for this volume. I would first test that running this flow wit one vm in a loop for a hour or so works as expected. ## Complete flow 1. Start basic flow in 50 threads, with 10 seconds delay between threads. 2. When a vm is deleted, repeat the basic flow for every thread 3. Run this for a day or so. 4. collect logs from all hosts and engine for inspection ## Version to test We know that discarding disks before 4.2.6 could cause delays in other processes, this was reported by the same reporter of this bug. - Reproduce with the version reported in this bug (oVirt 4.2.1) or 4.2.3 if you want to test with RHV. - Test with 4.2.6 - we improved the way volume are discarded in this version. It would be best if the test will written using oVirt python SDK, so we can inspect the test scripts before we run the test. Elad, this looks like a work for scale team, or at least using one of them servers. Moving to scale as per comment #31 Daniel, this is test only bug. If we can reproduce this we will make further investigation. If we cannot reproduce this, we will close it. See comment 31, explaning how it should be tested. We are working on reproducing the bug. Please advise the following questions : 1. At the reproduction it say's "Create vm with 30g raw disk", as for http://www.ovirt.org/documentation/admin-guide/chap-Virtual_Machine_Disks/ we can onlly use QCOW2 with Sparse for the extends, please advise if we can use QCOW2 with Sparse for the bug reproduction. 2. At the reproduction it say's to extend the root file system - is it O.k to extend a disk other then root file system or it's a must to extend the disk the root FS resides on ? 3. FC storage is a must or can we use iscsi SD ? 4. Please advise it will be test only on 4.2.6 since we do not have 4.2.3 currently. (In reply to guy chen from comment #35) > We are working on reproducing the bug. > Please advise the following questions : > > 1. At the reproduction it say's "Create vm with 30g raw disk", as for > http://www.ovirt.org/documentation/admin-guide/chap-Virtual_Machine_Disks/ > we can onlly use QCOW2 with Sparse for the extends, please advise if we can > use QCOW2 with Sparse for the bug reproduction. Read comment 31 again. "Create vm with 30g raw disk" refer to the template. You should create a vm based on this template. It will have qcow2 disk with the template as the base image. > 2. At the reproduction it say's to extend the root file system - is it O.k > to extend a disk other then root file system or it's a must to extend the > disk the root FS resides on ? I don't know what is "At the reproduction". Please read again comment 31. If the instructions are not clear, quote here the text which is not clear. > 3. FC storage is a must or can we use iscsi SD ? Better to use FC so we can create more load (disks will be extended and deleted faster, so we get more iterations per hour, or need less hours to reproduce. > 4. Please advise it will be test only on 4.2.6 since we do not have 4.2.3 > currently. I think we should test with older version, since this increases the chance to reproduce, and it can show if changes in 4.2.6 are related, as I explained in comment 31. Thanks Nir. Regarding number 2, this is the relevant section : " - Create vm with 30g raw disk - Make sure the vm has at least 15G free in the root file system. If not, resize the root file system as needed. " Can i have have instead of 1 disk with 30g a VM with 2 disks of 30g, one is bootable and on the other i will do the extend on ? (In reply to guy chen from comment #37) > Can i have have instead of 1 disk with 30g a VM with 2 disks of 30g, one is > bootable and on the other i will do the extend on ? You can, but I don't see why it better. Preparing the base image is something you can do once when you create the template, then all vms will use the same image with at least 15g free in the root file system. Nir please take a look and suggest (In reply to guy chen from comment #39) > Run the scenario as discussed. With original vdsm version, or latest 4.2? > During the removal of the VMS started to get IO error's, after that got read > only FS that prevents me doing any actions on the SD, logs attached. > > -=>>grep ERROR /var/log/vdsm/vdsm.log ... > 2018-09-20 06:49:26,346+0000 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH > getStoragePoolInfo error=(-202, 'Unable to read resource owners', 'IO > timeout') (dispatcher:86) Looks like your storage was not accessible or very slow here... > 2018-09-20 06:51:02,174+0000 ERROR (check/loop) [storage.Monitor] Error > checking path /dev/f1168ddf-917c-4ef4-b553-e034ee2983ec/metadata > (monitor:498) Storage monitoring failed to access this domain metadata lv... ... > 2018-09-20 07:21:12,023+0000 ERROR (Thread-22) [storage.TaskManager.Task] > (Task='846d39ec-cd8e-41cb-914f-e8a789ca2ef7') Unexpected error (task:875) > 2018-09-20 07:21:12,027+0000 ERROR (Thread-22) [storage.Dispatcher] FINISH > downloadImageFromStream error=Can't persist task: > (u'846d39ec-cd8e-41cb-914f-e8a789ca2ef7: cannot access/create taskdir > /rhev/data-center/98dea9a8-0a7e-11e8-b98d-b8ca3a638904/mastersd/master/tasks/ > 846d39ec-cd8e-41cb-914f-e8a789ca2ef7: [Errno 30] Read-only file system',) > (dispatcher:82) Looks like storage was overloaded by the test, and you did not configure multiapth so the master file system became read only. Did you run this setup with FC and multiple paths to storage? iSCSI? do you have iSCSI multipathing configured? re-applying need info for comment 44 https://bugzilla.redhat.com/show_bug.cgi?id=1553133#c44 Looking at vdsm logs, it looks like we reproduced the original issue in the last test run. The test was started at 07:53:24 (host time), with SPM on host b01-h17-r620. 1. First extend (b01-h17): -=>>xzgrep 'extending volume' /var/log/vdsm/vdsm.log.3 | head 2018-10-07 07:55:10,948+0000 INFO (mailbox-spm/0) [storage.SPM.Messages.Extend] processRequest: extending volume b074676d-dcba-4c0a-92d3-3f32ab8edb15 in domain f1168ddf-917c-4ef4-b553-e034ee2983ec (pool 98dea9a8-0a7e-11e8-b98d-b8ca3a638904) to size 2048 (mailbox:161) 2. The first error in the SPM log is: 2018-10-07 08:02:13,304+0000 ERROR (mailbox-spm/4) [storage.SPM.Messages.Extend] processRequest: Exception caught while trying to extend volume: ab2c4b0c-2709-4726-83bd-5671ff5697f1 in domain: f1168ddf-917c-4ef4-b553-e034ee2983ec (mailbox:172) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/mailbox.py", line 166, in processRequest pool.extendVolume(volume['domainID'], volume['volumeID'], size) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1420, in extendVolume sdCache.produce(sdUUID).extendVolume(volumeUUID, size, isShuttingDown) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1564, in extendVolume lvm.extendLV(self.sdUUID, volumeUUID, size) # , isShuttingDown) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 1224, in extendLV extent_size = int(vg.extent_size) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 82, in __getattr__ raise AttributeError("Failed reload: %s" % self.name) AttributeError: Failed reload: f1168ddf-917c-4ef4-b553-e034ee2983ec This error means that this VG was invalidated and we failed to reload VG metadata. Unfortunately the original error when we failed to reload the metadata was never logged, so we don't have any info on the first error. 3. Trying to access this VG using vgs fail with: # vgs f1168ddf-917c-4ef4-b553-e034ee2983ec -=>>vgs f1168ddf-917c-4ef4-b553-e034ee2983ec /dev/mapper/3600a098038304437415d4b6a59676d51: Checksum error at offset 22089216 Couldn't read volume group metadata from /dev/mapper/3600a098038304437415d4b6a59676d51. Metadata location on /dev/mapper/3600a098038304437415d4b6a59676d51 at 22089216 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/3600a098038304437415d4b6a59676d51 Failed to scan VG from /dev/mapper/3600a098038304437415d4b6a59676d51 Volume group "f1168ddf-917c-4ef4-b553-e034ee2983ec" not found Cannot process volume group f1168ddf-917c-4ef4-b553-e034ee2983ec 4. Using lsblk we can still see the contents of the corrupted VG: sdd 8:48 0 1T 0 disk └─3600a098038304437415d4b6a59676d51 253:4 0 1T 0 mpath ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-metadata 253:17 0 512M 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-ids 253:18 0 128M 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-leases 253:19 0 2G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-outbox 253:20 0 128M 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-xleases 253:21 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-inbox 253:22 0 128M 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-master 253:23 0 1G 0 lvm /rhev/data-center/mnt/blockSD/f1168ddf-917c-4ef4-b553-e034ee2983ec/master ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-5f94fc20--93ce--4694--9e75--6e367eecd7a3 253:38 0 50G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-e1d81892--57e3--4c9c--bef2--bfae08fa4c67 253:39 0 10G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-cb2c6de9--82b2--4467--9d7b--696d5523eff6 253:40 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-dfc7d9b7--b916--4565--8aa0--98f73f814aee 253:41 0 9G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-aae97ceb--af8d--4d8a--ace5--496cb93f09ae 253:42 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-b9582616--78cc--440e--b1f0--2f2e195faeb1 253:43 0 8G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-025d51bd--4d25--4393--850f--d1da09cf1c49 253:44 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-5afb917d--0e51--4cde--9608--0563be327d95 253:45 0 6G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-e4e4e606--c6b2--4503--b4f7--6eae0a1d0e42 253:46 0 4G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-b074676d--dcba--4c0a--92d3--3f32ab8edb15 253:47 0 16G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-68243011--06df--4e86--8ad4--714b89170d5b 253:48 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-d1c2a11e--75c8--4851--903e--ce251552793a 253:49 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-452169f5--0967--473a--8a1e--a4b3922ab915 253:50 0 3G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-4a5db397--2d03--4fa8--a1aa--842ac2dd2c8a 253:51 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-4530f234--1204--42a8--905e--e4ba284bf045 253:52 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-85db5760--6e55--40b4--9319--d7cbc11ab493 253:53 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-0a584504--e2a6--4df2--a03c--81bfbef59c23 253:54 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-ca7e84cb--7ab0--4102--b38e--a6b63fe65943 253:55 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-96f491a1--922c--4297--a075--39c5b757ecdd 253:56 0 16G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-80faa090--d967--4a4c--a017--a354b2f6416c 253:57 0 1G 0 lvm ├─f1168ddf--917c--4ef4--b553--e034ee2983ec-77c695cd--7126--44c2--b5d0--dd7989d6526d 253:58 0 1G 0 lvm └─f1168ddf--917c--4ef4--b553--e034ee2983ec-915b0e60--ac97--4c4a--aad6--a619663857ee 253:60 0 16G 0 lvm The attached patch (https://gerrit.ovirt.org/c/94751/) improve logging, showing detailed errors when reloading vgs and lvs fail. Created attachment 1491367 [details]
vdsm log from SPM during test
Created attachment 1491368 [details]
dump of lvm metadata area 1
This was created using:
dd if=//dev/mapper/3600a098038304437415d4b6a59676d51 \
of=3600a098038304437415d4b6a59676d51.metadata.1 \
bs=1M count=129 iflag=direct
Created attachment 1491369 [details]
dump of lvm metadata area 2
This was created using:
dd if=/dev/mapper/3600a098038304437415d4b6a59676d51 \
of=3600a098038304437415d4b6a59676d51.metadata.2 \
bs=1M skip=$((8191*128)) count=128 iflag=direct
Created attachment 1491370 [details]
lvm backup for the corrupted vg
The patch only improves logging, we still investigating the root cause. I want to do another run with the attached patch on all hosts, and a clean LUN. This may help to detect the reason for this corruption. Please do this: 1. Install vdsm on all hosts from this build: https://jenkins.ovirt.org/job/vdsm_master_build-artifacts-on-demand-el7-x86_64/1018/artifact/exported-artifacts/ 2. Add a new LUN for this test 3. Zero the entire LUN - we don't want old junk data to be mixed with lvm metadata: blkdiscard -z -p 32m /dev/mapper/xxxyyy 4. Create new storage domain from this LUN - best in a new DC. We don't want to touch the old corrupted VG. 5. Change vdsm log level to DEBUG - edit /etc/vdsm/logger.conf, and set: [logger_root] level=DEBUG handlers=syslog,logthread propagate=0 [logger_vds] level=DEBUG handlers=syslog,logthread qualname=vds propagate=0 [logger_storage] level=DEBUG handlers=logthread qualname=storage propagate=0 6. Run the test again. If we are lucky we will reproduce this again and we will have better logs and cleaner metadata to investigate. Removing Performance and TestOnly since this not about performance, and looks like we reproduce the issue, so this is not test only bug. Created attachment 1511697 [details] logs-kaminario Reproduced with Kaminario storage. Storage migration failed with Failed reload domain: 2018-12-04 15:20:20,033+0200 ERROR (monitor/9745f66) [storage.Monitor] Error checking domain 9745f66e-8d8e-4257-a652-4346601ecd4f (monitor:424) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 405, in _checkDomainStatus self.domain.selftest() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 48, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1637, in findDomain return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID)) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 958, in __init__ manifest = self.manifestClass(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 405, in __init__ sd.StorageDomainManifest.__init__(self, sdUUID, domaindir, metadata) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 327, in __init__ self._domainLock = self._makeDomainLock() File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 530, in _makeDomainLock domVersion = self.getVersion() File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 401, in getVersion return self.getMetaParam(DMDK_VERSION) File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 398, in getMetaParam return self._metadata[key] File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 91, in __getitem__ return dec(self._dict[key]) File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 201, in __getitem__ with self._accessWrapper(): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 154, in _accessWrapper self.refresh() File "/usr/lib/python2.7/site-packages/vdsm/storage/persistent.py", line 232, in refresh lines = self._metaRW.readlines() File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 271, in readlines for tag in vg.tags: File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 81, in __getattr__ raise AttributeError("Failed reload: %s" % self.name) AttributeError: Failed reload: 9745f66e-8d8e-4257-a652-4346601ecd4f After that, the VG is reported as [unknown] with Checksum error at offset.. 2018-12-04 15:20:40,419+0200 ERROR (monitor/9745f66) [storage.Monitor] Error checking domain 9745f66e-8d8e-4257-a652-4346601ecd4f (monitor:424) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 405, in _checkDomainStatus self.domain.selftest() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 48, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1637, in findDomain return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID)) File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1577, in findDomainPath raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: (u'9745f66e-8d8e-4257-a652-4346601ecd4f',) [root@puma45 ~]# pvs /dev/mapper/20024f40058540033: Checksum error at offset 65298432 Couldn't read volume group metadata. /dev/mapper/20024f40058540033: Checksum error at offset 16492605014016 Couldn't read volume group metadata. /dev/mapper/20024f40058540033: Checksum error at offset 65298432 Couldn't read volume group metadata. /dev/mapper/20024f40058540033: Checksum error at offset 16492605014016 Couldn't read volume group metadata. /dev/mapper/20024f40058540033: Checksum error at offset 65298432 Couldn't read volume group metadata. /dev/mapper/20024f40058540033: Checksum error at offset 16492605014016 Couldn't read volume group metadata. WARNING: PV /dev/mapper/20024f40058540033 is marked in use but no VG was found using it. WARNING: PV /dev/mapper/20024f40058540033 might need repairing. PV VG Fmt Attr PSize PFree /dev/mapper/20024f40058540033 [unknown] lvm2 u-- 15.00t 0 /dev/mapper/360002ac0000000000000009600021f6b f30b23ee-d99c-43a5-9794-fd22010e8d46 lvm2 a-- <12.00t 7.97t /dev/sda3 VolGroup01 lvm2 a-- 461.07g 0 This issue keeps repeating also in RHV automation on Netapp as reported in bug 1637405 and now we see it also with other storages. vdsm-4.20.28-1.el7ev.x86_64 kernel - 3.10.0-862.11.6.el7.x86_64 device-mapper-multipath-0.4.9-119.el7_5.1.x86_64 lvm2-2.02.177-4.el7.x86_64 I could reproduce this issue outside of oVirt using the attached reproducer script. The script simulate extend flow in oVirt system using 2 hosts: - one host run manger extend flow - this host runs commands like lvcreate, lvextend, and lvremove - the other host run regular host refresh flow - this host runs only "lvchange --available y" once and many "lvchange --refresh" commands. we expect both to be readonly. Here is results of example run: 1. Create test vg # python extend.py create-vg test-vg /dev/mapper/{36001405560f9d98242841cea375c479e,36001405847ec541476043d4ac9260d41,360014055fc8e87631db462caf5970d32,360014052557433e46b14abd91cfaae1d} 2019-01-04 16:29:31,213 INFO (MainThread) creating vg test-vg 2. Start manager flow on this host: # time python extend.py run-manager test-vg /dev/mapper/{36001405560f9d98242841cea375c479e,36001405847ec541476043d4ac9260d41,360014055fc8e87631db462caf5970d32,360014052557433e46b14abd91cfaae1d} > extend-run-manager.log 2>&1 3. After few seconds, start regular flow on another host # time python extend.py run-regular test-vg /dev/mapper/{36001405560f9d98242841cea375c479e,36001405847ec541476043d4ac9260d41,360014055fc8e87631db462caf5970d32,360014052557433e46b14abd91cfaae1d} > extend-run-regular.log 2>&1 The manager flow fail after 6 seconds with corrupted vg. Here are errors from the manager node log: Traceback (most recent call last): File "extend.py", line 310, in manager_worker lvm.extend_lv(lv_name, options.lv_size_mb) File "extend.py", line 157, in extend_lv full_name) File "extend.py", line 225, in run return run(cmd) File "extend.py", line 341, in run raise CommandError(cmd, p.returncode, out, err) CommandError: Command ['lvm', 'lvextend', '--config', ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 } ', '--autobackup', 'n', '--size', '+128m', 'test-vg/worker-003'] failed rc=5 out= err= /dev/mapper/36001405560f9d98242841cea375c479e: Checksum error at offset 1994752 Couldn't read volume group metadata from /dev/mapper/36001405560f9d98242841cea375c479e. Metadata location on /dev/mapper/36001405560f9d98242841cea375c479e at 1994752 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36001405560f9d98242841cea375c479e Failed to scan VG from /dev/mapper/36001405560f9d98242841cea375c479e Volume group "test-vg" not found Cannot process volume group test-vg ... 2019-01-04 16:30:45,661 ERROR (worker-038) failed Traceback (most recent call last): File "extend.py", line 310, in manager_worker lvm.extend_lv(lv_name, options.lv_size_mb) File "extend.py", line 157, in extend_lv full_name) File "extend.py", line 225, in run return run(cmd) File "extend.py", line 341, in run raise CommandError(cmd, p.returncode, out, err) CommandError: Command ['lvm', 'lvextend', '--config', ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_ count=3 filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d 32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { reta in_min=50 retain_days=0 } ', '--autobackup', 'n', '--size', '+128m', 'test-vg/worker-038'] failed rc=5 out= err= Scan of VG test-vg from /dev/mapper/36001405560f9d98242841cea375c479e found metadata seqno 143 vs previous 142. /dev/mapper/36001405560f9d98242841cea375c479e: Checksum error at offset 2432000 Couldn't read volume group metadata from /dev/mapper/36001405560f9d98242841cea375c479e. Metadata location on /dev/mapper/36001405560f9d98242841cea375c479e at 2432000 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36001405560f9d98242841cea375c479e Failed to scan VG from /dev/mapper/36001405560f9d98242841cea375c479e VG info not found after rescan of test-vg Volume group "test-vg" not found Cannot process volume group test-vg Here are some errors from the regular node run: 2019-01-04 16:30:45,691 ERROR (worker-034) failed to refresh worker-034 Traceback (most recent call last): File "extend.py", line 276, in regular_worker lvm.refresh_lv(lv_name) File "extend.py", line 162, in refresh_lv return self.run("lvchange", "--refresh", full_name) File "extend.py", line 225, in run return run(cmd) File "extend.py", line 341, in run raise CommandError(cmd, p.returncode, out, err) CommandError: Command ['lvm', 'lvchange', '--config', ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 } ', '--refresh', 'test-vg/worker-034'] failed rc=5 out= err= /dev/mapper/36001405560f9d98242841cea375c479e: Checksum error at offset 2432000 Couldn't read volume group metadata from /dev/mapper/36001405560f9d98242841cea375c479e. Metadata location on /dev/mapper/36001405560f9d98242841cea375c479e at 2432000 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36001405560f9d98242841cea375c479e Failed to scan VG from /dev/mapper/36001405560f9d98242841cea375c479e VG info not found after rescan of test-vg Volume group "test-vg" not found Cannot process volume group test-vg ... 2019-01-04 16:30:45,729 ERROR (worker-015) failed Traceback (most recent call last): File "extend.py", line 272, in regular_worker lvm.activate_lv(lv_name) File "extend.py", line 171, in activate_lv return self.run("lvchange", "--available", "y", full_name) File "extend.py", line 225, in run return run(cmd) File "extend.py", line 341, in run raise CommandError(cmd, p.returncode, out, err) CommandError: Command ['lvm', 'lvchange', '--config', ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 } ', '--available', 'y', 'test-vg/worker-015'] failed rc=5 out= err= Scan of VG test-vg from /dev/mapper/36001405560f9d98242841cea375c479e found metadata seqno 88 vs previous 87. /dev/mapper/36001405560f9d98242841cea375c479e: Checksum error at offset 2432000 Couldn't read volume group metadata from /dev/mapper/36001405560f9d98242841cea375c479e. Metadata location on /dev/mapper/36001405560f9d98242841cea375c479e at 2432000 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36001405560f9d98242841cea375c479e Failed to scan VG from /dev/mapper/36001405560f9d98242841cea375c479e VG info not found after rescan of test-vg Recovery of volume group "test-vg" failed. Cannot process volume group test-vg My theory is that "lvchange --refresh" command is seeing inconsistent metadata on storage since storage is being modified by the manager host very quickly, and try to "recover" the metadata: Recovery of volume group "test-vg" failed. This corrupt the vg metadata since only the manager host may change the vg metadata. Tested on CentOS 7.6: # rpm -qa | grep lvm2 lvm2-2.02.180-10.el7_6.2.x86_64 udisks2-lvm2-2.7.3-8.el7.x86_64 lvm2-libs-2.02.180-10.el7_6.2.x86_64 David, do we have a way to ensure that lvchange is readonly like "lvs --readonly"? Or maybe we need to use the (deprecated) global:locking_type=4 option for all commands on regular nodes? Created attachment 1518421 [details]
reproducer script
Created attachment 1518422 [details]
log from reproducer script running manager flow
Created attachment 1518423 [details]
log from reproducer script running regular node flow
In RHEL7 locking_type=4 is still the way to do this (it's deprecated in RHEL8). Reading a VG has this hidden and dangerous mis-feature of writing the VG if it happens to see something wrong with it. I've been meaning to fix this for a long time, but there was a lot of prepatory work required. It happens that over the past couple weeks I finally began working on this and am nearly finished. Reading a VG will no longer write it and will just use a good copy of the metadata. An explicit metadata repair command will be required to fix corrupt or stale metadata on a PV. Created attachment 1518426 [details]
reproducer script with locking_type 4 change
The same reproducer script using locking_type=1 for manager flows and locking_type=4 for regular node flows.
Created attachment 1518427 [details]
log from reproducer script with locking_type 4 running manager flow
Created attachment 1518429 [details]
log from reproducer script with locking_type 4 running regular node flow
This log from regular node show about 36% error rate refreshing lv while the lv
is manipulated on the manager node.
$ xzgrep -E 'INFO.+refreshing' extend-run-regular.log.xz | wc -l
54822
$ xzgrep -E 'ERROR.+failed to refresh' extend-run-regular.log.xz | wc -l
19785
You can also try setting --config global/metadata_read_only=1 in the refresh command. (In reply to David Teigland from comment #66) Thanks David, I wonder how this issue did not pop up earlier. I added locking_type=4 to the new reproducer script (attachment 1518426 [details]). With this the script completed successfully creating, extending and deleting 500 lvs, with 50 concurrent threads on the manager node, and 50 concurrent refresh threads on the regular node. You can see the log in attachment 1518427 [details]. However on the regular node, about 36% of the "lvchange --available y" operations failed as can be seen in attachment 1518429 [details]. Most of the errors were in "lvchange --refresh": $ xzgrep CommandError extend-run-regular.log.xz | grep '\--refresh' | wc -l 19785 $ xzgrep CommandError extend-run-regular.log.xz | grep '\--available' | wc -l 11 Here are some example failures: CommandError: Command ['lvm', 'lvchange', '--config', ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global { locking_type=4 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 } ', '--refresh', 'test-vg/worker-006'] failed rc=5 out= err= Scan of VG test-vg from /dev/mapper/36001405560f9d98242841cea375c479e found metadata seqno 109 vs previous 108. Read-only locking type set. Write locks are prohibited. Recovery of volume group "test-vg" failed. Cannot process volume group test-vg CommandError: Command ['lvm', 'lvchange', '--config', ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global { locking_type=4 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 retain_days=0 } ', '--available', 'y', 'test-vg/worker-001'] failed rc=5 out= err= Scan of VG test-vg from /dev/mapper/36001405560f9d98242841cea375c479e found metadata seqno 99 vs previous 98. Read-only locking type set. Write locks are prohibited. Recovery of volume group "test-vg" failed. Cannot process volume group test-vg If we apply locking_type=4 to regular nodes in vdsm, we will have a new issue with these refresh failures. This may fail randomly in many flows with unpredictable results. Way better than corrupting vg metadata, but still an issue. It looks like "lvchange --refresh" is not trying hard enough to deal with concurent remote updates, doing multiple reads from storage and failing if storage was changed between the reads. I would expect that when running with locking_type=4 it will retry to read from storage if storage was changed in the middle of the operation, and fail only after reasonable amount of retries. For now I think we can treat all readonly commands on regular nodes as temporary failures, log warning, and retry the operation 2-3 times before failing the flow. Based on the info in this bug, the corruption rate was once every few weeks, so these error would probably be very rare, and the cost for retrying the operation will be negligible. David, can we configure lvm to be more robust regarding remote updates in RHEL 7? Is there any advantage to using --config global/metadata_read_only=1? Before the scanning improvements in 7.6, the lvm would ignore the checksum errors during the scan and continue with reading the VG. When reading the VG it would repeat reading the disks and may not see the same problem, or it would potentially try to repair the metadata. Without locking_type=4 or readonly, that repair could end up corrupting the metadata. The scanning/reading of metadata has needed an overhaul for a long time. I completed the scanning rewrite, and now am doing the reading rewrite. 7.6 fell between the two, and this seems to be one effect of the incomplete transition. The way vdsm uses lvm without locking between reads and writes between nodes, we naturally expect to see issues like this. The new reading code will allow this to work because it will just take and use a good copy of the metadata when doing reporting or activation/refresh commands (and it won't ever try to repair something while reading it.) Even when this work is done, if you are unlucky, all the copies of metadata seen by the refresh command could be partially written and fail with a checksum error, leaving the command with nothing to use. We could look into adding an automatic retry for this case, but that's still no guarantee, it just reduces the chances. For now I suspect that retrying the refresh command in vdsm is the only option. (In reply to David Teigland from comment #70) > You can also try setting --config global/metadata_read_only=1 in the refresh > command. I tried metadata_read_only=1, and it is not allowed for lvchange --refresh: CommandError: Command ['lvm', 'lvchange', '--config', ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=[ "a|^/dev/mapper/36001405560f9d98242841cea375c479e$|", "a|^/dev/mapper/36001405847ec541476043d4ac9260d41$|", "a|^/dev/mapper/360014055fc8e87631db462caf5970d32$|", "a|^/dev/mapper/360014052557433e46b14abd91cfaae1d$|", "r|.*|" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 metadata_read_only=1 } backup { retain_min=50 retain_days=0 } ', '--refresh', 'test-vg/worker-006'] failed rc=5 out= err= Operation prohibited while global/metadata_read_only is set. Failed to suspend test-vg/worker-006. So locking_type=4 seems to be the only way for RHEL 7. (In reply to David Teigland from comment #72) > The way vdsm uses lvm without locking between reads and writes between > nodes, we naturally expect to see issues like this. The new reading code > will allow this to work because it will just take and use a good copy of the > metadata when doing reporting or activation/refresh commands (and it won't > ever try to repair something while reading it.) Even when this work is > done, if you are unlucky, all the copies of metadata seen by the refresh > command could be partially written and fail with a checksum error, How can we see partly written metadata? I thought that lvm is doing: - take metadata lock - write metadata content to the next area in the circular buffer - if write was successful write new metadata header pointing to the new metadata content - release metadata lock With this flow partial metadata is no possible, and metadata changes are atomic. Correct, but you are modifying and reading the metadata from different hosts, right? In that case you do not have a lock to prevent reading at any point during the modification of the buffer or header. We need to check if limiting the number of concurrent lvm commands: https://gerrit.ovirt.org/c/66932/ Can also lower the chance of this issue. This can be an easy change that we can ship for 4.2.z. I don't think we can support locking_type=4 in 4.2.z. Moving to 4.2.9 given comment #75, when the 4.2.8 async version will be available we'll try to push the limiting concurrent commands patch there after it will be tested I tested limiting the number of concurrent lvm operations to 10, and it does not prevent the corruption. I created improved simulator for locking type 4, simulating better what the real system does during extend flow. The simulator has 2 modes: - run-manger - simulates manager node flows - run-regular - simulates regular node extend flow The run-manager command starts a socket server, waiting for clients commands. The run-regular command starts multiple workers based on --concurrency option. Every worker connect to the server and perform this flow: for every iteration: create LV activate LV repeat 20 times: extend LV refresh LV sleep --extend-delay seconds deactivate LV remove LV The read-write operations (create LV, extend LV, remove LV) are performed on the manager node. The read-only operations (activate LV, refresh LV, deactivate LV) are performed locally. Read-only commands are performed using locking_type=4, ensuring that LVM does not attempt to recover the metadata if a command find the storage in inconsistent state. The command will fail in this case. To handle the expected failures of read-only commands, we use a retry mechanism with exponential back-off. When a command fails, we wait and retry the command again; each time the command fail we double the wait time to let the manager finish pending metadata updates. I tested creating 50,000 LVs, and then extending and refreshing them 20 time, total of 1,000,000 successful operations at rate of 29.7 operations per second. I also did some short runs creating 500 LVs, to check retry failure rate with lower operation rates. Here is a table with the results from all runs: extend-delay extends retries max-retry total-time extend-rate retry-rate ------------------------------------------------------------------------------ 1 1000000 237250 8 33677 29.69 21.56% 2 10000 1027 5 491 20.36 9.33% 5 10000 221 2 1138 8.78 2.00% 10 10000 100 2 2220 4.50 0.90% 20 10000 35 1 4382 2.28 0.31% 40 10000 8 1 8585 1.16 0.09% This shows that locking type 4 works well even with extreme load, but we must be prepared to handle failures of lvm commands running in read-only mode. The next step is to support locking type 4 in vdsm: - When host run as SPM, all lvm commands must run in read-write mode recovering vg metadata if needed. - When host is not SPM, all lvm commands must run in read-only mode so they do not try to recover vg metadata. - When running in read-only mode, retry failed commands. Testing show that 1-2 retries is enough in normal load. - When host changes read-only mode, wait for ongoing commands running in the old mode. - Creating and removing storage domains can be done on any host. These flows should be able to override lvm read only mode if they do not run on the SPM. Open issues: - Before the SPM can start, it must be able to activate the special LVs. If there is an issue with the VG metadata, activating the special LV may fail since we must activate them in read-only mode. - To acquire the SPM lease, we must be able to activate the ids and leases special volumes, but we may fail to activate them in read-only mode. Created attachment 1519091 [details]
locking type 4 simulator
Created attachment 1519092 [details]
log for running 50 threads, 1000 lvs per thread, 1 second extend delay
Created attachment 1519093 [details]
log for running 50 threads, 100 lvs per thread, 2 second extend delay
Created attachment 1519094 [details]
log for running 50 threads, 100 lvs per thread, 5 second extend delay
Created attachment 1519095 [details]
log for running 50 threads, 100 lvs per thread, 10 second extend delay
Created attachment 1519096 [details]
log for running 50 threads, 100 lvs per thread, 20 second extend delay
Created attachment 1519097 [details]
log for running 50 threads, 100 lvs per thread, 40 second extend delay
Nir, does this mean support will also need to use locking_type 4 when issuing LVM commands that modify the metadata, and we should also use metadata_read_only when giving commands to just read data? Maybe this is another reason to get those tools merged... (In reply to Germano Veit Michel from comment #85) Yes, using raw lvm commands on a system without lvm filter is risky. See also changes in vdsm sos plugin: https://gerrit.ovirt.org/c/96774/ https://github.com/sosreport/sos/issues/1533 Maybe we need to have /usr/share/vdsm/lvm/lvm.conf with the proper setting for accessing shared storage. This can be used by lvm commands like this: LVM_SYSTEM_DIR=/usr/share/vdsm/lvm lvs -o +tags Or add "vdsm-tool lvm" wrapper that will do the right thing. Tal, Nir As 4.2.8 Final build is already arrived should we change the Target top 4.3? (In reply to Daniel Gur from comment #87) > Tal, Nir As 4.2.8 Final build is already arrived should we change the Target > top 4.3? This is targeted to 4.2.8 async release Looking forward to a resolution of this issue; RE: WoRH CASE 02265434. (In reply to Nir Soffer from comment #86) > (In reply to Germano Veit Michel from comment #85) > Yes, using raw lvm commands on a system without lvm filter is risky. > See also changes in vdsm sos plugin: > https://gerrit.ovirt.org/c/96774/ > https://github.com/sosreport/sos/issues/1533 > > Maybe we need to have /usr/share/vdsm/lvm/lvm.conf with the proper setting > for accessing shared storage. This can be used by lvm commands like this: > > LVM_SYSTEM_DIR=/usr/share/vdsm/lvm lvs -o +tags > > Or add "vdsm-tool lvm" wrapper that will do the right thing. Thanks Nir. Nice catch on the lvm2 plugin too. Ideally I want all support to use the snapshot-tools, then no manual lvm command will be run and all is done via vdsm api. But in the meantime, I'll start writing an article explaining these changes and how we should run LVM commands. We already use some options for filters and lvmetad on our "base" syntax. Targeting to 4.2.8-2 when it will be available Is this still to be fixed? If so, can you please re-target to 4.2.8-3 ? (In reply to Sandro Bonazzola from comment #92) > Is this still to be fixed? If so, can you please re-target to 4.2.8-3 ? We are working on the master fix. When we have something for master we will evaluate again if this can be delivered in 4.2. Hi Tal, This bug is not in the 4.2.8-4 bug list (taken from 'RHV Bugzilla Searches' doc)[1]: Please re-target. [1] https://bugzilla.redhat.com/buglist.cgi?action=wrap&classification=Red%20Hat&f2=cf_ovirt_team&f3=OP&f4=product&f5=classification&f6=CP&j3=OR&list_id=10078056&o2=notsubstring&o4=equals&o5=equals&resolution=---&resolution=CURRENTRELEASE&resolution=ERRATA&target_milestone=ovirt-4.2.8-4&v2=docs&v4=Red%20Hat%20Enterprise%20Virtualization%20Manager&v5=oVirt Can't, no 4.2.9 version yet, I suspect we might have to change the product to RHEV, Aileen? sync2jira sync2jira Please move to 4.4 pm_ack+ (In reply to Allie DeVolder from comment #107) Can we get vdsm logs from all hosts in the DC at the time of the error? Or, it this is easier, output of: xzgrep WARN /var/log/vdsm/vdsm.log* And the estimated time of the first error about bad checksum. If the issue is indeed LVM corrupting its metadata during read only comamnd such as "lvs" or "lvchange --refresh", we expect to warnings like: WARNING: Inconsistent metadata found for VG xxx-yyy-zzz - updating to use version 42 To see these warnings, you need to run vdsm 4.30.20. The log was added in: commit 0d6d17dbe3d926460b13f0e31f4c97d55572d09a Author: Nir Soffer <nsoffer> Date: Wed May 29 00:26:11 2019 +0300 lvm: Log LVM warnings for successful commands nijin, please check comment 108. Did we see these warnings in other cases? To fix this in RHV we need to use locking_type=4 in all LVM commands running on non-SPM host. However pvs command cannot be used with locking_type=4 in some case. We depend on bug 1809660. Once this fix is available in RHEL 7.7 we can require it and use locking_type=4. ovirt-4.3 branch includes all the patches required to fix the data corruption except the the issue with pvs command, which can be resolved only when we have build for bug 1809660. QE can start testing this build now to check for regressions: https://jenkins.ovirt.org/job/vdsm_standard-on-merge/2405/artifact/build-artifacts.el7.x86_64/ Note that this version include some unwanted side affects like retrying lvm read-only commands when they should not be retried, which may cause delays in some flows. We are working on fixing these issues. The fix is included in ovirt-4.3.9, in vdsm-4.30.43. Note: On CentOS pvs command is used locking_type=1 instead of locking_type=4. This is unsafe and may cause data corruption. However pvs command is rarely used so the risk is very low. Once lvm2-2.02.186-7.el7_8.1 will be available in CentOS, we will release an update fixing this issue. *** Bug 1693075 has been marked as a duplicate of this bug. *** Due to the change in the locking mechanism in LVM for RHEL 8 this issue does not exist in RHV 4.4, also tested by the RHV scale team, closing this 4.4 bug. For tracking the 4.3.9 look in bug 1811391 As QE validated in 4.4 - I changed it to closed current-release. David, can you describe how you tested this on RHEL 8.2? 1. Used last updated extend.py script 2. On the first host: python2 extend.py create-vg <VG_NAME> /dev/mapper/<LUN_ID> firewall-cmd --add-port=8000/tcp python2 extend.py run-manager -p 8000 <VG_NAME> /dev/mapper/<LUN_ID> 2>1_run-manager.log 3. On the second host: python2 extend.py run-regular --iterations 1000 -p 8000 <First Host hostnae/IP> <VG_NAME> /dev/mapper/<LUN_ID> 2>1_run-regular.log *** Bug 1692230 has been marked as a duplicate of this bug. *** |