1081962 – [SCALE] block storage domain monitoring thread slows down when storage operations are running

Bug 1081962 - [SCALE] block storage domain monitoring thread slows down when storage operations are running

Summary: [SCALE] block storage domain monitoring thread slows down when storage operat...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	ovirt-4.0.4
Target Release:	---
Assignee:	Nir Soffer
QA Contact:	mlehrer
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1214584 (view as bug list)
Depends On:	1091389 1098791 1099081 1110140
Blocks:
TreeView+	depends on / blocked

Reported:	2014-03-28 10:15 UTC by Roman Hodain
Modified:	2020-06-11 12:36 UTC (History)
CC List:	44 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-10-17 13:11:10 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
repostats tool - generate repostats statistics from vdsm logs (899 bytes, text/plain) 2014-03-31 10:46 UTC, Nir Soffer	no flags	Details
lastCheck degradation (52.47 KB, image/png) 2015-08-11 11:51 UTC, Pavel Zhukov	no flags	Details
last check statistics - bug verification (12.99 KB, text/plain) 2016-02-02 13:53 UTC, Yuri Obshansky	no flags	Details
vdsm logs (8.03 MB, application/x-gzip) 2016-02-06 06:36 UTC, Pavel Zhukov	no flags	Details
engine logs (2.28 MB, application/x-gzip) 2016-02-06 06:38 UTC, Pavel Zhukov	no flags	Details
vdsm log (17.04 MB, text/plain) 2016-02-09 17:51 UTC, Pavel Zhukov	no flags	Details
engine log (2.44 MB, text/plain) 2016-02-09 17:52 UTC, Pavel Zhukov	no flags	Details
engine db dump (1019.04 KB, application/x-gzip) 2016-02-09 18:30 UTC, Pavel Zhukov	no flags	Details
vdsm log (217.18 KB, application/x-gzip) 2016-02-15 14:13 UTC, Pavel Zhukov	no flags	Details
targetcli configuration for testing this issue (82.76 KB, text/plain) 2016-02-15 14:15 UTC, Nir Soffer	no flags	Details
engine log (582.24 KB, application/x-gzip) 2016-02-15 14:15 UTC, Pavel Zhukov	no flags	Details
Monitor delays for vdsm logs from 2014-03-28 06:27 EDT, Roman Hodain (79.75 KB, application/pdf) 2016-02-23 12:23 UTC, Nir Soffer	no flags	Details
Monitor delays for vdsm log 2014-03-31 03:46 EDT, akotov@redhat.com (48.66 KB, application/pdf) 2016-02-23 12:27 UTC, Nir Soffer	no flags	Details
Monitor delays for vdsm logs 2016-02-06 01:36 EST, Pavel Zhukov (654.48 KB, application/pdf) 2016-02-23 12:35 UTC, Nir Soffer	no flags	Details
Monitor delays for vdsm log 2016-02-09 12:51 EST, Pavel Zhukov (53.22 KB, application/pdf) 2016-02-23 12:50 UTC, Nir Soffer	no flags	Details
Monitor delays for vdsm log 2016-02-15 09:13 EST, Pavel Zhukov (29.57 KB, application/pdf) 2016-02-23 12:53 UTC, Nir Soffer	no flags	Details
Repoplot of vdsm.log vdsm logs from 2014-03-28 06:27 EDT, Roman Hodain (140.77 KB, application/pdf) 2016-02-27 01:49 UTC, Nir Soffer	no flags	Details
Repoplot of vdsm.log vdsm logs from 2014-03-31 03:46 EDT, akotov@redhat.com (153.01 KB, application/pdf) 2016-02-27 01:52 UTC, Nir Soffer	no flags	Details
Repoplot of vdsm.log vdsm logs from 2016-02-06 01:36 EST, Pavel Zhukov (3.46 MB, application/pdf) 2016-02-27 02:03 UTC, Nir Soffer	no flags	Details
Repoplot of vdsm.log vdsm logs from 2016-02-09 12:51 EST, Pavel Zhukov (296.03 KB, application/pdf) 2016-02-27 02:05 UTC, Nir Soffer	no flags	Details
Repoplot of vdsm.log vdsm logs from 2016-02-15 09:13 EST, Pavel Zhukov (88.66 KB, application/pdf) 2016-02-27 02:07 UTC, Nir Soffer	no flags	Details
August 4, vdsm.log (380.99 KB, application/x-gzip) 2016-08-04 13:23 UTC, Yuri Obshansky	no flags	Details
August 4, vdsm.log.1 (6.72 MB, application/x-gzip) 2016-08-04 13:24 UTC, Yuri Obshansky	no flags	Details
August 4, vdsm.log.2 (7.17 MB, application/x-gzip) 2016-08-04 13:25 UTC, Yuri Obshansky	no flags	Details
August 4, vdsm.log.3 (4.73 MB, application/x-gzip) 2016-08-04 13:27 UTC, Yuri Obshansky	no flags	Details
repoplot from Yuri logs uploaded on August 4 (565.22 KB, application/pdf) 2016-08-20 23:30 UTC, Nir Soffer	no flags	Details
Repoplot showing differences between vdsm during concurrent disk move test (603.17 KB, application/zip) 2016-09-05 15:11 UTC, mlehrer	no flags	Details
Show Obsolete (5) View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Article)	375203	None	None	None	Never
Red Hat Product Errata	RHEA-2017:0998	normal	SHIPPED_LIVE	VDSM bug fix and enhancement update 4.1 GA	2017-04-18 20:11:39 UTC
oVirt gerrit	53395	master	ABANDONED	monitor: Refresh domains only if needed	2020-05-27 03:00:29 UTC
oVirt gerrit	53540	master	MERGED	lvm: Simplify locking, increasing concurrency	2020-05-27 03:00:29 UTC
oVirt gerrit	53861	master	MERGED	repoplot: Visualize storage domain monitoring	2020-05-27 03:00:29 UTC
oVirt gerrit	54757	master	MERGED	monitoring: Introduce the check module	2020-05-27 03:00:29 UTC
oVirt gerrit	54849	master	MERGED	monitoring: Introduce the asyncevent module	2020-05-27 03:00:30 UTC
oVirt gerrit	55254	master	MERGED	monitoring: Add process helpers	2020-05-27 03:00:30 UTC
oVirt gerrit	55490	master	MERGED	monitoring: Port vdsm.storage.check to python 3	2020-05-27 03:00:30 UTC
oVirt gerrit	57432	master	MERGED	monitoring: Use new monitoring infrastructure	2020-05-27 03:00:30 UTC
oVirt gerrit	57471	master	MERGED	misc: Remove unused readspeed function	2020-05-27 03:00:30 UTC
oVirt gerrit	58943	master	ABANDONED	lvm: Use global_filter instead of filter	2020-05-27 03:00:30 UTC
oVirt gerrit	59100	ovirt-3.6	ABANDONED	lvm: Use global_filter instead of filter	2020-05-27 03:00:30 UTC
oVirt gerrit	73702	ovirt-3.6	MERGED	misc: Improve concurrency in operation mutex	2020-05-27 03:00:30 UTC

Description Roman Hodain 2014-03-28 10:15:49 UTC

Description of problem:
	The monitoring thread is triggered every 10 seconds by default. As soon
as a storage operation is triggered ( copy image for instance ), the monitoring
thread slows down. 

Version-Release number of selected component (if applicable):
	vdsm-4.13.2-0.11.el6ev.x86_64

How reproducible:
	Almost always when a storage operation is triggered in certain
environments

Steps to Reproduce:
	Not clear yet

Actual results:
	The storage domains are marked as in Problems and the hypervisor is
marked as non-operational

Expected results:
	The monitoring thread is not slowed down.

Comment 9 Nir Soffer 2014-03-30 18:55:19 UTC

(In reply to Roman Hodain from comment #0)
> Description of problem:
> 	The monitoring thread is triggered every 10 seconds by default. As soon
> as a storage operation is triggered ( copy image for instance ), the
> monitoring
> thread slows down.

Your assumption that monitor thread is checking storage every 10 seconds is not true. The implementation *delay* at least 10 seconds between checks operations. If a check operation takes 60 seconds, the next check would be about 10 seconds after the last operations was finished.

Every 5 minutes, a monitor is performing a refresh operation on the domain. This operation takes only few seconds on an idle system with few tiny domains, but may take much longer, depending on the load on the machine, the number of storage domains, the number of lvs, and the storage.

Do you know any documentation that promised that the system behave differently?

> 
> Version-Release number of selected component (if applicable):
> 	vdsm-4.13.2-0.11.el6ev.x86_64
> 
> How reproducible:
> 	Almost always when a storage operation is triggered in certain
> environments

What is certain environment? We need much more details.

> 
> Steps to Reproduce:
> 	Not clear yet
> 
> Actual results:
> 	The storage domains are marked as in Problems

True, if lastCheck value is above 30 seconds (the default), the domain is mark as problem.

> and the hypervisor is marked as non-operational

But the hypervisor is *not* marked as non-operational because one check took long time. When domain is marked as problem domain, we start a timer on the engine, and if the domain is still in trouble after about 5 minutes, the host becomes non-operational.

So delays while monitoring a domain should not lead to host becoming non-operational, unless they are long and repeated, which means the host has trouble accessing the storage.

Can you point us to the events when a host became non-operational in this case?

> 
> Expected results:
> 	The monitoring thread is not slowed down.

No, in the current implementation, we cannot guarntee that. A monitor may slow down from time to time, and this is normal behavior, altough it usually a warning signal that the host or the storage is in trouble.

I will explain the example logs we see here in more defail later.

Comment 10 Nir Soffer 2014-03-30 18:57:13 UTC

Liron, explain more about the logic for marking a domain as non-operational, and check the engine log to see if we have someing abnormal here?

Comment 11 Mark Huth 2014-03-30 21:20:08 UTC

engine.log showing the domain 'in problem'...

2014-03-25 00:52:01,985 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-44) domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62:THW_B_SATA_DATA in problem. vds: ecp08.thw.itio.com

... 5 mins later, put host into non-operational state ...

2014-03-25 00:57:01,987 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-48) starting ProcessDomainRecovery for domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62:THW_B_SATA_DATA
2014-03-25 00:57:02,031 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (ajp-/127.0.0.1:8702-9) [4d66328d] Lock Acquired to object EngineLock [exclusiveLocks= key: d1f92602-0536-4e8e-bb6a-7af1a0afd4c1 value: VM
, sharedLocks= ]
2014-03-25 00:57:02,037 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (ajp-/127.0.0.1:8702-9) [4d66328d] START, IsVmDuringInitiatingVDSCommand( vmId = d1f92602-0536-4e8e-bb6a-7af1a0afd4c1), log id: 61b0176a
2014-03-25 00:57:02,037 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (ajp-/127.0.0.1:8702-9) [4d66328d] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 61b0176a
2014-03-25 00:57:02,067 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-48) vds ecp08.thw.itio.com reported domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62:THW_B_SATA_DATA as in problem, attempting to move the vds to status NonOperational

===================================================================================================================

The RHEVM decides the domain is 'in problem' from the results of repoStats, which shows the lastCheck to be > 30 seconds.

Thread-988992::INFO::2014-03-25 00:51:17,194::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-988992::INFO::2014-03-25 00:51:17,200::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response:... {u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000255575', 'lastCheck': '94.0', 'valid': True}...

... however for some reason, repoStats is run for 5 mins here and then unfortunately on the next run the domain had a lastCheck > 30 seconds again ...

Thread-989035::INFO::2014-03-25 00:56:11,641::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-989035::INFO::2014-03-25 00:56:11,713::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response:... {u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000190666', 'lastCheck': '60.1', 'valid': True}...

... and again, there is another delay in running repoStats of about 100 seconds and again the domain has a lastCheck > 30 seconds ...

Thread-989059::INFO::2014-03-25 00:57:55,527::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-989059::INFO::2014-03-25 00:57:55,553::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response:... {u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000213322', 'lastCheck': '52.5', 'valid': True}...

... but you can see from these there was no problem in reading the actual storage.  It responds quite quickly.  The problem is because the monitoring thread isn't checking very often as Roman showed in his update ...

===================================================================================================================

As Roman showed, the lastCheck > 30 was caused by the monitoring thread not running regularly:

When the storage domain was reported as 'in problem' the monitoring thread was only able to perform the checks sporadically.  Here we can see a > 90 gap between performing the getReadDelay

Thread-21::DEBUG::2014-03-25 00:49:36,938::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:49:43,177::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000255575 s, 16.0 MB/s\n'; <rc> = 0

... > 90 seconds before the next getReadDelay ...

Thread-21::DEBUG::2014-03-25 00:51:25,000::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:51:33,854::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000181862 s, 22.5 MB/s\n'; <rc> = 0

And this delay then shows up in the large repoStats lastCheck value which is reported back to RHEVM (in the first repoStats response shown above), here is it 94 seconds, which is about what it was between those 2 successive getReadDelay runs above ...

Thread-988992::INFO::2014-03-25 00:51:17,200::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {...
u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000255575', 'lastCheck': '94.0', 'valid': True}...


But as we can see, there is no read delay in reading the storage domain disks!  The only delay here is with vdsm not running the monitoring checks and then telling RHEVM the storage domain is 'in problem', when actually it is vdsm that is 'in problem'!

---------------------------------------------------------------------------------------------------------------------------

And here is another example where there is a 60 second gap between running successive getReadDelay (which is reported back to the RHEVM in the second repoStats response shown above) ...

Thread-21::DEBUG::2014-03-25 00:55:09,178::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:55:09,434::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000190666 s, 21.5 MB/s\n'; <rc> = 0

Thread-21::DEBUG::2014-03-25 00:56:11,646::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:56:11,876::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00028596 s, 14.3 MB/s\n'; <rc> = 0

And then repoStats reports a lastCheck of 60 seconds back to the RHEVM ...

Thread-989035::INFO::2014-03-25 00:56:11,713::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {...
u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000190666', 'lastCheck': '60.1', 'valid': True}...

... and so the RHEVM continues to consider the domain as 'in problem', but really there is no problem reading the storage.  It is result of large intervals in running the domain checks which are reported back to the RHEVM which falsely assumes there is a problem with the domain.

===================================================================================================================

The solution here for the customer is probably to adjust the MaxStorageVdsTimeoutCheckSec on the RHEVM to be > 60 seconds to minimize the chances of the large lastCheck making RHEVM think the domain is 'in problem'.

But perhaps a longer term solution is to have RHEVM collect an average of readDelay values and if the lastCheck is large *and* the associated readDelay is significantly larger than then average, *then* perhaps consider the domain as 'in problem'.  What do you think?

Comment 12 Nir Soffer 2014-03-30 21:33:06 UTC

(In reply to Mark Huth from comment #11)
Thanks Mark! We will check the events you showed - hopefully they are all in the logs attached to this bug? If not, please attach the relevant engine and vdsm logs here.

I think you are right in that the storage is not in problem. Looking in vdsm log, it looks like the host is in problem, taking multiple seconds to run a dd or lvm process, and not the storage.

Do we have any information about the load on the machine while these monitoring issues happened?

Comment 13 Mark Huth 2014-03-30 22:58:47 UTC

Hi Nir,

Yes the information in comment #12 (and all the comments) are in the attached engine.log and vdsm.log.1.xz files.  As yet we don't have any data about the load on host (interestingly, the customer doesn't like to run sosreport on their hosts because it seems it places load problems on the hosts!).  Anyway, I will request load information from the customer.

Thanks,
Mark

Comment 16 Nir Soffer 2014-03-31 10:44:28 UTC

This is a more useful view of repoStats for attachment 880585 [details].

domain: f737ce8c-d2c1-4cdf-8340-63fd449d2542
  delay      avg: 0.002497 min: 0.000293 max: 0.030579
  last check avg: 18.965657 min: 1.100000 max: 76.500000
domain: 07c1f411-8c34-4c3e-a096-48a133216016
  delay      avg: 0.001886 min: 0.000341 max: 0.015333
  last check avg: 21.066667 min: 1.200000 max: 109.300000
domain: 7f591cbf-1787-435d-881a-d55fb77ee636
  delay      avg: 0.002125 min: 0.000327 max: 0.031766
  last check avg: 18.268687 min: 0.100000 max: 50.600000
domain: 891c4f53-bee3-4054-9fe8-48b54fe38d65
  delay      avg: 0.000669 min: 0.000281 max: 0.009746
  last check avg: 32.796970 min: 0.300000 max: 84.500000
domain: 8b7ef24a-57c0-43e6-9f9e-b71a8701eacc
  delay      avg: 0.002053 min: 0.000000 max: 0.013383
  last check avg: 16.969792 min: 0.200000 max: 75.000000
domain: 99bd47ea-2c85-4988-ad28-ae925a0ea3f6
  delay      avg: 0.001679 min: 0.000342 max: 0.010271
  last check avg: 17.405051 min: -0.100000 max: 56.700000
domain: 58953ce4-c34d-49e1-85a8-1d0033978059
  delay      avg: 0.001879 min: 0.000324 max: 0.015990
  last check avg: 18.948485 min: 0.200000 max: 64.200000
domain: 145656b4-c764-4b4f-8932-f0cc790ac7f5
  delay      avg: 0.001933 min: 0.000298 max: 0.014536
  last check avg: 18.145455 min: -0.200000 max: 69.300000
domain: 1851daa3-660f-44b2-8949-3e1a821b9b87
  delay      avg: 0.001924 min: 0.000296 max: 0.008656
  last check avg: 18.884848 min: 0.100000 max: 83.500000
domain: beb7a15c-2041-4298-95b9-7c73657a140e
  delay      avg: 0.001827 min: 0.000293 max: 0.020137
  last check avg: 16.979798 min: -0.100000 max: 53.600000
domain: 216ae8e3-888f-41f0-9770-82af114a845f
  delay      avg: 0.002230 min: 0.000315 max: 0.027663
  last check avg: 15.493939 min: -0.200000 max: 50.100000
domain: fe1c8078-1bf1-4cab-bea5-e0d9f665d082
  delay      avg: 0.001957 min: 0.000329 max: 0.019197
  last check avg: 17.994949 min: 0.100000 max: 60.800000
domain: 80767ba5-7c87-45dd-9a32-7330ed889af5
  delay      avg: 0.001659 min: 0.000321 max: 0.006276
  last check avg: 29.701010 min: 0.000000 max: 102.600000
domain: 9b66b8d0-1a39-444a-bef5-c3b9b432cb49
  delay      avg: 0.001557 min: 0.000309 max: 0.009697
  last check avg: 19.275758 min: -0.700000 max: 98.700000
domain: ea38354b-1638-427f-b07a-fb7d68965573
  delay      avg: 0.002269 min: 0.000328 max: 0.020631
  last check avg: 15.559596 min: -0.300000 max: 54.200000
domain: 47e996a5-7031-4a67-bbd3-a982ddbaa385
  delay      avg: 0.001853 min: 0.000315 max: 0.010552
  last check avg: 19.190909 min: -0.100000 max: 68.300000
domain: 1f8ab569-e10c-4db2-89ea-1299c33657dc
  delay      avg: 0.002147 min: 0.000286 max: 0.019115
  last check avg: 17.875758 min: -0.100000 max: 67.300000
domain: fadcbebe-e7e9-41d7-acc2-c6f7c88ddc21
  delay      avg: 0.001892 min: 0.000355 max: 0.017745
  last check avg: 19.050505 min: -0.000000 max: 75.000000
domain: 16ff2d50-2d03-44e7-aee4-bbde70bbc866
  delay      avg: 0.001890 min: 0.000298 max: 0.019597
  last check avg: 18.421212 min: -0.600000 max: 67.800000
domain: 8fa43837-ac15-48f4-ad00-c928899709a1
  delay      avg: 0.001713 min: 0.000287 max: 0.013944
  last check avg: 19.128283 min: -0.100000 max: 55.000000
domain: 582f520c-cf40-498a-b999-6c5e0056509c
  delay      avg: 0.001845 min: 0.000329 max: 0.012668
  last check avg: 17.516162 min: -0.500000 max: 74.900000
domain: f60fde72-53e0-4ead-b998-a5d985112707
  delay      avg: 0.001730 min: 0.000285 max: 0.021188
  last check avg: 19.022222 min: 1.200000 max: 88.100000
domain: 15552cb3-d62d-4868-bc1c-64dcf8542d12
  delay      avg: 0.005119 min: 0.000325 max: 0.136264
  last check avg: 18.754545 min: 0.400000 max: 56.600000
domain: 50885a30-5bec-4b34-98ed-ed3c3306bad0
  delay      avg: 0.000447 min: 0.000297 max: 0.000926
  last check avg: 29.818182 min: -0.600000 max: 82.500000
domain: c14a4d42-7e18-493c-96ed-8549bdd4b0bb
  delay      avg: 0.001669 min: 0.000306 max: 0.011661
  last check avg: 19.109091 min: 0.200000 max: 103.300000
domain: 8ea8a6d5-b5e6-4120-bed0-539dd45a2457
  delay      avg: 0.001729 min: 0.000318 max: 0.012942
  last check avg: 15.146465 min: -0.600000 max: 65.600000
domain: 57e4de3a-a5dd-4b94-8062-d2665d294268
  delay      avg: 0.001642 min: 0.000305 max: 0.014575
  last check avg: 18.091919 min: -0.400000 max: 114.100000
domain: c5084f93-aa00-4f65-8532-65b3ca9dea7e
  delay      avg: 0.002153 min: 0.000311 max: 0.023065
  last check avg: 18.705051 min: 0.000000 max: 63.300000

Comment 17 Nir Soffer 2014-03-31 10:46:42 UTC

Created attachment 880688 [details]
repostats tool - generate repostats statistics from vdsm logs

Comment 18 Nir Soffer 2014-03-31 11:00:23 UTC

Here what we can see on the vdsm side. To make it more clear, I include only Thread-21, the monitor thread for the domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62, and some repoStats calls.


At 00:36, one minute before the copyImage task started:

Thread-21::DEBUG::2014-03-25 00:36:04,687::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096
 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:36:04,704::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00
0246864 s, 16.6 MB/s\n'; <rc> = 0
Thread-21::DEBUG::2014-03-25 00:36:14,744::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096
 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:36:14,794::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00
0511668 s, 8.0 MB/s\n'; <rc> = 0

We can see that running dd takes 30-50 milliseconds.


The copyImage task starts:

Thread-988661::INFO::2014-03-25 00:37:41,432::logUtils::44::dispatcher::(wrapper) Run and protect: copyImage(sdUUID='b45645cf-072b-43e6-95f3-0387caa14847', spUUID='97b195c8-23ba-49e9-aea3-22080512ac66', vmUUID='', srcImgUUID='f11f2572-54e4-4a1c-af35-18c1b8d416f9', srcVolUUID='cdfa0040-4885-4340-94ef-2d11590c1911', dstImgUUID='467c8de5-3260-4f84-9f7a-a9d097b67107', dstVolUUID='e2dc71b8-3ddd-4e02-a720-c69270c21408', description='Active VM', dstSdUUID='b45645cf-072b-43e6-95f3-0387caa14847', volType=6, volFormat=4, preallocate=2, postZero='false', force='false')


One minute later - we see no effect on the monitor thread:

Thread-21::DEBUG::2014-03-25 00:38:35,426::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096
 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:38:35,441::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00
0404696 s, 10.1 MB/s\n'; <rc> = 0
Thread-21::DEBUG::2014-03-25 00:38:45,485::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:38:45,527::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000206213 s, 19.9 MB/s\n'; <rc> = 0


At 00:39:25, the monitor refresh the domain (this happens every 5 minutes):

Thread-21::DEBUG::2014-03-25 00:39:35,676::domainMonitor::192::Storage.DomainMonitorThread::(_monitorDomain) Refreshing domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62
Thread-21::ERROR::2014-03-25 00:39:35,677::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62
Thread-21::ERROR::2014-03-25 00:39:35,677::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62
Thread-21::DEBUG::2014-03-25 00:39:35,677::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with LvMetadataRW backend
Thread-21::DEBUG::2014-03-25 00:39:35,682::blockSD::326::Storage.Misc.excCmd::(readlines) '/bin/dd iflag=direct skip=0 bs=2048 if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:39:35,696::blockSD::326::Storage.Misc.excCmd::(readlines) SUCCESS: <err> = '1+0 records in\n1+0 records out\n2048 bytes (2.0 kB) copied, 0.000306704 s, 6.7 MB/s\n'; <rc> = 0
Thread-21::DEBUG::2014-03-25 00:39:35,697::misc::289::Storage.Misc::(validateDDBytes) err: ['1+0 records in', '1+0 records out', '2048 bytes (2.0 kB) copied, 0.000306704 s, 6.7 MB/s'], size: 2048
Thread-21::DEBUG::2014-03-25 00:39:35,697::persistentDict::234::Storage.PersistentDict::(refresh) read lines (LvMetadataRW)=[]
Thread-21::WARNING::2014-03-25 00:39:35,697::persistentDict::256::Storage.PersistentDict::(refresh) data has no embedded checksum - trust it as it is
Thread-21::DEBUG::2014-03-25 00:39:35,698::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with VGTagMetadataRW backend
Thread-21::DEBUG::2014-03-25 00:39:35,698::lvm::516::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-21::DEBUG::2014-03-25 00:39:35,699::lvm::518::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex
Thread-21::DEBUG::2014-03-25 00:39:35,699::lvm::526::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-21::DEBUG::2014-03-25 00:39:35,699::lvm::538::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-21::DEBUG::2014-03-25 00:39:35,700::lvm::386::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-21::DEBUG::2014-03-25 00:39:35,701::lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/ECPB_01|/dev/mapper/ECPB_02|/dev/mapper/ECPB_03|/dev/mapper/ECPB_04|/dev/mapper/ECPB_SATA_01|/dev/mapper/ECPB_SATA_02|\', \'r|.*|\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62' (cwd None)
Thread-21::DEBUG::2014-03-25 00:39:35,979::lvm::309::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0
Thread-21::DEBUG::2014-03-25 00:39:35,981::lvm::428::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
Thread-21::DEBUG::2014-03-25 00:39:35,981::persistentDict::234::Storage.PersistentDict::(refresh) read lines (VGTagMetadataRW)=['LEASETIMESEC=60', 'SDUUID=2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62', 'LOCKRENEWALINTERVALSEC=5', 'LOGBLKSIZE=512', u'PV0=pv:ECPB_SATA_01,uuid:gr8Rdi-tJNf-iF8K-WW3m-rjSe-DJsa-HAV90O,pestart:0,pecount:8189,mapoffset:0', 'LEASERETRIES=3', 'DESCRIPTION=THW_B_SATA_DATA', 'IOOPTIMEOUTSEC=10', 'VGUUID=f0bA1D-h1SV-UuBs-YIan-e5nM-d5AX-meaq8z', u'PV1=pv:ECPB_SATA_02,uuid:FegIWm-G3Sw-Hf32-4gpl-GTuA-nspj-gDb2Eg,pestart:0,pecount:8189,mapoffset:8189', 'PHYBLKSIZE=512', 'CLASS=Data', 'TYPE=FCP', 'LOCKPOLICY=', 'VERSION=3', 'POOL_UUID=97b195c8-23ba-49e9-aea3-22080512ac66', 'POOL_DESCRIPTION=THW', 'ROLE=Master', 'MASTER_VERSION=2', 'POOL_SPM_ID=1', 'POOL_SPM_LVER=12', u'POOL_DOMAINS=b45645cf-072b-43e6-95f3-0387caa14847:Active,2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62:Active,f7e3876e-cb2d-4b8a-af4b-1f197c50c2f4:Active', '_SHA_CKSUM=c6cde602c4f6b7662e8044a091dfe3a8037c819b']
Thread-21::INFO::2014-03-25 00:39:35,984::sd::375::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_imageNS already registered
Thread-21::INFO::2014-03-25 00:39:35,984::sd::383::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_volumeNS already registered
Thread-21::INFO::2014-03-25 00:39:35,985::blockSD::446::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_lvmActivationNS already registered
Thread-21::DEBUG::2014-03-25 00:39:35,985::lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgck --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/ECPB_SATA_01|/dev/mapper/ECPB_SATA_02|\', \'r|.*|\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62' (cwd None)
Thread-21::DEBUG::2014-03-25 00:39:36,199::lvm::309::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0
Thread-21::DEBUG::2014-03-25 00:39:36,205::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:39:36,222::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00

The domain refresh took about 1 second - the task has no effect on the monitor thread so far.

We start to see trouble here about 10 minutes after starting the copyImage task - so it seems that the copyImage task is not related to this issue. Running dd takes now 3-10 seconds:

Thread-21::DEBUG::2014-03-25 00:47:21,183::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096
 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:47:24,227::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00
0293797 s, 13.9 MB/s\n'; <rc> = 0
Thread-21::DEBUG::2014-03-25 00:47:43,756::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096
 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:47:53,835::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00
0245865 s, 16.7 MB/s\n'; <rc> = 0


And now the monitor thread try to refresh the domain again, and it takes about 94 seconds:

Thread-21::DEBUG::2014-03-25 00:49:53,611::domainMonitor::192::Storage.DomainMonitorThread::(_monitorDomain) Refreshing domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62
Thread-21::ERROR::2014-03-25 00:50:04,486::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62
Thread-21::ERROR::2014-03-25 00:50:04,494::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62
Thread-21::DEBUG::2014-03-25 00:50:04,559::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with LvMetadataRW backend
Thread-21::DEBUG::2014-03-25 00:50:04,731::blockSD::326::Storage.Misc.excCmd::(readlines) '/bin/dd iflag=direct skip=0 bs=2048 if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/meta
data count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:50:08,419::blockSD::326::Storage.Misc.excCmd::(readlines) SUCCESS: <err> = '1+0 records in\n1+0 records out\n2048 bytes (2.0 kB) copied, 0.00018
7592 s, 10.9 MB/s\n'; <rc> = 0
Thread-21::DEBUG::2014-03-25 00:50:10,066::misc::289::Storage.Misc::(validateDDBytes) err: ['1+0 records in', '1+0 records out', '2048 bytes (2.0 kB) copied, 0.000187592 s, 10.
9 MB/s'], size: 2048
Thread-21::DEBUG::2014-03-25 00:50:15,917::persistentDict::234::Storage.PersistentDict::(refresh) read lines (LvMetadataRW)=[]
Thread-21::WARNING::2014-03-25 00:50:15,930::persistentDict::256::Storage.PersistentDict::(refresh) data has no embedded checksum - trust it as it is
Thread-21::DEBUG::2014-03-25 00:50:15,945::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with VGTagMetadataRW backend
Thread-21::DEBUG::2014-03-25 00:50:15,985::lvm::516::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-21::DEBUG::2014-03-25 00:50:16,016::lvm::518::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex
Thread-21::DEBUG::2014-03-25 00:50:16,042::lvm::526::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-21::DEBUG::2014-03-25 00:50:16,066::lvm::538::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-21::DEBUG::2014-03-25 00:50:16,087::lvm::386::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-21::DEBUG::2014-03-25 00:50:18,907::lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ign
ore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/ECPB_01|/dev/mapper/ECPB_02|/dev/mapper/ECPB_0
3|/dev/mapper/ECPB_04|/dev/mapper/ECPB_SATA_01|/dev/mapper/ECPB_SATA_02|\', \'r|.*|\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  ret
ain_min = 50  retain_days = 0 } " --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg
_mda_size,vg_mda_free,lv_count,pv_count,pv_name 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62' (cwd None)
Thread-21::DEBUG::2014-03-25 00:50:36,728::lvm::309::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0
Thread-21::DEBUG::2014-03-25 00:50:36,772::lvm::428::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
Thread-21::DEBUG::2014-03-25 00:50:36,781::persistentDict::234::Storage.PersistentDict::(refresh) read lines (VGTagMetadataRW)=['LEASETIMESEC=60', 'SDUUID=2c5f2291-ac31-4c9e-8f
0a-3d317f3f0d62', 'LOCKRENEWALINTERVALSEC=5', 'LOGBLKSIZE=512', u'PV0=pv:ECPB_SATA_01,uuid:gr8Rdi-tJNf-iF8K-WW3m-rjSe-DJsa-HAV90O,pestart:0,pecount:8189,mapoffset:0', 'LEASERET
RIES=3', 'DESCRIPTION=THW_B_SATA_DATA', 'IOOPTIMEOUTSEC=10', 'VGUUID=f0bA1D-h1SV-UuBs-YIan-e5nM-d5AX-meaq8z', u'PV1=pv:ECPB_SATA_02,uuid:FegIWm-G3Sw-Hf32-4gpl-GTuA-nspj-gDb2Eg,
pestart:0,pecount:8189,mapoffset:8189', 'PHYBLKSIZE=512', 'CLASS=Data', 'TYPE=FCP', 'LOCKPOLICY=', 'VERSION=3', 'POOL_UUID=97b195c8-23ba-49e9-aea3-22080512ac66', 'POOL_DESCRIPT
ION=THW', 'ROLE=Master', 'MASTER_VERSION=2', 'POOL_SPM_ID=1', 'POOL_SPM_LVER=12', u'POOL_DOMAINS=b45645cf-072b-43e6-95f3-0387caa14847:Active,2c5f2291-ac31-4c9e-8f0a-3d317f3f0d6
2:Active,f7e3876e-cb2d-4b8a-af4b-1f197c50c2f4:Active', '_SHA_CKSUM=c6cde602c4f6b7662e8044a091dfe3a8037c819b']
Thread-21::INFO::2014-03-25 00:50:58,323::sd::375::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_imageNS already 
registered
Thread-21::INFO::2014-03-25 00:50:58,336::sd::383::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_volumeNS already
 registered
Thread-21::INFO::2014-03-25 00:50:58,358::blockSD::446::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_lvmActivationNS already registered
Thread-21::DEBUG::2014-03-25 00:50:58,361::lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgck --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ig
nore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/ECPB_SATA_01|/dev/mapper/ECPB_SATA_02|\', \'r|.*|\' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62' (cwd None)
Thread-21::DEBUG::2014-03-25 00:51:17,182::lvm::309::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0
Thread-988992::INFO::2014-03-25 00:51:17,200::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'b45645cf-072b-43e6-95f3-0387caa14847': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000187753', 'lastCheck': '22.1', 'valid': True}, u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000255575', 'lastCheck': '94.0', 'valid': True}, u'f7e3876e-cb2d-4b8a-af4b-1f197c50c2f4': {'code': 0, 'version': 0, 'acquired': True, 'delay': '0.00036752', 'lastCheck': '18.8', 'valid': True}}
Thread-21::DEBUG::2014-03-25 00:51:25,000::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:51:33,854::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000181862 s, 22.5 MB/s\n'; <rc> = 0

Noteable delays while refreshing the domain:

- persistentDict::234::Storage.PersistentDict::(refresh) - 22 seconds
- lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs - 18 seconds
- lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgck - 18 seconds

Seems that the host is under extreme load, or swaping memory. We know that the storage is ok becasue the last readDelay show tiny delay and high rate.


Running dd continue to be randomally slow from time to time:

Thread-21::DEBUG::2014-03-25 00:52:03,742::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096
 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:52:04,043::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00
0264309 s, 15.5 MB/s\n'; <rc> = 0
Thread-21::DEBUG::2014-03-25 00:52:36,920::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096
 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:52:58,171::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00
0229393 s, 17.9 MB/s\n'; <rc> = 0
Thread-21::DEBUG::2014-03-25 00:53:10,253::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None)
Thread-21::DEBUG::2014-03-25 00:53:10,440::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00021057 s, 19.5 MB/s\n'; <rc> = 0


At 00:55:21,879, the next domain refresh starts:

Thread-21::DEBUG::2014-03-25 00:55:21,879::domainMonitor::192::Storage.DomainMonitorThread::(_monitorDomain) Refreshing domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62

This time the refresh took 50 seconds.


And now come the next repoStat call:

Thread-989035::INFO::2014-03-25 00:56:11,713::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'b45645cf-072b-43e6-95f3-0387caa14847': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000218881', 'lastCheck': '20.3', 'valid': True}, u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000190666', 'lastCheck': '60.1', 'valid': True}, u'f7e3876e-cb2d-4b8a-af4b-1f197c50c2f4': {'code': 0, 'version': 0, 'acquired': True, 'delay': '0.00377639', 'lastCheck': '0.6', 'valid': True}}

The last repoStats call was at 00:51:17,200 - seems that engine stopped monitoring the host for 5 minutes.


It seems that the engine 5 minutes timeout and dmain monitor 5 mintues timeout between refreshes work together to ensure that a host becomes non-operational if the host is under high load.

Comment 19 Liron Aravot 2014-04-01 11:58:52 UTC

I'll provide RCA here and what needs to be done - 


We have a succesfull vds refresh on which the domain is being reported as problematic.

-----------------------------------------------------------------------------
2014-03-25 00:52:01,985 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-44) domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62:THW_B_SATA_DATA in problem. vds: ecp08.thw.itio.com
-----------------------------------------------------------------------------


then it's attempted to stop a vm - this execution takes an exclusive lock (synchronized block) on _vdsManager.getLockObj() which is also the lock used for performing vds refresh.
----------------------------------------------------------------------------- 
[org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-49) [77dded4d] START, DestroyVmVDSCommand(HostName = ecp08.thw.itio.com, HostId = 26c25a9a-7af7-4e62-8289-4e04d2f5767f, vmId=d1f92602-0536-4e8e-bb6a-7af1a0afd4c1, force=false, secondsToWait=0, gracefully=false), log id: 1d9ba1b7

2014-03-25 00:52:48,132 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-49) [77dded4d] START, DestroyVDSCommand(HostName = ecp08.thw.itio.com, HostId = 26c25a9a-7af7-4e62-8289-4e04d2f5767f, vmId=d1f92602-0536-4e8e-bb6a-7af1a0afd4c1, force=false, secondsToWait=0, gracefully=false), log id: 103c2234
-----------------------------------------------------------------------------


The duration of the vm stop operation is long (more than 3 minutes) due to various reason on vdsm side (but that's regardless to this bug)
-----------------------------------------------------------------------------
Thread-989013::DEBUG::2014-03-25 00:55:51,385::BindingXMLRPC::982::vds::(wrapper) return vmDestroy with {'status': {'message': 'Machine destroyed', 'code': 0}}
-----------------------------------------------------------------------------


in the meanwhile (during those 3 minutes) - there were multiple attempts to stop the vm, leading to multiple threads to pile up waiting for the exclusive lock - 
-----------------------------------------------------------------------------
1:
2014-03-25 00:53:25,168 INFO  [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-44) [582e54a6] START, DestroyVmVDSCommand(HostName = ecp08.
thw.itio.com, HostId = 26c25a9a-7af7-4e62-8289-4e04d2f5767f, vmId=d1f92602-0536-4e8e-bb6a-7af1a0afd4c1, force=false, secondsToWait=0, gracefully=false), log i
d: 19c8e61b


2:
2014-03-25 00:54:13,921 INFO  [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-48) [73933d5d] START, DestroyVmVDSCommand(HostName = ecp08.thw.itio.com, HostId = 26c25a9a-7af7-4e62-8289-4e04d2f5767f, vmId=d1f92602-0536-4e8e-bb6a-7af1a0afd4c1, force=false, secondsToWait=0, gracefully=false), log id: 1ad87aca
-----------------------------------------------------------------------------

only now the first stop attempt is ended 
-----------------------------------------------------------------------------
2014-03-25 00:55:48,133 ERROR [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-49) [77dded4d] VDS::destroy Failed destroying vm d1f92602-0536-4e8e-bb6a-7af1a0afd4c1 in vds = 26c25a9a-7af7-4e62-8289-4e04d2f5767f : ecp08.thw.itio.com, error = org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.util.concurrent.TimeoutException
2014-03-25 00:55:48,133 INFO  [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-49) [77dded4d] FINISH, DestroyVmVDSCommand, log id: 1d9ba1b7
-----------------------------------------------------------------------------

as there were few threads waiting for the exclusive lock including the VdsManager.OnTimer() method - refresh won't occur (thus there's no call to repoStats meanwhile) - as the host did report the domain as problematic and no "opposite" report was collected (as vds manager won't collect stats waiting for the lock), the host will move to non operational.

Seems like there's no reason that at least "regular" (or let's call it - "unrelated" to vm's status) stats won't be collected from the host when it's attempted to stop a vm (stats like vm stats may be collected as well if it's safe).

Allon, how do you want to handle it?

Comment 20 Liron Aravot 2014-04-01 12:08:44 UTC

regardless, we shouldn't allow multiple execution of  DestroyVmVDSCommand which will hide this problem for "normal" config values, seems like there's no reason to allow multiple concurrent executions.

Comment 21 Allon Mureinik 2014-04-01 13:52:05 UTC

(In reply to Liron Aravot from comment #19)
> I'll provide RCA here and what needs to be done - 
> 
> as there were few threads waiting for the exclusive lock including the
> VdsManager.OnTimer() method - refresh won't occur (thus there's no call to
> repoStats meanwhile) - as the host did report the domain as problematic and
> no "opposite" report was collected (as vds manager won't collect stats
> waiting for the lock), the host will move to non operational.
> 
> Seems like there's no reason that at least "regular" (or let's call it -
> "unrelated" to vm's status) stats won't be collected from the host when it's
> attempted to stop a vm (stats like vm stats may be collected as well if it's
> safe).

Hi Barak,

Please see Liron's analysis above (comment 19 has the full details).
In a nutshell, host monitoring and some of the VM lifecycle commands (starting and stopping) take the same lock. Since these may be long operations (in this case - stopping a VM), a host may MISTAKENLY be perceived as though it's not updating its stats, and then be moved to non-op.

Can one of your guys please take a look at this?
If this is true, the issue is embedded in the host/vm lifecycle, and should be handled by the appropriate experts.

Comment 22 Liron Aravot 2014-04-01 13:54:44 UTC

regardless, we can also take a look into why we take an exclusive look prevents simultenous operations on different vms on the same host.

Comment 23 Allon Mureinik 2014-04-01 14:25:28 UTC

(In reply to Allon Mureinik from comment #21)
> (In reply to Liron Aravot from comment #19)
> > I'll provide RCA here and what needs to be done - 
> > 
> > as there were few threads waiting for the exclusive lock including the
> > VdsManager.OnTimer() method - refresh won't occur (thus there's no call to
> > repoStats meanwhile) - as the host did report the domain as problematic and
> > no "opposite" report was collected (as vds manager won't collect stats
> > waiting for the lock), the host will move to non operational.
> > 
> > Seems like there's no reason that at least "regular" (or let's call it -
> > "unrelated" to vm's status) stats won't be collected from the host when it's
> > attempted to stop a vm (stats like vm stats may be collected as well if it's
> > safe).
> 
> Hi Barak,
> 
> Please see Liron's analysis above (comment 19 has the full details).
> In a nutshell, host monitoring and some of the VM lifecycle commands
> (starting and stopping) take the same lock. Since these may be long
> operations (in this case - stopping a VM), a host may MISTAKENLY be
> perceived as though it's not updating its stats, and then be moved to non-op.
> 
> Can one of your guys please take a look at this?
> If this is true, the issue is embedded in the host/vm lifecycle, and should
> be handled by the appropriate experts.


Hi Omer,

Barak suggested you may be the right contact point for this, as the problem seems to be initiated by a VM stop operation taking too long.

Can you take a look please?
Thanks!

Comment 25 Michal Skrivanek 2014-04-02 10:44:03 UTC

continuing from comment #19, vdsm side, Thread-989013:

1) system seems to be heavily overloaded, all the operations seem to take very long time
2) 
  a) the destroy operation got stuck in libvirt for cca 16s
  b) this is followed by 30s wait in vdsm before we retry to kill it
  c) second attempt to destroy domain failed in libvirt after next 16s
       followed by a bogus WARN message that the VM is not running when it actually is 
  d) we continue with freeing up the resources regardless
  e) teardownVolumePath called at ~00:54:01. Followed by a flood of storage-related delays and errors
  f) finally "destroyed" at 00:55:51 and returning Machine destroyed (even when it's not)
 
assuming host time is in sync with engine(looks ok as per the end time), the overall performance is really bad. The destroy command got logged in vdsm at 00:52:56 whereas engine sent it at 00:52:48

we can consider spawning a separate thread for destroy flow in the future.

Comment 26 Michal Skrivanek 2014-04-02 11:18:57 UTC

from my perspective the NonOperational here is the best resolution.
The state of the VM in question is serious, it cannot be killed and it is reported to engine as down when it is not.

We do not want to do change suggested in comment #22 as we don't want to get stats during the state transition commands

IMO with this kind of unresponsive system we can't do anything else but tweak(increase) the timeouts all around the place. Perhaps some process/IO priorities should be set differently.

@Roman, libvirt log would be handy

@Barak - infra insights regarding the responsiveness?

@Sean - I think it's back on storage. The induced load and delay in lvm operations seems to be the most substantial part of the delays

@Francesco - let's try to make "destroy" more reliable and tweak the wait times

Comment 27 Liron Aravot 2014-04-02 12:09:47 UTC

Michal, I don't think that this is the issue here.
The destory operation/other operations that take that lock can always take time for whatever reason.

I'm not sure why we don't want to collect vm stats or to have exclusive lock between various unrelated operations (i think that we should change that anyway), specifically -  other stats (let's leave vm stats "out" for the discussion) can and should be collected/processed as they aren't related to vms and shouldn't be using the same exclusive lock.

Comment 28 Michal Skrivanek 2014-04-02 15:47:25 UTC

Liron, it's not designed to take time. Operations which are supposed to take time are spawning its own thread. The fact that it takes 62s is a mistake we should address, however in the overall duration it's not that significant. There is something seriously wrong with that VM since it can't be killed. 
VM stats:
It's true we don't need to lock for unrelated operations, but this one is related - destroy is a state transition and we don't want to get stats in the middle.
Host stats:
these, on the other hand, should be collected/called regardless VM-related operations. The trouble is the code is all in one place, there is only a single timer scheduled after the previous call finishes. This needs infra changes.
I'll follow up with Roy Golan on this


As noted before, the host is under extreme load, to the state when it is almost unusable, there's not silver bullet for that. I believe that needs to be addressed (or at least understood) first.

Comment 29 Roy Golan 2014-04-03 08:37:35 UTC

(In reply to Michal Skrivanek from comment #28)
> Liron, it's not designed to take time. Operations which are supposed to take
> time are spawning its own thread. The fact that it takes 62s is a mistake we
> should address, however in the overall duration it's not that significant.
> There is something seriously wrong with that VM since it can't be killed. 
> VM stats:
> It's true we don't need to lock for unrelated operations, but this one is
> related - destroy is a state transition and we don't want to get stats in
> the middle.
> Host stats:
> these, on the other hand, should be collected/called regardless VM-related
> operations. The trouble is the code is all in one place, there is only a
> single timer scheduled after the previous call finishes. This needs infra
> changes.
> I'll follow up with Roy Golan on this
> 
> 
> As noted before, the host is under extreme load, to the state when it is
> almost unusable, there's not silver bullet for that. I believe that needs to
> be addressed (or at least understood) first.

agree - from the system point of view, this host, being under stress and slow to react to the engine command is INDEED Not Operational. 

all the lock and threads contention on the shared lock is a side effect and merely noise for that matter.

Comment 30 Nir Soffer 2014-04-03 11:28:26 UTC

(In reply to Michal Skrivanek from comment #28)
> Liron, it's not designed to take time

Then the design is wrong and should be fixed.
 
> Host stats:
> these, on the other hand, should be collected/called regardless VM-related
> operations. 

These *must* be collected regardless of other operations.

> As noted before, the host is under extreme load, to the state when it is
> almost unusable, there's not silver bullet for that. I believe that needs to
> be addressed (or at least understood) first.

Host being overloaded is not a bug - this is normal situation that engine must handle.

Comment 31 Nir Soffer 2014-04-03 11:43:09 UTC

(In reply to Roy Golan from comment #29)
> (In reply to Michal Skrivanek from comment #28)
> agree - from the system point of view, this host, being under stress and
> slow to react to the engine command is INDEED Not Operational. 

It is operational, but slow; It is possible that vdsm is very slow becasue of memory swapping, and other processes are in good enough state. We don't have enough infomration about host health.

> all the lock and threads contention on the shared lock is a side effect and
> merely noise for that matter.

Taking exclusive lock and blocking unrelated operations is not noise.

Comment 32 Nir Soffer 2014-04-03 11:43:43 UTC

Doron: do we collect and analyze some info about host health in engine, that could help avoid such issues on the host?

Comment 33 Michal Skrivanek 2014-04-03 12:37:01 UTC

(In reply to Nir Soffer from comment #30)
Host being overloaded is not a bug, but it's not a normal situation either. If vdsm service is not operational then from RHEV point of view the host is not operational. I don't care about other services for this resolution.

(In reply to Nir Soffer from comment #32)
yes, the host statistics. Though AFAIK we don't collect "load", just memory-related info which is not really enough
We probably should improve host monitoring, on the other hand it's not the core purpose of RHEV, so there needs to be a line.

Comment 41 Doron Fediuck 2014-04-06 11:49:43 UTC

(In reply to Nir Soffer from comment #32)
> Doron: do we collect and analyze some info about host health in engine, that
> could help avoid such issues on the host?

Nir, the engine get the results of getVdsStats in the form of-
        cpuIdle = '99.72'
        cpuLoad = '0.14'
        cpuSys = '0.11'
        cpuSysVdsmd = '0.50'
        cpuUser = '0.17'
        cpuUserVdsmd = '0.62'

But we cannot really tell which process goes wild.
In 3.5 we plan to use cgroups limitations for VMs, to make sure a VM will
not go wild.

Comment 56 Michal Skrivanek 2014-05-19 11:25:46 UTC

bug 1091389 may help in the specific case of a problematic VM (as happened here), bug 1098791 should address the excessive duration of VM statistics processing(blocking the host stats), and rgolan to open another one for the actual separation of VM stats from host related stats(including storage domain monitoring) - Roy, please fill it in

Comment 57 Roy Golan 2014-05-19 12:22:44 UTC

oppend Bug 1099068

Comment 77 Allon Mureinik 2015-04-27 11:38:29 UTC

*** Bug 1214584 has been marked as a duplicate of this bug. ***

Comment 89 Pavel Zhukov 2015-08-11 11:51:12 UTC

Created attachment 1061456 [details]
lastCheck degradation

Comment 92 Nir Soffer 2015-09-09 16:31:00 UTC

May be related to bug 1247075.

I suggest to test with https://gerrit.ovirt.org/45738 when the patch is ready.

Comment 105 Yuri Obshansky 2016-02-02 13:52:27 UTC

The bug was verified on
rhevm-setup-plugin-ovirt-engine-common-3.6.3-0.1.el6.noarch
ovirt-engine-extension-aaa-jdbc-1.0.5-1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.6.3-0.1.el6.noarch
vdsm-python-4.17.19-0.el7ev.noarch
vdsm-hook-vmfex-dev-4.17.19-0.el7ev.noarch
vdsm-jsonrpc-4.17.19-0.el7ev.noarch
vdsm-yajsonrpc-4.17.19-0.el7ev.noarch
vdsm-xmlrpc-4.17.19-0.el7ev.noarch
vdsm-cli-4.17.19-0.el7ev.noarch
vdsm-4.17.19-0.el7ev.noarch
vdsm-infra-4.17.19-0.el7ev.noarch 

Since reproducing flow is not clear we decided to verify in following scenario:
- RHEVM has 20 NFS Storage Domains and 100 VMs
- I ran automatic load test of 100 concurrent threads which perform different operations on VMs such as getinfo/stop/start/etc using JMeter
- in the same time I manually performed actions such as deactivate/activate/dettach/attach of several SDs from 20 
- during test I collected statistics data of lastCheck per each SD and saved to file (see attached lastcheck_sd2.txt)

I don't see any slow down in Storage domain monitoring operations.

We can close bug if it is satisfied for developers.

Comment 106 Yuri Obshansky 2016-02-02 13:53:36 UTC

Created attachment 1120440 [details]
last check statistics - bug verification

Comment 107 Marina Kalinin 2016-02-02 14:09:29 UTC

Yuri, I am curious, if you ran the same tests before the fix and saw a significant difference. So that we can say that this is absolutely a fix.

Comment 108 Yuri Obshansky 2016-02-02 14:13:01 UTC

Unfortunately, I didn't run it before
Only with last version.
So, you must decide 
and take in consideration that we don't have time/resource to reproduce it with early version.

Comment 109 Pavel Zhukov 2016-02-03 15:32:06 UTC

(In reply to Yuri Obshansky from comment #105)
> The bug was verified on
> rhevm-setup-plugin-ovirt-engine-common-3.6.3-0.1.el6.noarch
> ovirt-engine-extension-aaa-jdbc-1.0.5-1.el6ev.noarch
> rhevm-setup-plugin-ovirt-engine-3.6.3-0.1.el6.noarch
> vdsm-python-4.17.19-0.el7ev.noarch
> vdsm-hook-vmfex-dev-4.17.19-0.el7ev.noarch
> vdsm-jsonrpc-4.17.19-0.el7ev.noarch
> vdsm-yajsonrpc-4.17.19-0.el7ev.noarch
> vdsm-xmlrpc-4.17.19-0.el7ev.noarch
> vdsm-cli-4.17.19-0.el7ev.noarch
> vdsm-4.17.19-0.el7ev.noarch
> vdsm-infra-4.17.19-0.el7ev.noarch 
> 
> Since reproducing flow is not clear we decided to verify in following
> scenario:
> - RHEVM has 20 NFS Storage Domains and 100 VMs
> - I ran automatic load test of 100 concurrent threads which perform
> different operations on VMs such as getinfo/stop/start/etc using JMeter
> - in the same time I manually performed actions such as
> deactivate/activate/dettach/attach of several SDs from 20 
> - during test I collected statistics data of lastCheck per each SD and saved
> to file (see attached lastcheck_sd2.txt)
> 
> I don't see any slow down in Storage domain monitoring operations.
> 
> We can close bug if it is satisfied for developers.

You have collected 167 stat info data and I don't think it's enough to verify the bug at all. Anyway you can check first few lines and last few lines. Looks like degradation is still here. At least there are values >5 secs (and even >10 secs) which is not normal.

Comment 110 Yuri Obshansky 2016-02-03 15:42:01 UTC

(In reply to Pavel Zhukov from comment #109)
> (In reply to Yuri Obshansky from comment #105)
> > The bug was verified on
> > rhevm-setup-plugin-ovirt-engine-common-3.6.3-0.1.el6.noarch
> > ovirt-engine-extension-aaa-jdbc-1.0.5-1.el6ev.noarch
> > rhevm-setup-plugin-ovirt-engine-3.6.3-0.1.el6.noarch
> > vdsm-python-4.17.19-0.el7ev.noarch
> > vdsm-hook-vmfex-dev-4.17.19-0.el7ev.noarch
> > vdsm-jsonrpc-4.17.19-0.el7ev.noarch
> > vdsm-yajsonrpc-4.17.19-0.el7ev.noarch
> > vdsm-xmlrpc-4.17.19-0.el7ev.noarch
> > vdsm-cli-4.17.19-0.el7ev.noarch
> > vdsm-4.17.19-0.el7ev.noarch
> > vdsm-infra-4.17.19-0.el7ev.noarch 
> > 
> > Since reproducing flow is not clear we decided to verify in following
> > scenario:
> > - RHEVM has 20 NFS Storage Domains and 100 VMs
> > - I ran automatic load test of 100 concurrent threads which perform
> > different operations on VMs such as getinfo/stop/start/etc using JMeter
> > - in the same time I manually performed actions such as
> > deactivate/activate/dettach/attach of several SDs from 20 
> > - during test I collected statistics data of lastCheck per each SD and saved
> > to file (see attached lastcheck_sd2.txt)
> > 
> > I don't see any slow down in Storage domain monitoring operations.
> > 
> > We can close bug if it is satisfied for developers.
> 
> You have collected 167 stat info data and I don't think it's enough to
> verify the bug at all. Anyway you can check first few lines and last few
> lines. Looks like degradation is still here. At least there are values >5
> secs (and even >10 secs) which is not normal.

Actually, that stat is second part of test. 
First part was only load and second load and operations with SD
If you say it is not normal, so I'll reopen bug.

Comment 111 Nir Soffer 2016-02-03 16:48:25 UTC

(In reply to Pavel Zhukov from comment #109)
> You have collected 167 stat info data and I don't think it's enough to
> verify the bug at all. Anyway you can check first few lines and last few
> lines. Looks like degradation is still here. At least there are values >5
> secs (and even >10 secs) which is not normal.

lastCheck values around 10 seconds are normal, there is no difference between 0-10
seconds (it just show the difference between the time engine did the query and the 
time the domain monitor thread wakes up.

Values bigger then 20 seconds mean that domain monitor is stuck on storage, this
is not a bug in itself.

Yuri, can you explain what are the result you expected, and why you think that 
the behavior is incorrect?

Comment 112 Nir Soffer 2016-02-03 16:51:26 UTC

Yuri, the description of the bug says:

Description of problem:
	The monitoring thread is triggered every 10 seconds by default. As soon
as a storage operation is triggered ( copy image for instance ), the monitoring
thread slows down. 

Did you do any copy image operations? can you show that the monitor threads are
slowing down significantly while storage operation are running?

Comment 113 Yuri Obshansky 2016-02-04 07:43:58 UTC

Nir, this is weird bug.

The description of the bug says:
Steps to Reproduce:
	Not clear yet

So, I don't have any idea how to reproduce it.
I did what I did.  
PMs and developers must decide to close it or not.
If not, Roman or somebody else must provide clear scenario and expected result for reproducing that weird bug.

I have 20 scale bugs in queue for reproducing 
so, I'm not able invest more time in it.

Hope, you all understand the situation.

Comment 114 Pavel Zhukov 2016-02-04 09:19:02 UTC

(In reply to Nir Soffer from comment #111)
> (In reply to Pavel Zhukov from comment #109)
> > You have collected 167 stat info data and I don't think it's enough to
> > verify the bug at all. Anyway you can check first few lines and last few
> > lines. Looks like degradation is still here. At least there are values >5
> > secs (and even >10 secs) which is not normal.
> 
Hi Nir,
Even in this small test we can see values 10.1 . I think if disks operation will be triggered as you suggested  and test will take at least few hours we'll see bigger values. So as suggested It's better to test it under load and for some reasonable amount of time...
1) Attach 40 SDs
2) Create VMs (XX per domain?)
3) Create templates from let's say 2 VMs wait for 15 minutes, create templates etc LSM can be used instead of templates or in addition to
4) Check vdsm.log

Does it make sense?

> lastCheck values around 10 seconds are normal, there is no difference
> between 0-10
> seconds (it just show the difference between the time engine did the query
> and the 
> time the domain monitor thread wakes up.
Agree but if domain monitor wakes up in 20 secs (for example) it means it slows down as bug description says. Please correct me if I'm mistaken. 
> 
> Values bigger then 20 seconds mean that domain monitor is stuck on storage,
> this
> is not a bug in itself.
Not sure if I understood this. We have "delay" value there which is pretty stable usually but lastCheck values grows constantly. I can be mistaken here but it doesn't look as monitoring thread is stuck on storage. And the problem is this bug can overlap with general vdsm performance bug we have...
> 
> Yuri, can you explain what are the result you expected, and why you think
> that 
> the behavior is incorrect?

Comment 115 Nir Soffer 2016-02-04 10:02:35 UTC

(In reply to Pavel Zhukov from comment #114)
> (In reply to Nir Soffer from comment #111)
> > (In reply to Pavel Zhukov from comment #109)
> > > You have collected 167 stat info data and I don't think it's enough to
> > > verify the bug at all. Anyway you can check first few lines and last few
> > > lines. Looks like degradation is still here. At least there are values >5
> > > secs (and even >10 secs) which is not normal.
> > 
> Hi Nir,
> Even in this small test we can see values 10.1 .

10.1 does not mean anything, good as 0.1.

> I think if disks operation
> will be triggered as you suggested  and test will take at least few hours
> we'll see bigger values. So as suggested It's better to test it under load
> and for some reasonable amount of time...
> 1) Attach 40 SDs
> 2) Create VMs (XX per domain?)
> 3) Create templates from let's say 2 VMs wait for 15 minutes, create
> templates etc LSM can be used instead of templates or in addition to
> 4) Check vdsm.log
> 
> Does it make sense?

Yes.

I would do this:

1. baseline
- 50 storage domains
- run one vm per core
- no storage operations

2. with storage operations
- 50 storage domains
- run one vm per core
- peform lsm on each vm disk, keeping 10 concurrent operations (vdsm will
  not run more than 10 at the same time)

Compare the logs of both runs.

> > lastCheck values around 10 seconds are normal, there is no difference
> > between 0-10
> > seconds (it just show the difference between the time engine did the query
> > and the 
> > time the domain monitor thread wakes up.
> Agree but if domain monitor wakes up in 20 secs (for example) it means it
> slows down as bug description says. Please correct me if I'm mistaken.

20 seconds means that the domain monitor was blocked during monitoring for
about 10 seconds. If during this time the storage is accessible, it means
that the monitor thread was not block on storage but on some lock in vdsm.

Due to the way vdsm is designed, we cannot guarantee the internal of the 
monitor threads; the thread sleeps 10 seconds, then perform operations
that usually takes less then a second. But every 5 minutes, the domain
monitor preform a manual refresh of the storage domain, and this may involve
scsi/hba/multipath scan, which can block for 30 seconds, or even more
(in abnormal conditions, we have seen multipath calls blocked forever).

Refreshing storage and rescaning multipath is protected by locks, so only
one thread can perform them. If multiple threads are waiting on these 
locks, only one of them will perform the operation, and the rest use
the return value of the thread that got that lock. However, if the 
operation takes lot of time, all the monitor threads will be blocked, 
causing high and increasing lastCheck values until the thread that
took the lock finish and all threads return.

I think this part in vdsm requires a redesign, but this can be done 
only for 4.x.

So what can we expect from the current code?
- lastCheck values mostly around 10 seconds when storage is accessible
- Occasional high values (e.g. 1--30) are possible due the the issue
  described above

lastCheck values up to 30 seconds do not effect the normal operation
of the system, since engine wait about 5 minutes before deciding that
a domain is in trouble.

If engine decide that storage domain is in trouble while the domain is
accessible, the monitoring mechanism failed, and should be improved. 

> > Values bigger then 20 seconds mean that domain monitor is stuck on storage,
> > this
> > is not a bug in itself.
> Not sure if I understood this. We have "delay" value there which is pretty
> stable usually but lastCheck values grows constantly.

Can you show an example of lastCheck values growing constantly in this test?

This can be caused if one monitor thread is block forever calling multipath,
and other threads are blocked waiting for this thread. We have seen this
in rhel 7 (also in our lab), and it should be fixed in 7.2.

Please point me to a bug or customer case with such issue.

> I can be mistaken here
> but it doesn't look as monitoring thread is stuck on storage. And the
> problem is this bug can overlap with general vdsm performance bug we have...

The performance issue when running many vms on many cores should be solved
in 3.6 (and in 3.5.6 with manual configuration) using cpu affinity.

Comment 116 Yuri Obshansky 2016-02-04 14:06:37 UTC

no info
i'm in the middle of other bugs verification
reassign to Roman

Comment 117 Pavel Zhukov 2016-02-06 06:35:33 UTC

So.
rhevm-3.6.2.6-0.1.el6.noarch
vdsm-4.17.18-0.el7ev.noarch

I tried to add 50 iscsi SDs with a script. After adding 30th SD host became unresponsive and lastCheck value reached 70+ secs. It was first attempt and nested virtualization was used (I know it's unsupported). 
Second attempt was using more powerful bare metal host: 12 cores / 32GB RAM and the exactly after adding 30th SD host started flipped between active and NonResponsive with "TimeOut exception". Host was accessible via ping/ssh
All three hosts (rhevm, rhelh and LIO) are in the same physical datacenter (rdu). VMs/Images were not created yet

Comment 118 Pavel Zhukov 2016-02-06 06:36:45 UTC

Created attachment 1121610 [details]
vdsm logs

Comment 119 Pavel Zhukov 2016-02-06 06:38:01 UTC

Created attachment 1121611 [details]
engine logs

Comment 124 Nir Soffer 2016-02-09 16:29:46 UTC

(In reply to Pavel Zhukov from comment #117)
> So.
> rhevm-3.6.2.6-0.1.el6.noarch
> vdsm-4.17.18-0.el7ev.noarch
> 
> I tried to add 50 iscsi SDs with a script. After adding 30th SD host became
> unresponsive and lastCheck value reached 70+ secs. It was first attempt and
> nested virtualization was used (I know it's unsupported). 
> Second attempt was using more powerful bare metal host: 12 cores / 32GB RAM
> and the exactly after adding 30th SD host started flipped between active and
> NonResponsive with "TimeOut exception". Host was accessible via ping/ssh
> All three hosts (rhevm, rhelh and LIO) are in the same physical datacenter
> (rdu). VMs/Images were not created yet

This is interesting results, but I'm not sure it is related to this bug, since
adding 50 domains using a script is not a common use case, and this is not what
this bug is about.

Please open a new bug for this issue, and we will continue there.

Comment 125 Pavel Zhukov 2016-02-09 16:57:39 UTC

(In reply to Nir Soffer from comment #124)
> (In reply to Pavel Zhukov from comment #117)
> > So.
> > rhevm-3.6.2.6-0.1.el6.noarch
> > vdsm-4.17.18-0.el7ev.noarch
> > 
> > I tried to add 50 iscsi SDs with a script. After adding 30th SD host became
> > unresponsive and lastCheck value reached 70+ secs. It was first attempt and
> > nested virtualization was used (I know it's unsupported). 
> > Second attempt was using more powerful bare metal host: 12 cores / 32GB RAM
> > and the exactly after adding 30th SD host started flipped between active and
> > NonResponsive with "TimeOut exception". Host was accessible via ping/ssh
> > All three hosts (rhevm, rhelh and LIO) are in the same physical datacenter
> > (rdu). VMs/Images were not created yet
> 
> This is interesting results, but I'm not sure it is related to this bug,
> since
> adding 50 domains using a script is not a common use case, and this is not
> what
> this bug is about.
> 
> Please open a new bug for this issue, and we will continue there.
I will investigate it but for me it looks like the same issue (storage operation slows vdsm thread down). 
Anyway bug was reproduced many times with blockSD but QA try to "verify" it using NFS.

Comment 126 Pavel Zhukov 2016-02-09 17:49:36 UTC

Installation as per #117 (physical host).

1) DC with 28 SDs was stable last two days. 
2) Create 4 disks 1GB each (no VMs so VM monitoring is not involved here as well). No script were involved. lastCheck grew a bit during the creation (up to 40 secs). See logs
3) Copy 4 disks to another SD
4) After some time lastCheck value start growing and reached values more than 200 secs (seems like few SDs (more than 2 affected by coping) are not updating at all)
5) Host marked as non-responsive.

Comment 127 Pavel Zhukov 2016-02-09 17:51:15 UTC

Created attachment 1122471 [details]
vdsm log

Comment 128 Pavel Zhukov 2016-02-09 17:52:20 UTC

Created attachment 1122472 [details]
engine log

Comment 129 Nir Soffer 2016-02-09 17:58:13 UTC

(In reply to Pavel Zhukov from comment #126)
> Installation as per #117 (physical host).
> 
> 1) DC with 28 SDs was stable last two days. 
> 2) Create 4 disks 1GB each (no VMs so VM monitoring is not involved here as
> well). No script were involved. lastCheck grew a bit during the creation (up
> to 40 secs). See logs
> 3) Copy 4 disks to another SD
> 4) After some time lastCheck value start growing and reached values more
> than 200 secs (seems like few SDs (more than 2 affected by coping) are not
> updating at all)
> 5) Host marked as non-responsive.

Thats looks like good reproduction, thanks!

Can give more details:
- engine, vdsm versions
- os version
- storage domain setup - maybe you can dump relevant parts from engine db so we
  can reproduce your setup here?

Comment 130 Pavel Zhukov 2016-02-09 18:21:35 UTC

I *think* the problem is in LVM locking mechanism. vdsm locks __all__ lvm calls at once and holds it. So in my scenario I can see vgs command takes ~5-10 secs to execute. Because of global lock 28*10 = 280 secs (worst case) + <time to acquire/release lock>. Sequential execution is not good in any case.

I *think* vgs calls can be groupped somehow. Vgs allows to specify several VGs at once [1]. So at the moment of first vgs releases the lock second command can be called with 27 VGs as agrument and it'll save 270 secs of doing nothing.

Anyway I don't know the reason of so aggressive "global" lock as far as parallel calls of vgs with different vg names should not be a problem. It's read-only operation, nothing more.

Nir, Does it make sense?

-- 
Pavel 

[1]
[root@]# time /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762 
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3|bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762|wz--n-|15703474176|11274289152|134217728|117|84|MDT_CLASS=Data,MDT_DESCRIPTION=lunx24,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:360014050775ed117965422bab1d61114&44&uuid:R7HTan-lMBc-cOEI-ZiAc-nE2r-1xoW-hoBZBk&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3,MDT__SHA_CKSUM=4e2014469f3a1c011be82df22040952130de622e,RHAT_storage_domain|134217728|67103744|8|1|/dev/mapper/360014050775ed117965422bab1d61114

real	0m12.203s
user	0m0.091s
sys	0m0.381s




[root@ ~]# time /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762  afcc0f16-817f-42ea-b6ea-e8c2db00db6b
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  tOdMfQ-su1f-M2Jn-VJ1i-QL1m-QneI-mRtWio|afcc0f16-817f-42ea-b6ea-e8c2db00db6b|wz--n-|15703474176|6979321856|134217728|117|52|MDT_CLASS=Data,MDT_DESCRIPTION=lunx2,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:36001405cd364bdaf252491b8bd1edbb8&44&uuid:EEoLFE-a2pP-K4xp-PrvE-9e3q-lewh-TrFbuM&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=afcc0f16-817f-42ea-b6ea-e8c2db00db6b,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tOdMfQ-su1f-M2Jn-VJ1i-QL1m-QneI-mRtWio,MDT__SHA_CKSUM=63dad6e7206036bae63eb3476489c59d3a597438,RHAT_storage_domain|134217728|67102208|12|1|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8
  tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3|bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762|wz--n-|15703474176|11274289152|134217728|117|84|MDT_CLASS=Data,MDT_DESCRIPTION=lunx24,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:360014050775ed117965422bab1d61114&44&uuid:R7HTan-lMBc-cOEI-ZiAc-nE2r-1xoW-hoBZBk&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3,MDT__SHA_CKSUM=4e2014469f3a1c011be82df22040952130de622e,RHAT_storage_domain|134217728|67103744|8|1|/dev/mapper/360014050775ed117965422bab1d61114

real	0m9.989s
user	0m0.068s
sys	0m0.166s

Comment 131 Pavel Zhukov 2016-02-09 18:30:04 UTC

(In reply to Nir Soffer from comment #129)
> Can give more details:
> - engine, vdsm versions
It's in comment #117
rhevm-3.6.2.6-0.1.el6.noarch
vdsm-4.17.18-0.el7ev.noarch

> - os version
RHEL6 engine with latest updates 
RHEL7 hypervisor with latest updates
> - storage domain setup - maybe you can dump relevant parts from engine db so
> we
>   can reproduce your setup here?
I'm attaching db dump. 
Nothing special. LIO target with 50 LUNs. Underlying backstores are partitions 15GB each. I don't have many options here because of lack of hardware.
Set as:
for index in {1..50}; do targetcli "/iscsi/iqn.2003-01.org.linux-iscsi.dell-per420-1.x8664:sn.d25a24ca3cf2/tpg1/luns create /backstores/block/lunx$index"; done

Comment 132 Pavel Zhukov 2016-02-09 18:30:49 UTC

Created attachment 1122476 [details]
engine db dump

Comment 133 Nir Soffer 2016-02-09 18:36:19 UTC

(In reply to Pavel Zhukov from comment #130)
> I *think* the problem is in LVM locking mechanism. vdsm locks __all__ lvm
> calls at once and holds it. 

There is no such global lock. We have an operation mutex that let many
concurrent calls of one type, blocking other call types. This was added
because of some issue in lvm with mixing several types of calls.

I guess we need to revisit this lock again, maybe it is not needed in current
lvm version.

> So in my scenario I can see vgs command takes
> ~5-10 secs to execute. Because of global lock 28*10 = 280 secs (worst case)
> + <time to acquire/release lock>. Sequential execution is not good in any
> case.
> 
> I *think* vgs calls can be groupped somehow. Vgs allows to specify several
> VGs at once [1]. So at the moment of first vgs releases the lock second
> command can be called with 27 VGs as agrument and it'll save 270 secs of
> doing nothing.
> 
> Anyway I don't know the reason of so aggressive "global" lock as far as
> parallel calls of vgs with different vg names should not be a problem. It's
> read-only operation, nothing more.
> 
> Nir, Does it make sense?

Many threads waiting on lvm lock can explain this, specially if accessing storage
becomes slow because of the other storage operations.

I did not look at the logs yet.

Thanks for this research, I think we have good direction here.

Comment 134 Eldad Marciano 2016-02-10 00:06:49 UTC

(In reply to Nir Soffer from comment #129)
> (In reply to Pavel Zhukov from comment #126)
> > Installation as per #117 (physical host).
> > 
> > 1) DC with 28 SDs was stable last two days. 
> > 2) Create 4 disks 1GB each (no VMs so VM monitoring is not involved here as
> > well). No script were involved. lastCheck grew a bit during the creation (up
> > to 40 secs). See logs
> > 3) Copy 4 disks to another SD
> > 4) After some time lastCheck value start growing and reached values more
> > than 200 secs (seems like few SDs (more than 2 affected by coping) are not
> > updating at all)
> > 5) Host marked as non-responsive.
> 
> Thats looks like good reproduction, thanks!
> 
> Can give more details:
> - engine, vdsm versions
> - os version
> - storage domain setup - maybe you can dump relevant parts from engine db so
> we
>   can reproduce your setup here?

Pavel,
I try to reproduce your case in our lab, but unfortunately it is not reproduce.

our lab distribution:
rhevm-3.6.3
vdsm-4.17.19-0.el7ev.noarch + enabling CPU affinity.

storage:
nfs on top of netapp gear

network:
10gig

bare-metal hosts (engine and vdsm) with 24 cores 64GB RAM.

vdsm:
running 50SD's 24VM's.

4 new disks were created and copied nice & fast.

Comment 135 Pavel Zhukov 2016-02-10 07:23:47 UTC

(In reply to Eldad Marciano from comment #134)
> (In reply to Nir Soffer from comment #129)
> > (In reply to Pavel Zhukov from comment #126)
> > > Installation as per #117 (physical host).
> > > 
> > > 1) DC with 28 SDs was stable last two days. 
> > > 2) Create 4 disks 1GB each (no VMs so VM monitoring is not involved here as
> > > well). No script were involved. lastCheck grew a bit during the creation (up
> > > to 40 secs). See logs
> > > 3) Copy 4 disks to another SD
> > > 4) After some time lastCheck value start growing and reached values more
> > > than 200 secs (seems like few SDs (more than 2 affected by coping) are not
> > > updating at all)
> > > 5) Host marked as non-responsive.
> > 
> > Thats looks like good reproduction, thanks!
> > 
> > Can give more details:
> > - engine, vdsm versions
> > - os version
> > - storage domain setup - maybe you can dump relevant parts from engine db so
> > we
> >   can reproduce your setup here?
> 
> Pavel,
> I try to reproduce your case in our lab, but unfortunately it is not
> reproduce.
> 
> our lab distribution:
> rhevm-3.6.3
> vdsm-4.17.19-0.el7ev.noarch + enabling CPU affinity.
> 
> storage:
> nfs on top of netapp gear
Eldad,
Please do NOT use nfs storage to reproduce the issue because it's (most probably) in LVM locking.
Use iscsi or FC (See comments #122, #130, #131)
> 
> network:
> 10gig
> 
> bare-metal hosts (engine and vdsm) with 24 cores 64GB RAM.
> 
> vdsm:
> running 50SD's 24VM's.
> 
> 4 new disks were created and copied nice & fast.

Comment 136 Pavel Zhukov 2016-02-10 07:48:43 UTC

I tried to  disable locking completely by commenting out acquire call and it made the env. completely unstable (vgs took 50-60 secs) even if not storage operation performed. So running 20+ vgs calls in parallel is not good idea for sure.

Comment 137 ania 2016-02-10 08:17:29 UTC

-- srm upate 

The customer affected by this bug is willing to test the snapshot 3 of beta and report results.  

Do you think it would help to work on this bz or should they wait?  If you agree it would be helpful if they tested 

1. What test scenario should we agree on to be tested so it is compatible with out tests
2. what type of output / results format should they provide?
3. should they attach it directly to this BZ?

kind regards

Ania

Comment 138 Nir Soffer 2016-02-10 08:45:40 UTC

(In reply to ania from comment #137)
> -- srm upate 
> 
> The customer affected by this bug is willing to test the snapshot 3 of beta
> and report results.  
> 
> Do you think it would help to work on this bz or should they wait?  If you
> agree it would be helpful if they tested 

Looks like the issue with lot of block storage domains is not fixed yet,
so the only reason to test is it is to check if they are effected by this
scale issue.

> 1. What test scenario should we agree on to be tested so it is compatible
> with out tests
> 2. what type of output / results format should they provide?
> 3. should they attach it directly to this BZ?

They should test their normal setup and flows, and provide engine and vdsm
logs.

If their setup includes file based storage domain, or few block storage domain
they can test that the system is stable when performing many storage
operations (e.g. copying disks from one domain to another, creating
vms by cloning templates).

Otherwise, if their setup includes many block storage domains, they should
wait until we improve the situation.

Note that this is not a new issue (should be same in 3.3, 3.4, 3.5), so it 
should not delay 3.6 upgrade plans.

Comment 139 Nir Soffer 2016-02-10 09:30:54 UTC

(In reply to Pavel Zhukov from comment #136)
> I tried to  disable locking completely by commenting out acquire call and it
> made the env. completely unstable (vgs took 50-60 secs) even if not storage
> operation performed. So running 20+ vgs calls in parallel is not good idea
> for sure.

Pavel, the operation mutex does not prevent parallel vgs/lvs/pvs commands, it
only prevents running them in the same time the lvm cache is invalidated, or
the other way around, invaliding the cache while any of these command is
running.

Invalidate ops are very fast, these only clear lvs/vgs/pvs cache,
so they never delay reload operations.

Reload operation are slow, since they must go to storage. If we
perform lot of io in the same time, they become much slower.

So the result of this locking mechanism, is starvation of invalidate
cache operations, waiting on reload operations. Given enough
reload operations, invalidate operations take lot of time to get 
the lock.

It seems that the common flow is:

1. perform invalidate operation
2. perform reload operation

In the log we can see many threads waiting for invalidate operation,
then the first thread gets in, finish the invalidate operation, and taking
the lock again for reload operation, before the other invalidate operation
gets in.

This effectively perform as a big lock, letting only one thread run at a time,
instead running multiple operation concurrently.

So the issues are:
1. Bad locking limiting concurrency and causing unneeded delays
2. Need to check why we need so many invalidate and refresh operations,
   and if we can separate these operations from the monitoring threads.

Here is an example of such flow:

1. Task thread try to do lvm reload, taking the operation mutex

d742dc20-58e2-4dc8-b3fd-0ed955893b00::DEBUG::2016-02-09 12:38:48,140::lvm::427::Storage.OperationMutex::(_reloadlvs) Operation 'lvm reload operation' got the operation mutex

2. Task starts lvs command...

d742dc20-58e2-4dc8-b3fd-0ed955893b00::DEBUG::2016-02-09 12:38:48,140::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags afcc0f16-817f-42ea-b6ea-e8c2db00db6b (cwd None)

3. While this command is running, other threads are waiting for the lock

Thread-377::DEBUG::2016-02-09 12:38:54,428::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-270::DEBUG::2016-02-09 12:38:54,609::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-254::DEBUG::2016-02-09 12:38:56,831::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-105::DEBUG::2016-02-09 12:38:56,998::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-161::DEBUG::2016-02-09 12:39:00,019::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-692::DEBUG::2016-02-09 12:39:01,075::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...

4. Task finish lvs command and release operation mutex

d742dc20-58e2-4dc8-b3fd-0ed955893b00::DEBUG::2016-02-09 12:39:01,145::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0
d742dc20-58e2-4dc8-b3fd-0ed955893b00::DEBUG::2016-02-09 12:39:01,146::lvm::462::Storage.OperationMutex::(_reloadlvs) Operation 'lvm reload operation' released the operation
mutex

5. Thread-377 get the operation mutex for invalidate

Thread-377::DEBUG::2016-02-09 12:39:01,160::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-377::DEBUG::2016-02-09 12:39:01,161::lvm::505::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex
Thread-377::DEBUG::2016-02-09 12:39:01,161::lvm::513::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-377::DEBUG::2016-02-09 12:39:01,162::lvm::525::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex

6. But before other invalidation thread get in the mutex, it takes the mutex for reload operation

Thread-377::DEBUG::2016-02-09 12:39:01,162::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex

7. And run vgs command

Thread-377::DEBUG::2016-02-09 12:39:01,163::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name e7c912df-a24e-447e-8d3a-8d1f484004df (cwd None)

8. Other invalidation threads are waiting again...

Thread-270::DEBUG::2016-02-09 12:39:01,176::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-254::DEBUG::2016-02-09 12:39:01,191::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-105::DEBUG::2016-02-09 12:39:01,192::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-161::DEBUG::2016-02-09 12:39:01,192::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-692::DEBUG::2016-02-09 12:39:01,192::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-563::DEBUG::2016-02-09 12:39:01,775::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1122::DEBUG::2016-02-09 12:39:01,925::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1225::DEBUG::2016-02-09 12:39:06,869::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-240::DEBUG::2016-02-09 12:39:11,163::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-845::DEBUG::2016-02-09 12:39:11,869::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...

9. Thread-377 finish reload operation and release the mutex

Thread-377::DEBUG::2016-02-09 12:39:12,059::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0
Thread-377::DEBUG::2016-02-09 12:39:12,059::lvm::415::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex

10. And now it take the operation mutex for reload

Thread-270::DEBUG::2016-02-09 12:39:12,076::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-270::DEBUG::2016-02-09 12:39:12,077::lvm::505::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex
Thread-270::DEBUG::2016-02-09 12:39:12,077::lvm::513::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-270::DEBUG::2016-02-09 12:39:12,078::lvm::525::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-270::DEBUG::2016-02-09 12:39:12,078::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex

11. Other threads still waiting

Thread-254::DEBUG::2016-02-09 12:39:12,092::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-105::DEBUG::2016-02-09 12:39:12,093::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-161::DEBUG::2016-02-09 12:39:12,093::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-692::DEBUG::2016-02-09 12:39:12,093::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-563::DEBUG::2016-02-09 12:39:12,093::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1122::DEBUG::2016-02-09 12:39:12,094::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1225::DEBUG::2016-02-09 12:39:12,094::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-240::DEBUG::2016-02-09 12:39:12,094::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-845::DEBUG::2016-02-09 12:39:12,094::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...

12. New task thread join and wait for reload

d742dc20-58e2-4dc8-b3fd-0ed955893b00::DEBUG::2016-02-09 12:39:12,668::lvm::513::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm reload operation' is holding the operation mutex, waiting...

13. More waiting threads

Thread-223::DEBUG::2016-02-09 12:39:14,679::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-726::DEBUG::2016-02-09 12:39:16,732::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-359::DEBUG::2016-02-09 12:39:21,238::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-488::DEBUG::2016-02-09 12:39:23,232::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-322::DEBUG::2016-02-09 12:39:25,733::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-339::DEBUG::2016-02-09 12:39:28,718::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-419::DEBUG::2016-02-09 12:39:31,299::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...

14. Thread-270 finish the reload and release the operation mutex

Thread-270::DEBUG::2016-02-09 12:39:31,375::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enablin
g it!\n'; <rc> = 0
Thread-270::DEBUG::2016-02-09 12:39:31,375::lvm::415::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex

And so on...

Comment 140 Nir Soffer 2016-02-10 10:29:57 UTC

Following monitor thread for domain 338cd3ef-732b-4408-8882-028b7c47c791
since it had high average last check values:

domain: 338cd3ef-732b-4408-8882-028b7c47c791
  delay      avg: 0.003690 min: 0.000478 max: 0.104589
  last check avg: 14.743452 min: 0.000000 max: 202.500000


1. Starting domain refresh (done every 5 minutes)

Thread-1172::DEBUG::2016-02-09 12:40:06,265::monitor::314::Storage.Monitor::(_refreshDomain) Refreshing domain 338cd3ef-732b-4408-8882-028b7c47c791

2. Recreating domain object, reading domain metadata from storage

Thread-1172::DEBUG::2016-02-09 12:40:06,266::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with LvMetadataRW backend
Thread-1172::DEBUG::2016-02-09 12:40:06,266::blockSD::337::Storage.Misc.excCmd::(readlines) /usr/bin/taskset --cpu-list 0-23 /usr/bin/dd iflag=direct skip=0 bs=2048 if=/dev/
338cd3ef-732b-4408-8882-028b7c47c791/metadata count=1 (cwd None)
Thread-1172::DEBUG::2016-02-09 12:40:06,308::blockSD::337::Storage.Misc.excCmd::(readlines) SUCCESS: <err> = '1+0 records in\n1+0 records out\n2048 bytes (2.0 kB) copied, 0.
0270738 s, 75.6 kB/s\n'; <rc> = 0
Thread-1172::DEBUG::2016-02-09 12:40:06,308::misc::261::Storage.Misc::(validateDDBytes) err: ['1+0 records in', '1+0 records out', '2048 bytes (2.0 kB) copied, 0.0270738 s,
75.6 kB/s'], size: 2048
Thread-1172::DEBUG::2016-02-09 12:40:06,308::persistentDict::234::Storage.PersistentDict::(refresh) read lines (LvMetadataRW)=[]
Thread-1172::DEBUG::2016-02-09 12:40:06,309::persistentDict::252::Storage.PersistentDict::(refresh) Empty metadata
Thread-1172::DEBUG::2016-02-09 12:40:06,309::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with VGTagMetadataRW backend

3. Waiting to invalidate vg (done before reading vg metadata)

Thread-1172::DEBUG::2016-02-09 12:40:06,309::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:40:12,514::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:40:20,636::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:40:29,500::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:40:37,923::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:40:45,764::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:40:53,869::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:41:02,184::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:41:11,778::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:41:20,863::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:41:30,624::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:41:37,700::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:41:44,187::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:41:50,419::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:41:56,994::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:42:03,425::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:42:09,877::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:42:16,233::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:42:22,733::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:42:29,298::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:42:35,757::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:42:42,313::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:42:48,643::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:42:55,124::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1172::DEBUG::2016-02-09 12:43:01,609::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...

4. Got the lock after 3 minutes, invalidating the vg

Thread-1172::DEBUG::2016-02-09 12:43:07,896::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-1172::DEBUG::2016-02-09 12:43:07,896::lvm::505::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex

5. Invalidating vg also invalidates the vg lvs

Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::513::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::525::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex

6. Finally reloading vg

Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 338cd3ef-732b-4408-8882-028b7c47c791 (cwd None)

7. vg reload finished in 7 seconds

Thread-1172::DEBUG::2016-02-09 12:43:14,318::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0

8. Performing vgck (part of refresh domain flow)

Thread-1172::DEBUG::2016-02-09 12:43:14,321::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405716af920e2394b14a09d4c7be|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' 338cd3ef-732b-4408-8882-028b7c47c791 (cwd None)


9. vgck finished in 7 seconds

Thread-1172::DEBUG::2016-02-09 12:43:20,923::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0

Comment 141 Nir Soffer 2016-02-10 11:03:10 UTC

Secondary issue, that may cause reload vg commands to become much slower is not
using the short filter.

This command:

Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 338cd3ef-732b-4408-8882-028b7c47c791 (cwd None)

Will look for metadata in all the devices, instead of the devices belonging to this vg.

We don't have a choice when reading the vg info when vdsm starts, but once
got the vg details, we should continue to use only the vg pvs in the 
filter.

The vg filter should be invalidated only during extend vg flow, where engine
is notifying hosts about the event.

Pavel, can you check the difference when using only the devices which
are part of this vg on your environment?

Comment 143 Pavel Zhukov 2016-02-10 12:36:22 UTC

It's good idea because the difference is more than huge:

# time /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762 
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3|bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762|wz--n-|15703474176|11274289152|134217728|117|84|MDT_CLASS=Data,MDT_DESCRIPTION=lunx24,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:360014050775ed117965422bab1d61114&44&uuid:R7HTan-lMBc-cOEI-ZiAc-nE2r-1xoW-hoBZBk&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3,MDT__SHA_CKSUM=4e2014469f3a1c011be82df22040952130de622e,RHAT_storage_domain|134217728|67103744|8|1|/dev/mapper/360014050775ed117965422bab1d61114

real	0m10.811s
user	0m0.085s
sys	0m0.146s

# time  /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360014050775ed117965422bab1d61114'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762 
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  Invalid filter pattern "a|/dev/mapper/360014050775ed117965422bab1d61114".

real	0m0.014s
user	0m0.005s
sys	0m0.009s

Comment 144 Pavel Zhukov 2016-02-10 12:39:13 UTC

Sorry. Please ignore last result

Not so good:
# time /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762 
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3|bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762|wz--n-|15703474176|11274289152|134217728|117|84|MDT_CLASS=Data,MDT_DESCRIPTION=lunx24,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:360014050775ed117965422bab1d61114&44&uuid:R7HTan-lMBc-cOEI-ZiAc-nE2r-1xoW-hoBZBk&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3,MDT__SHA_CKSUM=4e2014469f3a1c011be82df22040952130de622e,RHAT_storage_domain|134217728|67103744|8|1|/dev/mapper/360014050775ed117965422bab1d61114

real	0m9.370s
user	0m0.084s
sys	0m0.171s


# time  /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360014050775ed117965422bab1d61114|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762 
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3|bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762|wz--n-|15703474176|11274289152|134217728|117|84|MDT_CLASS=Data,MDT_DESCRIPTION=lunx24,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:360014050775ed117965422bab1d61114&44&uuid:R7HTan-lMBc-cOEI-ZiAc-nE2r-1xoW-hoBZBk&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3,MDT__SHA_CKSUM=4e2014469f3a1c011be82df22040952130de622e,RHAT_storage_domain|134217728|67103744|8|1|/dev/mapper/360014050775ed117965422bab1d61114

real	0m6.255s
user	0m0.065s
sys	0m0.161s

Comment 146 Nir Soffer 2016-02-11 09:24:37 UTC

Pavel, would you like to test this patch?
http://gerrit.ovirt.org/53395

It cannot be applied to 3.6, so you have to build vdsm from master.

Comment 147 Pavel Zhukov 2016-02-11 18:44:31 UTC

Nir,
I'm sorry but I can't do it right now. Reservation time of my machines in Beaker expired. Will test it but later.

Comment 148 Pavel Zhukov 2016-02-15 13:02:03 UTC

(In reply to Nir Soffer from comment #146)
> Pavel, would you like to test this patch?
> http://gerrit.ovirt.org/53395
> 
> It cannot be applied to 3.6, so you have to build vdsm from master.

Nir,
I've tested the patch.
Everything works fine. Added 50SDs with script. Copying raw disks between storage domains.
Host is not flipped to Unresponsive.No error messages in engine log. 
lastCheck grew up to 50 secs during the most critical period (adding SD AND copying of the images) but host was OK.
Do you need logs?
 
Thank you!

Comment 150 Yaniv Lavi 2016-02-15 13:26:31 UTC

We will probably need full automation testing on the patch sometime before GA.

Comment 151 Pavel Zhukov 2016-02-15 13:31:28 UTC

I did more tests. 

Once host was restarted (SPM) the locking hell came back. I think it's because all vgs should be refreshed during "bootstrap".
The lastCheck value doesn't grow constantly as before but it can reach 110+ secs and host is marked as NonReponsive again. After some time host is back and contending starts but it happens again in the loop:

jsonrpc.Executor/6::DEBUG::2016-02-15 08:28:11,003::task::1192::Storage.TaskManager.Task::(prepare) Task=`320babd8-5707-4486-9d5d-313895c4346f`::finished: {u'21e029a3-0b54-4f8b-8ad0-b701895671a2': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0092922', 'lastCheck': '0.0', 'valid': True}, u'0f8241ba-76d7-402b-85c3-8e4eff0d443a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.034504', 'lastCheck': '189.4', 'valid': True}, u'6d78d86f-53c1-4b10-a5f0-ad62805b1363': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000952666', 'lastCheck': '194.2', 'valid': True}, u'aaf40b64-9dd3-414a-9911-f67bf33b8d6a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0153762', 'lastCheck': '189.5', 'valid': True}, u'd80a8920-deac-4b05-9301-6d61bbb53575': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00751565', 'lastCheck': '163.8', 'valid': True}, u'6944d637-f2d0-43ba-8a72-289bec00cc05': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0104843', 'lastCheck': '8.4', 'valid': True}, u'f69d5159-ee0e-4f08-af44-f7a9b34830b7': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0119308', 'lastCheck': '164.4', 'valid': True}, u'399899db-daad-4672-a4cf-fe5160ce3330': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00639321', 'lastCheck': '0.8', 'valid': True}, u'f9b91801-0aa5-437c-a5d4-418ada236e51': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0071744', 'lastCheck': '2.3', 'valid': True}, u'74d8f43a-2d88-4493-8d0c-a218c63fe46a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00921745', 'lastCheck': '8.0', 'valid': True}, u'552beec7-39cf-4ad0-b86f-9b2bc6985971': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000655045', 'lastCheck': '186.4', 'valid': True}, u'52895b61-44d8-4115-9113-240bbce232d3': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00884128', 'lastCheck': '9.2', 'valid': True}, u'34a397b1-33f4-4048-b663-471edd729d93': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0066063', 'lastCheck': '11.7', 'valid': True}, u'cb95b818-a8c3-41e4-8479-c5d7430e251f': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00130233', 'lastCheck': '192.1', 'valid': True}, u'8f468a9f-ce70-408d-8043-07c29aea55ec': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00738504', 'lastCheck': '5.8', 'valid': True}, u'0a8eeeb5-0b6f-42a6-9322-c3ba95e0276c': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0146081', 'lastCheck': '192.7', 'valid': True}, u'38ef7afc-9d9e-4abe-93c2-dc98f86a31f8': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.012955', 'lastCheck': '187.7', 'valid': True}, u'2112c800-e008-4d7b-9ead-ad35391632f8': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00912894', 'lastCheck': '187.9', 'valid': True}, u'c0c0d03c-a8c0-43d9-b2ad-7b6eb3a78163': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000700416', 'lastCheck': '188.6', 'valid': True}, u'05458af5-a8d3-4976-913b-baaede6e4dac': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0140394', 'lastCheck': '163.8', 'valid': True}, u'1a681425-4472-41b4-a87d-066c31fca4a9': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0107165', 'lastCheck': '9.8', 'valid': True}, u'f47faf12-8ac5-4f30-8f7d-d9d7658c54c2': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00296497', 'lastCheck': '164.1', 'valid': True}, u'1bd88c11-b163-4c78-b8d4-045e38e7bb66': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00836018', 'lastCheck': '4.2', 'valid': True}, u'9c7ab86d-5928-4c45-babe-7e3d24ceed7d': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00122086', 'lastCheck': '164.5', 'valid': True}, u'32710ccd-22d3-4585-809c-56bcecd83aa6': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000782966', 'lastCheck': '190.3', 'valid': True}, u'af17670b-1b1f-4174-8fd1-0dbcd8a0779f': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000632146', 'lastCheck': '164.4', 'valid': True}, u'6d1d310e-ef0e-4791-a1ce-e20e6811be0a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00889699', 'lastCheck': '190.8', 'valid': True}, u'e3a34534-f66a-46f4-80b3-3e225b553c39': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00923788', 'lastCheck': '186.0', 'valid': True}, u'de55fd4d-ae83-4749-9415-ac004a66e652': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0108847', 'lastCheck': '190.4', 'valid': True}, u'ef5378b5-2296-411a-a5eb-5b14123a1c29': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0110197', 'lastCheck': '169.7', 'valid': True}, u'790da84a-ea36-4d63-82ea-e5fd18f25a86': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00920705', 'lastCheck': '3.6', 'valid': True}, u'9e1f121c-20d6-4982-837c-70ca226cf21f': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0527725', 'lastCheck': '164.2', 'valid': True}, u'05b1eaf7-e7d9-4985-90bb-fdbb21f8fdd3': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00540501', 'lastCheck': '7.1', 'valid': True}, u'7608c0aa-383d-4f68-8c94-82f002749007': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000696937', 'lastCheck': '7.3', 'valid': True}}

Comment 152 Nir Soffer 2016-02-15 13:33:41 UTC

(In reply to Pavel Zhukov from comment #148)
> (In reply to Nir Soffer from comment #146)
> > Pavel, would you like to test this patch?
> > http://gerrit.ovirt.org/53395
> > 
> > It cannot be applied to 3.6, so you have to build vdsm from master.
> 
> Nir,
> I've tested the patch.
> Everything works fine. Added 50SDs with script. Copying raw disks between
> storage domains.
> Host is not flipped to Unresponsive.No error messages in engine log. 
> lastCheck grew up to 50 secs during the most critical period (adding SD AND
> copying of the images) but host was OK.
> Do you need logs?

Yes, please attach the logs of this run.

It would be useful to see your test script.

Comment 153 Aharon Canan 2016-02-15 13:43:26 UTC

Automation is running on every build, in case you need to execute on specific patch please ping me directly.

Comment 154 Pavel Zhukov 2016-02-15 14:11:42 UTC

(In reply to Nir Soffer from comment #152)
> (In reply to Pavel Zhukov from comment #148)
> > (In reply to Nir Soffer from comment #146)
> > > Pavel, would you like to test this patch?
> > > http://gerrit.ovirt.org/53395
> > > 
> > > It cannot be applied to 3.6, so you have to build vdsm from master.
> > 
> > Nir,
> > I've tested the patch.
> > Everything works fine. Added 50SDs with script. Copying raw disks between
> > storage domains.
> > Host is not flipped to Unresponsive.No error messages in engine log. 
> > lastCheck grew up to 50 secs during the most critical period (adding SD AND
> > copying of the images) but host was OK.
> > Do you need logs?
> 
> Yes, please attach the logs of this run.
> 
> It would be useful to see your test script.

I don't use test script. I use script to attach many SDs (it's quite annoying to attach 30+ iscsi SDs with UI).
Script to attach SD: http://www.ovirt.org/Testing/PythonApi#Create_iSCSI_Storage_Domain_on_Data_Center without any modifications

Comment 155 Pavel Zhukov 2016-02-15 14:13:06 UTC

Created attachment 1127299 [details]
vdsm log

Comment 156 Nir Soffer 2016-02-15 14:15:11 UTC

Created attachment 1127303 [details]
targetcli configuration for testing this issue

Comment 157 Pavel Zhukov 2016-02-15 14:15:23 UTC

Created attachment 1127304 [details]
engine log

Comment 158 Nir Soffer 2016-02-15 14:21:48 UTC

(In reply to Pavel Zhukov from comment #151)
> I did more tests.
>
> Once host was restarted (SPM) the locking hell came back. I think it's
> because all vgs should be refreshed during "bootstrap".
> The lastCheck value doesn't grow constantly as before but it can reach 110+
> secs and host is marked as NonReponsive again. After some time host is back
> and contending starts but it happens again in the loop:

I will need to see the logs for this failure.

I have a setup with 50 iscsi storage domains, and I can deactivate and activate
and reboot a host without failures.

I'm using targetcli on real machine, using fileio backend (see attachment 1127303 [details])
Each storage domain uses one lun from target3.

Storage server is Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz machine, using
local disk for storage, and 1Gbit nic.

Host consuming this storage is a vm (using nested kvm).

We do enter lvm invlidate/reload storm on each time you start vdsm, connect
to storage server, connect the pool and get the device list, but on my setup,
single lvm invalidate/reload storm is quick enough. I see lastCheck values
up to 150 seconds.

We will have to fix the invalidate/reload issue, as currently our monitoring is harmful
when using lot of block storage domains.

First step would be to reproduce and test the fix with real storage.

Comment 159 Nir Soffer 2016-02-15 15:16:17 UTC

(In reply to Pavel Zhukov from comment #154)
> Script to attach SD:
> http://www.ovirt.org/Testing/
> PythonApi#Create_iSCSI_Storage_Domain_on_Data_Center without any
> modifications

This script is creating storage domains v2, you should use v3.

Comment 160 Pavel Zhukov 2016-02-16 10:13:41 UTC

(In reply to Nir Soffer from comment #159)
> (In reply to Pavel Zhukov from comment #154)
> > Script to attach SD:
> > http://www.ovirt.org/Testing/
> > PythonApi#Create_iSCSI_Storage_Domain_on_Data_Center without any
> > modifications
> 
> This script is creating storage domains v2, you should use v3.

Nir,
Please check logs and DB I provided many times. I changed to V3 but forgot to mention.

Comment 162 Pavel Zhukov 2016-02-16 10:32:31 UTC

(In reply to Nir Soffer from comment #158)
> (In reply to Pavel Zhukov from comment #151)
> > I did more tests.
> >
> > Once host was restarted (SPM) the locking hell came back. I think it's
> > because all vgs should be refreshed during "bootstrap".
> > The lastCheck value doesn't grow constantly as before but it can reach 110+
> > secs and host is marked as NonReponsive again. After some time host is back
> > and contending starts but it happens again in the loop:
> 
> I will need to see the logs for this failure.
> 
> I have a setup with 50 iscsi storage domains, and I can deactivate and
> activate
> and reboot a host without failures.
> 
> I'm using targetcli on real machine, using fileio backend (see attachment
> 1127303 [details])
> Each storage domain uses one lun from target3.
> 
> Storage server is Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz machine, using
> local disk for storage, and 1Gbit nic.
> 
> Host consuming this storage is a vm (using nested kvm).
With nested kvm you win some time because of no network latency is involved. And in fact host machine and VM use same storage. Not so much but anyway.
> 
> We do enter lvm invlidate/reload storm on each time you start vdsm, connect
> to storage server, connect the pool and get the device list, but on my setup,
> single lvm invalidate/reload storm is quick enough. I see lastCheck values
> up to 150 seconds.
I had similar results (lastCheck up to 
Was it with https://gerrit.ovirt.org/#/c/53540 applied? It should improve concurrent lvm operations a lot I was trying to get rid of this fake "ContextManager" but didn't have time.
Anyway I don't have test environment anymore. Thank you for fixing this and for all explanations!

Comment 163 Nir Soffer 2016-02-16 10:46:43 UTC

(In reply to Pavel Zhukov from comment #162)
> (In reply to Nir Soffer from comment #158)
> > (In reply to Pavel Zhukov from comment #151)
> > > I did more tests.
> > >
> > > Once host was restarted (SPM) the locking hell came back. I think it's
> > > because all vgs should be refreshed during "bootstrap".
> > > The lastCheck value doesn't grow constantly as before but it can reach 110+
> > > secs and host is marked as NonReponsive again. After some time host is back
> > > and contending starts but it happens again in the loop:
> > 
> > I will need to see the logs for this failure.
> > 
> > I have a setup with 50 iscsi storage domains, and I can deactivate and
> > activate
> > and reboot a host without failures.
> > 
> > I'm using targetcli on real machine, using fileio backend (see attachment
> > 1127303 [details])
> > Each storage domain uses one lun from target3.
> > 
> > Storage server is Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz machine, using
> > local disk for storage, and 1Gbit nic.
> > 
> > Host consuming this storage is a vm (using nested kvm).
> With nested kvm you win some time because of no network latency is involved.
> And in fact host machine and VM use same storage. Not so much but anyway.

The vm consuming this storage is running on another machine.

> > We do enter lvm invlidate/reload storm on each time you start vdsm, connect
> > to storage server, connect the pool and get the device list, but on my setup,
> > single lvm invalidate/reload storm is quick enough. I see lastCheck values
> > up to 150 seconds.
> I had similar results (lastCheck up to 
> Was it with https://gerrit.ovirt.org/#/c/53540 applied? It should improve
> concurrent lvm operations a lot I was trying to get rid of this fake
> "ContextManager" but didn't have time.

This was without https://gerrit.ovirt.org/53540. This patch improves the
situation further, although connecting 50 sds concurrently still takes lot
of time. Most monitors have lastCheck of 100 seconds, some 120, 140, and one
155 seconds.

We will need much more work work to optimize this, probably changing the way
we invalidate and refresh the entire cache. 

Thanks for the research.

Comment 170 Nir Soffer 2016-02-23 12:23:33 UTC

Created attachment 1129733 [details]
Monitor delays for vdsm logs from 2014-03-28 06:27 EDT, Roman Hodain

This graph show monitor delays in vdsm logs from 2014-03-28 06:27 EDT, Roman Hodain

Comment 171 Nir Soffer 2016-02-23 12:27:44 UTC

Created attachment 1129734 [details]
Monitor delays for vdsm log 2014-03-31 03:46 EDT, akotov

This graph show monitor delays for vdsm log 2014-03-31 03:46 EDT, akotov

Comment 172 Nir Soffer 2016-02-23 12:35:31 UTC

Created attachment 1129737 [details]
Monitor delays for vdsm logs  2016-02-06 01:36 EST, Pavel Zhukov

This graph show monitor delays for vdsm logs 2016-02-06 01:36 EST, Pavel Zhukov

Comment 173 Nir Soffer 2016-02-23 12:50:33 UTC

Created attachment 1129751 [details]
Monitor delays for vdsm log 2016-02-09 12:51 EST, Pavel Zhukov

This show monitor dealys for vdsm log 2016-02-09 12:51 EST, Pavel Zhukov

Comment 174 Nir Soffer 2016-02-23 12:53:53 UTC

Created attachment 1129754 [details]
Monitor delays for vdsm log 2016-02-15 09:13 EST, Pavel Zhukov

This show monitor delays for vdsm log 2016-02-15 09:13 EST, Pavel Zhukov

Vdsm was using patch https://gerrit.ovirt.org/53395.

Comment 177 Nir Soffer 2016-02-27 01:49:52 UTC

Created attachment 1130967 [details]
Repoplot of vdsm.log vdsm logs from 2014-03-28 06:27 EDT, Roman Hodain

Comment 178 Nir Soffer 2016-02-27 01:52:59 UTC

Created attachment 1130968 [details]
Repoplot of vdsm.log vdsm logs from 2014-03-31 03:46 EDT, akotov

Comment 179 Nir Soffer 2016-02-27 02:03:15 UTC

Created attachment 1130982 [details]
Repoplot of vdsm.log vdsm logs from 2016-02-06 01:36 EST, Pavel Zhukov

Comment 180 Nir Soffer 2016-02-27 02:05:47 UTC

Created attachment 1130983 [details]
Repoplot of vdsm.log vdsm logs from 2016-02-09 12:51 EST, Pavel Zhukov

Comment 181 Nir Soffer 2016-02-27 02:07:32 UTC

Created attachment 1130984 [details]
Repoplot of vdsm.log vdsm logs from 2016-02-15 09:13 EST, Pavel Zhukov

Comment 182 Mike McCune 2016-03-28 23:14:23 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 183 Yaniv Lavi 2016-05-09 11:06:06 UTC

oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.

Comment 188 Allon Mureinik 2016-06-02 13:49:49 UTC

Nir, is there anything else we need to merge, or can this bug be moved to MODIFIED?

Comment 189 Nir Soffer 2016-06-02 14:18:26 UTC

Patches are merged in master and ovirt-4.0.

Comment 191 Yuri Obshansky 2016-08-04 13:21:10 UTC

Bug verified on
RHEVM: 4.0.2.3-0.1.el7ev (rhev-4.0.2-5 repository)
VDSM Version:vdsm-4.18.9-1.el7ev
LIBVIRT Version:libvirt-1.2.17-13.el7_2.5

Environment the same:
- 1 DC, 1 Cluster,
- 1 Host ( 32 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 64 G RAM)
- 1 NFS Master SD
- 10 iSCSI SDs
- 50 VM (1 CPU, 1 G RAM, 5 G Disk)

Since that scenario is not clear
I ran JMeter automatic script which executes operations on iSCSI SDs during almost 2 hours:
- 01.Find.ISCSI.SD.Trx
- 02.Deactivate.ISCSI,SD.Trx
- 03.Detach.ISCSI,SD.Trx
- 04.Attach.ISCSI,SD.Trx
and collected statistics from vdsm log files using vdsm-tools/stats
The vdsm-tools/stats/repoplot still failed, so I cannot represent the graph
But, vdsm-tools/stats/repostat show us that max lastCheck over the 10 sec even with 10 iSCSC SDs

$ /home/yobshans/vdsm-tools/stats/repostat vdsm.log vdsm.log.1 vdsm.log.2 vdsm.log.3
domain: deadbeab-dc37-4224-95ed-e64420404861
  delay      avg: 0.001470 min: 0.000000 max: 0.062856
  last check avg: 3.363483 min: 0.000000 max: 13.200000
domain: 67f146ee-e410-4987-97b5-d74abaa6085f
  delay      avg: 0.001364 min: 0.000000 max: 0.077210
  last check avg: 3.708943 min: 0.000000 max: 13.100000
domain: b42b847d-8be3-430d-bcb0-ccb08df59233
  delay      avg: 0.001603 min: 0.000000 max: 0.055443
  last check avg: 3.851433 min: 0.000000 max: 13.200000
domain: f59f9d17-8373-441f-9924-67110a90755c
  delay      avg: 0.001398 min: 0.000000 max: 0.078949
  last check avg: 3.616427 min: 0.000000 max: 11.000000
domain: 77f694bb-cc3f-40b1-b120-cad9a5ea34fe
  delay      avg: 0.001744 min: 0.000000 max: 0.057214
  last check avg: 3.778623 min: 0.000000 max: 14.300000
domain: f04c71c0-9604-46b6-adb5-78660d11bf20
  delay      avg: 0.001873 min: 0.000000 max: 0.077009
  last check avg: 3.681272 min: 0.000000 max: 12.700000
domain: bec0d23b-2a94-4265-ad71-40c059c456d8
  delay      avg: 0.001387 min: 0.000000 max: 0.077029
  last check avg: 3.508897 min: 0.000000 max: 12.200000
domain: 12e87c9e-2526-4107-97cd-80e6298e3da5
  delay      avg: 0.001267 min: 0.000000 max: 0.070465
  last check avg: 3.484298 min: 0.000000 max: 10.300000
domain: bbfe3fcc-33eb-4351-8a58-f2fe007daa35
  delay      avg: 0.001536 min: 0.000000 max: 0.077022
  last check avg: 3.465584 min: 0.000000 max: 12.200000
domain: 5225ceb7-5f01-46f8-8415-a2efbc01de2c
  delay      avg: 0.001326 min: 0.000000 max: 0.067024
  last check avg: 3.635219 min: 0.000000 max: 16.300000


Transactions elapsed time (msec) :

Transactions name             Average   90% Line   Min     Max
01.Find.ISCSI.SD.Trx          810       864        458     2196
02.Deactivate.ISCSI,SD.Trx    142677    223249     19767   333741
03.Detach.ISCSI,SD.Trx        56427     78512      29854   107366
04.Attach.ISCSI,SD.Trx        232918    321849     90015   344879

I upload vdsm.log files

I think, the issue is still existed.

Comment 192 Yuri Obshansky 2016-08-04 13:23:39 UTC

Created attachment 1187507 [details]
August 4, vdsm.log

Comment 193 Yuri Obshansky 2016-08-04 13:24:44 UTC

Created attachment 1187508 [details]
August 4, vdsm.log.1

Comment 194 Yuri Obshansky 2016-08-04 13:25:50 UTC

Created attachment 1187510 [details]
August 4, vdsm.log.2

Comment 195 Yuri Obshansky 2016-08-04 13:27:08 UTC

Created attachment 1187511 [details]
August 4, vdsm.log.3

Comment 196 Yuri Obshansky 2016-08-07 12:28:27 UTC

Performed additional 2 tests on 
RHEVM: 4.0.2.3-0.1.el7ev (rhev-4.0.2-5 repository)
VDSM Version:vdsm-4.18.9-1.el7ev
LIBVIRT Version:libvirt-1.2.17-13.el7_2.5

But, this time with 50 iSCSI SDs

Environment the same:
- 1 DC, 1 Cluster,
- 1 Host ( 32 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 64 G RAM)
- 1 NFS Master SD
- 50 iSCSI SDs
- 50 VM (1 CPU, 1 G RAM, 5 G Disk)

Test 1 - idle mode (no activities)
Test 2 - load test with 50 concurrent threads which simulate operations on iSCSI SDs.

See below statistics for both tests:

Test 1:

$ /home/yobshans/vdsm-tools/stats/repostat vdsm.log
domain: be9f85fe-3d24-403b-bc78-3f91cefcf246
  delay      avg: 0.000828 min: 0.000451 max: 0.032423
  last check avg: 3.360823 min: 0.000000 max: 9.600000
domain: fa744f59-5255-434f-a0f4-96f518f49de6
  delay      avg: 0.000833 min: 0.000439 max: 0.076603
  last check avg: 3.426840 min: 0.000000 max: 9.400000
domain: aa859a83-ebc1-4ea0-9277-8af413deeaad
  delay      avg: 0.000905 min: 0.000427 max: 0.075148
  last check avg: 3.304978 min: 0.100000 max: 9.100000
domain: 2398d30a-43f3-4b26-a0ed-a212b1b8881e
  delay      avg: 0.000708 min: 0.000438 max: 0.006968
  last check avg: 3.462338 min: 0.000000 max: 9.300000
domain: f59f9d17-8373-441f-9924-67110a90755c
  delay      avg: 0.000723 min: 0.000433 max: 0.036150
  last check avg: 3.186580 min: 0.000000 max: 9.800000
domain: 923ce1a6-cf90-48ab-a02b-8545f649dc9b
  delay      avg: 0.000962 min: 0.000420 max: 0.083579
  last check avg: 3.042208 min: 0.000000 max: 9.900000
domain: 12e3c939-2837-46af-b994-46b2901c0883
  delay      avg: 0.000709 min: 0.000429 max: 0.017996
  last check avg: 3.111039 min: 0.000000 max: 9.900000
domain: 77f694bb-cc3f-40b1-b120-cad9a5ea34fe
  delay      avg: 0.000984 min: 0.000417 max: 0.040286
  last check avg: 3.157576 min: 0.000000 max: 9.700000
domain: c3d6fe21-fbf9-4d80-bb92-1939877a5f82
  delay      avg: 0.000856 min: 0.000418 max: 0.044383
  last check avg: 3.268615 min: 0.000000 max: 9.600000
domain: f7b789fd-f9e2-4f09-8d1e-8b87f451f6b9
  delay      avg: 0.001091 min: 0.000475 max: 0.046067
  last check avg: 3.129870 min: 0.000000 max: 9.500000
domain: f04c71c0-9604-46b6-adb5-78660d11bf20
  delay      avg: 0.000973 min: 0.000413 max: 0.034026
  last check avg: 3.271212 min: 0.000000 max: 9.300000
domain: 307fd25f-bf3f-40b8-ae19-f87cf105c099
  delay      avg: 0.000688 min: 0.000411 max: 0.026836
  last check avg: 3.250866 min: 0.000000 max: 9.600000
domain: bec0d23b-2a94-4265-ad71-40c059c456d8
  delay      avg: 0.001107 min: 0.000398 max: 0.036731
  last check avg: 3.513203 min: 0.000000 max: 10.000000
domain: d2498977-1b74-4875-91ba-08cc74c99629
  delay      avg: 0.000834 min: 0.000459 max: 0.045234
  last check avg: 3.354762 min: 0.000000 max: 9.800000
domain: f5f5b3fc-1495-4423-934f-89ea932fa333
  delay      avg: 0.000752 min: 0.000422 max: 0.038126
  last check avg: 3.185498 min: 0.000000 max: 9.900000
domain: 45c73813-8c0c-4d4f-badd-7c00f3a330a8
  delay      avg: 0.000761 min: 0.000426 max: 0.048208
  last check avg: 3.369913 min: 0.000000 max: 9.700000
domain: eba6461e-1771-4f8d-8e09-7d702c2dd028
  delay      avg: 0.001303 min: 0.000436 max: 0.064898
  last check avg: 3.383117 min: 0.000000 max: 9.500000
domain: e1d16f8d-013c-4be5-9a45-f7a12475636d
  delay      avg: 0.001317 min: 0.000473 max: 0.072406
  last check avg: 3.737662 min: 0.000000 max: 9.800000
domain: 1003d3d7-139e-486a-a22e-20f6467dc8c4
  delay      avg: 0.000885 min: 0.000424 max: 0.038167
  last check avg: 3.478139 min: 0.000000 max: 9.700000
domain: 30ed7ebd-439e-483a-8a03-9a7a1e3d049d
  delay      avg: 0.000939 min: 0.000420 max: 0.031377
  last check avg: 3.370996 min: 0.000000 max: 9.700000
domain: e3150f16-76a1-46cd-bf53-c321ee4d2c9a
  delay      avg: 0.000682 min: 0.000435 max: 0.005944
  last check avg: 3.262338 min: 0.000000 max: 9.500000
domain: 2caf8e26-5938-4cb2-9d98-d8cfdaa444a4
  delay      avg: 0.000987 min: 0.000456 max: 0.064760
  last check avg: 3.201948 min: 0.000000 max: 9.600000
domain: db48b33a-6856-40ff-badd-933ec20619ee
  delay      avg: 0.000732 min: 0.000434 max: 0.016702
  last check avg: 3.083983 min: 0.000000 max: 9.700000
domain: 6e165862-4a19-477f-98be-e078e699f2ae
  delay      avg: 0.001411 min: 0.000425 max: 0.079374
  last check avg: 3.311688 min: 0.000000 max: 9.700000
domain: 9884db84-f4d0-4402-a952-36c685ab0d33
  delay      avg: 0.001438 min: 0.000442 max: 0.076374
  last check avg: 3.645887 min: 0.000000 max: 10.000000
domain: ecb0dba1-3c1b-4ac8-9699-069431e2de03
  delay      avg: 0.000741 min: 0.000424 max: 0.032231
  last check avg: 3.188528 min: 0.000000 max: 9.100000
domain: 67f146ee-e410-4987-97b5-d74abaa6085f
  delay      avg: 0.000809 min: 0.000384 max: 0.047425
  last check avg: 3.460173 min: 0.000000 max: 10.000000
domain: b42b847d-8be3-430d-bcb0-ccb08df59233
  delay      avg: 0.000959 min: 0.000468 max: 0.035521
  last check avg: 3.261472 min: 0.000000 max: 9.600000
domain: ee163499-26c7-43fb-bd04-b0cf5e7d965e
  delay      avg: 0.000676 min: 0.000413 max: 0.003722
  last check avg: 3.503247 min: 0.000000 max: 9.600000
domain: a42cf644-6820-4dd6-ac19-5af69be4cf94
  delay      avg: 0.000642 min: 0.000402 max: 0.015142
  last check avg: 3.456710 min: 0.000000 max: 9.200000
domain: d5f72e36-c4f1-4e45-acab-01d15c4ccb73
  delay      avg: 0.000915 min: 0.000438 max: 0.032101
  last check avg: 3.201082 min: 0.000000 max: 9.900000
domain: f4b2db57-4b3d-4e52-8b3f-d4347beb32bd
  delay      avg: 0.000735 min: 0.000483 max: 0.025288
  last check avg: 3.262554 min: 0.000000 max: 9.500000
domain: b697ceb5-cb4b-4bc6-9fa7-527945aedb92
  delay      avg: 0.000732 min: 0.000409 max: 0.038098
  last check avg: 3.552597 min: 0.000000 max: 9.800000
domain: cad3037e-0b2d-493b-a81c-37e9244098cc
  delay      avg: 0.001552 min: 0.000460 max: 0.190254
  last check avg: 2.877922 min: 0.000000 max: 9.600000
domain: 12e87c9e-2526-4107-97cd-80e6298e3da5
  delay      avg: 0.001328 min: 0.000442 max: 0.063413
  last check avg: 3.109957 min: 0.000000 max: 9.500000
domain: 1ef035ec-7dbf-4bf2-80b0-cadf63dca4ed
  delay      avg: 0.000656 min: 0.000441 max: 0.004900
  last check avg: 3.332468 min: 0.000000 max: 9.300000
domain: bbfe3fcc-33eb-4351-8a58-f2fe007daa35
  delay      avg: 0.000888 min: 0.000406 max: 0.037597
  last check avg: 3.508225 min: 0.000000 max: 9.700000
domain: 5225ceb7-5f01-46f8-8415-a2efbc01de2c
  delay      avg: 0.000804 min: 0.000443 max: 0.028700
  last check avg: 3.320346 min: 0.000000 max: 9.900000
domain: eb6184cc-89be-47ab-afd3-23ebbcad55d9
  delay      avg: 0.000665 min: 0.000463 max: 0.004340
  last check avg: 3.126407 min: 0.000000 max: 10.000000
domain: deadbeab-dc37-4224-95ed-e64420404861
  delay      avg: 0.000843 min: 0.000418 max: 0.032031
  last check avg: 3.337013 min: 0.000000 max: 10.000000
domain: a48a1cc9-a6ff-412c-99ab-3d211fc24122
  delay      avg: 0.000844 min: 0.000415 max: 0.025883
  last check avg: 3.372727 min: 0.000000 max: 9.300000
domain: 8bde8676-76e6-4ad4-862e-7461e710f339
  delay      avg: 0.001316 min: 0.000475 max: 0.078356
  last check avg: 3.003247 min: 0.000000 max: 9.700000
domain: 032648cd-8f70-406f-9f19-a94dcb5d6e26
  delay      avg: 0.000977 min: 0.000438 max: 0.074872
  last check avg: 3.492424 min: 0.000000 max: 9.500000
domain: 97d7259e-f300-4862-897b-a62af647c851
  delay      avg: 0.000710 min: 0.000412 max: 0.035667
  last check avg: 3.345238 min: 0.000000 max: 9.200000
domain: d59c3d5c-8ff2-4488-af44-e2eecd838b82
  delay      avg: 0.000768 min: 0.000414 max: 0.021306
  last check avg: 3.590043 min: 0.000000 max: 10.000000
domain: 1d596506-6157-4d72-87c1-3c4c10aaa980
  delay      avg: 0.000905 min: 0.000430 max: 0.038977
  last check avg: 3.300000 min: 0.000000 max: 9.600000

Test 2:

$ /home/yobshans/vdsm-tools/stats/repostat vdsm.log.2 vdsm.log.1 vdsm.log
domain: be9f85fe-3d24-403b-bc78-3f91cefcf246
  delay      avg: 0.000303 min: 0.000000 max: 0.001830
  last check avg: 7.549147 min: 0.000000 max: 32.000000
domain: fa744f59-5255-434f-a0f4-96f518f49de6
  delay      avg: 0.000389 min: 0.000000 max: 0.008781
  last check avg: 7.488842 min: 0.000000 max: 34.000000
domain: aa859a83-ebc1-4ea0-9277-8af413deeaad
  delay      avg: 0.000229 min: 0.000000 max: 0.019545
  last check avg: 8.670147 min: 0.000000 max: 39.600000
domain: 2398d30a-43f3-4b26-a0ed-a212b1b8881e
  delay      avg: 0.000391 min: 0.000000 max: 0.046709
  last check avg: 6.963296 min: 0.000000 max: 28.000000
domain: f59f9d17-8373-441f-9924-67110a90755c
  delay      avg: 0.000438 min: 0.000000 max: 0.022101
  last check avg: 7.589327 min: 0.000000 max: 32.000000
domain: 923ce1a6-cf90-48ab-a02b-8545f649dc9b
  delay      avg: 0.000396 min: 0.000000 max: 0.001464
  last check avg: 8.071979 min: 0.000000 max: 49.800000
domain: 12e3c939-2837-46af-b994-46b2901c0883
  delay      avg: 0.000204 min: 0.000000 max: 0.009425
  last check avg: 7.431481 min: 0.000000 max: 48.100000
domain: 77f694bb-cc3f-40b1-b120-cad9a5ea34fe
  delay      avg: 0.000502 min: 0.000000 max: 0.000911
  last check avg: 7.413652 min: 0.000000 max: 24.200000
domain: c3d6fe21-fbf9-4d80-bb92-1939877a5f82
  delay      avg: 0.000385 min: 0.000000 max: 0.055027
  last check avg: 7.689199 min: 0.000000 max: 43.900000
domain: f7b789fd-f9e2-4f09-8d1e-8b87f451f6b9
  delay      avg: 0.000665 min: 0.000410 max: 0.003727
  last check avg: 4.909091 min: 0.000000 max: 17.600000
domain: f04c71c0-9604-46b6-adb5-78660d11bf20
  delay      avg: 0.000273 min: 0.000000 max: 0.036084
  last check avg: 7.797574 min: 0.000000 max: 29.000000
domain: 307fd25f-bf3f-40b8-ae19-f87cf105c099
  delay      avg: 0.000335 min: 0.000000 max: 0.018856
  last check avg: 7.786441 min: 0.000000 max: 31.900000
domain: bec0d23b-2a94-4265-ad71-40c059c456d8
  delay      avg: 0.000627 min: 0.000000 max: 0.014717
  last check avg: 8.271315 min: 0.000000 max: 33.400000
domain: d2498977-1b74-4875-91ba-08cc74c99629
  delay      avg: 0.001186 min: 0.000406 max: 0.017322
  last check avg: 4.835821 min: 0.000000 max: 22.000000
domain: f5f5b3fc-1495-4423-934f-89ea932fa333
  delay      avg: 0.000406 min: 0.000000 max: 0.008962
  last check avg: 8.109375 min: 0.000000 max: 41.900000
domain: 45c73813-8c0c-4d4f-badd-7c00f3a330a8
  delay      avg: 0.000505 min: 0.000000 max: 0.000748
  last check avg: 7.674003 min: 0.000000 max: 31.100000
domain: eba6461e-1771-4f8d-8e09-7d702c2dd028
  delay      avg: 0.000423 min: 0.000000 max: 0.007291
  last check avg: 8.333729 min: 0.000000 max: 38.800000
domain: e1d16f8d-013c-4be5-9a45-f7a12475636d
  delay      avg: 0.000536 min: 0.000000 max: 0.028912
  last check avg: 8.158541 min: 0.000000 max: 55.500000
domain: 1003d3d7-139e-486a-a22e-20f6467dc8c4
  delay      avg: 0.000577 min: 0.000425 max: 0.002126
  last check avg: 7.675868 min: 0.000000 max: 35.500000
domain: 30ed7ebd-439e-483a-8a03-9a7a1e3d049d
  delay      avg: 0.000611 min: 0.000394 max: 0.004189
  last check avg: 5.300000 min: 0.000000 max: 23.100000
domain: e3150f16-76a1-46cd-bf53-c321ee4d2c9a
  delay      avg: 0.001409 min: 0.000399 max: 0.029304
  last check avg: 5.792965 min: 0.000000 max: 28.200000
domain: 2caf8e26-5938-4cb2-9d98-d8cfdaa444a4
  delay      avg: 0.000368 min: 0.000000 max: 0.004120
  last check avg: 7.398485 min: 0.000000 max: 35.100000
domain: db48b33a-6856-40ff-badd-933ec20619ee
  delay      avg: 0.000546 min: 0.000394 max: 0.002481
  last check avg: 6.473643 min: 0.000000 max: 26.400000
domain: 6e165862-4a19-477f-98be-e078e699f2ae
  delay      avg: 0.000671 min: 0.000000 max: 0.039189
  last check avg: 8.502239 min: 0.000000 max: 31.000000
domain: 9884db84-f4d0-4402-a952-36c685ab0d33
  delay      avg: 0.000385 min: 0.000000 max: 0.006285
  last check avg: 7.591445 min: 0.000000 max: 30.000000
domain: ecb0dba1-3c1b-4ac8-9699-069431e2de03
  delay      avg: 0.001009 min: 0.000407 max: 0.024016
  last check avg: 5.280000 min: 0.000000 max: 19.900000
domain: b42b847d-8be3-430d-bcb0-ccb08df59233
  delay      avg: 0.000288 min: 0.000000 max: 0.026870
  last check avg: 7.901077 min: 0.000000 max: 34.600000
domain: ee163499-26c7-43fb-bd04-b0cf5e7d965e
  delay      avg: 0.000558 min: 0.000431 max: 0.004720
  last check avg: 7.646213 min: 0.000000 max: 33.400000
domain: deadbeab-dc37-4224-95ed-e64420404861
  delay      avg: 0.000361 min: 0.000000 max: 0.000755
  last check avg: 7.569044 min: 0.000000 max: 33.600000
domain: a42cf644-6820-4dd6-ac19-5af69be4cf94
  delay      avg: 0.000340 min: 0.000000 max: 0.040448
  last check avg: 7.609359 min: 0.000000 max: 32.700000
domain: d5f72e36-c4f1-4e45-acab-01d15c4ccb73
  delay      avg: 0.000628 min: 0.000413 max: 0.013137
  last check avg: 7.016000 min: 0.000000 max: 33.600000
domain: f4b2db57-4b3d-4e52-8b3f-d4347beb32bd
  delay      avg: 0.000793 min: 0.000380 max: 0.013701
  last check avg: 4.703937 min: 0.000000 max: 18.000000
domain: b697ceb5-cb4b-4bc6-9fa7-527945aedb92
  delay      avg: 0.000392 min: 0.000000 max: 0.001502
  last check avg: 7.884919 min: 0.000000 max: 32.800000
domain: cad3037e-0b2d-493b-a81c-37e9244098cc
  delay      avg: 0.000355 min: 0.000000 max: 0.040631
  last check avg: 7.352632 min: 0.000000 max: 27.800000
domain: 12e87c9e-2526-4107-97cd-80e6298e3da5
  delay      avg: 0.000611 min: 0.000000 max: 0.012651
  last check avg: 8.134072 min: 0.000000 max: 30.100000
domain: 1ef035ec-7dbf-4bf2-80b0-cadf63dca4ed
  delay      avg: 0.000477 min: 0.000000 max: 0.040202
  last check avg: 7.723826 min: 0.000000 max: 30.600000
domain: bbfe3fcc-33eb-4351-8a58-f2fe007daa35
  delay      avg: 0.000559 min: 0.000000 max: 0.038589
  last check avg: 7.688473 min: 0.000000 max: 38.400000
domain: 5225ceb7-5f01-46f8-8415-a2efbc01de2c
  delay      avg: 0.000619 min: 0.000442 max: 0.022269
  last check avg: 9.100179 min: 0.000000 max: 35.100000
domain: eb6184cc-89be-47ab-afd3-23ebbcad55d9
  delay      avg: 0.000530 min: 0.000417 max: 0.004869
  last check avg: 7.707956 min: 0.000000 max: 36.700000
domain: 67f146ee-e410-4987-97b5-d74abaa6085f
  delay      avg: 0.000143 min: 0.000000 max: 0.007179
  last check avg: 8.189730 min: 0.000000 max: 36.600000
domain: a48a1cc9-a6ff-412c-99ab-3d211fc24122
  delay      avg: 0.000335 min: 0.000000 max: 0.009734
  last check avg: 7.850738 min: 0.000000 max: 31.900000
domain: 8bde8676-76e6-4ad4-862e-7461e710f339
  delay      avg: 0.000308 min: 0.000000 max: 0.012916
  last check avg: 7.707309 min: 0.000000 max: 53.600000
domain: 032648cd-8f70-406f-9f19-a94dcb5d6e26
  delay      avg: 0.000163 min: 0.000000 max: 0.004134
  last check avg: 7.669752 min: 0.000000 max: 34.700000
domain: 97d7259e-f300-4862-897b-a62af647c851
  delay      avg: 0.000577 min: 0.000405 max: 0.005700
  last check avg: 6.459859 min: 0.000000 max: 23.800000
domain: d59c3d5c-8ff2-4488-af44-e2eecd838b82
  delay      avg: 0.000298 min: 0.000000 max: 0.000843
  last check avg: 7.821731 min: 0.000000 max: 34.300000
domain: 1d596506-6157-4d72-87c1-3c4c10aaa980
  delay      avg: 0.000298 min: 0.000000 max: 0.037302
  last check avg: 7.771409 min: 0.000000 max: 45.300000
domain: 61de827a-a78b-4d6c-94ec-292fb2d8fb2d
  delay      avg: 0.000511 min: 0.000417 max: 0.004799
  last check avg: 7.392241 min: 0.000000 max: 31.200000
domain: 6e6f8bd4-c10a-41b8-8b1a-b20d70ee132f
  delay      avg: 0.000600 min: 0.000437 max: 0.004435
  last check avg: 7.838505 min: 0.000000 max: 31.500000
domain: 959c90bd-c25d-4fa4-bf0c-5bc7e6494e5d
  delay      avg: 0.000355 min: 0.000210 max: 0.000701
  last check avg: 5.872021 min: 0.000000 max: 21.500000
domain: 56324a25-9789-4bdf-bf23-50bffa0577f0
  delay      avg: 0.000516 min: 0.000000 max: 0.000851
  last check avg: 7.819863 min: 0.000000 max: 35.800000
domain: 7604eebf-b55b-4ac3-b02a-1176b3519687
  delay      avg: 0.000552 min: 0.000439 max: 0.003322
  last check avg: 6.652416 min: 0.000000 max: 31.700000

I don't vdsm.log file 
But, id needed I can do it later up to request

Yuri

Comment 197 Nir Soffer 2016-08-20 23:30:54 UTC

Created attachment 1192516 [details]
repoplot from Yuri logs uploaded on August 4

This plot show very low (<10) lastCheck values, but also no long running lvm
commands, and no disk copy commands during this test. I assume that no
storage operations (such as copying disks) were done during this test, so 
this test run is not helpful for verifying this bug.

Comment 198 Nir Soffer 2016-08-20 23:53:52 UTC

(In reply to Yuri Obshansky from comment #191)
> Bug verified on
> RHEVM: 4.0.2.3-0.1.el7ev (rhev-4.0.2-5 repository)
> VDSM Version:vdsm-4.18.9-1.el7ev
> LIBVIRT Version:libvirt-1.2.17-13.el7_2.5
> 
> Environment the same:
> - 1 DC, 1 Cluster,
> - 1 Host ( 32 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 64 G RAM)
> - 1 NFS Master SD
> - 10 iSCSI SDs

To reproduce this issue, you need to test much higher number of storage domains.
I would recommend running with 50 storage domains, which is our target for now.

> - 50 VM (1 CPU, 1 G RAM, 5 G Disk)

We don't need to run any vm for this test as idle vms do not generate any
io that may effect storage domain monitoring.

> Since that scenario is not clear
> I ran JMeter automatic script which executes operations on iSCSI SDs during
> almost 2 hours:
> - 01.Find.ISCSI.SD.Trx
> - 02.Deactivate.ISCSI,SD.Trx
> - 03.Detach.ISCSI,SD.Trx
> - 04.Attach.ISCSI,SD.Trx

I'm not sure what did you test, but based on the names of the files, and the 
tests results seen in the logs, the flows tested are not helpful for verifying
this bug.

> and collected statistics from vdsm log files using vdsm-tools/stats
> The vdsm-tools/stats/repoplot still failed, 

It works for me - I download the logs, un-gzip them, and run repoplot like this:

    repoplot -n out vdsm.log.3 vdsm.log.2 vdsm.log.1 vdsm.log

Note that the multiple files must be given in the correct order, same order
as you would use with cat if you want combine them into one big log file.

> so I cannot represent the graph
> But, vdsm-tools/stats/repostat show us that max lastCheck over the 10 sec
> even with 10 iSCSC SDs

Values higher then 10 seconds are ok and expected when storage domain is
activated. One of the steps during activation is acquiring the host id, which
can be very flow.

> Transactions elapsed time (msec) :
> 
> Transactions name             Average   90% Line   Min     Max
> 01.Find.ISCSI.SD.Trx          810       864        458     2196
> 02.Deactivate.ISCSI,SD.Trx    142677    223249     19767   333741
> 03.Detach.ISCSI,SD.Trx        56427     78512      29854   107366
> 04.Attach.ISCSI,SD.Trx        232918    321849     90015   344879

I don't know what these numbers mean, but timing storage domain activation,
deactivation and attachment is not related to this bug.

> I think, the issue is still existed.

I think the tests done is not helpful for verifying this bug, so we cannot 
reach any conclusion yet.

Comment 199 Nir Soffer 2016-08-21 00:44:15 UTC

(In reply to Yuri Obshansky from comment #196)
> Performed additional 2 tests on 
> RHEVM: 4.0.2.3-0.1.el7ev (rhev-4.0.2-5 repository)
> VDSM Version:vdsm-4.18.9-1.el7ev
> LIBVIRT Version:libvirt-1.2.17-13.el7_2.5
> 
> But, this time with 50 iSCSI SDs

This is better...

> Test 1 - idle mode (no activities)

This does not test much...

> Test 2 - load test with 50 concurrent threads which simulate operations on
> iSCSI SDs.

The interesting operations are copying disks, not activating and deactivating
storage domains.

Running 50 threads activating and deactivating storage domains is also not
interesting use case for any bug. Can you think of a real world use case
including such worlload?

> I don't vdsm.log file 
> But, id needed I can do it later up to request

Since we did not test any disk copies, I don't think the log would helpful.

Comment 200 Nir Soffer 2016-08-21 01:58:33 UTC

This bug have huge amount of comments and lot of noise, so I'll try to make it
clear what is the issue and how it should be tested.

When performing storage operations such as copying creating templates, copying
disks or deleting vms or disks, storage monitoring threads would slow down,
and vdsm would report high lastCheck values.

When engine find lastCheck time bigger then 30 seconds it starts a 5 seconds timer.
If vdsm did not report normal lastCheck value (<30) when this timer expired, the
domain is treated as faulty. If this domain is ok on other hosts, this host is
considered as non-operational.

Looking in repoplots we can see that while lastCheck value may reach high values
(>300s), read delay values are typically tiny (<0.01s) - storage monitoring was not
monitoring storage.

We can also see very slow lvm operations and sometime also very slow
mailbox commands.

Please check these attachments to understand this issue:
- https://bugzilla.redhat.com/attachment.cgi?id=1130967
  Repoplot of vdsm.log vdsm logs from 2014-03-28 06:27 EDT, Roman Hodain
- https://bugzilla.redhat.com/attachment.cgi?id=1130968
  Repoplot of vdsm.log vdsm logs from 2014-03-31 03:46 EDT, akotov
- https://bugzilla.redhat.com/attachment.cgi?id=1130968
  Repoplot of vdsm.log vdsm logs from 2016-02-06 01:36 EST, Pavel Zhukov
- https://bugzilla.redhat.com/attachment.cgi?id=1130983
  Repoplot of vdsm.log vdsm logs from 2016-02-09 12:51 EST, Pavel Zhukov

The root cause of this issue was mixing storage monitoring and other (possible very
slow operations) such as refreshing lvm cache and doing scsi rescan.

Even worse, vdsm lvm module had bad locking that was designed to run together
reload and cache invalidation operations, but in practice it was serializing all
operations, so monitoring one domain could be delayed by unrelated delays
on other storage domains, or bad LUN causing scsi scan to block for 30
seconds.

In 4.0 we did these changes, fixing this issue:
- storage monitoring was move to separate thread, doing *only* storage monitoring.
  this thread is using an event loop to run multiple dd processes, so this thread
  can never block. Now high lastCheck value means that a dd process was blocked
  on storage.
- vdsm lvm locking was simplified, allowing concurrent lvm commands
  shortening delays in monitoring threads.

How to test this bug

First, I think we *must* test this bug both on 3.6 and 4.0, so we can compare
apples to apples. Users running 3.6 (or older) need this information for 
considering upgrade to 4.0.

Setup:
- 50 active iscsi storage domains
- create several big disk, big enough so copying the disk would take about 10 minutes
- make sure "Wipe after delete" option is checked for this disk, so after disks are
  moved, we fill them with zeros (this creates lot of io).
- You can use 3.6 cluster version, there is no need to test this with 4.0 cluster. This
   way we can compare the same setup on both 3.6 and 4.0.

Test:
- Move several disks concurrently from one storage domain to another
  Note: move and not copy, to make sure we wipe the source disks after
  the copy, maximizing amount of io during test.

Expected results (3.6):
- Copy operation fail after timeout in engine
- Or host becomes non-operational because of high lastCheck values

Expected results (4.0):
- Operation should succeed on engine
- Low last check values during test (0-10s)
- Slower lvm commands and storage mailbox commands during test

Additional testing

Because of the improved lvm locking, operations like attaching and activating
storage domain sould be faster. You may like to time these operations on
a setup with 50 storage domains.

It will be useful to compare the results to 3.6, to see how much the user
experience is improved.

To test this I would run an operation several times every 5 minutes. There is
no need to test concurrent operations.

We can test this operation during disk copying - it may be much slower since
lvm operations are slowed down, but it should not time out in engine.

Notes:

- when attaching and activating storage domain, we must wait both
for the storage monitor (typically finish in sub second), and for lvm refresh
operations that may take many seconds. Seeing high lastCheck values 
during these operations is expected.

- vdsm reports actual=False for storage domain until the first monitoring
cycle is completed. High lastCheck values when actual=False are ok, engine
act on high lastCheck values only when the domain has completed the
first monitoring cycle.

- We must have vdsm logs and engine logs for all tests.

Moving back to ON_QA, since this bug was not verified yet.

Comment 202 mlehrer 2016-09-05 15:11:01 UTC

Created attachment 1197972 [details]
Repoplot showing differences between vdsm during concurrent disk move test

Comment 203 mlehrer 2016-09-05 16:38:09 UTC

Results: Verified on vdsm-4.18.11 and RHEVM: 4.0.2-0.2

vdsm-4.18.11   7 Disk Concurrent Move Successful
vdsm-4.18.2-0  7 Disk Concurrent Move Failed - each time

Setup
RHEVM: 4.0.2-0.2.rc1.el7ev
50 ISCSI domains were created.
1 VM containing 7 Disks of 200GB each filled and "Wipe after delete" option true.
7 Disks were moved concurrently.
Dedicated storage 10gbe network to both hosts.

What was tested:
Moving 7 Disks concurrently between ISCSI SD domains with "wipe after delete" set on vdsm-4.18.11 and vdsm-4.18.2-0

Results:
Successful concurrent moving of 7 disk.
See attached repoplots of VDSM logs during move activity.

vdsm-3.6.pdf (repoplot) shows
 Failed to Move Disks.
 High last check value
 Disk move failed and no disks were needed to delete afterwards.

vdsm-4.0.pdf (repoplot) shows
 Successful to Move Disks.
 Acceptable last check value
 2 sets of disk activity corresponding to move, and delete

Note You need to log in before you can comment on or make changes to this bug.

acanan
adevolder
agkesos
amarchuk
amureini
aperotti
bazulay
bcholler
danken
dfediuck
ederevea
emarcian
fromani
gwatson
hkim
iheim
inetkach
jbuchta
jentrena
laravot
lpeer
luvilla
mhuth
michal.skrivanek
michele
mkalinin
mlehrer
mtessun
nicolas
nsoffer
pablo.iranzo
pdwyer
pzhukov
ratamir
rbinkhor
rgolan
rhodain
scohen
sherold
srevivo
tnisan
tspeetje
ylavi
yobshans