Hide Forgot
Description of problem: The monitoring thread is triggered every 10 seconds by default. As soon as a storage operation is triggered ( copy image for instance ), the monitoring thread slows down. Version-Release number of selected component (if applicable): vdsm-4.13.2-0.11.el6ev.x86_64 How reproducible: Almost always when a storage operation is triggered in certain environments Steps to Reproduce: Not clear yet Actual results: The storage domains are marked as in Problems and the hypervisor is marked as non-operational Expected results: The monitoring thread is not slowed down.
(In reply to Roman Hodain from comment #0) > Description of problem: > The monitoring thread is triggered every 10 seconds by default. As soon > as a storage operation is triggered ( copy image for instance ), the > monitoring > thread slows down. Your assumption that monitor thread is checking storage every 10 seconds is not true. The implementation *delay* at least 10 seconds between checks operations. If a check operation takes 60 seconds, the next check would be about 10 seconds after the last operations was finished. Every 5 minutes, a monitor is performing a refresh operation on the domain. This operation takes only few seconds on an idle system with few tiny domains, but may take much longer, depending on the load on the machine, the number of storage domains, the number of lvs, and the storage. Do you know any documentation that promised that the system behave differently? > > Version-Release number of selected component (if applicable): > vdsm-4.13.2-0.11.el6ev.x86_64 > > How reproducible: > Almost always when a storage operation is triggered in certain > environments What is certain environment? We need much more details. > > Steps to Reproduce: > Not clear yet > > Actual results: > The storage domains are marked as in Problems True, if lastCheck value is above 30 seconds (the default), the domain is mark as problem. > and the hypervisor is marked as non-operational But the hypervisor is *not* marked as non-operational because one check took long time. When domain is marked as problem domain, we start a timer on the engine, and if the domain is still in trouble after about 5 minutes, the host becomes non-operational. So delays while monitoring a domain should not lead to host becoming non-operational, unless they are long and repeated, which means the host has trouble accessing the storage. Can you point us to the events when a host became non-operational in this case? > > Expected results: > The monitoring thread is not slowed down. No, in the current implementation, we cannot guarntee that. A monitor may slow down from time to time, and this is normal behavior, altough it usually a warning signal that the host or the storage is in trouble. I will explain the example logs we see here in more defail later.
Liron, explain more about the logic for marking a domain as non-operational, and check the engine log to see if we have someing abnormal here?
engine.log showing the domain 'in problem'... 2014-03-25 00:52:01,985 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-44) domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62:THW_B_SATA_DATA in problem. vds: ecp08.thw.itio.com ... 5 mins later, put host into non-operational state ... 2014-03-25 00:57:01,987 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-48) starting ProcessDomainRecovery for domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62:THW_B_SATA_DATA 2014-03-25 00:57:02,031 INFO [org.ovirt.engine.core.bll.RunVmCommand] (ajp-/127.0.0.1:8702-9) [4d66328d] Lock Acquired to object EngineLock [exclusiveLocks= key: d1f92602-0536-4e8e-bb6a-7af1a0afd4c1 value: VM , sharedLocks= ] 2014-03-25 00:57:02,037 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (ajp-/127.0.0.1:8702-9) [4d66328d] START, IsVmDuringInitiatingVDSCommand( vmId = d1f92602-0536-4e8e-bb6a-7af1a0afd4c1), log id: 61b0176a 2014-03-25 00:57:02,037 INFO [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (ajp-/127.0.0.1:8702-9) [4d66328d] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 61b0176a 2014-03-25 00:57:02,067 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-48) vds ecp08.thw.itio.com reported domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62:THW_B_SATA_DATA as in problem, attempting to move the vds to status NonOperational =================================================================================================================== The RHEVM decides the domain is 'in problem' from the results of repoStats, which shows the lastCheck to be > 30 seconds. Thread-988992::INFO::2014-03-25 00:51:17,194::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-988992::INFO::2014-03-25 00:51:17,200::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response:... {u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000255575', 'lastCheck': '94.0', 'valid': True}... ... however for some reason, repoStats is run for 5 mins here and then unfortunately on the next run the domain had a lastCheck > 30 seconds again ... Thread-989035::INFO::2014-03-25 00:56:11,641::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-989035::INFO::2014-03-25 00:56:11,713::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response:... {u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000190666', 'lastCheck': '60.1', 'valid': True}... ... and again, there is another delay in running repoStats of about 100 seconds and again the domain has a lastCheck > 30 seconds ... Thread-989059::INFO::2014-03-25 00:57:55,527::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-989059::INFO::2014-03-25 00:57:55,553::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response:... {u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000213322', 'lastCheck': '52.5', 'valid': True}... ... but you can see from these there was no problem in reading the actual storage. It responds quite quickly. The problem is because the monitoring thread isn't checking very often as Roman showed in his update ... =================================================================================================================== As Roman showed, the lastCheck > 30 was caused by the monitoring thread not running regularly: When the storage domain was reported as 'in problem' the monitoring thread was only able to perform the checks sporadically. Here we can see a > 90 gap between performing the getReadDelay Thread-21::DEBUG::2014-03-25 00:49:36,938::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:49:43,177::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000255575 s, 16.0 MB/s\n'; <rc> = 0 ... > 90 seconds before the next getReadDelay ... Thread-21::DEBUG::2014-03-25 00:51:25,000::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:51:33,854::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000181862 s, 22.5 MB/s\n'; <rc> = 0 And this delay then shows up in the large repoStats lastCheck value which is reported back to RHEVM (in the first repoStats response shown above), here is it 94 seconds, which is about what it was between those 2 successive getReadDelay runs above ... Thread-988992::INFO::2014-03-25 00:51:17,200::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {... u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000255575', 'lastCheck': '94.0', 'valid': True}... But as we can see, there is no read delay in reading the storage domain disks! The only delay here is with vdsm not running the monitoring checks and then telling RHEVM the storage domain is 'in problem', when actually it is vdsm that is 'in problem'! --------------------------------------------------------------------------------------------------------------------------- And here is another example where there is a 60 second gap between running successive getReadDelay (which is reported back to the RHEVM in the second repoStats response shown above) ... Thread-21::DEBUG::2014-03-25 00:55:09,178::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:55:09,434::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000190666 s, 21.5 MB/s\n'; <rc> = 0 Thread-21::DEBUG::2014-03-25 00:56:11,646::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:56:11,876::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00028596 s, 14.3 MB/s\n'; <rc> = 0 And then repoStats reports a lastCheck of 60 seconds back to the RHEVM ... Thread-989035::INFO::2014-03-25 00:56:11,713::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {... u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000190666', 'lastCheck': '60.1', 'valid': True}... ... and so the RHEVM continues to consider the domain as 'in problem', but really there is no problem reading the storage. It is result of large intervals in running the domain checks which are reported back to the RHEVM which falsely assumes there is a problem with the domain. =================================================================================================================== The solution here for the customer is probably to adjust the MaxStorageVdsTimeoutCheckSec on the RHEVM to be > 60 seconds to minimize the chances of the large lastCheck making RHEVM think the domain is 'in problem'. But perhaps a longer term solution is to have RHEVM collect an average of readDelay values and if the lastCheck is large *and* the associated readDelay is significantly larger than then average, *then* perhaps consider the domain as 'in problem'. What do you think?
(In reply to Mark Huth from comment #11) Thanks Mark! We will check the events you showed - hopefully they are all in the logs attached to this bug? If not, please attach the relevant engine and vdsm logs here. I think you are right in that the storage is not in problem. Looking in vdsm log, it looks like the host is in problem, taking multiple seconds to run a dd or lvm process, and not the storage. Do we have any information about the load on the machine while these monitoring issues happened?
Hi Nir, Yes the information in comment #12 (and all the comments) are in the attached engine.log and vdsm.log.1.xz files. As yet we don't have any data about the load on host (interestingly, the customer doesn't like to run sosreport on their hosts because it seems it places load problems on the hosts!). Anyway, I will request load information from the customer. Thanks, Mark
This is a more useful view of repoStats for attachment 880585 [details]. domain: f737ce8c-d2c1-4cdf-8340-63fd449d2542 delay avg: 0.002497 min: 0.000293 max: 0.030579 last check avg: 18.965657 min: 1.100000 max: 76.500000 domain: 07c1f411-8c34-4c3e-a096-48a133216016 delay avg: 0.001886 min: 0.000341 max: 0.015333 last check avg: 21.066667 min: 1.200000 max: 109.300000 domain: 7f591cbf-1787-435d-881a-d55fb77ee636 delay avg: 0.002125 min: 0.000327 max: 0.031766 last check avg: 18.268687 min: 0.100000 max: 50.600000 domain: 891c4f53-bee3-4054-9fe8-48b54fe38d65 delay avg: 0.000669 min: 0.000281 max: 0.009746 last check avg: 32.796970 min: 0.300000 max: 84.500000 domain: 8b7ef24a-57c0-43e6-9f9e-b71a8701eacc delay avg: 0.002053 min: 0.000000 max: 0.013383 last check avg: 16.969792 min: 0.200000 max: 75.000000 domain: 99bd47ea-2c85-4988-ad28-ae925a0ea3f6 delay avg: 0.001679 min: 0.000342 max: 0.010271 last check avg: 17.405051 min: -0.100000 max: 56.700000 domain: 58953ce4-c34d-49e1-85a8-1d0033978059 delay avg: 0.001879 min: 0.000324 max: 0.015990 last check avg: 18.948485 min: 0.200000 max: 64.200000 domain: 145656b4-c764-4b4f-8932-f0cc790ac7f5 delay avg: 0.001933 min: 0.000298 max: 0.014536 last check avg: 18.145455 min: -0.200000 max: 69.300000 domain: 1851daa3-660f-44b2-8949-3e1a821b9b87 delay avg: 0.001924 min: 0.000296 max: 0.008656 last check avg: 18.884848 min: 0.100000 max: 83.500000 domain: beb7a15c-2041-4298-95b9-7c73657a140e delay avg: 0.001827 min: 0.000293 max: 0.020137 last check avg: 16.979798 min: -0.100000 max: 53.600000 domain: 216ae8e3-888f-41f0-9770-82af114a845f delay avg: 0.002230 min: 0.000315 max: 0.027663 last check avg: 15.493939 min: -0.200000 max: 50.100000 domain: fe1c8078-1bf1-4cab-bea5-e0d9f665d082 delay avg: 0.001957 min: 0.000329 max: 0.019197 last check avg: 17.994949 min: 0.100000 max: 60.800000 domain: 80767ba5-7c87-45dd-9a32-7330ed889af5 delay avg: 0.001659 min: 0.000321 max: 0.006276 last check avg: 29.701010 min: 0.000000 max: 102.600000 domain: 9b66b8d0-1a39-444a-bef5-c3b9b432cb49 delay avg: 0.001557 min: 0.000309 max: 0.009697 last check avg: 19.275758 min: -0.700000 max: 98.700000 domain: ea38354b-1638-427f-b07a-fb7d68965573 delay avg: 0.002269 min: 0.000328 max: 0.020631 last check avg: 15.559596 min: -0.300000 max: 54.200000 domain: 47e996a5-7031-4a67-bbd3-a982ddbaa385 delay avg: 0.001853 min: 0.000315 max: 0.010552 last check avg: 19.190909 min: -0.100000 max: 68.300000 domain: 1f8ab569-e10c-4db2-89ea-1299c33657dc delay avg: 0.002147 min: 0.000286 max: 0.019115 last check avg: 17.875758 min: -0.100000 max: 67.300000 domain: fadcbebe-e7e9-41d7-acc2-c6f7c88ddc21 delay avg: 0.001892 min: 0.000355 max: 0.017745 last check avg: 19.050505 min: -0.000000 max: 75.000000 domain: 16ff2d50-2d03-44e7-aee4-bbde70bbc866 delay avg: 0.001890 min: 0.000298 max: 0.019597 last check avg: 18.421212 min: -0.600000 max: 67.800000 domain: 8fa43837-ac15-48f4-ad00-c928899709a1 delay avg: 0.001713 min: 0.000287 max: 0.013944 last check avg: 19.128283 min: -0.100000 max: 55.000000 domain: 582f520c-cf40-498a-b999-6c5e0056509c delay avg: 0.001845 min: 0.000329 max: 0.012668 last check avg: 17.516162 min: -0.500000 max: 74.900000 domain: f60fde72-53e0-4ead-b998-a5d985112707 delay avg: 0.001730 min: 0.000285 max: 0.021188 last check avg: 19.022222 min: 1.200000 max: 88.100000 domain: 15552cb3-d62d-4868-bc1c-64dcf8542d12 delay avg: 0.005119 min: 0.000325 max: 0.136264 last check avg: 18.754545 min: 0.400000 max: 56.600000 domain: 50885a30-5bec-4b34-98ed-ed3c3306bad0 delay avg: 0.000447 min: 0.000297 max: 0.000926 last check avg: 29.818182 min: -0.600000 max: 82.500000 domain: c14a4d42-7e18-493c-96ed-8549bdd4b0bb delay avg: 0.001669 min: 0.000306 max: 0.011661 last check avg: 19.109091 min: 0.200000 max: 103.300000 domain: 8ea8a6d5-b5e6-4120-bed0-539dd45a2457 delay avg: 0.001729 min: 0.000318 max: 0.012942 last check avg: 15.146465 min: -0.600000 max: 65.600000 domain: 57e4de3a-a5dd-4b94-8062-d2665d294268 delay avg: 0.001642 min: 0.000305 max: 0.014575 last check avg: 18.091919 min: -0.400000 max: 114.100000 domain: c5084f93-aa00-4f65-8532-65b3ca9dea7e delay avg: 0.002153 min: 0.000311 max: 0.023065 last check avg: 18.705051 min: 0.000000 max: 63.300000
Created attachment 880688 [details] repostats tool - generate repostats statistics from vdsm logs
Here what we can see on the vdsm side. To make it more clear, I include only Thread-21, the monitor thread for the domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62, and some repoStats calls. At 00:36, one minute before the copyImage task started: Thread-21::DEBUG::2014-03-25 00:36:04,687::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:36:04,704::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00 0246864 s, 16.6 MB/s\n'; <rc> = 0 Thread-21::DEBUG::2014-03-25 00:36:14,744::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:36:14,794::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00 0511668 s, 8.0 MB/s\n'; <rc> = 0 We can see that running dd takes 30-50 milliseconds. The copyImage task starts: Thread-988661::INFO::2014-03-25 00:37:41,432::logUtils::44::dispatcher::(wrapper) Run and protect: copyImage(sdUUID='b45645cf-072b-43e6-95f3-0387caa14847', spUUID='97b195c8-23ba-49e9-aea3-22080512ac66', vmUUID='', srcImgUUID='f11f2572-54e4-4a1c-af35-18c1b8d416f9', srcVolUUID='cdfa0040-4885-4340-94ef-2d11590c1911', dstImgUUID='467c8de5-3260-4f84-9f7a-a9d097b67107', dstVolUUID='e2dc71b8-3ddd-4e02-a720-c69270c21408', description='Active VM', dstSdUUID='b45645cf-072b-43e6-95f3-0387caa14847', volType=6, volFormat=4, preallocate=2, postZero='false', force='false') One minute later - we see no effect on the monitor thread: Thread-21::DEBUG::2014-03-25 00:38:35,426::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:38:35,441::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00 0404696 s, 10.1 MB/s\n'; <rc> = 0 Thread-21::DEBUG::2014-03-25 00:38:45,485::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:38:45,527::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000206213 s, 19.9 MB/s\n'; <rc> = 0 At 00:39:25, the monitor refresh the domain (this happens every 5 minutes): Thread-21::DEBUG::2014-03-25 00:39:35,676::domainMonitor::192::Storage.DomainMonitorThread::(_monitorDomain) Refreshing domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62 Thread-21::ERROR::2014-03-25 00:39:35,677::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62 Thread-21::ERROR::2014-03-25 00:39:35,677::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62 Thread-21::DEBUG::2014-03-25 00:39:35,677::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with LvMetadataRW backend Thread-21::DEBUG::2014-03-25 00:39:35,682::blockSD::326::Storage.Misc.excCmd::(readlines) '/bin/dd iflag=direct skip=0 bs=2048 if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:39:35,696::blockSD::326::Storage.Misc.excCmd::(readlines) SUCCESS: <err> = '1+0 records in\n1+0 records out\n2048 bytes (2.0 kB) copied, 0.000306704 s, 6.7 MB/s\n'; <rc> = 0 Thread-21::DEBUG::2014-03-25 00:39:35,697::misc::289::Storage.Misc::(validateDDBytes) err: ['1+0 records in', '1+0 records out', '2048 bytes (2.0 kB) copied, 0.000306704 s, 6.7 MB/s'], size: 2048 Thread-21::DEBUG::2014-03-25 00:39:35,697::persistentDict::234::Storage.PersistentDict::(refresh) read lines (LvMetadataRW)=[] Thread-21::WARNING::2014-03-25 00:39:35,697::persistentDict::256::Storage.PersistentDict::(refresh) data has no embedded checksum - trust it as it is Thread-21::DEBUG::2014-03-25 00:39:35,698::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with VGTagMetadataRW backend Thread-21::DEBUG::2014-03-25 00:39:35,698::lvm::516::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex Thread-21::DEBUG::2014-03-25 00:39:35,699::lvm::518::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex Thread-21::DEBUG::2014-03-25 00:39:35,699::lvm::526::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex Thread-21::DEBUG::2014-03-25 00:39:35,699::lvm::538::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex Thread-21::DEBUG::2014-03-25 00:39:35,700::lvm::386::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-21::DEBUG::2014-03-25 00:39:35,701::lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/ECPB_01|/dev/mapper/ECPB_02|/dev/mapper/ECPB_03|/dev/mapper/ECPB_04|/dev/mapper/ECPB_SATA_01|/dev/mapper/ECPB_SATA_02|\', \'r|.*|\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62' (cwd None) Thread-21::DEBUG::2014-03-25 00:39:35,979::lvm::309::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0 Thread-21::DEBUG::2014-03-25 00:39:35,981::lvm::428::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-21::DEBUG::2014-03-25 00:39:35,981::persistentDict::234::Storage.PersistentDict::(refresh) read lines (VGTagMetadataRW)=['LEASETIMESEC=60', 'SDUUID=2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62', 'LOCKRENEWALINTERVALSEC=5', 'LOGBLKSIZE=512', u'PV0=pv:ECPB_SATA_01,uuid:gr8Rdi-tJNf-iF8K-WW3m-rjSe-DJsa-HAV90O,pestart:0,pecount:8189,mapoffset:0', 'LEASERETRIES=3', 'DESCRIPTION=THW_B_SATA_DATA', 'IOOPTIMEOUTSEC=10', 'VGUUID=f0bA1D-h1SV-UuBs-YIan-e5nM-d5AX-meaq8z', u'PV1=pv:ECPB_SATA_02,uuid:FegIWm-G3Sw-Hf32-4gpl-GTuA-nspj-gDb2Eg,pestart:0,pecount:8189,mapoffset:8189', 'PHYBLKSIZE=512', 'CLASS=Data', 'TYPE=FCP', 'LOCKPOLICY=', 'VERSION=3', 'POOL_UUID=97b195c8-23ba-49e9-aea3-22080512ac66', 'POOL_DESCRIPTION=THW', 'ROLE=Master', 'MASTER_VERSION=2', 'POOL_SPM_ID=1', 'POOL_SPM_LVER=12', u'POOL_DOMAINS=b45645cf-072b-43e6-95f3-0387caa14847:Active,2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62:Active,f7e3876e-cb2d-4b8a-af4b-1f197c50c2f4:Active', '_SHA_CKSUM=c6cde602c4f6b7662e8044a091dfe3a8037c819b'] Thread-21::INFO::2014-03-25 00:39:35,984::sd::375::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_imageNS already registered Thread-21::INFO::2014-03-25 00:39:35,984::sd::383::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_volumeNS already registered Thread-21::INFO::2014-03-25 00:39:35,985::blockSD::446::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_lvmActivationNS already registered Thread-21::DEBUG::2014-03-25 00:39:35,985::lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgck --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/ECPB_SATA_01|/dev/mapper/ECPB_SATA_02|\', \'r|.*|\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62' (cwd None) Thread-21::DEBUG::2014-03-25 00:39:36,199::lvm::309::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0 Thread-21::DEBUG::2014-03-25 00:39:36,205::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:39:36,222::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00 The domain refresh took about 1 second - the task has no effect on the monitor thread so far. We start to see trouble here about 10 minutes after starting the copyImage task - so it seems that the copyImage task is not related to this issue. Running dd takes now 3-10 seconds: Thread-21::DEBUG::2014-03-25 00:47:21,183::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:47:24,227::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00 0293797 s, 13.9 MB/s\n'; <rc> = 0 Thread-21::DEBUG::2014-03-25 00:47:43,756::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:47:53,835::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00 0245865 s, 16.7 MB/s\n'; <rc> = 0 And now the monitor thread try to refresh the domain again, and it takes about 94 seconds: Thread-21::DEBUG::2014-03-25 00:49:53,611::domainMonitor::192::Storage.DomainMonitorThread::(_monitorDomain) Refreshing domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62 Thread-21::ERROR::2014-03-25 00:50:04,486::sdc::137::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62 Thread-21::ERROR::2014-03-25 00:50:04,494::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain) looking for domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62 Thread-21::DEBUG::2014-03-25 00:50:04,559::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with LvMetadataRW backend Thread-21::DEBUG::2014-03-25 00:50:04,731::blockSD::326::Storage.Misc.excCmd::(readlines) '/bin/dd iflag=direct skip=0 bs=2048 if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/meta data count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:50:08,419::blockSD::326::Storage.Misc.excCmd::(readlines) SUCCESS: <err> = '1+0 records in\n1+0 records out\n2048 bytes (2.0 kB) copied, 0.00018 7592 s, 10.9 MB/s\n'; <rc> = 0 Thread-21::DEBUG::2014-03-25 00:50:10,066::misc::289::Storage.Misc::(validateDDBytes) err: ['1+0 records in', '1+0 records out', '2048 bytes (2.0 kB) copied, 0.000187592 s, 10. 9 MB/s'], size: 2048 Thread-21::DEBUG::2014-03-25 00:50:15,917::persistentDict::234::Storage.PersistentDict::(refresh) read lines (LvMetadataRW)=[] Thread-21::WARNING::2014-03-25 00:50:15,930::persistentDict::256::Storage.PersistentDict::(refresh) data has no embedded checksum - trust it as it is Thread-21::DEBUG::2014-03-25 00:50:15,945::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with VGTagMetadataRW backend Thread-21::DEBUG::2014-03-25 00:50:15,985::lvm::516::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex Thread-21::DEBUG::2014-03-25 00:50:16,016::lvm::518::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex Thread-21::DEBUG::2014-03-25 00:50:16,042::lvm::526::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex Thread-21::DEBUG::2014-03-25 00:50:16,066::lvm::538::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex Thread-21::DEBUG::2014-03-25 00:50:16,087::lvm::386::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-21::DEBUG::2014-03-25 00:50:18,907::lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ign ore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/ECPB_01|/dev/mapper/ECPB_02|/dev/mapper/ECPB_0 3|/dev/mapper/ECPB_04|/dev/mapper/ECPB_SATA_01|/dev/mapper/ECPB_SATA_02|\', \'r|.*|\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { ret ain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg _mda_size,vg_mda_free,lv_count,pv_count,pv_name 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62' (cwd None) Thread-21::DEBUG::2014-03-25 00:50:36,728::lvm::309::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0 Thread-21::DEBUG::2014-03-25 00:50:36,772::lvm::428::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-21::DEBUG::2014-03-25 00:50:36,781::persistentDict::234::Storage.PersistentDict::(refresh) read lines (VGTagMetadataRW)=['LEASETIMESEC=60', 'SDUUID=2c5f2291-ac31-4c9e-8f 0a-3d317f3f0d62', 'LOCKRENEWALINTERVALSEC=5', 'LOGBLKSIZE=512', u'PV0=pv:ECPB_SATA_01,uuid:gr8Rdi-tJNf-iF8K-WW3m-rjSe-DJsa-HAV90O,pestart:0,pecount:8189,mapoffset:0', 'LEASERET RIES=3', 'DESCRIPTION=THW_B_SATA_DATA', 'IOOPTIMEOUTSEC=10', 'VGUUID=f0bA1D-h1SV-UuBs-YIan-e5nM-d5AX-meaq8z', u'PV1=pv:ECPB_SATA_02,uuid:FegIWm-G3Sw-Hf32-4gpl-GTuA-nspj-gDb2Eg, pestart:0,pecount:8189,mapoffset:8189', 'PHYBLKSIZE=512', 'CLASS=Data', 'TYPE=FCP', 'LOCKPOLICY=', 'VERSION=3', 'POOL_UUID=97b195c8-23ba-49e9-aea3-22080512ac66', 'POOL_DESCRIPT ION=THW', 'ROLE=Master', 'MASTER_VERSION=2', 'POOL_SPM_ID=1', 'POOL_SPM_LVER=12', u'POOL_DOMAINS=b45645cf-072b-43e6-95f3-0387caa14847:Active,2c5f2291-ac31-4c9e-8f0a-3d317f3f0d6 2:Active,f7e3876e-cb2d-4b8a-af4b-1f197c50c2f4:Active', '_SHA_CKSUM=c6cde602c4f6b7662e8044a091dfe3a8037c819b'] Thread-21::INFO::2014-03-25 00:50:58,323::sd::375::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_imageNS already registered Thread-21::INFO::2014-03-25 00:50:58,336::sd::383::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_volumeNS already registered Thread-21::INFO::2014-03-25 00:50:58,358::blockSD::446::Storage.StorageDomain::(_registerResourceNamespaces) Resource namespace 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62_lvmActivationNS already registered Thread-21::DEBUG::2014-03-25 00:50:58,361::lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgck --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ig nore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ \'a|/dev/mapper/ECPB_SATA_01|/dev/mapper/ECPB_SATA_02|\', \'r|.*|\' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 } backup { retain_min = 50 retain_days = 0 } " 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62' (cwd None) Thread-21::DEBUG::2014-03-25 00:51:17,182::lvm::309::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ''; <rc> = 0 Thread-988992::INFO::2014-03-25 00:51:17,200::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'b45645cf-072b-43e6-95f3-0387caa14847': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000187753', 'lastCheck': '22.1', 'valid': True}, u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000255575', 'lastCheck': '94.0', 'valid': True}, u'f7e3876e-cb2d-4b8a-af4b-1f197c50c2f4': {'code': 0, 'version': 0, 'acquired': True, 'delay': '0.00036752', 'lastCheck': '18.8', 'valid': True}} Thread-21::DEBUG::2014-03-25 00:51:25,000::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:51:33,854::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.000181862 s, 22.5 MB/s\n'; <rc> = 0 Noteable delays while refreshing the domain: - persistentDict::234::Storage.PersistentDict::(refresh) - 22 seconds - lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs - 18 seconds - lvm::309::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgck - 18 seconds Seems that the host is under extreme load, or swaping memory. We know that the storage is ok becasue the last readDelay show tiny delay and high rate. Running dd continue to be randomally slow from time to time: Thread-21::DEBUG::2014-03-25 00:52:03,742::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:52:04,043::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00 0264309 s, 15.5 MB/s\n'; <rc> = 0 Thread-21::DEBUG::2014-03-25 00:52:36,920::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:52:58,171::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00 0229393 s, 17.9 MB/s\n'; <rc> = 0 Thread-21::DEBUG::2014-03-25 00:53:10,253::blockSD::595::Storage.Misc.excCmd::(getReadDelay) '/bin/dd iflag=direct if=/dev/2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62/metadata bs=4096 count=1' (cwd None) Thread-21::DEBUG::2014-03-25 00:53:10,440::blockSD::595::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00021057 s, 19.5 MB/s\n'; <rc> = 0 At 00:55:21,879, the next domain refresh starts: Thread-21::DEBUG::2014-03-25 00:55:21,879::domainMonitor::192::Storage.DomainMonitorThread::(_monitorDomain) Refreshing domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62 This time the refresh took 50 seconds. And now come the next repoStat call: Thread-989035::INFO::2014-03-25 00:56:11,713::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'b45645cf-072b-43e6-95f3-0387caa14847': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000218881', 'lastCheck': '20.3', 'valid': True}, u'2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62': {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000190666', 'lastCheck': '60.1', 'valid': True}, u'f7e3876e-cb2d-4b8a-af4b-1f197c50c2f4': {'code': 0, 'version': 0, 'acquired': True, 'delay': '0.00377639', 'lastCheck': '0.6', 'valid': True}} The last repoStats call was at 00:51:17,200 - seems that engine stopped monitoring the host for 5 minutes. It seems that the engine 5 minutes timeout and dmain monitor 5 mintues timeout between refreshes work together to ensure that a host becomes non-operational if the host is under high load.
I'll provide RCA here and what needs to be done - We have a succesfull vds refresh on which the domain is being reported as problematic. ----------------------------------------------------------------------------- 2014-03-25 00:52:01,985 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-4-thread-44) domain 2c5f2291-ac31-4c9e-8f0a-3d317f3f0d62:THW_B_SATA_DATA in problem. vds: ecp08.thw.itio.com ----------------------------------------------------------------------------- then it's attempted to stop a vm - this execution takes an exclusive lock (synchronized block) on _vdsManager.getLockObj() which is also the lock used for performing vds refresh. ----------------------------------------------------------------------------- [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-49) [77dded4d] START, DestroyVmVDSCommand(HostName = ecp08.thw.itio.com, HostId = 26c25a9a-7af7-4e62-8289-4e04d2f5767f, vmId=d1f92602-0536-4e8e-bb6a-7af1a0afd4c1, force=false, secondsToWait=0, gracefully=false), log id: 1d9ba1b7 2014-03-25 00:52:48,132 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (pool-4-thread-49) [77dded4d] START, DestroyVDSCommand(HostName = ecp08.thw.itio.com, HostId = 26c25a9a-7af7-4e62-8289-4e04d2f5767f, vmId=d1f92602-0536-4e8e-bb6a-7af1a0afd4c1, force=false, secondsToWait=0, gracefully=false), log id: 103c2234 ----------------------------------------------------------------------------- The duration of the vm stop operation is long (more than 3 minutes) due to various reason on vdsm side (but that's regardless to this bug) ----------------------------------------------------------------------------- Thread-989013::DEBUG::2014-03-25 00:55:51,385::BindingXMLRPC::982::vds::(wrapper) return vmDestroy with {'status': {'message': 'Machine destroyed', 'code': 0}} ----------------------------------------------------------------------------- in the meanwhile (during those 3 minutes) - there were multiple attempts to stop the vm, leading to multiple threads to pile up waiting for the exclusive lock - ----------------------------------------------------------------------------- 1: 2014-03-25 00:53:25,168 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-44) [582e54a6] START, DestroyVmVDSCommand(HostName = ecp08. thw.itio.com, HostId = 26c25a9a-7af7-4e62-8289-4e04d2f5767f, vmId=d1f92602-0536-4e8e-bb6a-7af1a0afd4c1, force=false, secondsToWait=0, gracefully=false), log i d: 19c8e61b 2: 2014-03-25 00:54:13,921 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-48) [73933d5d] START, DestroyVmVDSCommand(HostName = ecp08.thw.itio.com, HostId = 26c25a9a-7af7-4e62-8289-4e04d2f5767f, vmId=d1f92602-0536-4e8e-bb6a-7af1a0afd4c1, force=false, secondsToWait=0, gracefully=false), log id: 1ad87aca ----------------------------------------------------------------------------- only now the first stop attempt is ended ----------------------------------------------------------------------------- 2014-03-25 00:55:48,133 ERROR [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-49) [77dded4d] VDS::destroy Failed destroying vm d1f92602-0536-4e8e-bb6a-7af1a0afd4c1 in vds = 26c25a9a-7af7-4e62-8289-4e04d2f5767f : ecp08.thw.itio.com, error = org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.util.concurrent.TimeoutException 2014-03-25 00:55:48,133 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (pool-4-thread-49) [77dded4d] FINISH, DestroyVmVDSCommand, log id: 1d9ba1b7 ----------------------------------------------------------------------------- as there were few threads waiting for the exclusive lock including the VdsManager.OnTimer() method - refresh won't occur (thus there's no call to repoStats meanwhile) - as the host did report the domain as problematic and no "opposite" report was collected (as vds manager won't collect stats waiting for the lock), the host will move to non operational. Seems like there's no reason that at least "regular" (or let's call it - "unrelated" to vm's status) stats won't be collected from the host when it's attempted to stop a vm (stats like vm stats may be collected as well if it's safe). Allon, how do you want to handle it?
regardless, we shouldn't allow multiple execution of DestroyVmVDSCommand which will hide this problem for "normal" config values, seems like there's no reason to allow multiple concurrent executions.
(In reply to Liron Aravot from comment #19) > I'll provide RCA here and what needs to be done - > > as there were few threads waiting for the exclusive lock including the > VdsManager.OnTimer() method - refresh won't occur (thus there's no call to > repoStats meanwhile) - as the host did report the domain as problematic and > no "opposite" report was collected (as vds manager won't collect stats > waiting for the lock), the host will move to non operational. > > Seems like there's no reason that at least "regular" (or let's call it - > "unrelated" to vm's status) stats won't be collected from the host when it's > attempted to stop a vm (stats like vm stats may be collected as well if it's > safe). Hi Barak, Please see Liron's analysis above (comment 19 has the full details). In a nutshell, host monitoring and some of the VM lifecycle commands (starting and stopping) take the same lock. Since these may be long operations (in this case - stopping a VM), a host may MISTAKENLY be perceived as though it's not updating its stats, and then be moved to non-op. Can one of your guys please take a look at this? If this is true, the issue is embedded in the host/vm lifecycle, and should be handled by the appropriate experts.
regardless, we can also take a look into why we take an exclusive look prevents simultenous operations on different vms on the same host.
(In reply to Allon Mureinik from comment #21) > (In reply to Liron Aravot from comment #19) > > I'll provide RCA here and what needs to be done - > > > > as there were few threads waiting for the exclusive lock including the > > VdsManager.OnTimer() method - refresh won't occur (thus there's no call to > > repoStats meanwhile) - as the host did report the domain as problematic and > > no "opposite" report was collected (as vds manager won't collect stats > > waiting for the lock), the host will move to non operational. > > > > Seems like there's no reason that at least "regular" (or let's call it - > > "unrelated" to vm's status) stats won't be collected from the host when it's > > attempted to stop a vm (stats like vm stats may be collected as well if it's > > safe). > > Hi Barak, > > Please see Liron's analysis above (comment 19 has the full details). > In a nutshell, host monitoring and some of the VM lifecycle commands > (starting and stopping) take the same lock. Since these may be long > operations (in this case - stopping a VM), a host may MISTAKENLY be > perceived as though it's not updating its stats, and then be moved to non-op. > > Can one of your guys please take a look at this? > If this is true, the issue is embedded in the host/vm lifecycle, and should > be handled by the appropriate experts. Hi Omer, Barak suggested you may be the right contact point for this, as the problem seems to be initiated by a VM stop operation taking too long. Can you take a look please? Thanks!
continuing from comment #19, vdsm side, Thread-989013: 1) system seems to be heavily overloaded, all the operations seem to take very long time 2) a) the destroy operation got stuck in libvirt for cca 16s b) this is followed by 30s wait in vdsm before we retry to kill it c) second attempt to destroy domain failed in libvirt after next 16s followed by a bogus WARN message that the VM is not running when it actually is d) we continue with freeing up the resources regardless e) teardownVolumePath called at ~00:54:01. Followed by a flood of storage-related delays and errors f) finally "destroyed" at 00:55:51 and returning Machine destroyed (even when it's not) assuming host time is in sync with engine(looks ok as per the end time), the overall performance is really bad. The destroy command got logged in vdsm at 00:52:56 whereas engine sent it at 00:52:48 we can consider spawning a separate thread for destroy flow in the future.
from my perspective the NonOperational here is the best resolution. The state of the VM in question is serious, it cannot be killed and it is reported to engine as down when it is not. We do not want to do change suggested in comment #22 as we don't want to get stats during the state transition commands IMO with this kind of unresponsive system we can't do anything else but tweak(increase) the timeouts all around the place. Perhaps some process/IO priorities should be set differently. @Roman, libvirt log would be handy @Barak - infra insights regarding the responsiveness? @Sean - I think it's back on storage. The induced load and delay in lvm operations seems to be the most substantial part of the delays @Francesco - let's try to make "destroy" more reliable and tweak the wait times
Michal, I don't think that this is the issue here. The destory operation/other operations that take that lock can always take time for whatever reason. I'm not sure why we don't want to collect vm stats or to have exclusive lock between various unrelated operations (i think that we should change that anyway), specifically - other stats (let's leave vm stats "out" for the discussion) can and should be collected/processed as they aren't related to vms and shouldn't be using the same exclusive lock.
Liron, it's not designed to take time. Operations which are supposed to take time are spawning its own thread. The fact that it takes 62s is a mistake we should address, however in the overall duration it's not that significant. There is something seriously wrong with that VM since it can't be killed. VM stats: It's true we don't need to lock for unrelated operations, but this one is related - destroy is a state transition and we don't want to get stats in the middle. Host stats: these, on the other hand, should be collected/called regardless VM-related operations. The trouble is the code is all in one place, there is only a single timer scheduled after the previous call finishes. This needs infra changes. I'll follow up with Roy Golan on this As noted before, the host is under extreme load, to the state when it is almost unusable, there's not silver bullet for that. I believe that needs to be addressed (or at least understood) first.
(In reply to Michal Skrivanek from comment #28) > Liron, it's not designed to take time. Operations which are supposed to take > time are spawning its own thread. The fact that it takes 62s is a mistake we > should address, however in the overall duration it's not that significant. > There is something seriously wrong with that VM since it can't be killed. > VM stats: > It's true we don't need to lock for unrelated operations, but this one is > related - destroy is a state transition and we don't want to get stats in > the middle. > Host stats: > these, on the other hand, should be collected/called regardless VM-related > operations. The trouble is the code is all in one place, there is only a > single timer scheduled after the previous call finishes. This needs infra > changes. > I'll follow up with Roy Golan on this > > > As noted before, the host is under extreme load, to the state when it is > almost unusable, there's not silver bullet for that. I believe that needs to > be addressed (or at least understood) first. agree - from the system point of view, this host, being under stress and slow to react to the engine command is INDEED Not Operational. all the lock and threads contention on the shared lock is a side effect and merely noise for that matter.
(In reply to Michal Skrivanek from comment #28) > Liron, it's not designed to take time Then the design is wrong and should be fixed. > Host stats: > these, on the other hand, should be collected/called regardless VM-related > operations. These *must* be collected regardless of other operations. > As noted before, the host is under extreme load, to the state when it is > almost unusable, there's not silver bullet for that. I believe that needs to > be addressed (or at least understood) first. Host being overloaded is not a bug - this is normal situation that engine must handle.
(In reply to Roy Golan from comment #29) > (In reply to Michal Skrivanek from comment #28) > agree - from the system point of view, this host, being under stress and > slow to react to the engine command is INDEED Not Operational. It is operational, but slow; It is possible that vdsm is very slow becasue of memory swapping, and other processes are in good enough state. We don't have enough infomration about host health. > all the lock and threads contention on the shared lock is a side effect and > merely noise for that matter. Taking exclusive lock and blocking unrelated operations is not noise.
Doron: do we collect and analyze some info about host health in engine, that could help avoid such issues on the host?
(In reply to Nir Soffer from comment #30) Host being overloaded is not a bug, but it's not a normal situation either. If vdsm service is not operational then from RHEV point of view the host is not operational. I don't care about other services for this resolution. (In reply to Nir Soffer from comment #32) yes, the host statistics. Though AFAIK we don't collect "load", just memory-related info which is not really enough We probably should improve host monitoring, on the other hand it's not the core purpose of RHEV, so there needs to be a line.
(In reply to Nir Soffer from comment #32) > Doron: do we collect and analyze some info about host health in engine, that > could help avoid such issues on the host? Nir, the engine get the results of getVdsStats in the form of- cpuIdle = '99.72' cpuLoad = '0.14' cpuSys = '0.11' cpuSysVdsmd = '0.50' cpuUser = '0.17' cpuUserVdsmd = '0.62' But we cannot really tell which process goes wild. In 3.5 we plan to use cgroups limitations for VMs, to make sure a VM will not go wild.
bug 1091389 may help in the specific case of a problematic VM (as happened here), bug 1098791 should address the excessive duration of VM statistics processing(blocking the host stats), and rgolan to open another one for the actual separation of VM stats from host related stats(including storage domain monitoring) - Roy, please fill it in
oppend Bug 1099068
*** Bug 1214584 has been marked as a duplicate of this bug. ***
Created attachment 1061456 [details] lastCheck degradation
May be related to bug 1247075. I suggest to test with https://gerrit.ovirt.org/45738 when the patch is ready.
The bug was verified on rhevm-setup-plugin-ovirt-engine-common-3.6.3-0.1.el6.noarch ovirt-engine-extension-aaa-jdbc-1.0.5-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.3-0.1.el6.noarch vdsm-python-4.17.19-0.el7ev.noarch vdsm-hook-vmfex-dev-4.17.19-0.el7ev.noarch vdsm-jsonrpc-4.17.19-0.el7ev.noarch vdsm-yajsonrpc-4.17.19-0.el7ev.noarch vdsm-xmlrpc-4.17.19-0.el7ev.noarch vdsm-cli-4.17.19-0.el7ev.noarch vdsm-4.17.19-0.el7ev.noarch vdsm-infra-4.17.19-0.el7ev.noarch Since reproducing flow is not clear we decided to verify in following scenario: - RHEVM has 20 NFS Storage Domains and 100 VMs - I ran automatic load test of 100 concurrent threads which perform different operations on VMs such as getinfo/stop/start/etc using JMeter - in the same time I manually performed actions such as deactivate/activate/dettach/attach of several SDs from 20 - during test I collected statistics data of lastCheck per each SD and saved to file (see attached lastcheck_sd2.txt) I don't see any slow down in Storage domain monitoring operations. We can close bug if it is satisfied for developers.
Created attachment 1120440 [details] last check statistics - bug verification
Yuri, I am curious, if you ran the same tests before the fix and saw a significant difference. So that we can say that this is absolutely a fix.
Unfortunately, I didn't run it before Only with last version. So, you must decide and take in consideration that we don't have time/resource to reproduce it with early version.
(In reply to Yuri Obshansky from comment #105) > The bug was verified on > rhevm-setup-plugin-ovirt-engine-common-3.6.3-0.1.el6.noarch > ovirt-engine-extension-aaa-jdbc-1.0.5-1.el6ev.noarch > rhevm-setup-plugin-ovirt-engine-3.6.3-0.1.el6.noarch > vdsm-python-4.17.19-0.el7ev.noarch > vdsm-hook-vmfex-dev-4.17.19-0.el7ev.noarch > vdsm-jsonrpc-4.17.19-0.el7ev.noarch > vdsm-yajsonrpc-4.17.19-0.el7ev.noarch > vdsm-xmlrpc-4.17.19-0.el7ev.noarch > vdsm-cli-4.17.19-0.el7ev.noarch > vdsm-4.17.19-0.el7ev.noarch > vdsm-infra-4.17.19-0.el7ev.noarch > > Since reproducing flow is not clear we decided to verify in following > scenario: > - RHEVM has 20 NFS Storage Domains and 100 VMs > - I ran automatic load test of 100 concurrent threads which perform > different operations on VMs such as getinfo/stop/start/etc using JMeter > - in the same time I manually performed actions such as > deactivate/activate/dettach/attach of several SDs from 20 > - during test I collected statistics data of lastCheck per each SD and saved > to file (see attached lastcheck_sd2.txt) > > I don't see any slow down in Storage domain monitoring operations. > > We can close bug if it is satisfied for developers. You have collected 167 stat info data and I don't think it's enough to verify the bug at all. Anyway you can check first few lines and last few lines. Looks like degradation is still here. At least there are values >5 secs (and even >10 secs) which is not normal.
(In reply to Pavel Zhukov from comment #109) > (In reply to Yuri Obshansky from comment #105) > > The bug was verified on > > rhevm-setup-plugin-ovirt-engine-common-3.6.3-0.1.el6.noarch > > ovirt-engine-extension-aaa-jdbc-1.0.5-1.el6ev.noarch > > rhevm-setup-plugin-ovirt-engine-3.6.3-0.1.el6.noarch > > vdsm-python-4.17.19-0.el7ev.noarch > > vdsm-hook-vmfex-dev-4.17.19-0.el7ev.noarch > > vdsm-jsonrpc-4.17.19-0.el7ev.noarch > > vdsm-yajsonrpc-4.17.19-0.el7ev.noarch > > vdsm-xmlrpc-4.17.19-0.el7ev.noarch > > vdsm-cli-4.17.19-0.el7ev.noarch > > vdsm-4.17.19-0.el7ev.noarch > > vdsm-infra-4.17.19-0.el7ev.noarch > > > > Since reproducing flow is not clear we decided to verify in following > > scenario: > > - RHEVM has 20 NFS Storage Domains and 100 VMs > > - I ran automatic load test of 100 concurrent threads which perform > > different operations on VMs such as getinfo/stop/start/etc using JMeter > > - in the same time I manually performed actions such as > > deactivate/activate/dettach/attach of several SDs from 20 > > - during test I collected statistics data of lastCheck per each SD and saved > > to file (see attached lastcheck_sd2.txt) > > > > I don't see any slow down in Storage domain monitoring operations. > > > > We can close bug if it is satisfied for developers. > > You have collected 167 stat info data and I don't think it's enough to > verify the bug at all. Anyway you can check first few lines and last few > lines. Looks like degradation is still here. At least there are values >5 > secs (and even >10 secs) which is not normal. Actually, that stat is second part of test. First part was only load and second load and operations with SD If you say it is not normal, so I'll reopen bug.
(In reply to Pavel Zhukov from comment #109) > You have collected 167 stat info data and I don't think it's enough to > verify the bug at all. Anyway you can check first few lines and last few > lines. Looks like degradation is still here. At least there are values >5 > secs (and even >10 secs) which is not normal. lastCheck values around 10 seconds are normal, there is no difference between 0-10 seconds (it just show the difference between the time engine did the query and the time the domain monitor thread wakes up. Values bigger then 20 seconds mean that domain monitor is stuck on storage, this is not a bug in itself. Yuri, can you explain what are the result you expected, and why you think that the behavior is incorrect?
Yuri, the description of the bug says: Description of problem: The monitoring thread is triggered every 10 seconds by default. As soon as a storage operation is triggered ( copy image for instance ), the monitoring thread slows down. Did you do any copy image operations? can you show that the monitor threads are slowing down significantly while storage operation are running?
Nir, this is weird bug. The description of the bug says: Steps to Reproduce: Not clear yet So, I don't have any idea how to reproduce it. I did what I did. PMs and developers must decide to close it or not. If not, Roman or somebody else must provide clear scenario and expected result for reproducing that weird bug. I have 20 scale bugs in queue for reproducing so, I'm not able invest more time in it. Hope, you all understand the situation.
(In reply to Nir Soffer from comment #111) > (In reply to Pavel Zhukov from comment #109) > > You have collected 167 stat info data and I don't think it's enough to > > verify the bug at all. Anyway you can check first few lines and last few > > lines. Looks like degradation is still here. At least there are values >5 > > secs (and even >10 secs) which is not normal. > Hi Nir, Even in this small test we can see values 10.1 . I think if disks operation will be triggered as you suggested and test will take at least few hours we'll see bigger values. So as suggested It's better to test it under load and for some reasonable amount of time... 1) Attach 40 SDs 2) Create VMs (XX per domain?) 3) Create templates from let's say 2 VMs wait for 15 minutes, create templates etc LSM can be used instead of templates or in addition to 4) Check vdsm.log Does it make sense? > lastCheck values around 10 seconds are normal, there is no difference > between 0-10 > seconds (it just show the difference between the time engine did the query > and the > time the domain monitor thread wakes up. Agree but if domain monitor wakes up in 20 secs (for example) it means it slows down as bug description says. Please correct me if I'm mistaken. > > Values bigger then 20 seconds mean that domain monitor is stuck on storage, > this > is not a bug in itself. Not sure if I understood this. We have "delay" value there which is pretty stable usually but lastCheck values grows constantly. I can be mistaken here but it doesn't look as monitoring thread is stuck on storage. And the problem is this bug can overlap with general vdsm performance bug we have... > > Yuri, can you explain what are the result you expected, and why you think > that > the behavior is incorrect?
(In reply to Pavel Zhukov from comment #114) > (In reply to Nir Soffer from comment #111) > > (In reply to Pavel Zhukov from comment #109) > > > You have collected 167 stat info data and I don't think it's enough to > > > verify the bug at all. Anyway you can check first few lines and last few > > > lines. Looks like degradation is still here. At least there are values >5 > > > secs (and even >10 secs) which is not normal. > > > Hi Nir, > Even in this small test we can see values 10.1 . 10.1 does not mean anything, good as 0.1. > I think if disks operation > will be triggered as you suggested and test will take at least few hours > we'll see bigger values. So as suggested It's better to test it under load > and for some reasonable amount of time... > 1) Attach 40 SDs > 2) Create VMs (XX per domain?) > 3) Create templates from let's say 2 VMs wait for 15 minutes, create > templates etc LSM can be used instead of templates or in addition to > 4) Check vdsm.log > > Does it make sense? Yes. I would do this: 1. baseline - 50 storage domains - run one vm per core - no storage operations 2. with storage operations - 50 storage domains - run one vm per core - peform lsm on each vm disk, keeping 10 concurrent operations (vdsm will not run more than 10 at the same time) Compare the logs of both runs. > > lastCheck values around 10 seconds are normal, there is no difference > > between 0-10 > > seconds (it just show the difference between the time engine did the query > > and the > > time the domain monitor thread wakes up. > Agree but if domain monitor wakes up in 20 secs (for example) it means it > slows down as bug description says. Please correct me if I'm mistaken. 20 seconds means that the domain monitor was blocked during monitoring for about 10 seconds. If during this time the storage is accessible, it means that the monitor thread was not block on storage but on some lock in vdsm. Due to the way vdsm is designed, we cannot guarantee the internal of the monitor threads; the thread sleeps 10 seconds, then perform operations that usually takes less then a second. But every 5 minutes, the domain monitor preform a manual refresh of the storage domain, and this may involve scsi/hba/multipath scan, which can block for 30 seconds, or even more (in abnormal conditions, we have seen multipath calls blocked forever). Refreshing storage and rescaning multipath is protected by locks, so only one thread can perform them. If multiple threads are waiting on these locks, only one of them will perform the operation, and the rest use the return value of the thread that got that lock. However, if the operation takes lot of time, all the monitor threads will be blocked, causing high and increasing lastCheck values until the thread that took the lock finish and all threads return. I think this part in vdsm requires a redesign, but this can be done only for 4.x. So what can we expect from the current code? - lastCheck values mostly around 10 seconds when storage is accessible - Occasional high values (e.g. 1--30) are possible due the the issue described above lastCheck values up to 30 seconds do not effect the normal operation of the system, since engine wait about 5 minutes before deciding that a domain is in trouble. If engine decide that storage domain is in trouble while the domain is accessible, the monitoring mechanism failed, and should be improved. > > Values bigger then 20 seconds mean that domain monitor is stuck on storage, > > this > > is not a bug in itself. > Not sure if I understood this. We have "delay" value there which is pretty > stable usually but lastCheck values grows constantly. Can you show an example of lastCheck values growing constantly in this test? This can be caused if one monitor thread is block forever calling multipath, and other threads are blocked waiting for this thread. We have seen this in rhel 7 (also in our lab), and it should be fixed in 7.2. Please point me to a bug or customer case with such issue. > I can be mistaken here > but it doesn't look as monitoring thread is stuck on storage. And the > problem is this bug can overlap with general vdsm performance bug we have... The performance issue when running many vms on many cores should be solved in 3.6 (and in 3.5.6 with manual configuration) using cpu affinity.
no info i'm in the middle of other bugs verification reassign to Roman
So. rhevm-3.6.2.6-0.1.el6.noarch vdsm-4.17.18-0.el7ev.noarch I tried to add 50 iscsi SDs with a script. After adding 30th SD host became unresponsive and lastCheck value reached 70+ secs. It was first attempt and nested virtualization was used (I know it's unsupported). Second attempt was using more powerful bare metal host: 12 cores / 32GB RAM and the exactly after adding 30th SD host started flipped between active and NonResponsive with "TimeOut exception". Host was accessible via ping/ssh All three hosts (rhevm, rhelh and LIO) are in the same physical datacenter (rdu). VMs/Images were not created yet
Created attachment 1121610 [details] vdsm logs
Created attachment 1121611 [details] engine logs
(In reply to Pavel Zhukov from comment #117) > So. > rhevm-3.6.2.6-0.1.el6.noarch > vdsm-4.17.18-0.el7ev.noarch > > I tried to add 50 iscsi SDs with a script. After adding 30th SD host became > unresponsive and lastCheck value reached 70+ secs. It was first attempt and > nested virtualization was used (I know it's unsupported). > Second attempt was using more powerful bare metal host: 12 cores / 32GB RAM > and the exactly after adding 30th SD host started flipped between active and > NonResponsive with "TimeOut exception". Host was accessible via ping/ssh > All three hosts (rhevm, rhelh and LIO) are in the same physical datacenter > (rdu). VMs/Images were not created yet This is interesting results, but I'm not sure it is related to this bug, since adding 50 domains using a script is not a common use case, and this is not what this bug is about. Please open a new bug for this issue, and we will continue there.
(In reply to Nir Soffer from comment #124) > (In reply to Pavel Zhukov from comment #117) > > So. > > rhevm-3.6.2.6-0.1.el6.noarch > > vdsm-4.17.18-0.el7ev.noarch > > > > I tried to add 50 iscsi SDs with a script. After adding 30th SD host became > > unresponsive and lastCheck value reached 70+ secs. It was first attempt and > > nested virtualization was used (I know it's unsupported). > > Second attempt was using more powerful bare metal host: 12 cores / 32GB RAM > > and the exactly after adding 30th SD host started flipped between active and > > NonResponsive with "TimeOut exception". Host was accessible via ping/ssh > > All three hosts (rhevm, rhelh and LIO) are in the same physical datacenter > > (rdu). VMs/Images were not created yet > > This is interesting results, but I'm not sure it is related to this bug, > since > adding 50 domains using a script is not a common use case, and this is not > what > this bug is about. > > Please open a new bug for this issue, and we will continue there. I will investigate it but for me it looks like the same issue (storage operation slows vdsm thread down). Anyway bug was reproduced many times with blockSD but QA try to "verify" it using NFS.
Installation as per #117 (physical host). 1) DC with 28 SDs was stable last two days. 2) Create 4 disks 1GB each (no VMs so VM monitoring is not involved here as well). No script were involved. lastCheck grew a bit during the creation (up to 40 secs). See logs 3) Copy 4 disks to another SD 4) After some time lastCheck value start growing and reached values more than 200 secs (seems like few SDs (more than 2 affected by coping) are not updating at all) 5) Host marked as non-responsive.
Created attachment 1122471 [details] vdsm log
Created attachment 1122472 [details] engine log
(In reply to Pavel Zhukov from comment #126) > Installation as per #117 (physical host). > > 1) DC with 28 SDs was stable last two days. > 2) Create 4 disks 1GB each (no VMs so VM monitoring is not involved here as > well). No script were involved. lastCheck grew a bit during the creation (up > to 40 secs). See logs > 3) Copy 4 disks to another SD > 4) After some time lastCheck value start growing and reached values more > than 200 secs (seems like few SDs (more than 2 affected by coping) are not > updating at all) > 5) Host marked as non-responsive. Thats looks like good reproduction, thanks! Can give more details: - engine, vdsm versions - os version - storage domain setup - maybe you can dump relevant parts from engine db so we can reproduce your setup here?
I *think* the problem is in LVM locking mechanism. vdsm locks __all__ lvm calls at once and holds it. So in my scenario I can see vgs command takes ~5-10 secs to execute. Because of global lock 28*10 = 280 secs (worst case) + <time to acquire/release lock>. Sequential execution is not good in any case. I *think* vgs calls can be groupped somehow. Vgs allows to specify several VGs at once [1]. So at the moment of first vgs releases the lock second command can be called with 27 VGs as agrument and it'll save 270 secs of doing nothing. Anyway I don't know the reason of so aggressive "global" lock as far as parallel calls of vgs with different vg names should not be a problem. It's read-only operation, nothing more. Nir, Does it make sense? -- Pavel [1] [root@]# time /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762 WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it! tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3|bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762|wz--n-|15703474176|11274289152|134217728|117|84|MDT_CLASS=Data,MDT_DESCRIPTION=lunx24,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:360014050775ed117965422bab1d61114&44&uuid:R7HTan-lMBc-cOEI-ZiAc-nE2r-1xoW-hoBZBk&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3,MDT__SHA_CKSUM=4e2014469f3a1c011be82df22040952130de622e,RHAT_storage_domain|134217728|67103744|8|1|/dev/mapper/360014050775ed117965422bab1d61114 real 0m12.203s user 0m0.091s sys 0m0.381s [root@ ~]# time /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762 afcc0f16-817f-42ea-b6ea-e8c2db00db6b WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it! tOdMfQ-su1f-M2Jn-VJ1i-QL1m-QneI-mRtWio|afcc0f16-817f-42ea-b6ea-e8c2db00db6b|wz--n-|15703474176|6979321856|134217728|117|52|MDT_CLASS=Data,MDT_DESCRIPTION=lunx2,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:36001405cd364bdaf252491b8bd1edbb8&44&uuid:EEoLFE-a2pP-K4xp-PrvE-9e3q-lewh-TrFbuM&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=afcc0f16-817f-42ea-b6ea-e8c2db00db6b,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tOdMfQ-su1f-M2Jn-VJ1i-QL1m-QneI-mRtWio,MDT__SHA_CKSUM=63dad6e7206036bae63eb3476489c59d3a597438,RHAT_storage_domain|134217728|67102208|12|1|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8 tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3|bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762|wz--n-|15703474176|11274289152|134217728|117|84|MDT_CLASS=Data,MDT_DESCRIPTION=lunx24,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:360014050775ed117965422bab1d61114&44&uuid:R7HTan-lMBc-cOEI-ZiAc-nE2r-1xoW-hoBZBk&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3,MDT__SHA_CKSUM=4e2014469f3a1c011be82df22040952130de622e,RHAT_storage_domain|134217728|67103744|8|1|/dev/mapper/360014050775ed117965422bab1d61114 real 0m9.989s user 0m0.068s sys 0m0.166s
(In reply to Nir Soffer from comment #129) > Can give more details: > - engine, vdsm versions It's in comment #117 rhevm-3.6.2.6-0.1.el6.noarch vdsm-4.17.18-0.el7ev.noarch > - os version RHEL6 engine with latest updates RHEL7 hypervisor with latest updates > - storage domain setup - maybe you can dump relevant parts from engine db so > we > can reproduce your setup here? I'm attaching db dump. Nothing special. LIO target with 50 LUNs. Underlying backstores are partitions 15GB each. I don't have many options here because of lack of hardware. Set as: for index in {1..50}; do targetcli "/iscsi/iqn.2003-01.org.linux-iscsi.dell-per420-1.x8664:sn.d25a24ca3cf2/tpg1/luns create /backstores/block/lunx$index"; done
Created attachment 1122476 [details] engine db dump
(In reply to Pavel Zhukov from comment #130) > I *think* the problem is in LVM locking mechanism. vdsm locks __all__ lvm > calls at once and holds it. There is no such global lock. We have an operation mutex that let many concurrent calls of one type, blocking other call types. This was added because of some issue in lvm with mixing several types of calls. I guess we need to revisit this lock again, maybe it is not needed in current lvm version. > So in my scenario I can see vgs command takes > ~5-10 secs to execute. Because of global lock 28*10 = 280 secs (worst case) > + <time to acquire/release lock>. Sequential execution is not good in any > case. > > I *think* vgs calls can be groupped somehow. Vgs allows to specify several > VGs at once [1]. So at the moment of first vgs releases the lock second > command can be called with 27 VGs as agrument and it'll save 270 secs of > doing nothing. > > Anyway I don't know the reason of so aggressive "global" lock as far as > parallel calls of vgs with different vg names should not be a problem. It's > read-only operation, nothing more. > > Nir, Does it make sense? Many threads waiting on lvm lock can explain this, specially if accessing storage becomes slow because of the other storage operations. I did not look at the logs yet. Thanks for this research, I think we have good direction here.
(In reply to Nir Soffer from comment #129) > (In reply to Pavel Zhukov from comment #126) > > Installation as per #117 (physical host). > > > > 1) DC with 28 SDs was stable last two days. > > 2) Create 4 disks 1GB each (no VMs so VM monitoring is not involved here as > > well). No script were involved. lastCheck grew a bit during the creation (up > > to 40 secs). See logs > > 3) Copy 4 disks to another SD > > 4) After some time lastCheck value start growing and reached values more > > than 200 secs (seems like few SDs (more than 2 affected by coping) are not > > updating at all) > > 5) Host marked as non-responsive. > > Thats looks like good reproduction, thanks! > > Can give more details: > - engine, vdsm versions > - os version > - storage domain setup - maybe you can dump relevant parts from engine db so > we > can reproduce your setup here? Pavel, I try to reproduce your case in our lab, but unfortunately it is not reproduce. our lab distribution: rhevm-3.6.3 vdsm-4.17.19-0.el7ev.noarch + enabling CPU affinity. storage: nfs on top of netapp gear network: 10gig bare-metal hosts (engine and vdsm) with 24 cores 64GB RAM. vdsm: running 50SD's 24VM's. 4 new disks were created and copied nice & fast.
(In reply to Eldad Marciano from comment #134) > (In reply to Nir Soffer from comment #129) > > (In reply to Pavel Zhukov from comment #126) > > > Installation as per #117 (physical host). > > > > > > 1) DC with 28 SDs was stable last two days. > > > 2) Create 4 disks 1GB each (no VMs so VM monitoring is not involved here as > > > well). No script were involved. lastCheck grew a bit during the creation (up > > > to 40 secs). See logs > > > 3) Copy 4 disks to another SD > > > 4) After some time lastCheck value start growing and reached values more > > > than 200 secs (seems like few SDs (more than 2 affected by coping) are not > > > updating at all) > > > 5) Host marked as non-responsive. > > > > Thats looks like good reproduction, thanks! > > > > Can give more details: > > - engine, vdsm versions > > - os version > > - storage domain setup - maybe you can dump relevant parts from engine db so > > we > > can reproduce your setup here? > > Pavel, > I try to reproduce your case in our lab, but unfortunately it is not > reproduce. > > our lab distribution: > rhevm-3.6.3 > vdsm-4.17.19-0.el7ev.noarch + enabling CPU affinity. > > storage: > nfs on top of netapp gear Eldad, Please do NOT use nfs storage to reproduce the issue because it's (most probably) in LVM locking. Use iscsi or FC (See comments #122, #130, #131) > > network: > 10gig > > bare-metal hosts (engine and vdsm) with 24 cores 64GB RAM. > > vdsm: > running 50SD's 24VM's. > > 4 new disks were created and copied nice & fast.
I tried to disable locking completely by commenting out acquire call and it made the env. completely unstable (vgs took 50-60 secs) even if not storage operation performed. So running 20+ vgs calls in parallel is not good idea for sure.
-- srm upate The customer affected by this bug is willing to test the snapshot 3 of beta and report results. Do you think it would help to work on this bz or should they wait? If you agree it would be helpful if they tested 1. What test scenario should we agree on to be tested so it is compatible with out tests 2. what type of output / results format should they provide? 3. should they attach it directly to this BZ? kind regards Ania
(In reply to ania from comment #137) > -- srm upate > > The customer affected by this bug is willing to test the snapshot 3 of beta > and report results. > > Do you think it would help to work on this bz or should they wait? If you > agree it would be helpful if they tested Looks like the issue with lot of block storage domains is not fixed yet, so the only reason to test is it is to check if they are effected by this scale issue. > 1. What test scenario should we agree on to be tested so it is compatible > with out tests > 2. what type of output / results format should they provide? > 3. should they attach it directly to this BZ? They should test their normal setup and flows, and provide engine and vdsm logs. If their setup includes file based storage domain, or few block storage domain they can test that the system is stable when performing many storage operations (e.g. copying disks from one domain to another, creating vms by cloning templates). Otherwise, if their setup includes many block storage domains, they should wait until we improve the situation. Note that this is not a new issue (should be same in 3.3, 3.4, 3.5), so it should not delay 3.6 upgrade plans.
(In reply to Pavel Zhukov from comment #136) > I tried to disable locking completely by commenting out acquire call and it > made the env. completely unstable (vgs took 50-60 secs) even if not storage > operation performed. So running 20+ vgs calls in parallel is not good idea > for sure. Pavel, the operation mutex does not prevent parallel vgs/lvs/pvs commands, it only prevents running them in the same time the lvm cache is invalidated, or the other way around, invaliding the cache while any of these command is running. Invalidate ops are very fast, these only clear lvs/vgs/pvs cache, so they never delay reload operations. Reload operation are slow, since they must go to storage. If we perform lot of io in the same time, they become much slower. So the result of this locking mechanism, is starvation of invalidate cache operations, waiting on reload operations. Given enough reload operations, invalidate operations take lot of time to get the lock. It seems that the common flow is: 1. perform invalidate operation 2. perform reload operation In the log we can see many threads waiting for invalidate operation, then the first thread gets in, finish the invalidate operation, and taking the lock again for reload operation, before the other invalidate operation gets in. This effectively perform as a big lock, letting only one thread run at a time, instead running multiple operation concurrently. So the issues are: 1. Bad locking limiting concurrency and causing unneeded delays 2. Need to check why we need so many invalidate and refresh operations, and if we can separate these operations from the monitoring threads. Here is an example of such flow: 1. Task thread try to do lvm reload, taking the operation mutex d742dc20-58e2-4dc8-b3fd-0ed955893b00::DEBUG::2016-02-09 12:38:48,140::lvm::427::Storage.OperationMutex::(_reloadlvs) Operation 'lvm reload operation' got the operation mutex 2. Task starts lvs command... d742dc20-58e2-4dc8-b3fd-0ed955893b00::DEBUG::2016-02-09 12:38:48,140::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags afcc0f16-817f-42ea-b6ea-e8c2db00db6b (cwd None) 3. While this command is running, other threads are waiting for the lock Thread-377::DEBUG::2016-02-09 12:38:54,428::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-270::DEBUG::2016-02-09 12:38:54,609::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-254::DEBUG::2016-02-09 12:38:56,831::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-105::DEBUG::2016-02-09 12:38:56,998::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-161::DEBUG::2016-02-09 12:39:00,019::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-692::DEBUG::2016-02-09 12:39:01,075::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... 4. Task finish lvs command and release operation mutex d742dc20-58e2-4dc8-b3fd-0ed955893b00::DEBUG::2016-02-09 12:39:01,145::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0 d742dc20-58e2-4dc8-b3fd-0ed955893b00::DEBUG::2016-02-09 12:39:01,146::lvm::462::Storage.OperationMutex::(_reloadlvs) Operation 'lvm reload operation' released the operation mutex 5. Thread-377 get the operation mutex for invalidate Thread-377::DEBUG::2016-02-09 12:39:01,160::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex Thread-377::DEBUG::2016-02-09 12:39:01,161::lvm::505::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex Thread-377::DEBUG::2016-02-09 12:39:01,161::lvm::513::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex Thread-377::DEBUG::2016-02-09 12:39:01,162::lvm::525::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex 6. But before other invalidation thread get in the mutex, it takes the mutex for reload operation Thread-377::DEBUG::2016-02-09 12:39:01,162::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex 7. And run vgs command Thread-377::DEBUG::2016-02-09 12:39:01,163::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name e7c912df-a24e-447e-8d3a-8d1f484004df (cwd None) 8. Other invalidation threads are waiting again... Thread-270::DEBUG::2016-02-09 12:39:01,176::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-254::DEBUG::2016-02-09 12:39:01,191::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-105::DEBUG::2016-02-09 12:39:01,192::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-161::DEBUG::2016-02-09 12:39:01,192::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-692::DEBUG::2016-02-09 12:39:01,192::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-563::DEBUG::2016-02-09 12:39:01,775::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1122::DEBUG::2016-02-09 12:39:01,925::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1225::DEBUG::2016-02-09 12:39:06,869::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-240::DEBUG::2016-02-09 12:39:11,163::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-845::DEBUG::2016-02-09 12:39:11,869::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... 9. Thread-377 finish reload operation and release the mutex Thread-377::DEBUG::2016-02-09 12:39:12,059::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0 Thread-377::DEBUG::2016-02-09 12:39:12,059::lvm::415::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex 10. And now it take the operation mutex for reload Thread-270::DEBUG::2016-02-09 12:39:12,076::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex Thread-270::DEBUG::2016-02-09 12:39:12,077::lvm::505::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex Thread-270::DEBUG::2016-02-09 12:39:12,077::lvm::513::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex Thread-270::DEBUG::2016-02-09 12:39:12,078::lvm::525::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex Thread-270::DEBUG::2016-02-09 12:39:12,078::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex 11. Other threads still waiting Thread-254::DEBUG::2016-02-09 12:39:12,092::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-105::DEBUG::2016-02-09 12:39:12,093::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-161::DEBUG::2016-02-09 12:39:12,093::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-692::DEBUG::2016-02-09 12:39:12,093::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-563::DEBUG::2016-02-09 12:39:12,093::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1122::DEBUG::2016-02-09 12:39:12,094::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1225::DEBUG::2016-02-09 12:39:12,094::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-240::DEBUG::2016-02-09 12:39:12,094::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-845::DEBUG::2016-02-09 12:39:12,094::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... 12. New task thread join and wait for reload d742dc20-58e2-4dc8-b3fd-0ed955893b00::DEBUG::2016-02-09 12:39:12,668::lvm::513::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm reload operation' is holding the operation mutex, waiting... 13. More waiting threads Thread-223::DEBUG::2016-02-09 12:39:14,679::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-726::DEBUG::2016-02-09 12:39:16,732::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-359::DEBUG::2016-02-09 12:39:21,238::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-488::DEBUG::2016-02-09 12:39:23,232::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-322::DEBUG::2016-02-09 12:39:25,733::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-339::DEBUG::2016-02-09 12:39:28,718::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-419::DEBUG::2016-02-09 12:39:31,299::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... 14. Thread-270 finish the reload and release the operation mutex Thread-270::DEBUG::2016-02-09 12:39:31,375::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enablin g it!\n'; <rc> = 0 Thread-270::DEBUG::2016-02-09 12:39:31,375::lvm::415::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex And so on...
Following monitor thread for domain 338cd3ef-732b-4408-8882-028b7c47c791 since it had high average last check values: domain: 338cd3ef-732b-4408-8882-028b7c47c791 delay avg: 0.003690 min: 0.000478 max: 0.104589 last check avg: 14.743452 min: 0.000000 max: 202.500000 1. Starting domain refresh (done every 5 minutes) Thread-1172::DEBUG::2016-02-09 12:40:06,265::monitor::314::Storage.Monitor::(_refreshDomain) Refreshing domain 338cd3ef-732b-4408-8882-028b7c47c791 2. Recreating domain object, reading domain metadata from storage Thread-1172::DEBUG::2016-02-09 12:40:06,266::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with LvMetadataRW backend Thread-1172::DEBUG::2016-02-09 12:40:06,266::blockSD::337::Storage.Misc.excCmd::(readlines) /usr/bin/taskset --cpu-list 0-23 /usr/bin/dd iflag=direct skip=0 bs=2048 if=/dev/ 338cd3ef-732b-4408-8882-028b7c47c791/metadata count=1 (cwd None) Thread-1172::DEBUG::2016-02-09 12:40:06,308::blockSD::337::Storage.Misc.excCmd::(readlines) SUCCESS: <err> = '1+0 records in\n1+0 records out\n2048 bytes (2.0 kB) copied, 0. 0270738 s, 75.6 kB/s\n'; <rc> = 0 Thread-1172::DEBUG::2016-02-09 12:40:06,308::misc::261::Storage.Misc::(validateDDBytes) err: ['1+0 records in', '1+0 records out', '2048 bytes (2.0 kB) copied, 0.0270738 s, 75.6 kB/s'], size: 2048 Thread-1172::DEBUG::2016-02-09 12:40:06,308::persistentDict::234::Storage.PersistentDict::(refresh) read lines (LvMetadataRW)=[] Thread-1172::DEBUG::2016-02-09 12:40:06,309::persistentDict::252::Storage.PersistentDict::(refresh) Empty metadata Thread-1172::DEBUG::2016-02-09 12:40:06,309::persistentDict::192::Storage.PersistentDict::(__init__) Created a persistent dict with VGTagMetadataRW backend 3. Waiting to invalidate vg (done before reading vg metadata) Thread-1172::DEBUG::2016-02-09 12:40:06,309::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:40:12,514::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:40:20,636::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:40:29,500::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:40:37,923::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:40:45,764::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:40:53,869::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:41:02,184::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:41:11,778::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:41:20,863::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:41:30,624::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:41:37,700::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:41:44,187::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:41:50,419::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:41:56,994::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:42:03,425::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:42:09,877::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:42:16,233::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:42:22,733::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:42:29,298::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:42:35,757::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:42:42,313::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:42:48,643::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:42:55,124::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1172::DEBUG::2016-02-09 12:43:01,609::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... 4. Got the lock after 3 minutes, invalidating the vg Thread-1172::DEBUG::2016-02-09 12:43:07,896::lvm::503::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex Thread-1172::DEBUG::2016-02-09 12:43:07,896::lvm::505::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex 5. Invalidating vg also invalidates the vg lvs Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::513::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::525::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex 6. Finally reloading vg Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 338cd3ef-732b-4408-8882-028b7c47c791 (cwd None) 7. vg reload finished in 7 seconds Thread-1172::DEBUG::2016-02-09 12:43:14,318::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0 8. Performing vgck (part of refresh domain flow) Thread-1172::DEBUG::2016-02-09 12:43:14,321::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405716af920e2394b14a09d4c7be|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' 338cd3ef-732b-4408-8882-028b7c47c791 (cwd None) 9. vgck finished in 7 seconds Thread-1172::DEBUG::2016-02-09 12:43:20,923::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0
Secondary issue, that may cause reload vg commands to become much slower is not using the short filter. This command: Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::370::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex Thread-1172::DEBUG::2016-02-09 12:43:07,897::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 338cd3ef-732b-4408-8882-028b7c47c791 (cwd None) Will look for metadata in all the devices, instead of the devices belonging to this vg. We don't have a choice when reading the vg info when vdsm starts, but once got the vg details, we should continue to use only the vg pvs in the filter. The vg filter should be invalidated only during extend vg flow, where engine is notifying hosts about the event. Pavel, can you check the difference when using only the devices which are part of this vg on your environment?
It's good idea because the difference is more than huge: # time /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762 WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it! tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3|bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762|wz--n-|15703474176|11274289152|134217728|117|84|MDT_CLASS=Data,MDT_DESCRIPTION=lunx24,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:360014050775ed117965422bab1d61114&44&uuid:R7HTan-lMBc-cOEI-ZiAc-nE2r-1xoW-hoBZBk&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3,MDT__SHA_CKSUM=4e2014469f3a1c011be82df22040952130de622e,RHAT_storage_domain|134217728|67103744|8|1|/dev/mapper/360014050775ed117965422bab1d61114 real 0m10.811s user 0m0.085s sys 0m0.146s # time /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360014050775ed117965422bab1d61114'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762 WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it! Invalid filter pattern "a|/dev/mapper/360014050775ed117965422bab1d61114". real 0m0.014s user 0m0.005s sys 0m0.009s
Sorry. Please ignore last result Not so good: # time /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36001405006e2dd151644bffa8bacc104|/dev/mapper/360014050775ed117965422bab1d61114|/dev/mapper/3600140507d8651ca296474c8e3dd0ae8|/dev/mapper/360014050ea6c2fcc8514f7393344c753|/dev/mapper/3600140514c9cd7f29f34278866d52e49|/dev/mapper/360014051901d284eb5b4f4580eb17508|/dev/mapper/360014051bae82f9df404a61a741441d9|/dev/mapper/360014051cdfb8827e1f4392a0028fd7d|/dev/mapper/3600140526f27657071741a397364ac4b|/dev/mapper/360014052a8a0cd586aa46c2bb971da75|/dev/mapper/360014052ad4a1f5da364feebaaaad6be|/dev/mapper/360014052d172039894d4fe79a6ffd320|/dev/mapper/360014052e74c3621e7c4d5ab8eeb7ecd|/dev/mapper/360014052f89aa81903b45c5bce3e5804|/dev/mapper/360014053858afd998fb48d58f1219a6c|/dev/mapper/360014053a3f6dc56e2e4e6d844b89a48|/dev/mapper/360014053a8a8499f50b41bebd69e2d66|/dev/mapper/360014053dfd3c6559414d1e9bb0b6a30|/dev/mapper/360014055c9314a32f3b49048e73fb345|/dev/mapper/3600140566dbebbc21d848c9a9523c0c5|/dev/mapper/360014056ab6f96b18bf494d956b3f1c7|/dev/mapper/360014056c5383d52447405caba32ea62|/dev/mapper/360014056c56d162fcb549c5b081b06bb|/dev/mapper/360014056e117ce4930441f08b19f932d|/dev/mapper/36001405710c8172101f49008506a0577|/dev/mapper/36001405716af920e2394b14a09d4c7be|/dev/mapper/3600140574cbe0e46bc64d39a17e24f47|/dev/mapper/36001405763d71879829411e963e16b4a|/dev/mapper/3600140578571d02f3494052985d05704|/dev/mapper/360014057c7a43aaacdb4139a5ff8c43d|/dev/mapper/360014058454caec4f274f66ab8279de8|/dev/mapper/360014059fbafb60b4ed434991733eebe|/dev/mapper/36001405a4d31e809dee47788afd29fc0|/dev/mapper/36001405ac1e0edc652c48938180dbc2b|/dev/mapper/36001405ae85ec2a3c204a55b0ce85d87|/dev/mapper/36001405b15a6d7ab9b147d2a95881875|/dev/mapper/36001405b3d1432066bf4e5c8f870ffe7|/dev/mapper/36001405b69fbf8643774e0f8b2e64c92|/dev/mapper/36001405ba299a3fb8524452890938e29|/dev/mapper/36001405c3c6d2863f0f46eda143bc4e4|/dev/mapper/36001405c40f4cc99bfd4676978ff4959|/dev/mapper/36001405c82c2d887e25435da68806a4a|/dev/mapper/36001405cd364bdaf252491b8bd1edbb8|/dev/mapper/36001405d67805a703b04d9bb9cdcbce5|/dev/mapper/36001405dc369c7383a64b49b06612fa0|/dev/mapper/36001405e0222d1042294e2d88c439100|/dev/mapper/36001405e554db1a20a84fb1848a0f6fe|/dev/mapper/36001405e79b7c05532940709041de1d6|/dev/mapper/36001405f5f9cc51826743038f33183f4|/dev/mapper/36001405ffa7d8bf907e4a2092dc8a2e2|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762 WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it! tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3|bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762|wz--n-|15703474176|11274289152|134217728|117|84|MDT_CLASS=Data,MDT_DESCRIPTION=lunx24,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:360014050775ed117965422bab1d61114&44&uuid:R7HTan-lMBc-cOEI-ZiAc-nE2r-1xoW-hoBZBk&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3,MDT__SHA_CKSUM=4e2014469f3a1c011be82df22040952130de622e,RHAT_storage_domain|134217728|67103744|8|1|/dev/mapper/360014050775ed117965422bab1d61114 real 0m9.370s user 0m0.084s sys 0m0.171s # time /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360014050775ed117965422bab1d61114|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762 WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it! tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3|bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762|wz--n-|15703474176|11274289152|134217728|117|84|MDT_CLASS=Data,MDT_DESCRIPTION=lunx24,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=00000001-0001-0001-0001-0000000000ea,MDT_PV0=pv:360014050775ed117965422bab1d61114&44&uuid:R7HTan-lMBc-cOEI-ZiAc-nE2r-1xoW-hoBZBk&44&pestart:0&44&pecount:117&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=bbeb6e56-9ad7-4bfb-b2f5-5e8d7ef93762,MDT_TYPE=ISCSI,MDT_VERSION=3,MDT_VGUUID=tQ9tUw-a2qW-DgL3-J8s5-ojVg-6OQk-19OzQ3,MDT__SHA_CKSUM=4e2014469f3a1c011be82df22040952130de622e,RHAT_storage_domain|134217728|67103744|8|1|/dev/mapper/360014050775ed117965422bab1d61114 real 0m6.255s user 0m0.065s sys 0m0.161s
Pavel, would you like to test this patch? http://gerrit.ovirt.org/53395 It cannot be applied to 3.6, so you have to build vdsm from master.
Nir, I'm sorry but I can't do it right now. Reservation time of my machines in Beaker expired. Will test it but later.
(In reply to Nir Soffer from comment #146) > Pavel, would you like to test this patch? > http://gerrit.ovirt.org/53395 > > It cannot be applied to 3.6, so you have to build vdsm from master. Nir, I've tested the patch. Everything works fine. Added 50SDs with script. Copying raw disks between storage domains. Host is not flipped to Unresponsive.No error messages in engine log. lastCheck grew up to 50 secs during the most critical period (adding SD AND copying of the images) but host was OK. Do you need logs? Thank you!
We will probably need full automation testing on the patch sometime before GA.
I did more tests. Once host was restarted (SPM) the locking hell came back. I think it's because all vgs should be refreshed during "bootstrap". The lastCheck value doesn't grow constantly as before but it can reach 110+ secs and host is marked as NonReponsive again. After some time host is back and contending starts but it happens again in the loop: jsonrpc.Executor/6::DEBUG::2016-02-15 08:28:11,003::task::1192::Storage.TaskManager.Task::(prepare) Task=`320babd8-5707-4486-9d5d-313895c4346f`::finished: {u'21e029a3-0b54-4f8b-8ad0-b701895671a2': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0092922', 'lastCheck': '0.0', 'valid': True}, u'0f8241ba-76d7-402b-85c3-8e4eff0d443a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.034504', 'lastCheck': '189.4', 'valid': True}, u'6d78d86f-53c1-4b10-a5f0-ad62805b1363': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000952666', 'lastCheck': '194.2', 'valid': True}, u'aaf40b64-9dd3-414a-9911-f67bf33b8d6a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0153762', 'lastCheck': '189.5', 'valid': True}, u'd80a8920-deac-4b05-9301-6d61bbb53575': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00751565', 'lastCheck': '163.8', 'valid': True}, u'6944d637-f2d0-43ba-8a72-289bec00cc05': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0104843', 'lastCheck': '8.4', 'valid': True}, u'f69d5159-ee0e-4f08-af44-f7a9b34830b7': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0119308', 'lastCheck': '164.4', 'valid': True}, u'399899db-daad-4672-a4cf-fe5160ce3330': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00639321', 'lastCheck': '0.8', 'valid': True}, u'f9b91801-0aa5-437c-a5d4-418ada236e51': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0071744', 'lastCheck': '2.3', 'valid': True}, u'74d8f43a-2d88-4493-8d0c-a218c63fe46a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00921745', 'lastCheck': '8.0', 'valid': True}, u'552beec7-39cf-4ad0-b86f-9b2bc6985971': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000655045', 'lastCheck': '186.4', 'valid': True}, u'52895b61-44d8-4115-9113-240bbce232d3': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00884128', 'lastCheck': '9.2', 'valid': True}, u'34a397b1-33f4-4048-b663-471edd729d93': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0066063', 'lastCheck': '11.7', 'valid': True}, u'cb95b818-a8c3-41e4-8479-c5d7430e251f': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00130233', 'lastCheck': '192.1', 'valid': True}, u'8f468a9f-ce70-408d-8043-07c29aea55ec': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00738504', 'lastCheck': '5.8', 'valid': True}, u'0a8eeeb5-0b6f-42a6-9322-c3ba95e0276c': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0146081', 'lastCheck': '192.7', 'valid': True}, u'38ef7afc-9d9e-4abe-93c2-dc98f86a31f8': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.012955', 'lastCheck': '187.7', 'valid': True}, u'2112c800-e008-4d7b-9ead-ad35391632f8': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00912894', 'lastCheck': '187.9', 'valid': True}, u'c0c0d03c-a8c0-43d9-b2ad-7b6eb3a78163': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000700416', 'lastCheck': '188.6', 'valid': True}, u'05458af5-a8d3-4976-913b-baaede6e4dac': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0140394', 'lastCheck': '163.8', 'valid': True}, u'1a681425-4472-41b4-a87d-066c31fca4a9': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0107165', 'lastCheck': '9.8', 'valid': True}, u'f47faf12-8ac5-4f30-8f7d-d9d7658c54c2': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00296497', 'lastCheck': '164.1', 'valid': True}, u'1bd88c11-b163-4c78-b8d4-045e38e7bb66': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00836018', 'lastCheck': '4.2', 'valid': True}, u'9c7ab86d-5928-4c45-babe-7e3d24ceed7d': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00122086', 'lastCheck': '164.5', 'valid': True}, u'32710ccd-22d3-4585-809c-56bcecd83aa6': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000782966', 'lastCheck': '190.3', 'valid': True}, u'af17670b-1b1f-4174-8fd1-0dbcd8a0779f': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000632146', 'lastCheck': '164.4', 'valid': True}, u'6d1d310e-ef0e-4791-a1ce-e20e6811be0a': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00889699', 'lastCheck': '190.8', 'valid': True}, u'e3a34534-f66a-46f4-80b3-3e225b553c39': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00923788', 'lastCheck': '186.0', 'valid': True}, u'de55fd4d-ae83-4749-9415-ac004a66e652': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0108847', 'lastCheck': '190.4', 'valid': True}, u'ef5378b5-2296-411a-a5eb-5b14123a1c29': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0110197', 'lastCheck': '169.7', 'valid': True}, u'790da84a-ea36-4d63-82ea-e5fd18f25a86': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00920705', 'lastCheck': '3.6', 'valid': True}, u'9e1f121c-20d6-4982-837c-70ca226cf21f': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.0527725', 'lastCheck': '164.2', 'valid': True}, u'05b1eaf7-e7d9-4985-90bb-fdbb21f8fdd3': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.00540501', 'lastCheck': '7.1', 'valid': True}, u'7608c0aa-383d-4f68-8c94-82f002749007': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000696937', 'lastCheck': '7.3', 'valid': True}}
(In reply to Pavel Zhukov from comment #148) > (In reply to Nir Soffer from comment #146) > > Pavel, would you like to test this patch? > > http://gerrit.ovirt.org/53395 > > > > It cannot be applied to 3.6, so you have to build vdsm from master. > > Nir, > I've tested the patch. > Everything works fine. Added 50SDs with script. Copying raw disks between > storage domains. > Host is not flipped to Unresponsive.No error messages in engine log. > lastCheck grew up to 50 secs during the most critical period (adding SD AND > copying of the images) but host was OK. > Do you need logs? Yes, please attach the logs of this run. It would be useful to see your test script.
Automation is running on every build, in case you need to execute on specific patch please ping me directly.
(In reply to Nir Soffer from comment #152) > (In reply to Pavel Zhukov from comment #148) > > (In reply to Nir Soffer from comment #146) > > > Pavel, would you like to test this patch? > > > http://gerrit.ovirt.org/53395 > > > > > > It cannot be applied to 3.6, so you have to build vdsm from master. > > > > Nir, > > I've tested the patch. > > Everything works fine. Added 50SDs with script. Copying raw disks between > > storage domains. > > Host is not flipped to Unresponsive.No error messages in engine log. > > lastCheck grew up to 50 secs during the most critical period (adding SD AND > > copying of the images) but host was OK. > > Do you need logs? > > Yes, please attach the logs of this run. > > It would be useful to see your test script. I don't use test script. I use script to attach many SDs (it's quite annoying to attach 30+ iscsi SDs with UI). Script to attach SD: http://www.ovirt.org/Testing/PythonApi#Create_iSCSI_Storage_Domain_on_Data_Center without any modifications
Created attachment 1127299 [details] vdsm log
Created attachment 1127303 [details] targetcli configuration for testing this issue
Created attachment 1127304 [details] engine log
(In reply to Pavel Zhukov from comment #151) > I did more tests. > > Once host was restarted (SPM) the locking hell came back. I think it's > because all vgs should be refreshed during "bootstrap". > The lastCheck value doesn't grow constantly as before but it can reach 110+ > secs and host is marked as NonReponsive again. After some time host is back > and contending starts but it happens again in the loop: I will need to see the logs for this failure. I have a setup with 50 iscsi storage domains, and I can deactivate and activate and reboot a host without failures. I'm using targetcli on real machine, using fileio backend (see attachment 1127303 [details]) Each storage domain uses one lun from target3. Storage server is Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz machine, using local disk for storage, and 1Gbit nic. Host consuming this storage is a vm (using nested kvm). We do enter lvm invlidate/reload storm on each time you start vdsm, connect to storage server, connect the pool and get the device list, but on my setup, single lvm invalidate/reload storm is quick enough. I see lastCheck values up to 150 seconds. We will have to fix the invalidate/reload issue, as currently our monitoring is harmful when using lot of block storage domains. First step would be to reproduce and test the fix with real storage.
(In reply to Pavel Zhukov from comment #154) > Script to attach SD: > http://www.ovirt.org/Testing/ > PythonApi#Create_iSCSI_Storage_Domain_on_Data_Center without any > modifications This script is creating storage domains v2, you should use v3.
(In reply to Nir Soffer from comment #159) > (In reply to Pavel Zhukov from comment #154) > > Script to attach SD: > > http://www.ovirt.org/Testing/ > > PythonApi#Create_iSCSI_Storage_Domain_on_Data_Center without any > > modifications > > This script is creating storage domains v2, you should use v3. Nir, Please check logs and DB I provided many times. I changed to V3 but forgot to mention.
(In reply to Nir Soffer from comment #158) > (In reply to Pavel Zhukov from comment #151) > > I did more tests. > > > > Once host was restarted (SPM) the locking hell came back. I think it's > > because all vgs should be refreshed during "bootstrap". > > The lastCheck value doesn't grow constantly as before but it can reach 110+ > > secs and host is marked as NonReponsive again. After some time host is back > > and contending starts but it happens again in the loop: > > I will need to see the logs for this failure. > > I have a setup with 50 iscsi storage domains, and I can deactivate and > activate > and reboot a host without failures. > > I'm using targetcli on real machine, using fileio backend (see attachment > 1127303 [details]) > Each storage domain uses one lun from target3. > > Storage server is Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz machine, using > local disk for storage, and 1Gbit nic. > > Host consuming this storage is a vm (using nested kvm). With nested kvm you win some time because of no network latency is involved. And in fact host machine and VM use same storage. Not so much but anyway. > > We do enter lvm invlidate/reload storm on each time you start vdsm, connect > to storage server, connect the pool and get the device list, but on my setup, > single lvm invalidate/reload storm is quick enough. I see lastCheck values > up to 150 seconds. I had similar results (lastCheck up to Was it with https://gerrit.ovirt.org/#/c/53540 applied? It should improve concurrent lvm operations a lot I was trying to get rid of this fake "ContextManager" but didn't have time. Anyway I don't have test environment anymore. Thank you for fixing this and for all explanations!
(In reply to Pavel Zhukov from comment #162) > (In reply to Nir Soffer from comment #158) > > (In reply to Pavel Zhukov from comment #151) > > > I did more tests. > > > > > > Once host was restarted (SPM) the locking hell came back. I think it's > > > because all vgs should be refreshed during "bootstrap". > > > The lastCheck value doesn't grow constantly as before but it can reach 110+ > > > secs and host is marked as NonReponsive again. After some time host is back > > > and contending starts but it happens again in the loop: > > > > I will need to see the logs for this failure. > > > > I have a setup with 50 iscsi storage domains, and I can deactivate and > > activate > > and reboot a host without failures. > > > > I'm using targetcli on real machine, using fileio backend (see attachment > > 1127303 [details]) > > Each storage domain uses one lun from target3. > > > > Storage server is Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz machine, using > > local disk for storage, and 1Gbit nic. > > > > Host consuming this storage is a vm (using nested kvm). > With nested kvm you win some time because of no network latency is involved. > And in fact host machine and VM use same storage. Not so much but anyway. The vm consuming this storage is running on another machine. > > We do enter lvm invlidate/reload storm on each time you start vdsm, connect > > to storage server, connect the pool and get the device list, but on my setup, > > single lvm invalidate/reload storm is quick enough. I see lastCheck values > > up to 150 seconds. > I had similar results (lastCheck up to > Was it with https://gerrit.ovirt.org/#/c/53540 applied? It should improve > concurrent lvm operations a lot I was trying to get rid of this fake > "ContextManager" but didn't have time. This was without https://gerrit.ovirt.org/53540. This patch improves the situation further, although connecting 50 sds concurrently still takes lot of time. Most monitors have lastCheck of 100 seconds, some 120, 140, and one 155 seconds. We will need much more work work to optimize this, probably changing the way we invalidate and refresh the entire cache. Thanks for the research.
Created attachment 1129733 [details] Monitor delays for vdsm logs from 2014-03-28 06:27 EDT, Roman Hodain This graph show monitor delays in vdsm logs from 2014-03-28 06:27 EDT, Roman Hodain
Created attachment 1129734 [details] Monitor delays for vdsm log 2014-03-31 03:46 EDT, akotov This graph show monitor delays for vdsm log 2014-03-31 03:46 EDT, akotov
Created attachment 1129737 [details] Monitor delays for vdsm logs 2016-02-06 01:36 EST, Pavel Zhukov This graph show monitor delays for vdsm logs 2016-02-06 01:36 EST, Pavel Zhukov
Created attachment 1129751 [details] Monitor delays for vdsm log 2016-02-09 12:51 EST, Pavel Zhukov This show monitor dealys for vdsm log 2016-02-09 12:51 EST, Pavel Zhukov
Created attachment 1129754 [details] Monitor delays for vdsm log 2016-02-15 09:13 EST, Pavel Zhukov This show monitor delays for vdsm log 2016-02-15 09:13 EST, Pavel Zhukov Vdsm was using patch https://gerrit.ovirt.org/53395.
Created attachment 1130967 [details] Repoplot of vdsm.log vdsm logs from 2014-03-28 06:27 EDT, Roman Hodain
Created attachment 1130968 [details] Repoplot of vdsm.log vdsm logs from 2014-03-31 03:46 EDT, akotov
Created attachment 1130982 [details] Repoplot of vdsm.log vdsm logs from 2016-02-06 01:36 EST, Pavel Zhukov
Created attachment 1130983 [details] Repoplot of vdsm.log vdsm logs from 2016-02-09 12:51 EST, Pavel Zhukov
Created attachment 1130984 [details] Repoplot of vdsm.log vdsm logs from 2016-02-15 09:13 EST, Pavel Zhukov
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.
Nir, is there anything else we need to merge, or can this bug be moved to MODIFIED?
Patches are merged in master and ovirt-4.0.
Bug verified on RHEVM: 4.0.2.3-0.1.el7ev (rhev-4.0.2-5 repository) VDSM Version:vdsm-4.18.9-1.el7ev LIBVIRT Version:libvirt-1.2.17-13.el7_2.5 Environment the same: - 1 DC, 1 Cluster, - 1 Host ( 32 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 64 G RAM) - 1 NFS Master SD - 10 iSCSI SDs - 50 VM (1 CPU, 1 G RAM, 5 G Disk) Since that scenario is not clear I ran JMeter automatic script which executes operations on iSCSI SDs during almost 2 hours: - 01.Find.ISCSI.SD.Trx - 02.Deactivate.ISCSI,SD.Trx - 03.Detach.ISCSI,SD.Trx - 04.Attach.ISCSI,SD.Trx and collected statistics from vdsm log files using vdsm-tools/stats The vdsm-tools/stats/repoplot still failed, so I cannot represent the graph But, vdsm-tools/stats/repostat show us that max lastCheck over the 10 sec even with 10 iSCSC SDs $ /home/yobshans/vdsm-tools/stats/repostat vdsm.log vdsm.log.1 vdsm.log.2 vdsm.log.3 domain: deadbeab-dc37-4224-95ed-e64420404861 delay avg: 0.001470 min: 0.000000 max: 0.062856 last check avg: 3.363483 min: 0.000000 max: 13.200000 domain: 67f146ee-e410-4987-97b5-d74abaa6085f delay avg: 0.001364 min: 0.000000 max: 0.077210 last check avg: 3.708943 min: 0.000000 max: 13.100000 domain: b42b847d-8be3-430d-bcb0-ccb08df59233 delay avg: 0.001603 min: 0.000000 max: 0.055443 last check avg: 3.851433 min: 0.000000 max: 13.200000 domain: f59f9d17-8373-441f-9924-67110a90755c delay avg: 0.001398 min: 0.000000 max: 0.078949 last check avg: 3.616427 min: 0.000000 max: 11.000000 domain: 77f694bb-cc3f-40b1-b120-cad9a5ea34fe delay avg: 0.001744 min: 0.000000 max: 0.057214 last check avg: 3.778623 min: 0.000000 max: 14.300000 domain: f04c71c0-9604-46b6-adb5-78660d11bf20 delay avg: 0.001873 min: 0.000000 max: 0.077009 last check avg: 3.681272 min: 0.000000 max: 12.700000 domain: bec0d23b-2a94-4265-ad71-40c059c456d8 delay avg: 0.001387 min: 0.000000 max: 0.077029 last check avg: 3.508897 min: 0.000000 max: 12.200000 domain: 12e87c9e-2526-4107-97cd-80e6298e3da5 delay avg: 0.001267 min: 0.000000 max: 0.070465 last check avg: 3.484298 min: 0.000000 max: 10.300000 domain: bbfe3fcc-33eb-4351-8a58-f2fe007daa35 delay avg: 0.001536 min: 0.000000 max: 0.077022 last check avg: 3.465584 min: 0.000000 max: 12.200000 domain: 5225ceb7-5f01-46f8-8415-a2efbc01de2c delay avg: 0.001326 min: 0.000000 max: 0.067024 last check avg: 3.635219 min: 0.000000 max: 16.300000 Transactions elapsed time (msec) : Transactions name Average 90% Line Min Max 01.Find.ISCSI.SD.Trx 810 864 458 2196 02.Deactivate.ISCSI,SD.Trx 142677 223249 19767 333741 03.Detach.ISCSI,SD.Trx 56427 78512 29854 107366 04.Attach.ISCSI,SD.Trx 232918 321849 90015 344879 I upload vdsm.log files I think, the issue is still existed.
Created attachment 1187507 [details] August 4, vdsm.log
Created attachment 1187508 [details] August 4, vdsm.log.1
Created attachment 1187510 [details] August 4, vdsm.log.2
Created attachment 1187511 [details] August 4, vdsm.log.3
Performed additional 2 tests on RHEVM: 4.0.2.3-0.1.el7ev (rhev-4.0.2-5 repository) VDSM Version:vdsm-4.18.9-1.el7ev LIBVIRT Version:libvirt-1.2.17-13.el7_2.5 But, this time with 50 iSCSI SDs Environment the same: - 1 DC, 1 Cluster, - 1 Host ( 32 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 64 G RAM) - 1 NFS Master SD - 50 iSCSI SDs - 50 VM (1 CPU, 1 G RAM, 5 G Disk) Test 1 - idle mode (no activities) Test 2 - load test with 50 concurrent threads which simulate operations on iSCSI SDs. See below statistics for both tests: Test 1: $ /home/yobshans/vdsm-tools/stats/repostat vdsm.log domain: be9f85fe-3d24-403b-bc78-3f91cefcf246 delay avg: 0.000828 min: 0.000451 max: 0.032423 last check avg: 3.360823 min: 0.000000 max: 9.600000 domain: fa744f59-5255-434f-a0f4-96f518f49de6 delay avg: 0.000833 min: 0.000439 max: 0.076603 last check avg: 3.426840 min: 0.000000 max: 9.400000 domain: aa859a83-ebc1-4ea0-9277-8af413deeaad delay avg: 0.000905 min: 0.000427 max: 0.075148 last check avg: 3.304978 min: 0.100000 max: 9.100000 domain: 2398d30a-43f3-4b26-a0ed-a212b1b8881e delay avg: 0.000708 min: 0.000438 max: 0.006968 last check avg: 3.462338 min: 0.000000 max: 9.300000 domain: f59f9d17-8373-441f-9924-67110a90755c delay avg: 0.000723 min: 0.000433 max: 0.036150 last check avg: 3.186580 min: 0.000000 max: 9.800000 domain: 923ce1a6-cf90-48ab-a02b-8545f649dc9b delay avg: 0.000962 min: 0.000420 max: 0.083579 last check avg: 3.042208 min: 0.000000 max: 9.900000 domain: 12e3c939-2837-46af-b994-46b2901c0883 delay avg: 0.000709 min: 0.000429 max: 0.017996 last check avg: 3.111039 min: 0.000000 max: 9.900000 domain: 77f694bb-cc3f-40b1-b120-cad9a5ea34fe delay avg: 0.000984 min: 0.000417 max: 0.040286 last check avg: 3.157576 min: 0.000000 max: 9.700000 domain: c3d6fe21-fbf9-4d80-bb92-1939877a5f82 delay avg: 0.000856 min: 0.000418 max: 0.044383 last check avg: 3.268615 min: 0.000000 max: 9.600000 domain: f7b789fd-f9e2-4f09-8d1e-8b87f451f6b9 delay avg: 0.001091 min: 0.000475 max: 0.046067 last check avg: 3.129870 min: 0.000000 max: 9.500000 domain: f04c71c0-9604-46b6-adb5-78660d11bf20 delay avg: 0.000973 min: 0.000413 max: 0.034026 last check avg: 3.271212 min: 0.000000 max: 9.300000 domain: 307fd25f-bf3f-40b8-ae19-f87cf105c099 delay avg: 0.000688 min: 0.000411 max: 0.026836 last check avg: 3.250866 min: 0.000000 max: 9.600000 domain: bec0d23b-2a94-4265-ad71-40c059c456d8 delay avg: 0.001107 min: 0.000398 max: 0.036731 last check avg: 3.513203 min: 0.000000 max: 10.000000 domain: d2498977-1b74-4875-91ba-08cc74c99629 delay avg: 0.000834 min: 0.000459 max: 0.045234 last check avg: 3.354762 min: 0.000000 max: 9.800000 domain: f5f5b3fc-1495-4423-934f-89ea932fa333 delay avg: 0.000752 min: 0.000422 max: 0.038126 last check avg: 3.185498 min: 0.000000 max: 9.900000 domain: 45c73813-8c0c-4d4f-badd-7c00f3a330a8 delay avg: 0.000761 min: 0.000426 max: 0.048208 last check avg: 3.369913 min: 0.000000 max: 9.700000 domain: eba6461e-1771-4f8d-8e09-7d702c2dd028 delay avg: 0.001303 min: 0.000436 max: 0.064898 last check avg: 3.383117 min: 0.000000 max: 9.500000 domain: e1d16f8d-013c-4be5-9a45-f7a12475636d delay avg: 0.001317 min: 0.000473 max: 0.072406 last check avg: 3.737662 min: 0.000000 max: 9.800000 domain: 1003d3d7-139e-486a-a22e-20f6467dc8c4 delay avg: 0.000885 min: 0.000424 max: 0.038167 last check avg: 3.478139 min: 0.000000 max: 9.700000 domain: 30ed7ebd-439e-483a-8a03-9a7a1e3d049d delay avg: 0.000939 min: 0.000420 max: 0.031377 last check avg: 3.370996 min: 0.000000 max: 9.700000 domain: e3150f16-76a1-46cd-bf53-c321ee4d2c9a delay avg: 0.000682 min: 0.000435 max: 0.005944 last check avg: 3.262338 min: 0.000000 max: 9.500000 domain: 2caf8e26-5938-4cb2-9d98-d8cfdaa444a4 delay avg: 0.000987 min: 0.000456 max: 0.064760 last check avg: 3.201948 min: 0.000000 max: 9.600000 domain: db48b33a-6856-40ff-badd-933ec20619ee delay avg: 0.000732 min: 0.000434 max: 0.016702 last check avg: 3.083983 min: 0.000000 max: 9.700000 domain: 6e165862-4a19-477f-98be-e078e699f2ae delay avg: 0.001411 min: 0.000425 max: 0.079374 last check avg: 3.311688 min: 0.000000 max: 9.700000 domain: 9884db84-f4d0-4402-a952-36c685ab0d33 delay avg: 0.001438 min: 0.000442 max: 0.076374 last check avg: 3.645887 min: 0.000000 max: 10.000000 domain: ecb0dba1-3c1b-4ac8-9699-069431e2de03 delay avg: 0.000741 min: 0.000424 max: 0.032231 last check avg: 3.188528 min: 0.000000 max: 9.100000 domain: 67f146ee-e410-4987-97b5-d74abaa6085f delay avg: 0.000809 min: 0.000384 max: 0.047425 last check avg: 3.460173 min: 0.000000 max: 10.000000 domain: b42b847d-8be3-430d-bcb0-ccb08df59233 delay avg: 0.000959 min: 0.000468 max: 0.035521 last check avg: 3.261472 min: 0.000000 max: 9.600000 domain: ee163499-26c7-43fb-bd04-b0cf5e7d965e delay avg: 0.000676 min: 0.000413 max: 0.003722 last check avg: 3.503247 min: 0.000000 max: 9.600000 domain: a42cf644-6820-4dd6-ac19-5af69be4cf94 delay avg: 0.000642 min: 0.000402 max: 0.015142 last check avg: 3.456710 min: 0.000000 max: 9.200000 domain: d5f72e36-c4f1-4e45-acab-01d15c4ccb73 delay avg: 0.000915 min: 0.000438 max: 0.032101 last check avg: 3.201082 min: 0.000000 max: 9.900000 domain: f4b2db57-4b3d-4e52-8b3f-d4347beb32bd delay avg: 0.000735 min: 0.000483 max: 0.025288 last check avg: 3.262554 min: 0.000000 max: 9.500000 domain: b697ceb5-cb4b-4bc6-9fa7-527945aedb92 delay avg: 0.000732 min: 0.000409 max: 0.038098 last check avg: 3.552597 min: 0.000000 max: 9.800000 domain: cad3037e-0b2d-493b-a81c-37e9244098cc delay avg: 0.001552 min: 0.000460 max: 0.190254 last check avg: 2.877922 min: 0.000000 max: 9.600000 domain: 12e87c9e-2526-4107-97cd-80e6298e3da5 delay avg: 0.001328 min: 0.000442 max: 0.063413 last check avg: 3.109957 min: 0.000000 max: 9.500000 domain: 1ef035ec-7dbf-4bf2-80b0-cadf63dca4ed delay avg: 0.000656 min: 0.000441 max: 0.004900 last check avg: 3.332468 min: 0.000000 max: 9.300000 domain: bbfe3fcc-33eb-4351-8a58-f2fe007daa35 delay avg: 0.000888 min: 0.000406 max: 0.037597 last check avg: 3.508225 min: 0.000000 max: 9.700000 domain: 5225ceb7-5f01-46f8-8415-a2efbc01de2c delay avg: 0.000804 min: 0.000443 max: 0.028700 last check avg: 3.320346 min: 0.000000 max: 9.900000 domain: eb6184cc-89be-47ab-afd3-23ebbcad55d9 delay avg: 0.000665 min: 0.000463 max: 0.004340 last check avg: 3.126407 min: 0.000000 max: 10.000000 domain: deadbeab-dc37-4224-95ed-e64420404861 delay avg: 0.000843 min: 0.000418 max: 0.032031 last check avg: 3.337013 min: 0.000000 max: 10.000000 domain: a48a1cc9-a6ff-412c-99ab-3d211fc24122 delay avg: 0.000844 min: 0.000415 max: 0.025883 last check avg: 3.372727 min: 0.000000 max: 9.300000 domain: 8bde8676-76e6-4ad4-862e-7461e710f339 delay avg: 0.001316 min: 0.000475 max: 0.078356 last check avg: 3.003247 min: 0.000000 max: 9.700000 domain: 032648cd-8f70-406f-9f19-a94dcb5d6e26 delay avg: 0.000977 min: 0.000438 max: 0.074872 last check avg: 3.492424 min: 0.000000 max: 9.500000 domain: 97d7259e-f300-4862-897b-a62af647c851 delay avg: 0.000710 min: 0.000412 max: 0.035667 last check avg: 3.345238 min: 0.000000 max: 9.200000 domain: d59c3d5c-8ff2-4488-af44-e2eecd838b82 delay avg: 0.000768 min: 0.000414 max: 0.021306 last check avg: 3.590043 min: 0.000000 max: 10.000000 domain: 1d596506-6157-4d72-87c1-3c4c10aaa980 delay avg: 0.000905 min: 0.000430 max: 0.038977 last check avg: 3.300000 min: 0.000000 max: 9.600000 Test 2: $ /home/yobshans/vdsm-tools/stats/repostat vdsm.log.2 vdsm.log.1 vdsm.log domain: be9f85fe-3d24-403b-bc78-3f91cefcf246 delay avg: 0.000303 min: 0.000000 max: 0.001830 last check avg: 7.549147 min: 0.000000 max: 32.000000 domain: fa744f59-5255-434f-a0f4-96f518f49de6 delay avg: 0.000389 min: 0.000000 max: 0.008781 last check avg: 7.488842 min: 0.000000 max: 34.000000 domain: aa859a83-ebc1-4ea0-9277-8af413deeaad delay avg: 0.000229 min: 0.000000 max: 0.019545 last check avg: 8.670147 min: 0.000000 max: 39.600000 domain: 2398d30a-43f3-4b26-a0ed-a212b1b8881e delay avg: 0.000391 min: 0.000000 max: 0.046709 last check avg: 6.963296 min: 0.000000 max: 28.000000 domain: f59f9d17-8373-441f-9924-67110a90755c delay avg: 0.000438 min: 0.000000 max: 0.022101 last check avg: 7.589327 min: 0.000000 max: 32.000000 domain: 923ce1a6-cf90-48ab-a02b-8545f649dc9b delay avg: 0.000396 min: 0.000000 max: 0.001464 last check avg: 8.071979 min: 0.000000 max: 49.800000 domain: 12e3c939-2837-46af-b994-46b2901c0883 delay avg: 0.000204 min: 0.000000 max: 0.009425 last check avg: 7.431481 min: 0.000000 max: 48.100000 domain: 77f694bb-cc3f-40b1-b120-cad9a5ea34fe delay avg: 0.000502 min: 0.000000 max: 0.000911 last check avg: 7.413652 min: 0.000000 max: 24.200000 domain: c3d6fe21-fbf9-4d80-bb92-1939877a5f82 delay avg: 0.000385 min: 0.000000 max: 0.055027 last check avg: 7.689199 min: 0.000000 max: 43.900000 domain: f7b789fd-f9e2-4f09-8d1e-8b87f451f6b9 delay avg: 0.000665 min: 0.000410 max: 0.003727 last check avg: 4.909091 min: 0.000000 max: 17.600000 domain: f04c71c0-9604-46b6-adb5-78660d11bf20 delay avg: 0.000273 min: 0.000000 max: 0.036084 last check avg: 7.797574 min: 0.000000 max: 29.000000 domain: 307fd25f-bf3f-40b8-ae19-f87cf105c099 delay avg: 0.000335 min: 0.000000 max: 0.018856 last check avg: 7.786441 min: 0.000000 max: 31.900000 domain: bec0d23b-2a94-4265-ad71-40c059c456d8 delay avg: 0.000627 min: 0.000000 max: 0.014717 last check avg: 8.271315 min: 0.000000 max: 33.400000 domain: d2498977-1b74-4875-91ba-08cc74c99629 delay avg: 0.001186 min: 0.000406 max: 0.017322 last check avg: 4.835821 min: 0.000000 max: 22.000000 domain: f5f5b3fc-1495-4423-934f-89ea932fa333 delay avg: 0.000406 min: 0.000000 max: 0.008962 last check avg: 8.109375 min: 0.000000 max: 41.900000 domain: 45c73813-8c0c-4d4f-badd-7c00f3a330a8 delay avg: 0.000505 min: 0.000000 max: 0.000748 last check avg: 7.674003 min: 0.000000 max: 31.100000 domain: eba6461e-1771-4f8d-8e09-7d702c2dd028 delay avg: 0.000423 min: 0.000000 max: 0.007291 last check avg: 8.333729 min: 0.000000 max: 38.800000 domain: e1d16f8d-013c-4be5-9a45-f7a12475636d delay avg: 0.000536 min: 0.000000 max: 0.028912 last check avg: 8.158541 min: 0.000000 max: 55.500000 domain: 1003d3d7-139e-486a-a22e-20f6467dc8c4 delay avg: 0.000577 min: 0.000425 max: 0.002126 last check avg: 7.675868 min: 0.000000 max: 35.500000 domain: 30ed7ebd-439e-483a-8a03-9a7a1e3d049d delay avg: 0.000611 min: 0.000394 max: 0.004189 last check avg: 5.300000 min: 0.000000 max: 23.100000 domain: e3150f16-76a1-46cd-bf53-c321ee4d2c9a delay avg: 0.001409 min: 0.000399 max: 0.029304 last check avg: 5.792965 min: 0.000000 max: 28.200000 domain: 2caf8e26-5938-4cb2-9d98-d8cfdaa444a4 delay avg: 0.000368 min: 0.000000 max: 0.004120 last check avg: 7.398485 min: 0.000000 max: 35.100000 domain: db48b33a-6856-40ff-badd-933ec20619ee delay avg: 0.000546 min: 0.000394 max: 0.002481 last check avg: 6.473643 min: 0.000000 max: 26.400000 domain: 6e165862-4a19-477f-98be-e078e699f2ae delay avg: 0.000671 min: 0.000000 max: 0.039189 last check avg: 8.502239 min: 0.000000 max: 31.000000 domain: 9884db84-f4d0-4402-a952-36c685ab0d33 delay avg: 0.000385 min: 0.000000 max: 0.006285 last check avg: 7.591445 min: 0.000000 max: 30.000000 domain: ecb0dba1-3c1b-4ac8-9699-069431e2de03 delay avg: 0.001009 min: 0.000407 max: 0.024016 last check avg: 5.280000 min: 0.000000 max: 19.900000 domain: b42b847d-8be3-430d-bcb0-ccb08df59233 delay avg: 0.000288 min: 0.000000 max: 0.026870 last check avg: 7.901077 min: 0.000000 max: 34.600000 domain: ee163499-26c7-43fb-bd04-b0cf5e7d965e delay avg: 0.000558 min: 0.000431 max: 0.004720 last check avg: 7.646213 min: 0.000000 max: 33.400000 domain: deadbeab-dc37-4224-95ed-e64420404861 delay avg: 0.000361 min: 0.000000 max: 0.000755 last check avg: 7.569044 min: 0.000000 max: 33.600000 domain: a42cf644-6820-4dd6-ac19-5af69be4cf94 delay avg: 0.000340 min: 0.000000 max: 0.040448 last check avg: 7.609359 min: 0.000000 max: 32.700000 domain: d5f72e36-c4f1-4e45-acab-01d15c4ccb73 delay avg: 0.000628 min: 0.000413 max: 0.013137 last check avg: 7.016000 min: 0.000000 max: 33.600000 domain: f4b2db57-4b3d-4e52-8b3f-d4347beb32bd delay avg: 0.000793 min: 0.000380 max: 0.013701 last check avg: 4.703937 min: 0.000000 max: 18.000000 domain: b697ceb5-cb4b-4bc6-9fa7-527945aedb92 delay avg: 0.000392 min: 0.000000 max: 0.001502 last check avg: 7.884919 min: 0.000000 max: 32.800000 domain: cad3037e-0b2d-493b-a81c-37e9244098cc delay avg: 0.000355 min: 0.000000 max: 0.040631 last check avg: 7.352632 min: 0.000000 max: 27.800000 domain: 12e87c9e-2526-4107-97cd-80e6298e3da5 delay avg: 0.000611 min: 0.000000 max: 0.012651 last check avg: 8.134072 min: 0.000000 max: 30.100000 domain: 1ef035ec-7dbf-4bf2-80b0-cadf63dca4ed delay avg: 0.000477 min: 0.000000 max: 0.040202 last check avg: 7.723826 min: 0.000000 max: 30.600000 domain: bbfe3fcc-33eb-4351-8a58-f2fe007daa35 delay avg: 0.000559 min: 0.000000 max: 0.038589 last check avg: 7.688473 min: 0.000000 max: 38.400000 domain: 5225ceb7-5f01-46f8-8415-a2efbc01de2c delay avg: 0.000619 min: 0.000442 max: 0.022269 last check avg: 9.100179 min: 0.000000 max: 35.100000 domain: eb6184cc-89be-47ab-afd3-23ebbcad55d9 delay avg: 0.000530 min: 0.000417 max: 0.004869 last check avg: 7.707956 min: 0.000000 max: 36.700000 domain: 67f146ee-e410-4987-97b5-d74abaa6085f delay avg: 0.000143 min: 0.000000 max: 0.007179 last check avg: 8.189730 min: 0.000000 max: 36.600000 domain: a48a1cc9-a6ff-412c-99ab-3d211fc24122 delay avg: 0.000335 min: 0.000000 max: 0.009734 last check avg: 7.850738 min: 0.000000 max: 31.900000 domain: 8bde8676-76e6-4ad4-862e-7461e710f339 delay avg: 0.000308 min: 0.000000 max: 0.012916 last check avg: 7.707309 min: 0.000000 max: 53.600000 domain: 032648cd-8f70-406f-9f19-a94dcb5d6e26 delay avg: 0.000163 min: 0.000000 max: 0.004134 last check avg: 7.669752 min: 0.000000 max: 34.700000 domain: 97d7259e-f300-4862-897b-a62af647c851 delay avg: 0.000577 min: 0.000405 max: 0.005700 last check avg: 6.459859 min: 0.000000 max: 23.800000 domain: d59c3d5c-8ff2-4488-af44-e2eecd838b82 delay avg: 0.000298 min: 0.000000 max: 0.000843 last check avg: 7.821731 min: 0.000000 max: 34.300000 domain: 1d596506-6157-4d72-87c1-3c4c10aaa980 delay avg: 0.000298 min: 0.000000 max: 0.037302 last check avg: 7.771409 min: 0.000000 max: 45.300000 domain: 61de827a-a78b-4d6c-94ec-292fb2d8fb2d delay avg: 0.000511 min: 0.000417 max: 0.004799 last check avg: 7.392241 min: 0.000000 max: 31.200000 domain: 6e6f8bd4-c10a-41b8-8b1a-b20d70ee132f delay avg: 0.000600 min: 0.000437 max: 0.004435 last check avg: 7.838505 min: 0.000000 max: 31.500000 domain: 959c90bd-c25d-4fa4-bf0c-5bc7e6494e5d delay avg: 0.000355 min: 0.000210 max: 0.000701 last check avg: 5.872021 min: 0.000000 max: 21.500000 domain: 56324a25-9789-4bdf-bf23-50bffa0577f0 delay avg: 0.000516 min: 0.000000 max: 0.000851 last check avg: 7.819863 min: 0.000000 max: 35.800000 domain: 7604eebf-b55b-4ac3-b02a-1176b3519687 delay avg: 0.000552 min: 0.000439 max: 0.003322 last check avg: 6.652416 min: 0.000000 max: 31.700000 I don't vdsm.log file But, id needed I can do it later up to request Yuri
Created attachment 1192516 [details] repoplot from Yuri logs uploaded on August 4 This plot show very low (<10) lastCheck values, but also no long running lvm commands, and no disk copy commands during this test. I assume that no storage operations (such as copying disks) were done during this test, so this test run is not helpful for verifying this bug.
(In reply to Yuri Obshansky from comment #191) > Bug verified on > RHEVM: 4.0.2.3-0.1.el7ev (rhev-4.0.2-5 repository) > VDSM Version:vdsm-4.18.9-1.el7ev > LIBVIRT Version:libvirt-1.2.17-13.el7_2.5 > > Environment the same: > - 1 DC, 1 Cluster, > - 1 Host ( 32 x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 64 G RAM) > - 1 NFS Master SD > - 10 iSCSI SDs To reproduce this issue, you need to test much higher number of storage domains. I would recommend running with 50 storage domains, which is our target for now. > - 50 VM (1 CPU, 1 G RAM, 5 G Disk) We don't need to run any vm for this test as idle vms do not generate any io that may effect storage domain monitoring. > Since that scenario is not clear > I ran JMeter automatic script which executes operations on iSCSI SDs during > almost 2 hours: > - 01.Find.ISCSI.SD.Trx > - 02.Deactivate.ISCSI,SD.Trx > - 03.Detach.ISCSI,SD.Trx > - 04.Attach.ISCSI,SD.Trx I'm not sure what did you test, but based on the names of the files, and the tests results seen in the logs, the flows tested are not helpful for verifying this bug. > and collected statistics from vdsm log files using vdsm-tools/stats > The vdsm-tools/stats/repoplot still failed, It works for me - I download the logs, un-gzip them, and run repoplot like this: repoplot -n out vdsm.log.3 vdsm.log.2 vdsm.log.1 vdsm.log Note that the multiple files must be given in the correct order, same order as you would use with cat if you want combine them into one big log file. > so I cannot represent the graph > But, vdsm-tools/stats/repostat show us that max lastCheck over the 10 sec > even with 10 iSCSC SDs Values higher then 10 seconds are ok and expected when storage domain is activated. One of the steps during activation is acquiring the host id, which can be very flow. > Transactions elapsed time (msec) : > > Transactions name Average 90% Line Min Max > 01.Find.ISCSI.SD.Trx 810 864 458 2196 > 02.Deactivate.ISCSI,SD.Trx 142677 223249 19767 333741 > 03.Detach.ISCSI,SD.Trx 56427 78512 29854 107366 > 04.Attach.ISCSI,SD.Trx 232918 321849 90015 344879 I don't know what these numbers mean, but timing storage domain activation, deactivation and attachment is not related to this bug. > I think, the issue is still existed. I think the tests done is not helpful for verifying this bug, so we cannot reach any conclusion yet.
(In reply to Yuri Obshansky from comment #196) > Performed additional 2 tests on > RHEVM: 4.0.2.3-0.1.el7ev (rhev-4.0.2-5 repository) > VDSM Version:vdsm-4.18.9-1.el7ev > LIBVIRT Version:libvirt-1.2.17-13.el7_2.5 > > But, this time with 50 iSCSI SDs This is better... > Test 1 - idle mode (no activities) This does not test much... > Test 2 - load test with 50 concurrent threads which simulate operations on > iSCSI SDs. The interesting operations are copying disks, not activating and deactivating storage domains. Running 50 threads activating and deactivating storage domains is also not interesting use case for any bug. Can you think of a real world use case including such worlload? > I don't vdsm.log file > But, id needed I can do it later up to request Since we did not test any disk copies, I don't think the log would helpful.
This bug have huge amount of comments and lot of noise, so I'll try to make it clear what is the issue and how it should be tested. When performing storage operations such as copying creating templates, copying disks or deleting vms or disks, storage monitoring threads would slow down, and vdsm would report high lastCheck values. When engine find lastCheck time bigger then 30 seconds it starts a 5 seconds timer. If vdsm did not report normal lastCheck value (<30) when this timer expired, the domain is treated as faulty. If this domain is ok on other hosts, this host is considered as non-operational. Looking in repoplots we can see that while lastCheck value may reach high values (>300s), read delay values are typically tiny (<0.01s) - storage monitoring was not monitoring storage. We can also see very slow lvm operations and sometime also very slow mailbox commands. Please check these attachments to understand this issue: - https://bugzilla.redhat.com/attachment.cgi?id=1130967 Repoplot of vdsm.log vdsm logs from 2014-03-28 06:27 EDT, Roman Hodain - https://bugzilla.redhat.com/attachment.cgi?id=1130968 Repoplot of vdsm.log vdsm logs from 2014-03-31 03:46 EDT, akotov - https://bugzilla.redhat.com/attachment.cgi?id=1130968 Repoplot of vdsm.log vdsm logs from 2016-02-06 01:36 EST, Pavel Zhukov - https://bugzilla.redhat.com/attachment.cgi?id=1130983 Repoplot of vdsm.log vdsm logs from 2016-02-09 12:51 EST, Pavel Zhukov The root cause of this issue was mixing storage monitoring and other (possible very slow operations) such as refreshing lvm cache and doing scsi rescan. Even worse, vdsm lvm module had bad locking that was designed to run together reload and cache invalidation operations, but in practice it was serializing all operations, so monitoring one domain could be delayed by unrelated delays on other storage domains, or bad LUN causing scsi scan to block for 30 seconds. In 4.0 we did these changes, fixing this issue: - storage monitoring was move to separate thread, doing *only* storage monitoring. this thread is using an event loop to run multiple dd processes, so this thread can never block. Now high lastCheck value means that a dd process was blocked on storage. - vdsm lvm locking was simplified, allowing concurrent lvm commands shortening delays in monitoring threads. How to test this bug First, I think we *must* test this bug both on 3.6 and 4.0, so we can compare apples to apples. Users running 3.6 (or older) need this information for considering upgrade to 4.0. Setup: - 50 active iscsi storage domains - create several big disk, big enough so copying the disk would take about 10 minutes - make sure "Wipe after delete" option is checked for this disk, so after disks are moved, we fill them with zeros (this creates lot of io). - You can use 3.6 cluster version, there is no need to test this with 4.0 cluster. This way we can compare the same setup on both 3.6 and 4.0. Test: - Move several disks concurrently from one storage domain to another Note: move and not copy, to make sure we wipe the source disks after the copy, maximizing amount of io during test. Expected results (3.6): - Copy operation fail after timeout in engine - Or host becomes non-operational because of high lastCheck values Expected results (4.0): - Operation should succeed on engine - Low last check values during test (0-10s) - Slower lvm commands and storage mailbox commands during test Additional testing Because of the improved lvm locking, operations like attaching and activating storage domain sould be faster. You may like to time these operations on a setup with 50 storage domains. It will be useful to compare the results to 3.6, to see how much the user experience is improved. To test this I would run an operation several times every 5 minutes. There is no need to test concurrent operations. We can test this operation during disk copying - it may be much slower since lvm operations are slowed down, but it should not time out in engine. Notes: - when attaching and activating storage domain, we must wait both for the storage monitor (typically finish in sub second), and for lvm refresh operations that may take many seconds. Seeing high lastCheck values during these operations is expected. - vdsm reports actual=False for storage domain until the first monitoring cycle is completed. High lastCheck values when actual=False are ok, engine act on high lastCheck values only when the domain has completed the first monitoring cycle. - We must have vdsm logs and engine logs for all tests. Moving back to ON_QA, since this bug was not verified yet.
Created attachment 1197972 [details] Repoplot showing differences between vdsm during concurrent disk move test
Results: Verified on vdsm-4.18.11 and RHEVM: 4.0.2-0.2 vdsm-4.18.11 7 Disk Concurrent Move Successful vdsm-4.18.2-0 7 Disk Concurrent Move Failed - each time Setup RHEVM: 4.0.2-0.2.rc1.el7ev 50 ISCSI domains were created. 1 VM containing 7 Disks of 200GB each filled and "Wipe after delete" option true. 7 Disks were moved concurrently. Dedicated storage 10gbe network to both hosts. What was tested: Moving 7 Disks concurrently between ISCSI SD domains with "wipe after delete" set on vdsm-4.18.11 and vdsm-4.18.2-0 Results: Successful concurrent moving of 7 disk. See attached repoplots of VDSM logs during move activity. vdsm-3.6.pdf (repoplot) shows Failed to Move Disks. High last check value Disk move failed and no disks were needed to delete afterwards. vdsm-4.0.pdf (repoplot) shows Successful to Move Disks. Acceptable last check value 2 sets of disk activity corresponding to move, and delete