Bug 1479768 - [downstream clone - 4.1.5] Hosted engine issues too many lvm operations
Summary: [downstream clone - 4.1.5] Hosted engine issues too many lvm operations
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha
Version: 3.6.3
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ovirt-4.1.5
: ---
Assignee: Denis Chaplygin
QA Contact: Artyom
URL:
Whiteboard:
Depends On: 1337914
Blocks: 1399766 1479855
TreeView+ depends on / blocked
 
Reported: 2017-08-09 11:16 UTC by rhev-integ
Modified: 2020-12-14 10:59 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: HE used not correct and expensive approach to check HE SD status. Fix: HE now uses cheap validation call to check HE SD status and fallbacks to the expensive recovery call on failure. Result: HE monitoring loops is much faster now and and does not initiates too many LVM operations.
Clone Of: 1337914
Environment:
Last Closed: 2017-08-22 17:45:24 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2526 0 normal SHIPPED_LIVE ovirt-hosted-engine-ha bug fix update for 4.1.5 2017-08-22 21:41:10 UTC
oVirt gerrit 80454 0 master MERGED storage: Added call to repoStats before storage recovery 2020-02-21 08:01:50 UTC
oVirt gerrit 80461 0 v2.1.z MERGED Made default loop delay value a constant 2020-02-21 08:01:51 UTC
oVirt gerrit 80462 0 v2.1.z MERGED storage: Added call to repoStats before storage recovery 2020-02-21 08:01:51 UTC

Description rhev-integ 2017-08-09 11:16:20 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1337914 +++
======================================================================

Description of problem:
Hosted engine monitoring issues to many lvm operations which is vulnerable to
delays on the storage side. A specially in environments with high number of LUNs
connected and higher number of LVM volume groups the monitoring can suffer by
lvm performance.

Version-Release number of selected component (if applicable):
    ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch

How reproducible:
    100%

Steps to Reproduce:
1. See what hosted engine is doing during one monitoring cycle.

Actual results:
The monitoring uses to many LVM operations for every monitoring cyccle which can
cause delays in the monitoring.

Expected results:
hosted engine monitoring is more efficient

(Originally by Roman Hodain)

Comment 5 rhev-integ 2017-08-09 11:16:47 UTC
Here is an example of an environment suffering by lvm performance.

The monitoring cycle takes ~ one minute which causes often VM start failure as
other hosts assume that the host which is running the HE is down. The reson is
that it does not refresh the status fast enough

$ grep 'Sleeping 10 seconds' prvh2.ldipbm.int/var/log/ovirt-hosted-engine-ha/agent.log  | tail 
MainThread::DEBUG::2016-04-19 12:39:30,636::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Sleeping 10 seconds
MainThread::DEBUG::2016-04-19 12:40:44,002::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Sleeping 10 seconds
MainThread::DEBUG::2016-04-19 12:41:52,113::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Sleeping 10 seconds
MainThread::DEBUG::2016-04-19 12:42:57,744::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Sleeping 10 seconds
MainThread::DEBUG::2016-04-19 12:44:04,363::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Sleeping 10 seconds
MainThread::DEBUG::2016-04-19 12:45:15,459::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Sleeping 10 seconds
MainThread::DEBUG::2016-04-19 12:46:16,765::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Sleeping 10 seconds
MainThread::DEBUG::2016-04-19 12:47:19,371::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Sleeping 10 seconds
MainThread::DEBUG::2016-04-19 12:48:28,572::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Sleeping 10 seconds
MainThread::DEBUG::2016-04-19 12:49:50,712::hosted_engine::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Sleeping 10 seconds

The sar data:
Storage (average for the LUNs connectord to the system (253-2 is HE SD)
                  DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
Average:     dev253-2     40.74    640.08      0.25     15.72      0.13      3.20      3.09     12.59
Average:     dev253-3    159.84   5474.46   2426.18     49.43      0.51      3.20      1.96     31.31
Average:     dev253-4    124.19  61754.03    330.37    499.91      0.52      4.22      1.82     22.60
Average:     dev253-5    171.48  11165.72   2135.06     77.56      3.62     21.13      3.80     65.17
Average:     dev253-6      7.05     56.38      0.00      8.00      0.02      3.15      3.08      2.17
Average:     dev253-7      7.05     56.38      0.00      8.00      0.02      2.64      2.58      1.82
Average:     dev253-8      7.05     56.38      0.00      8.00      0.02      2.34      2.30      1.62
Average:     dev253-9      7.05     56.38      0.00      8.00      0.04      6.18      6.05      4.26
Average:    dev253-10      7.05     56.38      0.00      8.00      0.02      2.25      2.20      1.55
Average:    dev253-13      3.23     25.84      0.00      8.00      0.02      5.73      5.64      1.82
Average:    dev253-16      3.23     25.84      0.00      8.00      0.01      2.16      2.14      0.69
Average:    dev253-23     13.27    468.14      0.00     35.27      0.03      2.52      2.40      3.19

CPU:
                CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest    %gnice     %idle
Average:        all      0.74      0.00      1.99      2.52      0.00      0.00      0.01      4.18      0.00     90.55
Average:          0      0.53      0.00      2.96      0.02      0.00      0.00      0.01     13.20      0.00     83.29
Average:          1     10.52      0.00      7.46      4.65      0.00      0.00      0.20      2.40      0.00     74.77
Average:          2      0.89      0.00      2.86      0.20      0.00      0.00      0.02      9.63      0.00     86.42
Average:          3      0.86      0.00      2.30      0.48      0.00      0.00      0.00      8.24      0.00     88.11
Average:          4      0.85      0.00      2.55      0.03      0.00      0.00      0.01      8.50      0.00     88.06
Average:          5      0.72      0.00      2.54      0.40      0.00      0.00      0.08     10.73      0.00     85.54
Average:          6      0.66      0.00      2.62      0.01      0.00      0.00      0.02     10.51      0.00     86.19
Average:          7      0.70      0.00      2.71      0.30      0.00      0.00      0.00      9.00      0.00     87.29
Average:          8      0.65      0.00      2.45      2.59      0.00      0.00      0.03      9.11      0.00     85.17
Average:          9      0.59      0.00      2.43      0.07      0.00      0.00      0.01      9.50      0.00     87.39
Average:         10      0.84      0.00      3.21      3.40      0.00      0.00      0.01      5.54      0.00     87.00
Average:         11      0.83      0.00      3.16      3.06      0.00      0.00      0.01      5.59      0.00     87.35
Average:         12      0.84      0.00      3.13      4.83      0.00      0.00      0.01      5.52      0.00     85.67
Average:         13      0.87      0.00      3.14      4.24      0.00      0.00      0.01      5.39      0.00     86.36
Average:         14      0.85      0.00      3.15      4.36      0.00      0.00      0.01      5.29      0.00     86.34
Average:         15      0.86      0.00      3.13      4.25      0.00      0.00      0.01      5.36      0.00     86.40
Average:         16      0.86      0.00      3.13      4.04      0.00      0.00      0.01      5.18      0.00     86.79
Average:         17      0.85      0.00      3.12      4.14      0.00      0.00      0.01      5.06      0.00     86.84
Average:         18      0.85      0.00      3.15      3.92      0.00      0.00      0.01      5.04      0.00     87.04
Average:         19      0.85      0.00      3.13      3.55      0.00      0.00      0.01      5.06      0.00     87.41
Average:         20      0.07      0.00      0.87      0.00      0.00      0.00      0.00      0.95      0.00     98.11
Average:         21      0.70      0.00      1.15      0.09      0.00      0.00      0.00      5.61      0.00     92.44
Average:         22      0.20      0.00      1.32      2.10      0.00      0.00      0.01      0.69      0.00     95.68
Average:         23      0.16      0.00      1.59      0.00      0.00      0.00      0.00      1.31      0.00     96.94
Average:         24      0.12      0.00      1.05      0.00      0.00      0.00      0.02      0.45      0.00     98.35
Average:         25      0.10      0.00      0.98      0.00      0.00      0.00      0.00      0.86      0.00     98.06
Average:         26      0.48      0.00      1.40      6.18      0.00      0.00      0.09      0.53      0.00     91.33
Average:         27      0.06      0.00      0.93      0.00      0.00      0.00      0.00      0.77      0.00     98.24
Average:         28      0.04      0.00      0.90      0.00      0.00      0.00      0.00      0.93      0.00     98.12
Average:         29      0.06      0.00      0.87      0.00      0.00      0.00      0.00      2.27      0.00     96.80
Average:         30      0.07      0.00      0.38      0.08      0.00      0.00      0.00      0.89      0.00     98.57
Average:         31      0.73      0.00      1.60     19.02      0.00      0.00      0.01      0.45      0.00     78.19
Average:         32      0.21      0.00      0.53      6.17      0.00      0.00      0.00      0.46      0.00     92.62
Average:         33      0.24      0.00      0.60     11.11      0.00      0.00      0.01      0.58      0.00     87.46
Average:         34      0.21      0.00      0.44      4.83      0.00      0.00      0.01      0.65      0.00     93.86
Average:         35      0.14      0.00      0.42      1.58      0.00      0.00      0.00      0.83      0.00     97.04
Average:         36      0.09      0.00      0.38      0.83      0.00      0.00      0.00      0.69      0.00     98.01
Average:         37      0.11      0.00      0.51      0.23      0.00      0.00      0.00      0.96      0.00     98.19
Average:         38      0.10      0.00      0.52      0.14      0.00      0.00      0.00      1.22      0.00     98.01
Average:         39      0.18      0.00      0.98      0.07      0.00      0.00      0.00      2.61      0.00     96.16

Memory:

            kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
Average:     47021926  84769194     64.32     11051    803221 103119214     27.82  80827514    284451       169


Here is basically what hoste-engine-ha does with the storage plus the LVM
operation which are part of the operations:

prvh2.ldipbm.int/var/log/vdsm/vdsm.log


Thread-890729::INFO::2016-04-19 12:52:40,164::logUtils::48::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=3, spUUID='07a8ed20-7b52-42f7-ad13-b3bc93ac3a3f', conList=[{'id': 'd2d3b2a6-d3b1-4e10-ad13-a1d4752d6eb6', 'connection': '10.2.2.13', 'iqn': 'iqn.1992-04.com.emc:cx.apm00143854803.a4', 'portal': '1', 'user': '', 'password': '********', 'port': '3260'}], options=None)

Thread-890729::DEBUG::2016-04-19 12:52:41,634::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3600601604110370001e6f6eeb9f5e411|/dev/mapper/36006016041103700437eb2849f79e511|/dev/mapper/3600601604110370073c8a8a80779e511|/dev/mapper/360060160411037008194fb521ff4e411|/dev/mapper/3600601604110370084d7340be6f4e411|/dev/mapper/360060160411037009231d9677873e511|/dev/mapper/36006016041103700994e4d130879e511|/dev/mapper/360060160411037009ba65f6a9f79e511|/dev/mapper/36006016041103700d21c652ae7f4e411|/dev/mapper/36006016041103700d3fdba270879e511|/dev/mapper/36006016041103700e221e8912671e511|/dev/mapper/36006016041103700e41717cd0779e511|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name (cwd None)
Thread-890729::DEBUG::2016-04-19 12:52:46,046::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0



Thread-890733::INFO::2016-04-19 12:52:46,063::logUtils::48::dispatcher::(wrapper) Run and protect: getStorageDomainStats(sdUUID='69837360-ab4b-48be-9285-c3882b3b1573', options=None)
Thread-890733::DEBUG::2016-04-19 12:52:48,351::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3600601604110370001e6f6eeb9f5e411|/dev/mapper/36006016041103700437eb2849f79e511|/dev/mapper/3600601604110370073c8a8a80779e511|/dev/mapper/360060160411037008194fb521ff4e411|/dev/mapper/3600601604110370084d7340be6f4e411|/dev/mapper/360060160411037009231d9677873e511|/dev/mapper/36006016041103700994e4d130879e511|/dev/mapper/360060160411037009ba65f6a9f79e511|/dev/mapper/36006016041103700d21c652ae7f4e411|/dev/mapper/36006016041103700d3fdba270879e511|/dev/mapper/36006016041103700e221e8912671e511|/dev/mapper/36006016041103700e41717cd0779e511|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 69837360-ab4b-48be-9285-c3882b3b1573 (cwd None)
Thread-890733::DEBUG::2016-04-19 12:52:54,697::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0

Thread-890733::DEBUG::2016-04-19 12:52:56,759::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3600601604110370001e6f6eeb9f5e411|/dev/mapper/36006016041103700437eb2849f79e511|/dev/mapper/3600601604110370073c8a8a80779e511|/dev/mapper/360060160411037008194fb521ff4e411|/dev/mapper/3600601604110370084d7340be6f4e411|/dev/mapper/360060160411037009231d9677873e511|/dev/mapper/36006016041103700994e4d130879e511|/dev/mapper/360060160411037009ba65f6a9f79e511|/dev/mapper/36006016041103700d21c652ae7f4e411|/dev/mapper/36006016041103700d3fdba270879e511|/dev/mapper/36006016041103700e221e8912671e511|/dev/mapper/36006016041103700e41717cd0779e511|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 69837360-ab4b-48be-9285-c3882b3b1573 (cwd None)
Thread-890733::DEBUG::2016-04-19 12:53:01,532::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0

Thread-890733::DEBUG::2016-04-19 12:53:01,584::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3600601604110370001e6f6eeb9f5e411|/dev/mapper/36006016041103700437eb2849f79e511|/dev/mapper/3600601604110370073c8a8a80779e511|/dev/mapper/360060160411037008194fb521ff4e411|/dev/mapper/3600601604110370084d7340be6f4e411|/dev/mapper/360060160411037009231d9677873e511|/dev/mapper/36006016041103700994e4d130879e511|/dev/mapper/360060160411037009ba65f6a9f79e511|/dev/mapper/36006016041103700d21c652ae7f4e411|/dev/mapper/36006016041103700d3fdba270879e511|/dev/mapper/36006016041103700e221e8912671e511|/dev/mapper/36006016041103700e41717cd0779e511|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 69837360-ab4b-48be-9285-c3882b3b1573 (cwd None)
Thread-890733::DEBUG::2016-04-19 12:53:03,131::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0



Thread-890740::INFO::2016-04-19 12:53:03,232::logUtils::48::dispatcher::(wrapper) Run and protect: getImagesList(sdUUID='69837360-ab4b-48be-9285-c3882b3b1573', options=None)
Thread-890740::DEBUG::2016-04-19 12:53:04,008::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360060160411037008194fb521ff4e411|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags 69837360-ab4b-48be-9285-c3882b3b1573 (cwd None)
Thread-890740::DEBUG::2016-04-19 12:53:04,388::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0


Thread-890742::INFO::2016-04-19 12:53:04,404::logUtils::48::dispatcher::(wrapper) Run and protect: getVolumesList(sdUUID='69837360-ab4b-48be-9285-c3882b3b1573', spUUID='00000000-0000-0000-0000-000000000000', imgUUID='0f3378bb-d5c2-4915-b966-c6ec70327ac0', options=None)
Thread-890742::DEBUG::2016-04-19 12:53:05,436::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3600601604110370001e6f6eeb9f5e411|/dev/mapper/36006016041103700437eb2849f79e511|/dev/mapper/3600601604110370073c8a8a80779e511|/dev/mapper/360060160411037008194fb521ff4e411|/dev/mapper/3600601604110370084d7340be6f4e411|/dev/mapper/360060160411037009231d9677873e511|/dev/mapper/36006016041103700994e4d130879e511|/dev/mapper/360060160411037009ba65f6a9f79e511|/dev/mapper/36006016041103700d21c652ae7f4e411|/dev/mapper/36006016041103700d3fdba270879e511|/dev/mapper/36006016041103700e221e8912671e511|/dev/mapper/36006016041103700e41717cd0779e511|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags 69837360-ab4b-48be-9285-c3882b3b1573 (cwd None)
Thread-890742::DEBUG::2016-04-19 12:53:06,623::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0



Thread-890743::INFO::2016-04-19 12:53:06,661::logUtils::48::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='69837360-ab4b-48be-9285-c3882b3b1573', spUUID='00000000-0000-0000-0000-000000000000', imgUUID='0f3378bb-d5c2-4915-b966-c6ec70327ac0', leafUUID='18e41e18-1e7f-45ac-806f-a3a58a450bcf')
Thread-890743::DEBUG::2016-04-19 12:53:06,663::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360060160411037008194fb521ff4e411|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags 69837360-ab4b-48be-9285-c3882b3b1573 (cwd None)
Thread-890743::DEBUG::2016-04-19 12:53:08,013::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0

Thread-890745::INFO::2016-04-19 12:53:08,031::logUtils::48::dispatcher::(wrapper) Run and protect: getVolumesList(sdUUID='69837360-ab4b-48be-9285-c3882b3b1573', spUUID='00000000-0000-0000-0000-000000000000', imgUUID='0d34df52-84ab-4251-bfac-3297850402a5', options=None)
Thread-890745::DEBUG::2016-04-19 12:53:08,033::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-39 /usr/bin/sudo -n /usr/sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360060160411037008194fb521ff4e411|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags 69837360-ab4b-48be-9285-c3882b3b1573 (cwd None)
Thread-890745::DEBUG::2016-04-19 12:53:09,575::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0

Repeating prepare images for all of the images:

Thread-890746::INFO::2016-04-19 12:53:09,591::logUtils::48::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='69837360-ab4b-48be-9285-c3882b3b1573', spUUID='00000000-0000-0000-0000-000000000000', imgUUID='0d34df52-84ab-4251-bfac-3297850402a5', leafUUID='4c250af8-6bcf-4bdf-9590-72dd54fcbebd')
Thread-890751::INFO::2016-04-19 12:53:16,726::logUtils::48::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='69837360-ab4b-48be-9285-c3882b3b1573', spUUID='00000000-0000-0000-0000-000000000000', imgUUID='392c2dc0-19be-44e9-8997-aed1baeeeeff', leafUUID='af5fcd22-e481-4394-b114-d35a4ce66339')
Thread-890753::INFO::2016-04-19 12:53:22,544::logUtils::48::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='69837360-ab4b-48be-9285-c3882b3b1573', spUUID='00000000-0000-0000-0000-000000000000', imgUUID='6bfc9d3d-eac0-4d98-b43a-c799bebebfc0', leafUUID='d74512e3-014c-4230-a23a-b4ade9e72ffd')
Thread-890759::INFO::2016-04-19 12:53:29,465::logUtils::48::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='69837360-ab4b-48be-9285-c3882b3b1573', spUUID='00000000-0000-0000-0000-000000000000', imgUUID='b0bf6072-d049-4d57-ab0a-3098015118a2', leafUUID='0e25b559-f986-4561-b4ed-f3af77690652')

No do we have to run all of these every time we run the monitoring? To me it
seems the all of this should be run just conditionaly. For exmaple during the
first start and if there is a storage failure. This is the same as for any other
VM images. We do not periodically prepare the images or connecting to the
storage.

(Originally by Roman Hodain)

Comment 7 rhev-integ 2017-08-09 11:17:02 UTC
Nir, IIRC your storage monitoring enhancements are not yet in 4.0? Will they affect this flow as well?

(Originally by Yaniv Kaul)

Comment 10 rhev-integ 2017-08-09 11:17:20 UTC
Nir can you refer to comment 7?

(Originally by Roy Golan)

Comment 13 rhev-integ 2017-08-09 11:17:39 UTC
We call connectStorageServer and prepareImages every time to make sure we always have access to the images and symlinks we use for sanlock and the synchronization whiteboard.

It was assumed in the past that prepareImages would do nothing when the storage is already connected and we wanted that be a single atomic call.

(Originally by Martin Sivak)

Comment 14 rhev-integ 2017-08-09 11:17:46 UTC
(In reply to Martin Sivák from comment #12)
> We call connectStorageServer and prepareImages every time to make sure we
> always have access to the images and symlinks we use for sanlock and the
> synchronization whiteboard.
> 
> It was assumed in the past that prepareImages would do nothing when the
> storage is already connected and we wanted that be a single atomic call.

connectStorageServer is not something you want to call regularly.

Every call does:
- drop entire lvm cache, forming running lvm commands for each lvm related
  call
- perform scsi and fc scan (each can block for up to 30 seconds)
- run multipath to reload devices
- search all devices and resize multipath maps for resized devices
- finally, marking storage cache as stale, possibly triggering another
  refresh next time storage domain is accessed

You should avoid calling connectStorageServer unless you think storage need to
be connected; this can be used in recovery process, not for monitoring.

prepareImage will do every time:
- list all lvs in the vg, possibly running lvm commands if
  the cache was invalidated. If you called connectStorageServer before, the
  it will probably need to refresh lvm cache.
- recreate links in the repository
- activate lvs if needed, but there is never need to when is running

prepareImage is not related to monitoring and should not be called more than
once before you start the engine vm.

You can use it as well as part of a recovery process, not for monitoring.

For monitoring, you should use repoStats, and verbs for getting information
about storage domain and volumes. Please check how engine is monitoring 
storage.

(Originally by Nir Soffer)

Comment 17 rhev-integ 2017-08-09 11:18:04 UTC
Nir, we can call it during recovery only, but first we need to detect that a recovery is needed. We would actually like to hear how the VDSM API is supposed to be used from the storage team. Trying to tie all the lose ends from the engine code is error prone as we have repeatedly seen...

Can you please give us a proper call sequence to check that the storage is still properly connected, all images are available and symlinks are ready?

(Originally by Martin Sivak)

Comment 18 rhev-integ 2017-08-09 11:18:10 UTC
Nir?

(Originally by Roy Golan)

Comment 19 rhev-integ 2017-08-09 11:18:16 UTC
(In reply to Martin Sivák from comment #16)
> Nir, we can call it during recovery only, but first we need to detect that a
> recovery is needed. We would actually like to hear how the VDSM API is
> supposed to be used from the storage team. Trying to tie all the lose ends
> from the engine code is error prone as we have repeatedly seen...

There is no documentation about how vdsm was supposed to be used, and the
original authors of vdsm are not available for years.

The best way when writing new flows with vdsm is to check how engine is using
vdsm, since vdsm was designed to provide services for engine.

> Can you please give us a proper call sequence to check that the storage is
> still properly connected, all images are available and symlinks are ready?

For this I will have to check how engine is doing this, I don't have any 
magic answers.

General rules:

- storage domains - repoStat should be your main source. If you need more info
  you can use other verbs returning info on storage domain.

- images - you don't need to check that they are available, once you prepared
  them, nobody is tearing them down, so there is nothing to check.

- symlinks - these are created when you connect storaged domains and prepare
  images, you don't need to check them. This is implementation detail of vdsm.

(Originally by Nir Soffer)

Comment 20 rhev-integ 2017-08-09 11:18:22 UTC
Do we have, in 4.0, such an issue?

(Originally by Yaniv Kaul)

Comment 21 rhev-integ 2017-08-09 11:18:28 UTC
Moving to 4.2 as we need to cooperate with the storage team on this.

(Originally by Martin Sivak)

Comment 28 Martin Sivák 2017-08-16 08:35:34 UTC
Reproducer steps are simple:
- Install hosted engine and watch what LVM does
- Then upgrade to the fixed version and watch again
- The amount of performed LVM ops should be reduced both on system level and in hosted engine logs

Comment 29 Artyom 2017-08-17 13:03:33 UTC
Verified on ovirt-hosted-engine-ha-2.1.5-1.el7ev.noarch
MainThread::INFO::2017-08-17 16:00:54,374::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage
MainThread::INFO::2017-08-17 16:00:54,375::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) Validating storage server
MainThread::INFO::2017-08-17 16:00:57,650::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Storage domain reported as valid and reconnect is not forced

Comment 31 errata-xmlrpc 2017-08-22 17:45:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2526


Note You need to log in before you can comment on or make changes to this bug.