Bug 1304844

Summary: [scale] Long delays in updating of web admin "events" pane after many long running storage operations
Product: [oVirt] ovirt-engine Reporter: mlehrer
Component: Frontend.WebAdminAssignee: Martin Perina <mperina>
Status: CLOSED WONTFIX QA Contact: Pavel Stehlik <pstehlik>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.2.5CC: bugs, mlehrer, oourfali
Target Milestone: ---Flags: sbonazzo: ovirt-4.2-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-04 12:17:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Example of lag between events messages after long running operations none

Description mlehrer 2016-02-04 18:41:59 UTC
Created attachment 1121178 [details]
Example of lag between events messages after long running operations

Description of problem:

After multiple long running storage domain operations the Events pane in the web admin shows delayed event message descriptions.  The delay or lag between what the "last message" web admin pane shows and event occurrence or status, can last for several minutes.

eg: see attachment for picture of host that shows different status then the event pane shows for up to several minutes.



Version-Release number of selected component (if applicable):

vdsm-hook-vmfex-dev-4.17.17-0.el7ev.noarch
vdsm-python-4.17.17-0.el7ev.noarch
vdsm-yajsonrpc-4.17.17-0.el7ev.noarch
vdsm-4.17.17-0.el7ev.noarch
vdsm-xmlrpc-4.17.17-0.el7ev.noarch
vdsm-jsonrpc-4.17.17-0.el7ev.noarch
vdsm-cli-4.17.17-0.el7ev.noarch
vdsm-infra-4.17.17-0.el7ev.noarch
rhevm-*-3.6.2.5-0.1

Env details:
-------------------
in 1 Cluster:
50 total SDs of which 
   21 are ISCSI
   30 are NFS

15 running VMs
2  Hosts



How reproducible:

Requires long running operations, and happens over time.
Not easily reproducible within a few clicks.

Steps to Reproduce:

1. Perform some long running storage operations like domain attachment, domain creation, creation of VMs from pools, Disk Migration of large disks.

2. After several long running operations notice events UI pane lagging behind several minutes from actual events occurring.  The difference between UI updates of pane events vs actual changes should be come noticeable at that point.

Actual results:

Event Pane last messages aren't current and lag by several minutes.

Expected results:

Event pane shows up to date status within 15 seconds of change or less. 

Additional info:

Noticed this while focusing on scale storage scenarios.

Comment 1 Oved Ourfali 2016-02-29 10:02:50 UTC
Mordechai - not sure the use case of using half FC half iSCSI is reflecting a real use-case.
Do you see the same when there are a lot of storage operations in general?

Comment 2 mlehrer 2016-02-29 13:24:44 UTC
(In reply to Oved Ourfali from comment #1)
> Mordechai - not sure the use case of using half FC half iSCSI is reflecting
> a real use-case.

The half *NFS* / half iSCSCI were suggested to cover domain scale testing built from existing customer issues with a small growth factor for simulated customer datasets. I agree the number of domains are high, but this was intended.

> Do you see the same when there are a lot of storage operations in general?

Currently I don't have data from an enviroment that has only a few SD's but is also heavy in storage operations.  In the environments that we checked using heavy storage operations, they also contained multiple domains as described above.  

It seems that simply having many (50) domains won't reproduce this delayed event behavior, its  necessary to have executed some long running storage operations, in addition to having many domains.  Further investigation would be necessary to see the effect of many long running operations in an enviroment with less SD's.

Comment 3 Sandro Bonazzola 2016-05-02 09:56:36 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 4 Yaniv Lavi 2016-05-23 13:17:49 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 5 Yaniv Lavi 2016-05-23 13:24:30 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 7 Oved Ourfali 2017-07-04 12:17:49 UTC
I don't see us prioritizing this at the moment.
Closing as wontfix.