1124751 – UI Hosts tab isn't being refreshed automatically

Bug 1124751 - UI Hosts tab isn't being refreshed automatically

Summary: UI Hosts tab isn't being refreshed automatically

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	ovirt-engine-core
Sub Component:
Version:	3.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Eli Mesika
QA Contact:	Pavel Stehlik
Docs Contact:
URL:
Whiteboard:	infra
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-07-30 08:42 UTC by Nir Soffer
Modified:	2016-02-10 19:34 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-09-07 07:56:44 UTC
oVirt Team:	Infra
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Engine log (508.81 KB, application/x-xz) 2014-07-30 08:42 UTC, Nir Soffer	no flags	Details
vdsm log from host that was up (534.16 KB, application/x-xz) 2014-07-30 08:44 UTC, Nir Soffer	no flags	Details
vdsm log from host that had trouble becoming up (585.34 KB, application/x-xz) 2014-07-30 08:45 UTC, Nir Soffer	no flags	Details
View All

Description Nir Soffer 2014-07-30 08:42:09 UTC

Created attachment 922462 [details]
Engine log

Description of problem:

Got into state where tow hosts were up but none of them is the spm.

Version-Release number of selected component (if applicable):
oVirt Engine Version: 3.5.0-0.0.master.20140722232056.git8e1babc.fc19

How reproducible:
Unknown

Steps to Reproduce:

1. Setup ISCSI data center with 30 storage domains and 2 hosts running
   Fedora 19.
2. Remove the two hosts and add them back, one host using jsonrpc and
   the other using xmlrpc.
3. Activate both hosts
4. Switch spm role between hosts several times
   On one switch, the host that should be the new spm became the spm
   and immidiately lost spm and the other host became the spm.
5. Put hosts to maintenance and wait until all 30 domain monitors are
   stopped
6. Remove both hosts, add them back using jsonrpc

Actual results:

One host came up but did not become the spm, although all domains were up.

The other host remained in "Initiailizing" state. After few minutes, the host
became non-operational. Activating the hosts made it "Unasigned". After few
minutes the host became non-operational again and I moved it into maintenance.
After activating the host, it was finally up, but still no host became the spm.

Expected results:

Both host in up state, one of them the spm.

Workaround:

Restart ovirt-engine.

Comment 1 Nir Soffer 2014-07-30 08:44:43 UTC

Created attachment 922463 [details]
vdsm log from host that was up

Comment 2 Nir Soffer 2014-07-30 08:45:35 UTC

Created attachment 922464 [details]
vdsm log from host that had trouble becoming up

Comment 3 Nir Soffer 2014-07-30 08:54:11 UTC

Additional info:

After putting both hosts to maintenance, all storage domain in the data center 
remained up(!) - without any active host in the data center.

Comment 4 Allon Mureinik 2014-07-30 10:51:05 UTC

Nir, did you try the same scenario with XMLRPC?

Comment 5 Nir Soffer 2014-07-30 11:40:09 UTC

(In reply to Allon Mureinik from comment #4)
> Nir, did you try the same scenario with XMLRPC?

In step 1 there was one host using xmlrpc. After the problem state was reached, I switched both hosts to xmlrpc and it did not change engine state.

I did not try to repeat the whole test using xmlrpc on both hosts.

Comment 6 Liron Aravot 2014-07-30 12:55:40 UTC

Nir, it'll be great if on the next time you could add only the relevant timeframe of the log or specify which part is relevant.
reproduce the issue with the minimal steps required, it's hard to track the provided log as it's spread on long time and contains many different operations.


Let's handle the issues separately here- this bug describes multiple issues in one bug.

1. 
Host being activated and moves to non operational:
The host moves to non operational because the connect to storage pool operation takes more than 3 minutes on VDSM, which causes it to be considered as timed out in the engine.

-----------------------------------------------------
Thread-21::INFO::2014-07-30 00:53:32,519::logUtils::44::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID=u'2440ff3d-275f-42e6-b204-7d055b26b17
4', hostID=2, msdUUID=u'983111eb-5fea-4899-833b-305e6fb91b47', masterVersion=740, domainsMap=None, options=None)

Thread-21::INFO::2014-07-30 00:57:54,293::logUtils::47::dispatcher::(wrapper) Run and protect: connectStoragePool, Return response: True
-----------------------------------------------------

2.
Host doesn't become the spm - 
from what i've seen in the log (and if that's not the correct timeframe please le me know) voodoo3 does become the SPM, the issue might be a UI refresh issue.

-----------------------------------------------------
2014-07-30 00:46:27,614 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (DefaultQuartzScheduler_Worker-30) [6a4da444] FINISH, SpmStartVDS
Command, return: org.ovirt.engine.core.common.businessentities.SpmStatusResult@7a013213, log id: 415fe407
2014-07-30 00:46:27,622 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData] (DefaultQuartzScheduler_Worker-30) [6a4da444] Initialize Irs proxy from
 vds: voodoo3.tlv.redhat.com
2014-07-30 00:46:27,638 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-30) [6a4da444] Correlation
 ID: null, Call Stack: null, Custom Event ID: -1, Message: Storage Pool Manager runs on Host voodoo3 (Address: voodoo3.tlv.redhat.com).

-----------------------------------------------------

Comment 7 Nir Soffer 2014-07-30 13:13:13 UTC

(In reply to Liron Aravot from comment #6)
> Let's handle the issues separately here- this bug describes multiple issues
> in one bug.

I mentioned two issues (no spm, storage domain up when all hosts are down), but they are probably related, both cause by same issue. If not, you can open other bug for the separate issue.

> 
> 1. 
> Host being activated and moves to non operational:
> The host moves to non operational because the connect to storage pool
> operation takes more than 3 minutes on VDSM, which causes it to be
> considered as timed out in the engine.
> 
> -----------------------------------------------------
> Thread-21::INFO::2014-07-30
> 00:53:32,519::logUtils::44::dispatcher::(wrapper) Run and protect:
> connectStoragePool(spUUID=u'2440ff3d-275f-42e6-b204-7d055b26b17
> 4', hostID=2, msdUUID=u'983111eb-5fea-4899-833b-305e6fb91b47',
> masterVersion=740, domainsMap=None, options=None)
> 
> Thread-21::INFO::2014-07-30
> 00:57:54,293::logUtils::47::dispatcher::(wrapper) Run and protect:
> connectStoragePool, Return response: True
> -----------------------------------------------------

Sure, but why the connect to storage pool failed?

> 
> 2.
> Host doesn't become the spm - 
> from what i've seen in the log (and if that's not the correct timeframe
> please le me know) voodoo3 does become the SPM, the issue might be a UI
> refresh issue.
> 
> -----------------------------------------------------
> 2014-07-30 00:46:27,614 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
> (DefaultQuartzScheduler_Worker-30) [6a4da444] FINISH, SpmStartVDS
> Command, return:
> org.ovirt.engine.core.common.businessentities.SpmStatusResult@7a013213, log
> id: 415fe407
> 2014-07-30 00:46:27,622 INFO 
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
> (DefaultQuartzScheduler_Worker-30) [6a4da444] Initialize Irs proxy from
>  vds: voodoo3.tlv.redhat.com
> 2014-07-30 00:46:27,638 INFO 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler_Worker-30) [6a4da444] Correlation
>  ID: null, Call Stack: null, Custom Event ID: -1, Message: Storage Pool
> Manager runs on Host voodoo3 (Address: voodoo3.tlv.redhat.com).
> 
> -----------------------------------------------------

Hopefully this is the case - we can also see in vdsm logs if one of the
host became the spm.

The relevant engine log start at Jul 29 about 19:00.

The vdsm logs start some time before this issue was seen.

I don't have more information, all the info is in the logs.

Comment 8 Liron Aravot 2014-07-30 13:54:46 UTC

(In reply to Nir Soffer from comment #7)

> Sure, but why the connect to storage pool failed?
> 

We should inspect why the connect takes that long on that scenario so long, but that's not an engine issue.

---------------------------------------------
> 
> Hopefully this is the case - we can also see in vdsm logs if one of the
> host became the spm.
> 
> The relevant engine log start at Jul 29 about 19:00.
> 
> The vdsm logs start some time before this issue was seen.
> 
> I don't have more information, all the info is in the logs.
 

from what i've seen in the log (and if that's not the correct timeframe please le me know) voodoo3 does become the SPM, the issue might be a UI refresh issue.

Comment 9 Liron Aravot 2014-08-03 09:14:54 UTC

Ori, this seems like the UI refresh issue that you encountered last week, please close this as a duplicate of the opened bug on that issue.

thanks.

Comment 10 Ori Gofen 2014-08-03 10:58:44 UTC

I haven't opened that one,someone beat me to it

Comment 11 Allon Mureinik 2014-08-07 11:57:03 UTC

(In reply to Ori from comment #10)
> I haven't opened that one,someone beat me to it
Ori, do we have a BZ number for it?

Comment 12 Ori Gofen 2014-08-07 12:17:30 UTC

Allon, by someone i meant Nir :)

Comment 13 Oved Ourfali 2014-08-26 07:39:30 UTC

So, do we have a bug on that already?
Nir?

Comment 14 Oved Ourfali 2014-08-26 07:50:58 UTC

Talked offline with Liron. This is the bug. Removing the needinfo.

Eli - can you check if this reproduces?
Play a bit with the hosts and see if it is being refreshed?
I'm afraid this one will be hard to reproduce, but lets try.

Comment 15 Eli Mesika 2014-08-26 16:13:54 UTC

(In reply to Oved Ourfali from comment #14)
> Talked offline with Liron. This is the bug. Removing the needinfo.
> 
> Eli - can you check if this reproduces?
> Play a bit with the hosts and see if it is being refreshed?
> I'm afraid this one will be hard to reproduce, but lets try.

Indeed, I see no issue with the UI auto refresh, but still I miss info

Nir, if you can reproduce the following info is mandatory

1) You wrote in the BZ description that the workaround is engine restart, if BZ claims that "Hosts tab isn't being refreshed automatically" , we can refresh the information on Hosts tab manually , does this work ???

2) If 1) does not work, please issue a REST GET on api/hosts 
   do you get the correct information ???

Note You need to log in before you can comment on or make changes to this bug.