1068926 – engine: host stuck on Unassigned when moving from status Maintenance when storage is not availble from the host

Bug 1068926 - engine: host stuck on Unassigned when moving from status Maintenance when storage is not availble from the host

Summary: engine: host stuck on Unassigned when moving from status Maintenance when st...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.2.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.3.2
Assignee:	Liran Zelkha
QA Contact:	sefi litmanovich
Docs Contact:
URL:
Whiteboard:	infra
Depends On:	962180
Blocks:
TreeView+	depends on / blocked

Reported:	2014-02-23 09:28 UTC by rhev-integ
Modified:	2019-04-28 10:03 UTC (History)
CC List:	21 users (show)
Fixed In Version:	is35.1
Doc Type:	Bug Fix
Doc Text:
Clone Of:	962180
Environment:
Last Closed:	2014-04-13 07:49:41 UTC
oVirt Team:	Infra
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
screenshot (2.36 MB, image/png) 2014-03-16 12:39 UTC, Tareq Alayan	no flags	Details
engine.log (74.99 KB, application/x-tar-gz) 2014-03-16 12:40 UTC, Tareq Alayan	no flags	Details
engine log - 02.04.14 (277.99 KB, application/x-tar-gz) 2014-04-02 15:17 UTC, sefi litmanovich	no flags	Details
screenshot - 02.04.14 (2.32 MB, image/png) 2014-04-02 15:17 UTC, sefi litmanovich	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	25303	0	None	MERGED	core: Ensure NonOperational state is saved	2020-07-03 14:22:09 UTC
oVirt gerrit	25865	0	None	MERGED	core: host stuck on Unassigned when moving from status Maintenance	2020-07-03 14:22:08 UTC

Comment 2 Tareq Alayan 2014-03-16 12:39:20 UTC

Created attachment 875129 [details]
screenshot

1- Have 2 hosts in iscsi data centre in up state
2. put hsm host in maintenance and block traffic to storage domain
3. put host back to up

result: 
Host goes to state unassigned for ~2minutes and then goes to non operational state.

However, in the event log tab in the management you can see an event saying: 
State was set to Up for host monique-vds01.tlv.redhat.com 

after an the event: Host monique-vds01.tlv.redhat.com was activated by admin@internal

Comment 3 Tareq Alayan 2014-03-16 12:40:30 UTC

Created attachment 875130 [details]
engine.log

Comment 4 Tareq Alayan 2014-03-16 12:41:30 UTC

failqa per comment 2

Comment 5 Liran Zelkha 2014-03-16 13:30:29 UTC

Hi Tareq,

I can see in the log that:
1. Host was switched to unassigned
2. Host was switched to up
3. Host was switched to non operational

Is that what you see in the audit log too? I think this is the expected behaviour, no?

Comment 6 Tareq Alayan 2014-03-16 14:20:15 UTC

No on the audit log i can see the "State was set to up for Host monique-vds01.tlv.redhat.com"
This message it wrong. What really happens is that host goes into unassigned then goes into non operational. 

You need to omit this message from event log .. because it is wrong and confusing.

Comment 8 sefi litmanovich 2014-04-02 15:16:22 UTC

Got a strange behaviour, not sure if it's the expected one:

1. added two hosts to iscsi DC-CLUSTER_SD
2. both hosts are connected to storage and up
3. put hsm to maintenance state
4. on hsm: iptables -A OUTPUT -p tcp --dport 3260 -j DROP
5. at this point hsm became 'unassigned' for 2 minutes, then non-operational, and after 3 more minutes hsm went back to 'up' state for few seconds then started another "loop" of the same unassigned-non-operational-up-etc....
only after I set iptables to allow connection to storage again I was able to put hsm to 'up' state.

I'll attach my engine.log and a snapshot of the auditlog.

Comment 9 sefi litmanovich 2014-04-02 15:17:05 UTC

Created attachment 881863 [details]
engine log - 02.04.14

Comment 10 sefi litmanovich 2014-04-02 15:17:56 UTC

Created attachment 881864 [details]
screenshot - 02.04.14

Comment 11 Zac Dover 2014-04-03 02:26:16 UTC

This bug is currently attached to errata RHBA-2014:17286. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.

* Consequence: What happens when the bug presents.

* Fix: What was done to fix the bug.

* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes

Thanks in advance.

Comment 12 Eli Mesika 2014-04-07 09:22:22 UTC

(In reply to sefi litmanovich from comment #8)
> Got a strange behaviour, not sure if it's the expected one:
> 
> 1. added two hosts to iscsi DC-CLUSTER_SD
> 2. both hosts are connected to storage and up
> 3. put hsm to maintenance state
> 4. on hsm: iptables -A OUTPUT -p tcp --dport 3260 -j DROP
> 5. at this point hsm became 'unassigned' for 2 minutes, then
> non-operational, and after 3 more minutes hsm went back to 'up' state for
> few seconds then started another "loop" of the same
> unassigned-non-operational-up-etc....
> only after I set iptables to allow connection to storage again I was able to
> put hsm to 'up' state.
> 
> I'll attach my engine.log and a snapshot of the auditlog.

Please retry this test with REJECT rather than DROP

Comment 13 sefi litmanovich 2014-04-07 13:27:43 UTC

Retried just now with REJECT instead of DROP.

result is pretty much the same.
host becomes unassigned for about 2 minutes then engine tries to set host to up state (event log says:"State was set to up on host") it remains unassigned until it fails to connect the host to the storage, gives the expected message and moves host to non-operational. after few minutes engine tries to set host to up again and it becomes unassigned.
this loop can last forever it seems, this cannot be the expected result.

Comment 14 Liran Zelkha 2014-04-08 05:39:25 UTC

Sefi, can you send me an environment where you reproduced this bug? I am kind of shooting in the dark here...

Comment 16 Barak 2014-04-08 14:27:51 UTC

(In reply to sefi litmanovich from comment #13)
> Retried just now with REJECT instead of DROP.
> 
> result is pretty much the same.
> host becomes unassigned for about 2 minutes then engine tries to set host to
> up state (event log says:"State was set to up on host") it remains
> unassigned until it fails to connect the host to the storage, gives the
> expected message and moves host to non-operational. after few minutes engine
> tries to set host to up again and it becomes unassigned.
> this loop can last forever it seems, this cannot be the expected result.

This behavior is the exact expected behaviour in these situations.

The host recovery mechanism wakes up every couple of minutes and tries to reconnect and bring everything to up.

So moving this bug back to ON_QA so you guys can move it to VERIFIED.

Comment 18 sefi litmanovich 2014-04-09 08:35:30 UTC

Verified according to according to comments 8,13,16.
using rhevm-3.3.2-0.50.el6ev.noarch

Comment 19 Kiril Nesenko 2014-04-13 07:49:41 UTC

Was released with 3.3.2

Note You need to log in before you can comment on or make changes to this bug.

acanan
acathrow
bazulay
dron
emesika
iheim
jkt
juan.hernandez
knesenko
lpeer
lzelkha
pep
perobins
pstehlik
pzhukov
Rhev-m-bugs
rhodain
slitmano
talayan
yeylon
yzaslavs