Bug 1068926

Summary: engine: host stuck on Unassigned when moving from status Maintenance when storage is not availble from the host
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-engineAssignee: Liran Zelkha <lzelkha>
Status: CLOSED CURRENTRELEASE QA Contact: sefi litmanovich <slitmano>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: acanan, acathrow, bazulay, dron, emesika, iheim, jkt, juan.hernandez, knesenko, lpeer, lzelkha, pep, perobins, pstehlik, pzhukov, Rhev-m-bugs, rhodain, slitmano, talayan, yeylon, yzaslavs
Target Milestone: ---Keywords: Reopened, ZStream
Target Release: 3.3.2   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: is35.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 962180 Environment:
Last Closed: 2014-04-13 07:49:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 962180    
Bug Blocks:    
Attachments:
Description Flags
screenshot
none
engine.log
none
engine log - 02.04.14
none
screenshot - 02.04.14 none

Comment 2 Tareq Alayan 2014-03-16 12:39:20 UTC
Created attachment 875129 [details]
screenshot

1- Have 2 hosts in iscsi data centre in up state
2. put hsm host in maintenance and block traffic to storage domain
3. put host back to up

result: 
Host goes to state unassigned for ~2minutes and then goes to non operational state.

However, in the event log tab in the management you can see an event saying: 
State was set to Up for host monique-vds01.tlv.redhat.com 

after an the event: Host monique-vds01.tlv.redhat.com was activated by admin@internal

Comment 3 Tareq Alayan 2014-03-16 12:40:30 UTC
Created attachment 875130 [details]
engine.log

Comment 4 Tareq Alayan 2014-03-16 12:41:30 UTC
failqa per comment 2

Comment 5 Liran Zelkha 2014-03-16 13:30:29 UTC
Hi Tareq,

I can see in the log that:
1. Host was switched to unassigned
2. Host was switched to up
3. Host was switched to non operational

Is that what you see in the audit log too? I think this is the expected behaviour, no?

Comment 6 Tareq Alayan 2014-03-16 14:20:15 UTC
No on the audit log i can see the "State was set to up for Host monique-vds01.tlv.redhat.com"
This message it wrong. What really happens is that host goes into unassigned then goes into non operational. 

You need to omit this message from event log .. because it is wrong and confusing.

Comment 8 sefi litmanovich 2014-04-02 15:16:22 UTC
Got a strange behaviour, not sure if it's the expected one:

1. added two hosts to iscsi DC-CLUSTER_SD
2. both hosts are connected to storage and up
3. put hsm to maintenance state
4. on hsm: iptables -A OUTPUT -p tcp --dport 3260 -j DROP
5. at this point hsm became 'unassigned' for 2 minutes, then non-operational, and after 3 more minutes hsm went back to 'up' state for few seconds then started another "loop" of the same unassigned-non-operational-up-etc....
only after I set iptables to allow connection to storage again I was able to put hsm to 'up' state.

I'll attach my engine.log and a snapshot of the auditlog.

Comment 9 sefi litmanovich 2014-04-02 15:17:05 UTC
Created attachment 881863 [details]
engine log - 02.04.14

Comment 10 sefi litmanovich 2014-04-02 15:17:56 UTC
Created attachment 881864 [details]
screenshot - 02.04.14

Comment 11 Zac Dover 2014-04-03 02:26:16 UTC
This bug is currently attached to errata RHBA-2014:17286. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.

* Consequence: What happens when the bug presents.

* Fix: What was done to fix the bug.

* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes

Thanks in advance.

Comment 12 Eli Mesika 2014-04-07 09:22:22 UTC
(In reply to sefi litmanovich from comment #8)
> Got a strange behaviour, not sure if it's the expected one:
> 
> 1. added two hosts to iscsi DC-CLUSTER_SD
> 2. both hosts are connected to storage and up
> 3. put hsm to maintenance state
> 4. on hsm: iptables -A OUTPUT -p tcp --dport 3260 -j DROP
> 5. at this point hsm became 'unassigned' for 2 minutes, then
> non-operational, and after 3 more minutes hsm went back to 'up' state for
> few seconds then started another "loop" of the same
> unassigned-non-operational-up-etc....
> only after I set iptables to allow connection to storage again I was able to
> put hsm to 'up' state.
> 
> I'll attach my engine.log and a snapshot of the auditlog.

Please retry this test with REJECT rather than DROP

Comment 13 sefi litmanovich 2014-04-07 13:27:43 UTC
Retried just now with REJECT instead of DROP.

result is pretty much the same.
host becomes unassigned for about 2 minutes then engine tries to set host to up state (event log says:"State was set to up on host") it remains unassigned until it fails to connect the host to the storage, gives the expected message and moves host to non-operational. after few minutes engine tries to set host to up again and it becomes unassigned.
this loop can last forever it seems, this cannot be the expected result.

Comment 14 Liran Zelkha 2014-04-08 05:39:25 UTC
Sefi, can you send me an environment where you reproduced this bug? I am kind of shooting in the dark here...

Comment 16 Barak 2014-04-08 14:27:51 UTC
(In reply to sefi litmanovich from comment #13)
> Retried just now with REJECT instead of DROP.
> 
> result is pretty much the same.
> host becomes unassigned for about 2 minutes then engine tries to set host to
> up state (event log says:"State was set to up on host") it remains
> unassigned until it fails to connect the host to the storage, gives the
> expected message and moves host to non-operational. after few minutes engine
> tries to set host to up again and it becomes unassigned.
> this loop can last forever it seems, this cannot be the expected result.

This behavior is the exact expected behaviour in these situations.

The host recovery mechanism wakes up every couple of minutes and tries to reconnect and bring everything to up.

So moving this bug back to ON_QA so you guys can move it to VERIFIED.

Comment 18 sefi litmanovich 2014-04-09 08:35:30 UTC
Verified according to according to comments 8,13,16.
using rhevm-3.3.2-0.50.el6ev.noarch

Comment 19 Kiril Nesenko 2014-04-13 07:49:41 UTC
Was released with 3.3.2