Bug 988339

Summary: [scale] race - sometimes VM and VDS statuses is not being updated (host stuck in unassigned)
Product: Red Hat Enterprise Virtualization Manager Reporter: Pavel Zhukov <pzhukov>
Component: ovirt-engineAssignee: Roy Golan <rgolan>
Status: CLOSED CURRENTRELEASE QA Contact: Yuri Obshansky <yobshans>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.2.0CC: acathrow, bazulay, emarcian, ernest.beinrohr, hchiramm, iheim, lpeer, lyarwood, mkalinin, nobody, pep, perobins, pstehlik, pzhukov, rbinkhor, rgolan, Rhev-m-bugs, srevivo, talayan, yeylon, yzaslavs
Target Milestone: ---Keywords: ZStream
Target Release: 3.4.0Flags: lsvaty: testing_plan_complete-
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, when a host was stuck in an unassigned state, it could also cause virtual machines on other hosts to stop updating their status. This update adds a concurrent hash map for the internal event queue, which fixes this issue.
Story Points: ---
Clone Of:
: 1060700 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1008634, 1060700, 1078909, 1142926    
Attachments:
Description Flags
eventq.btm none

Description Pavel Zhukov 2013-07-25 11:15:43 UTC
Description of problem:
After some manipulation with the hosts, one of them went to "Unassigned" state for a long time (more than 20 hrs). Statuses of the VMs on all _other_ host are not being updated (VM can be launched without errors from the host/engine but status is 9 "waiting for launch). VMs can be powered off and launched again (status changed from 0 to 9 and vice versa, run_on_vds is changed as well). Free memory of the host is not being updated.  

Version-Release number of selected component (if applicable):
rhevm-3.2.1-0.39.el6ev.noarch

How reproducible:
Unknown. 2 systems are affected


Actual results:
One host is in Unassigned mode. 
New started VMs are in "Waiting for launch" status but actually up and running

Comment 10 Roy Golan 2013-07-28 12:12:09 UTC
Created attachment 779325 [details]
eventq.btm

Comment 23 Yair Zaslavsky 2013-08-20 12:05:13 UTC
Still needs to be investigated, postponing to 3.2.4

Comment 24 Barak 2013-09-16 11:41:43 UTC
This bug is about patch

Comment 26 Barak 2013-09-17 12:12:16 UTC
the patch was accepted upstream long time ago and it is already in 3.3,
I would like to test this scenario as a part of the scale testing for 3.3,
Hence moving to ON_QA

Comment 27 Charlie 2013-11-28 00:13:59 UTC
This bug is currently attached to errata RHEA-2013:15231. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 28 Shai Revivo 2014-01-15 12:14:22 UTC
QE are unable to verify this scale bug for 3.3.
will verify in 3.4

Comment 30 Barak 2014-02-03 11:58:28 UTC
Added a 3.3.z flag to test it for 3.3.zstream

Comment 33 Eldad Marciano 2014-05-13 14:19:00 UTC
How to reproduced the bug?

Comment 34 Eldad Marciano 2014-06-05 13:18:27 UTC
Tested on 3.4(latest) 3.4.0-0.21.el6ev

- I have created 37 hosts 
- running deactivate and active in high frequency.
- hosts being unassigned for 2-3 min and then status Ok.

Comment 35 Itamar Heim 2014-06-12 14:06:42 UTC
Closing as part of 3.4.0