988339 – [scale] race - sometimes VM and VDS statuses is not being updated (host stuck in unassigned)

Bug 988339 - [scale] race - sometimes VM and VDS statuses is not being updated (host stuck in unassigned)

Summary: [scale] race - sometimes VM and VDS statuses is not being updated (host stuck...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.2.0
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.4.0
Assignee:	Roy Golan
QA Contact:	Yuri Obshansky
Docs Contact:
URL:
Whiteboard:	infra
Depends On:
Blocks:	1008634 1060700 rhev3.4beta 1142926
TreeView+	depends on / blocked

Reported:	2013-07-25 11:15 UTC by Pavel Zhukov
Modified:	2020-08-13 08:07 UTC (History)
CC List:	21 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, when a host was stuck in an unassigned state, it could also cause virtual machines on other hosts to stop updating their status. This update adds a concurrent hash map for the internal event queue, which fixes this issue.
Clone Of:
Clones:	1060700 (view as bug list)
Environment:
Last Closed:
oVirt Team:	Infra
Target Upstream Version:
Embargoed:
Flags:	lsvaty: testing_plan_complete-

Attachments	(Terms of Use)
eventq.btm (606 bytes, text/plain) 2013-07-28 12:12 UTC, Roy Golan	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	438093	0	None	None	None	Never

Description Pavel Zhukov 2013-07-25 11:15:43 UTC

Description of problem:
After some manipulation with the hosts, one of them went to "Unassigned" state for a long time (more than 20 hrs). Statuses of the VMs on all _other_ host are not being updated (VM can be launched without errors from the host/engine but status is 9 "waiting for launch). VMs can be powered off and launched again (status changed from 0 to 9 and vice versa, run_on_vds is changed as well). Free memory of the host is not being updated.  

Version-Release number of selected component (if applicable):
rhevm-3.2.1-0.39.el6ev.noarch

How reproducible:
Unknown. 2 systems are affected


Actual results:
One host is in Unassigned mode. 
New started VMs are in "Waiting for launch" status but actually up and running

Comment 10 Roy Golan 2013-07-28 12:12:09 UTC

Created attachment 779325 [details]
eventq.btm

Comment 23 Yair Zaslavsky 2013-08-20 12:05:13 UTC

Still needs to be investigated, postponing to 3.2.4

Comment 24 Barak 2013-09-16 11:41:43 UTC

This bug is about patch

Comment 26 Barak 2013-09-17 12:12:16 UTC

the patch was accepted upstream long time ago and it is already in 3.3,
I would like to test this scenario as a part of the scale testing for 3.3,
Hence moving to ON_QA

Comment 27 Charlie 2013-11-28 00:13:59 UTC

This bug is currently attached to errata RHEA-2013:15231. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 28 Shai Revivo 2014-01-15 12:14:22 UTC

QE are unable to verify this scale bug for 3.3.
will verify in 3.4

Comment 30 Barak 2014-02-03 11:58:28 UTC

Added a 3.3.z flag to test it for 3.3.zstream

Comment 33 Eldad Marciano 2014-05-13 14:19:00 UTC

How to reproduced the bug?

Comment 34 Eldad Marciano 2014-06-05 13:18:27 UTC

Tested on 3.4(latest) 3.4.0-0.21.el6ev

- I have created 37 hosts 
- running deactivate and active in high frequency.
- hosts being unassigned for 2-3 min and then status Ok.

Comment 35 Itamar Heim 2014-06-12 14:06:42 UTC

Closing as part of 3.4.0

Note You need to log in before you can comment on or make changes to this bug.

acathrow
bazulay
emarcian
ernest.beinrohr
hchiramm
iheim
lpeer
lyarwood
mkalinin
nobody
pep
perobins
pstehlik
pzhukov
rbinkhor
rgolan
Rhev-m-bugs
srevivo
talayan
yeylon
yzaslavs