833795 – Competing non-responsive and non-operational flows can result in guests being marked in a non-responsive state instead of down.

Bug 833795 - Competing non-responsive and non-operational flows can result in guests being marked in a non-responsive state instead of down.

Summary: Competing non-responsive and non-operational flows can result in guests being...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.0.3
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Roy Golan
QA Contact:	Ido Begun
Docs Contact:
URL:
Whiteboard:	virt
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-06-20 11:27 UTC by Lee Yarwood
Modified:	2018-11-28 20:32 UTC (History)
CC List:	10 users (show)
Fixed In Version:	si20
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-01-30 21:35:45 UTC
oVirt Team:	---
Target Upstream Version:
Embargoed:
Flags:	ykaul: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	147023	0	None	None	None	2012-06-20 11:27:27 UTC

Description Lee Yarwood 2012-06-20 11:27:27 UTC

Description of problem:

If a host becomes non-operational after previously becoming non-responsive and being fenced the 
two flows (vdsNotResponding and SetNonOperationalVdsCommand) can leave one or more guests marked 
in a non-responsive state instead of down.

Version-Release number of selected component (if applicable):
3.0.3

How reproducible:
Unable to reproduce internally as yet.

Steps to Reproduce:
[ Work In Progress ]
1. Use a locally shared NFS mount as a SD.
2. Disable the NFS services at reboot.
3. Have a number of running guests on the host at the time.
4. Block vdsmd to force a non-responsive treatment to be started.
5. Host should be fenced and should become non-operational shortly after booting.
6. One or more guests should be marked as non-responsive instead of down.
  
Actual results:
Guests marked as non-responsive.

Expected results:
Guests should be marked as down.

Additional info:

Customer logs with an example of this will following in a private comment.

Given the need for a host to become non-operational after fencing and the fact that the guests can easily be corrected I am only assigning a medium prio to this bug. 

My recommendation at this time would be that InitVdsOnUpCommand destroy any competing threads for the same host to avoid situations like this but I am not sure if this is appropriate for every use case.

Comment 4 Roy Golan 2012-08-08 12:09:38 UTC

looks quite complicated to try to prevent the interleaving. I feel we really need kind of framework for that kind of task. 

I am sending a fix to make the  migrateVm command return unsuccsessfull migration of VMs to their former status instead of NotResponding. It won't solve the interleaving but it will keep the outcome sane I guess.

Comment 5 Roy Golan 2012-08-08 12:32:29 UTC

http://gerrit.ovirt.org/#/c/6998/

Comment 8 RHEL Program Management 2012-08-16 12:15:52 UTC

Quality Engineering Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 11 Michal Skrivanek 2012-08-21 12:15:44 UTC

not in downstream

Comment 18 Roy Golan 2012-09-09 08:51:12 UTC

merged downstream 
https://gerrit.eng.lab.tlv.redhat.com/#/c/1433/

Comment 23 Ido Begun 2012-12-25 07:44:56 UTC

OK - SI25.1

Following the steps on https://bugzilla.redhat.com/show_bug.cgi?id=833795#c12, VM state is UP after migration fails.

Note You need to log in before you can comment on or make changes to this bug.