Bug 1014241 - Sometimes OS will cache the data from nfs mounts which will cause to host take not updated meta data.
Summary: Sometimes OS will cache the data from nfs mounts which will cause to host tak...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha
Version: unspecified
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ---
: 3.3.0
Assignee: Greg Padgett
QA Contact: Artyom
URL:
Whiteboard: sla
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-01 14:47 UTC by Leonid Natapov
Modified: 2016-06-12 23:16 UTC (History)
7 users (show)

Fixed In Version: ovirt-hosted-engine-ha-0.1.0-0.3.1.beta1.el6ev
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-21 16:50:29 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0080 0 normal SHIPPED_LIVE new package: ovirt-hosted-engine-ha 2014-01-21 21:00:07 UTC
oVirt gerrit 19760 0 None None None Never

Description Leonid Natapov 2013-10-01 14:47:43 UTC
Sometimes OS will cache the date from nfs mounts which will cause to host take not updated meta data.

Scenario:
Two hosts,VM is running on host1.
Blocking connection to the storage on host1 causes to failover and VM starts on host2. After connection resumed ha agent on the second host shows that the score on the first host is 2000 while the first host thinks it's 1000 and tries to start VM.

Here is the agent.log output file on the second host:
-----------------------------------------------------
MainThread::INFO::2013-10-01 17:41:41,888::hosted_engine::608::HostedEngine::(_collect_all_host_stats) Host purple-vds2.qa.lab.tlv.redhat.com (id 1) metadata updated
MainThread::INFO::2013-10-01 17:41:41,889::hosted_engine::613::HostedEngine::(_collect_all_host_stats) Host purple-vds2.qa.lab.tlv.redhat.com (id 1): {'last-update-host-ts': 1380638497, 'last-update-local-ts': 1380638501.888829, 'hostname': 'purple-vds2.qa.lab.tlv.redhat.com', 'alive': True, 'engine-status': 'vm-down', 'score': 2400, 'first-update': False}
MainThread::INFO::2013-10-01 17:41:41,889::hosted_engine::608::HostedEngine::(_collect_all_host_stats) Host purple-vds3.qa.lab.tlv.redhat.com (id 2) metadata updated
MainThread::INFO::2013-10-01 17:41:41,889::hosted_engine::613::HostedEngine::(_collect_all_host_stats) Host purple-vds3.qa.lab.tlv.redhat.com (id 2): {'last-update-host-ts': 1380638501, 'last-update-local-ts': 1380638501.888829, 'hostname': 'purple-vds3.qa.lab.tlv.redhat.com', 'alive': True, 'engine-status': 'vm-up good-health-status', 'score': 2000, 'first-update': False}

And here is the output of vm-status from the first host:
---------------------------------------------------------
--== Host 1 status ==--

Hostname                           : purple-vds2.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : vm-down
Score                              : 2400
Host timestamp                     : 1380638671
Extra metadata                     :
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=1380638671 (Tue Oct  1 17:44:31 2013)
	host-id=1
	score=2400
	bridge=True
	cpu-load=0.035
	engine-health=vm-down
	gateway=True
	mem-free=9408
	mem-load=0.000506072874494


--== Host 2 status ==--

Hostname                           : purple-vds3.qa.lab.tlv.redhat.com
Host ID                            : 2
Engine status                      : vm-up good-health-status
Score                              : 1000
Host timestamp                     : 1380633444
Extra metadata                     :
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=1380633444 (Tue Oct  1 16:17:24 2013)
	host-id=2
	score=1000
	bridge=True
	cpu-load=0.19
	engine-health=vm-up good-health-status
	gateway=False
	mem-free=1432
	mem-load=0.0164405010438

As you can see second hosts shows that the first host's score is 2400 while the first host thinks its score is 1000.

Comment 1 Itamar Heim 2013-10-01 20:34:58 UTC
we should only use directio?

Comment 2 Greg Padgett 2013-10-01 22:51:01 UTC
(In reply to Itamar Heim from comment #1)
> we should only use directio?

Indeed, I have a patch to do this which fixes the inconsistency.

We should be sure to test it with glusterfs, to ensure the driver behaves the same as nfs.  (i.e. nfs doesn't have any alignment restrictions which is why we can do an o_direct read in python, I hope glusterfs is the same.)

Comment 3 Greg Padgett 2013-10-09 13:37:10 UTC
Merged Change-Id: I296977000ffa3fff3f9391f93a8b4f3f519eae4e

Comment 5 Artyom 2013-11-11 14:16:33 UTC
Migration work correct in one of hosts, so was verified bug on ovirt-hosted-engine-ha-0.1.0-0.5.1.beta1.el6ev.noarch
After restore connection to first host I see that:
hosted-engine --vm-status on two hosts show the same information
score of first host 0
score of second 2400

Comment 6 Charlie 2013-11-28 01:41:49 UTC
This bug is currently attached to errata RHEA-2013:15591. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 7 Greg Padgett 2013-12-06 17:58:38 UTC
ovirt-hosted-engine-ha is a new package; does not need errata for bugs during its development.

Comment 8 errata-xmlrpc 2014-01-21 16:50:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0080.html


Note You need to log in before you can comment on or make changes to this bug.