Sometimes OS will cache the date from nfs mounts which will cause to host take not updated meta data. Scenario: Two hosts,VM is running on host1. Blocking connection to the storage on host1 causes to failover and VM starts on host2. After connection resumed ha agent on the second host shows that the score on the first host is 2000 while the first host thinks it's 1000 and tries to start VM. Here is the agent.log output file on the second host: ----------------------------------------------------- MainThread::INFO::2013-10-01 17:41:41,888::hosted_engine::608::HostedEngine::(_collect_all_host_stats) Host purple-vds2.qa.lab.tlv.redhat.com (id 1) metadata updated MainThread::INFO::2013-10-01 17:41:41,889::hosted_engine::613::HostedEngine::(_collect_all_host_stats) Host purple-vds2.qa.lab.tlv.redhat.com (id 1): {'last-update-host-ts': 1380638497, 'last-update-local-ts': 1380638501.888829, 'hostname': 'purple-vds2.qa.lab.tlv.redhat.com', 'alive': True, 'engine-status': 'vm-down', 'score': 2400, 'first-update': False} MainThread::INFO::2013-10-01 17:41:41,889::hosted_engine::608::HostedEngine::(_collect_all_host_stats) Host purple-vds3.qa.lab.tlv.redhat.com (id 2) metadata updated MainThread::INFO::2013-10-01 17:41:41,889::hosted_engine::613::HostedEngine::(_collect_all_host_stats) Host purple-vds3.qa.lab.tlv.redhat.com (id 2): {'last-update-host-ts': 1380638501, 'last-update-local-ts': 1380638501.888829, 'hostname': 'purple-vds3.qa.lab.tlv.redhat.com', 'alive': True, 'engine-status': 'vm-up good-health-status', 'score': 2000, 'first-update': False} And here is the output of vm-status from the first host: --------------------------------------------------------- --== Host 1 status ==-- Hostname : purple-vds2.qa.lab.tlv.redhat.com Host ID : 1 Engine status : vm-down Score : 2400 Host timestamp : 1380638671 Extra metadata : metadata_parse_version=1 metadata_feature_version=1 timestamp=1380638671 (Tue Oct 1 17:44:31 2013) host-id=1 score=2400 bridge=True cpu-load=0.035 engine-health=vm-down gateway=True mem-free=9408 mem-load=0.000506072874494 --== Host 2 status ==-- Hostname : purple-vds3.qa.lab.tlv.redhat.com Host ID : 2 Engine status : vm-up good-health-status Score : 1000 Host timestamp : 1380633444 Extra metadata : metadata_parse_version=1 metadata_feature_version=1 timestamp=1380633444 (Tue Oct 1 16:17:24 2013) host-id=2 score=1000 bridge=True cpu-load=0.19 engine-health=vm-up good-health-status gateway=False mem-free=1432 mem-load=0.0164405010438 As you can see second hosts shows that the first host's score is 2400 while the first host thinks its score is 1000.
we should only use directio?
(In reply to Itamar Heim from comment #1) > we should only use directio? Indeed, I have a patch to do this which fixes the inconsistency. We should be sure to test it with glusterfs, to ensure the driver behaves the same as nfs. (i.e. nfs doesn't have any alignment restrictions which is why we can do an o_direct read in python, I hope glusterfs is the same.)
Merged Change-Id: I296977000ffa3fff3f9391f93a8b4f3f519eae4e
Migration work correct in one of hosts, so was verified bug on ovirt-hosted-engine-ha-0.1.0-0.5.1.beta1.el6ev.noarch After restore connection to first host I see that: hosted-engine --vm-status on two hosts show the same information score of first host 0 score of second 2400
This bug is currently attached to errata RHEA-2013:15591. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
ovirt-hosted-engine-ha is a new package; does not need errata for bugs during its development.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0080.html