Created attachment 891747 [details] agent broker and vdsm logs from two hosts Description of problem: Have two hosts with installed hosted-engine, on one of them run engine-vm, if I set global maintenance and restart hosts, after restart hosts not automatically connect hosted engine storage domain that and subsequently crash hosted-engine HA. Version-Release number of selected component (if applicable): ovirt-hosted-engine-ha-1.1.2-2.el6ev.noarch How reproducible: Always Steps to Reproduce: 1. Have hosted engine environment with two hosts and running engine vm 2. Set global maintenance mode(from one of hosts hosted-engine --set-maintenance --mode=global) wait until hosted-engine --vm-status will receive update 3. Restart both hosts Actual results: After hosts restart hosted engine storage domain to mounted automatically, and it's led to HA agent crush Expected results: hosted engine storage domain mount automatically on both hosts, and HA agent start normally and also start engine vm Additional info: it also possible to mount storage domain manually after restart vi hosted-engine --connect-storage and start HA agent(service ovirt-ha-agent start) after it hosted engine continue work fine Before it tried on this environment different cases when connection between host and storage domain blocked, but after restore ip tables all worked fine
I just reproduced this crash, but with a slightly different result. It seems like the agent is trying to communicate with vdsmd too soon (while it's not yet intialized) and thus fails to connect the storage. Can you please try to reproduce it and this time just wait a while after the machine boots and then just start the agent with service ovirt-ha-agent start. It should start normally.
Sorry for late answer, just had some exams on this week, I checked you proposal: 1) Installed hosted-engine environment with two hosts 2) Set global maintenance mode 3) Disable autorun for agent and broker services on both hosts: chkconfig ovirt-ha-agent off && chkconfig ovirt-ha-agent --del chkconfig ovirt-ha-broker off && chkconfig ovirt-ha-broker --del 4) Restarted both hosts 5) wait sometime after host booting 6) Start manually service on both hosts: service ovirt-ha-broker start && service ovirt-ha-agent start After sometime it looks like storage mounted successfully, but now engine agent start playing ping pong with vm, because score problem, but it related to other bug.
(In reply to Artyom from comment #2) > Sorry for late answer, just had some exams on this week, I checked you > proposal: > 1) Installed hosted-engine environment with two hosts > 2) Set global maintenance mode > 3) Disable autorun for agent and broker services on both hosts: > chkconfig ovirt-ha-agent off && chkconfig ovirt-ha-agent --del > chkconfig ovirt-ha-broker off && chkconfig ovirt-ha-broker --del > 4) Restarted both hosts > 5) wait sometime after host booting > 6) Start manually service on both hosts: > service ovirt-ha-broker start && service ovirt-ha-agent start > > After sometime it looks like storage mounted successfully, but now engine > agent start playing ping pong with vm, because score problem, but it related > to other bug. Thanks Artiom for confirming this, the proposed patch should fix the problem
Verified on ovirt-hosted-engine-ha-1.2.1-0.2.master.20140805072346.el6.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0194.html