Created attachment 1702924 [details] vm boot issue Created attachment 1702924 [details] vm boot issue Description of problem: I deployed oVirt 4.4.1 with a hosted-engine at Gluster storage. Often the Engine starts with an error wich showed in attachment. The Engine started after a lot stop-start iterations. The same error is inherent for other cases: - following Hosted-Engine installation: with disabled maintenance mode the Engine started after few attempts on different hosts - for common virtual machines at Gluster storage If disk to move to iscsi the error not present Version-Release number of selected component (if applicable): ovirt-4.4.1.4 vdsm-4.40.22-1.el8.x86_64 glusterfs-7.6-1.el8.x86_64 How reproducible: Steps to Reproduce: 1. Enable Global maintenance: 'hosted-engine --set-maintenance --mode=global' 2. Stop Hosted Engine: 'hosted-engine --vm-shutdown' 3. Start Hosted Engine: 'hosted-engine --vm-start' 4. Set console password: 'hosted-engine --add-console-password' 5. Connect Engine's VNC console and look 'hosted-engine --vm-status' Actual results: 1. Engine status: {"vm": "up", "health": "bad", "detail": "Up", "reason": "failed liveliness check"} 2. Screenshot in attachement Expected results: Engine status: {"vm": "up", "health": "good", "detail": "Up"} Additional info:
Can you attach relevant logs? Specifically, Gluster, VDSM and hosted engine logs?
Created attachment 1702936 [details] vdsmd logs from start-stop interval
glusterd.log only this sting from start-stop interval [2020-07-30 11:18:15.525364] I [MSGID: 106488] [glusterd-handler.c:1400:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
In additional I think may be related https://bugzilla.redhat.com/show_bug.cgi?id=1859403
Created attachment 1702942 [details] engine logs
Can you please provide engine mount log?
Created attachment 1702946 [details] supervdsm
Created attachment 1702948 [details] gluster mnt log
Could not find anything from attached logs. If the health is bad and the vm is up, the HA services will try to restart the Manager virtual machine to get the Manager back. Also please provide HA agent and broker log. Please check "gluster volume status" ? If yes then I will need some logs from engine (need to connect to engine via console): /var/log/messages, /var/log/ovirt-engine/engine.log and /var/log/ovirt-engine/server.log. You can also check engine status by running cmd: systemctl status -l ovirt-engine If anything wrong then check: journalctl -u ovirt-engine
Created attachment 1702950 [details] agent logs
Created attachment 1702951 [details] agent logs
Created attachment 1702952 [details] broker
(In reply to Gobinda Das from comment #9) > Could not find anything from attached logs. > If the health is bad and the vm is up, the HA services will try to restart > the Manager virtual machine to get the Manager back. > Also please provide HA agent and broker log. > > Please check "gluster volume status" ? Brick store-01:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 Brick store-02:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 Brick store-03:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 > If yes then I will need some logs from engine (need to connect to engine via > console): > /var/log/messages, /var/log/ovirt-engine/engine.log and > /var/log/ovirt-engine/server.log. > > You can also check engine status by running cmd: > systemctl status -l ovirt-engine > If anything wrong then check: > journalctl -u ovirt-engine I cant check ovirt-engine service. VM not started really. Please look first attachment
So this issue we hit long back with gluster sharding and The issue is fixed with glusterfs release-7 and release-8 Ref BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1823423 This is the fix -> https://review.gluster.org/#/c/glusterfs/+/24480/ Please upgrade your glusterfs to v7.7 and try.
I upgraded glusterfs to 7.7 from centos-gluster7-test and issue fixed. Unfortunately v7.7 not present in main repo now