Description of problem: In 3.5, /etc/ovirt-hosted-engine/answers.conf permissions are, right after install, as below: Initial Host: -rw-rw----. 1 root root 2585 Dec 5 00:08 /etc/ovirt-hosted-engine/answers.conf Additional Hosts: -rw-r--r--. 1 root root 2575 Dec 5 00:18 /etc/ovirt-hosted-engine/answers.conf When upgrading to 3.6, if the Host chosen to upgrade first (to ha 1.3.x) is the Initial one selected for the initial deployment of HE (-rw-rw----), the HE SD upgrade fails due to EACCESS to answers.conf file. Version-Release number of selected component (if applicable): Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160920.1.el7ev) ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy fresh Hosted Engine on 3.5 using 20160219.0.el7ev 2. Upgrade initial HE host to 20160920.1.el7ev 3. Trigger HE SD Upgrade (Host in maintenance, restart ha-agent) Actual results: If one chooses the initial HE Host to do the upgrade, the HE SD not Upgraded, ha-agent keeps restarting. See: MainThread::INFO::2016-12-05 01:24:38,917::upgrade::1010::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_35_36) Upgrading to current version MainThread::INFO::2016-12-05 01:24:39,004::upgrade::736::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_stopMonitoringDomain) Stop monitoring domain MainThread::INFO::2016-12-05 01:24:39,059::upgrade::151::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) Looking for conf volume MainThread::ERROR::2016-12-05 01:24:39,112::upgrade::207::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) Unable to find HE conf volume MainThread::INFO::2016-12-05 01:24:39,112::upgrade::953::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_move_to_shared_conf) _move_to_shared_conf MainThread::INFO::2016-12-05 01:24:39,112::upgrade::375::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_get_conffile_content) Reading conf file: fhanswers.conf MainThread::ERROR::2016-12-05 01:24:39,112::upgrade::399::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_get_conffile_content) Failed to read configuration file '/etc/ovirt-hosted-engine/answers.conf': [Errno 13] Permission denied: '/etc/ovirt-hosted-engine/answers.conf' MainThread::ERROR::2016-12-05 01:24:39,113::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to read configuration file '/etc/ovirt-hosted-engine/answers.conf': [Errno 13] Permission denied: '/etc/ovirt-hosted-engine/answers.conf'' - trying to restart agent Expected results: HE SD upgraded. Additional info: The permission issue seem to be there for some time. In previous hosted-engine-ha in 3.6 apparently used to upgrade the HE SD even when hitting this "Permission denied" error. See: https://bugzilla.redhat.com/show_bug.cgi?id=1292652#c1 Apparently we missed that error in that BZ, and now the behavior is slightly different, we restart the agent and the HE SD is NOT upgraded at the step it should. This causes troubles for 3.5 to 3.6 Upgrade.
1)Deployed clean environment on puma18 and puma19 hosts, over NFS storage domain and added two NFS data storage domains. Components on hosts: ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch sanlock-3.2.4-3.el7_2.x86_64 rhevm-sdk-python-3.5.6.0-1.el7ev.noarch mom-0.4.1-4.el7ev.noarch vdsm-4.16.38-1.el7ev.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.25.x86_64 ovirt-hosted-engine-ha-1.2.10-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.6.x86_64 ovirt-host-deploy-1.3.2-1.el7ev.noarch Linux version 3.10.0-327.53.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Tue Mar 14 10:49:09 EDT 2017 Linux 3.10.0-327.53.1.el7.x86_64 #1 SMP Tue Mar 14 10:49:09 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) On engine: rhevm-lib-3.5.8-0.1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-common-3.5.8-0.1.el6ev.noarch rhevm-dwh-3.5.5-1.el6ev.noarch rhevm-setup-plugins-3.5.4-1.el6ev.noarch rhevm-iso-uploader-3.5.1-1.el6ev.noarch rhevm-extensions-api-impl-3.5.8-0.1.el6ev.noarch rhevm-spice-client-x64-msi-3.5-3.el6.noarch rhevm-backend-3.5.8-0.1.el6ev.noarch rhevm-sdk-python-3.5.6.0-1.el6ev.noarch ovirt-host-deploy-1.3.2-1.el6ev.noarch rhevm-spice-client-x64-cab-3.5-3.el6.noarch rhevm-webadmin-portal-3.5.8-0.1.el6ev.noarch rhevm-setup-base-3.5.8-0.1.el6ev.noarch rhevm-reports-3.5.8-1.el6ev.noarch rhevm-image-uploader-3.5.0-4.el6ev.noarch rhevm-setup-plugin-websocket-proxy-3.5.8-0.1.el6ev.noarch ovirt-host-deploy-java-1.3.2-1.el6ev.noarch rhevm-spice-client-x86-cab-3.5-3.el6.noarch rhevm-setup-3.5.8-0.1.el6ev.noarch rhevm-tools-3.5.8-0.1.el6ev.noarch rhevm-3.5.8-0.1.el6ev.noarch rhevm-log-collector-3.5.4-2.el6ev.noarch rhev-guest-tools-iso-3.5-15.el6ev.noarch rhevm-doc-3.5.3-1.el6eng.noarch rhevm-spice-client-x86-msi-3.5-3.el6.noarch rhevm-guest-agent-common-1.0.10-2.el6ev.noarch rhevm-dwh-setup-3.5.5-1.el6ev.noarch rhevm-branding-rhev-3.5.0-4.el6ev.noarch rhevm-cli-3.5.0.6-1.el6ev.noarch rhevm-dbscripts-3.5.8-0.1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.5.8-0.1.el6ev.noarch rhevm-reports-setup-3.5.8-1.el6ev.noarch rhevm-dependencies-3.5.1-1.el6ev.noarch rhevm-websocket-proxy-3.5.8-0.1.el6ev.noarch rhevm-userportal-3.5.8-0.1.el6ev.noarch rhevm-restapi-3.5.8-0.1.el6ev.noarch Linux version 2.6.32-573.41.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Thu Mar 2 11:08:17 EST 2017 Linux 2.6.32-573.41.1.el6.x86_64 #1 SMP Thu Mar 2 11:08:17 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.7 (Santiago) puma18 ~]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : puma18 Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 2400 Local maintenance : False Host timestamp : 21579 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=21579 (Wed Apr 12 19:11:01 2017) host-id=1 score=2400 maintenance=False state=EngineUp --== Host 2 status ==-- Status up-to-date : True Hostname : puma19 Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 2400 Local maintenance : False Host timestamp : 21438 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=21438 (Wed Apr 12 19:11:02 2017) host-id=2 score=2400 maintenance=False state=EngineDown 2)Permissions right after deployment on both hosts as appears bellow: puma18 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf 4.0K -rw-rw----. 1 root root 2.6K Apr 12 17:52 /etc/ovirt-hosted-engine/answers.conf puma19 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf 4.0K -rw-r--r--. 1 root root 2.6K Apr 12 18:01 /etc/ovirt-hosted-engine/answers.conf 3)Created one guest VM with RHEL7.3 OS. 4)Upgraded HE from 3.5 to 3.6 following https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html/Self-Hosted_Engine_Guide/Upgrading_the_Self-Hosted_Engine.html 5)I was able to reproduce this issue on my environment. MainThread::INFO::2017-04-12 22:24:36,343::upgrade::1012::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_35_36) Upgrading to current version MainThread::INFO::2017-04-12 22:24:36,368::upgrade::738::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_stopMonitoringDomain) Stop monitoring domain MainThread::INFO::2017-04-12 22:24:36,383::upgrade::153::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) Looking for conf volume MainThread::ERROR::2017-04-12 22:24:36,399::upgrade::209::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) Unable to find HE conf volume MainThread::INFO::2017-04-12 22:24:36,399::upgrade::955::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_move_to_shared_conf) _move_to_shared_conf MainThread::INFO::2017-04-12 22:24:36,400::upgrade::377::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_get_conffile_content) Reading conf file: fhanswers.conf MainThread::ERROR::2017-04-12 22:24:36,400::upgrade::401::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_get_conffile_content) Failed to read configuration file '/etc/ovirt-hosted-engine/answers.conf': [Errno 13] Permission denied: '/etc/ovirt-hosted-engine/answers.conf' MainThread::ERROR::2017-04-12 22:24:36,400::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to read configuration file '/etc/ovirt-hosted-engine/answers.conf': [Errno 13] Permission denied: '/etc/ovirt-hosted-engine/answers.conf'' - trying to restart agent Moving back to assigned. I was not able to move the second host to maintenance as HE-VM was running on it and it was failing to migrate to 3.6's rhel7.3 puma18.
Created attachment 1271242 [details] sosreport-puma18.scl.lab.tlv.redhat.com-20170412222758.tar.xz
Created attachment 1271243 [details] sosreport-puma19.scl.lab.tlv.redhat.com-20170412222804.tar.xz
Created attachment 1271244 [details] engine
puma18 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf 4.0K -rw-rw----. 1 root root 2.6K Apr 12 17:52 /etc/ovirt-hosted-engine/answers.conf ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.10-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 mom-0.5.6-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.4-1.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch vdsm-4.17.39-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch libvirt-client-2.0.0-10.el7_3.5.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 rhevm-sdk-python-3.6.9.1-1.el7ev.noarch Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017 Linux puma18 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo)
ovirt-hosted-engine-ha-2.1.0.5-1.el7ev didn't included the fix, I built ovirt-hosted-engine-ha-2.1.0.5-2.el7ev with it
(In reply to Simone Tiraboschi from comment #11) > ovirt-hosted-engine-ha-2.1.0.5-1.el7ev didn't included the fix, I built > ovirt-hosted-engine-ha-2.1.0.5-2.el7ev with it Was it consumed by QE? Is it in any errata?
1)Initial host running on 3.5 components with ovirt-hosted-engine-ha-1.2.10-1.el7ev.noarch: puma18 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf 4.0K -rw-rw----. 1 root root 2.6K Apr 18 18:44 /etc/ovirt-hosted-engine/answers.conf. 2)After upgrading initial host to latest 4.1: puma18 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf 4.0K -rw-rw----. 1 root kvm 2.6K Apr 18 18:44 /etc/ovirt-hosted-engine/answers.conf puma18 ~]# rpm -qa | grep ovirt-hosted-engine-ha ovirt-hosted-engine-ha-2.1.0.5-2.el7ev.noarch 3)vdsm service was not running dues to known bug opened by Simone: puma18 ~]# systemctl status vdsmd -l ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: failed (Result: start-limit) since Mon 2017-04-24 15:05:06 IDT; 25s ago Process: 15302 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=1/FAILURE) Main PID: 3042 (code=exited, status=0/SUCCESS) Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager. Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state. Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed. Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service holdoff time over, scheduling restart. Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: start request repeated too quickly for vdsmd.service Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager. Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state. Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed. I've fixed it with "vdsm-tool configure --force" and then with "systemctl restart vdsmd && systemctl restart ovirt-ha-broker && systemctl restart ovirt-ha-agent". puma18 ~]# systemctl status vdsmd -l ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2017-04-24 15:07:44 IDT; 25s ago Process: 16228 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 16309 (vdsm) CGroup: /system.slice/vdsmd.service ├─16309 /usr/bin/python2 /usr/share/vdsm/vdsm ├─16438 /usr/libexec/ioprocess --read-pipe-fd 40 --write-pipe-fd 39 --max-threads 10 --max-queued-requests 10 ├─16445 /usr/libexec/ioprocess --read-pipe-fd 48 --write-pipe-fd 47 --max-threads 10 --max-queued-requests 10 └─16457 /usr/libexec/ioprocess --read-pipe-fd 57 --write-pipe-fd 56 --max-threads 10 --max-queued-requests 10 Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 client step 1 Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 ask_user_info() Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 make_client_response() Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 client step 2 Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 parse_server_challenge() Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 ask_user_info() Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 make_client_response() Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 client step 3 Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com vdsm[16309]: vdsm MOM WARN MOM not available. Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com vdsm[16309]: vdsm MOM WARN MOM not available, KSM stats will be missing. 4)After step 3 I've checked again for puma18 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf 4.0K -rw-rw----. 1 root kvm 2.6K Apr 18 18:44 /etc/ovirt-hosted-engine/answers.conf Permissions were not changed on host, although hosted-storage was successfully upgraded on host: MainThread::INFO::2017-04-24 15:10:16,090::upgrade::1035::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_3 5_36) Successfully upgraded 5)Moving to verified as besides of known issues and their workarounds the hosted-storage upgrade succeeded on initial host.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1195