Bug 1401359
| Summary: | [Hosted-Engine] 3.5 HE SD upgrade fails if done on initial host | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Germano Veit Michel <gveitmic> | ||||||||
| Component: | ovirt-hosted-engine-ha | Assignee: | Simone Tiraboschi <stirabos> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Nikolai Sednev <nsednev> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 3.6.9 | CC: | eedri, lsurette, mavital, melewis, mkalinin, stirabos, ykaul, ylavi | ||||||||
| Target Milestone: | ovirt-4.1.1-2 | Keywords: | Triaged, ZStream | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | ovirt-hosted-engine-ha-2.1.0.5-2.el7ev | Doc Type: | Bug Fix | ||||||||
| Doc Text: |
Previously, the Red Hat Enterprise Virtualization 3.5 self-hosted engine storage domain upgrade failed on the initial host due to permissions errors. This has been corrected.
|
Story Points: | --- | ||||||||
| Clone Of: | |||||||||||
| : | 1422864 (view as bug list) | Environment: | |||||||||
| Last Closed: | 2017-05-03 07:51:23 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1400800, 1422864, 1430513, 1444021 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Germano Veit Michel
2016-12-05 01:43:16 UTC
1)Deployed clean environment on puma18 and puma19 hosts, over NFS storage domain and added two NFS data storage domains.
Components on hosts:
ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch
sanlock-3.2.4-3.el7_2.x86_64
rhevm-sdk-python-3.5.6.0-1.el7ev.noarch
mom-0.4.1-4.el7ev.noarch
vdsm-4.16.38-1.el7ev.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.25.x86_64
ovirt-hosted-engine-ha-1.2.10-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.6.x86_64
ovirt-host-deploy-1.3.2-1.el7ev.noarch
Linux version 3.10.0-327.53.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Tue Mar 14 10:49:09 EDT 2017
Linux 3.10.0-327.53.1.el7.x86_64 #1 SMP Tue Mar 14 10:49:09 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)
On engine:
rhevm-lib-3.5.8-0.1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-common-3.5.8-0.1.el6ev.noarch
rhevm-dwh-3.5.5-1.el6ev.noarch
rhevm-setup-plugins-3.5.4-1.el6ev.noarch
rhevm-iso-uploader-3.5.1-1.el6ev.noarch
rhevm-extensions-api-impl-3.5.8-0.1.el6ev.noarch
rhevm-spice-client-x64-msi-3.5-3.el6.noarch
rhevm-backend-3.5.8-0.1.el6ev.noarch
rhevm-sdk-python-3.5.6.0-1.el6ev.noarch
ovirt-host-deploy-1.3.2-1.el6ev.noarch
rhevm-spice-client-x64-cab-3.5-3.el6.noarch
rhevm-webadmin-portal-3.5.8-0.1.el6ev.noarch
rhevm-setup-base-3.5.8-0.1.el6ev.noarch
rhevm-reports-3.5.8-1.el6ev.noarch
rhevm-image-uploader-3.5.0-4.el6ev.noarch
rhevm-setup-plugin-websocket-proxy-3.5.8-0.1.el6ev.noarch
ovirt-host-deploy-java-1.3.2-1.el6ev.noarch
rhevm-spice-client-x86-cab-3.5-3.el6.noarch
rhevm-setup-3.5.8-0.1.el6ev.noarch
rhevm-tools-3.5.8-0.1.el6ev.noarch
rhevm-3.5.8-0.1.el6ev.noarch
rhevm-log-collector-3.5.4-2.el6ev.noarch
rhev-guest-tools-iso-3.5-15.el6ev.noarch
rhevm-doc-3.5.3-1.el6eng.noarch
rhevm-spice-client-x86-msi-3.5-3.el6.noarch
rhevm-guest-agent-common-1.0.10-2.el6ev.noarch
rhevm-dwh-setup-3.5.5-1.el6ev.noarch
rhevm-branding-rhev-3.5.0-4.el6ev.noarch
rhevm-cli-3.5.0.6-1.el6ev.noarch
rhevm-dbscripts-3.5.8-0.1.el6ev.noarch
rhevm-setup-plugin-ovirt-engine-3.5.8-0.1.el6ev.noarch
rhevm-reports-setup-3.5.8-1.el6ev.noarch
rhevm-dependencies-3.5.1-1.el6ev.noarch
rhevm-websocket-proxy-3.5.8-0.1.el6ev.noarch
rhevm-userportal-3.5.8-0.1.el6ev.noarch
rhevm-restapi-3.5.8-0.1.el6ev.noarch
Linux version 2.6.32-573.41.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Thu Mar 2 11:08:17 EST 2017
Linux 2.6.32-573.41.1.el6.x86_64 #1 SMP Thu Mar 2 11:08:17 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.7 (Santiago)
puma18 ~]# hosted-engine --vm-status
--== Host 1 status ==--
Status up-to-date : True
Hostname : puma18
Host ID : 1
Engine status : {"health": "good", "vm": "up", "detail": "up"}
Score : 2400
Local maintenance : False
Host timestamp : 21579
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=21579 (Wed Apr 12 19:11:01 2017)
host-id=1
score=2400
maintenance=False
state=EngineUp
--== Host 2 status ==--
Status up-to-date : True
Hostname : puma19
Host ID : 2
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 2400
Local maintenance : False
Host timestamp : 21438
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=21438 (Wed Apr 12 19:11:02 2017)
host-id=2
score=2400
maintenance=False
state=EngineDown
2)Permissions right after deployment on both hosts as appears bellow:
puma18 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf
4.0K -rw-rw----. 1 root root 2.6K Apr 12 17:52 /etc/ovirt-hosted-engine/answers.conf
puma19 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf
4.0K -rw-r--r--. 1 root root 2.6K Apr 12 18:01 /etc/ovirt-hosted-engine/answers.conf
3)Created one guest VM with RHEL7.3 OS.
4)Upgraded HE from 3.5 to 3.6 following https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html/Self-Hosted_Engine_Guide/Upgrading_the_Self-Hosted_Engine.html
5)I was able to reproduce this issue on my environment.
MainThread::INFO::2017-04-12 22:24:36,343::upgrade::1012::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_35_36) Upgrading to current version
MainThread::INFO::2017-04-12 22:24:36,368::upgrade::738::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_stopMonitoringDomain) Stop monitoring domain
MainThread::INFO::2017-04-12 22:24:36,383::upgrade::153::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) Looking for conf volume
MainThread::ERROR::2017-04-12 22:24:36,399::upgrade::209::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) Unable to find HE conf volume
MainThread::INFO::2017-04-12 22:24:36,399::upgrade::955::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_move_to_shared_conf) _move_to_shared_conf
MainThread::INFO::2017-04-12 22:24:36,400::upgrade::377::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_get_conffile_content) Reading conf file: fhanswers.conf
MainThread::ERROR::2017-04-12 22:24:36,400::upgrade::401::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_get_conffile_content) Failed to read configuration file '/etc/ovirt-hosted-engine/answers.conf': [Errno 13] Permission denied: '/etc/ovirt-hosted-engine/answers.conf'
MainThread::ERROR::2017-04-12 22:24:36,400::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Failed to read configuration file '/etc/ovirt-hosted-engine/answers.conf': [Errno 13] Permission denied: '/etc/ovirt-hosted-engine/answers.conf'' - trying to restart agent
Moving back to assigned.
I was not able to move the second host to maintenance as HE-VM was running on it and it was failing to migrate to 3.6's rhel7.3 puma18.
Created attachment 1271242 [details]
sosreport-puma18.scl.lab.tlv.redhat.com-20170412222758.tar.xz
Created attachment 1271243 [details]
sosreport-puma19.scl.lab.tlv.redhat.com-20170412222804.tar.xz
Created attachment 1271244 [details]
engine
puma18 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf 4.0K -rw-rw----. 1 root root 2.6K Apr 12 17:52 /etc/ovirt-hosted-engine/answers.conf ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.10-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 mom-0.5.6-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.7.4-1.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch vdsm-4.17.39-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch libvirt-client-2.0.0-10.el7_3.5.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 rhevm-sdk-python-3.6.9.1-1.el7ev.noarch Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017 Linux puma18 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) ovirt-hosted-engine-ha-2.1.0.5-1.el7ev didn't included the fix, I built ovirt-hosted-engine-ha-2.1.0.5-2.el7ev with it (In reply to Simone Tiraboschi from comment #11) > ovirt-hosted-engine-ha-2.1.0.5-1.el7ev didn't included the fix, I built > ovirt-hosted-engine-ha-2.1.0.5-2.el7ev with it Was it consumed by QE? Is it in any errata? 1)Initial host running on 3.5 components with ovirt-hosted-engine-ha-1.2.10-1.el7ev.noarch:
puma18 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf
4.0K -rw-rw----. 1 root root 2.6K Apr 18 18:44 /etc/ovirt-hosted-engine/answers.conf.
2)After upgrading initial host to latest 4.1:
puma18 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf
4.0K -rw-rw----. 1 root kvm 2.6K Apr 18 18:44 /etc/ovirt-hosted-engine/answers.conf
puma18 ~]# rpm -qa | grep ovirt-hosted-engine-ha
ovirt-hosted-engine-ha-2.1.0.5-2.el7ev.noarch
3)vdsm service was not running dues to known bug opened by Simone:
puma18 ~]# systemctl status vdsmd -l
● vdsmd.service - Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
Active: failed (Result: start-limit) since Mon 2017-04-24 15:05:06 IDT; 25s ago
Process: 15302 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=1/FAILURE)
Main PID: 3042 (code=exited, status=0/SUCCESS)
Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager.
Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state.
Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed.
Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service holdoff time over, scheduling restart.
Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: start request repeated too quickly for vdsmd.service
Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: Failed to start Virtual Desktop Server Manager.
Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: Unit vdsmd.service entered failed state.
Apr 24 15:05:06 puma18.scl.lab.tlv.redhat.com systemd[1]: vdsmd.service failed.
I've fixed it with "vdsm-tool configure --force" and then with "systemctl restart vdsmd && systemctl restart ovirt-ha-broker && systemctl restart ovirt-ha-agent".
puma18 ~]# systemctl status vdsmd -l
● vdsmd.service - Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2017-04-24 15:07:44 IDT; 25s ago
Process: 16228 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
Main PID: 16309 (vdsm)
CGroup: /system.slice/vdsmd.service
├─16309 /usr/bin/python2 /usr/share/vdsm/vdsm
├─16438 /usr/libexec/ioprocess --read-pipe-fd 40 --write-pipe-fd 39 --max-threads 10 --max-queued-requests 10
├─16445 /usr/libexec/ioprocess --read-pipe-fd 48 --write-pipe-fd 47 --max-threads 10 --max-queued-requests 10
└─16457 /usr/libexec/ioprocess --read-pipe-fd 57 --write-pipe-fd 56 --max-threads 10 --max-queued-requests 10
Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 client step 1
Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 ask_user_info()
Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 make_client_response()
Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 client step 2
Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 parse_server_challenge()
Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 ask_user_info()
Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 make_client_response()
Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com python2[16309]: DIGEST-MD5 client step 3
Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com vdsm[16309]: vdsm MOM WARN MOM not available.
Apr 24 15:07:45 puma18.scl.lab.tlv.redhat.com vdsm[16309]: vdsm MOM WARN MOM not available, KSM stats will be missing.
4)After step 3 I've checked again for puma18 ~]# ll -lsha /etc/ovirt-hosted-engine/answers.conf
4.0K -rw-rw----. 1 root kvm 2.6K Apr 18 18:44 /etc/ovirt-hosted-engine/answers.conf
Permissions were not changed on host, although hosted-storage was successfully upgraded on host:
MainThread::INFO::2017-04-24 15:10:16,090::upgrade::1035::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade_3
5_36) Successfully upgraded
5)Moving to verified as besides of known issues and their workarounds the hosted-storage upgrade succeeded on initial host.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1195 |