Description of problem: On a fresh installed node i was able to setup hosted-engine with out any issues. My setup had issues so i had to reinstall hosted-engine. I followed the steps below to clean up existing hosted-engine installation and when i try to install again it gives me the error "Failed to execute stage 'Misc configuration': Failed to start service 'vdsmd'". cleanup steps: =============== Hosts: 1) check that HE VM does not exist on host - vdsClient -s 0 list table | awk '{print $1}' | xargs vdsClient -s 0 destroy 2) remove all relevant configuration directories - rm -rf /etc/vdsm/ /etc/ovirt-hosted-engine* 3) try to umount all storages from /rhev/data-center/mnt/ - umount -f /rhev/data-center/mnt/ (if failed with device is busy reboot host) remove all dirs from /rhev/data-center/mnt/ - rm -rf /rhev/data-center/mnt/* Storage: 1) stop relevant gluster storage - glusterfs volume stop your_storage 2) clean all stuff from your gluster storage - rm -rf /gluster_vloume/you_storage/* (you need to run it for all replica's) 3) start gluster storage - glusterfs volume start your_storage Version-Release number of selected component (if applicable): vdsm-4.18.13-1.el7ev.x86_64 How reproducible: Always Steps to Reproduce: 1. Install Hosted-engine using cockpit UI 2. Reboot the node 3. Use the clean up steps to clean up hosted engine setup. 4. Now start deploying hosted-engine via cockpit UI Actual results: hosted-engine installation fails with error "failed to start vdsmd service" Expected results: User should be able to successfully deploy hosted-engine. Additional info:
sos report and ovirt-hosted-engine-setup log can be found in the link below: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1386581/
Created attachment 1212094 [details] screenshot for vdsm failure
Lowering the priority, because according to comment 0 this bug appears on the second attempt of installing hosted-engine, after the cleanup of a previous attempt. This is an unusual flow. Rama, I'd suggest to start completely fresh if you need to re-deploy hosted engine. AFAIK there is no defined way for RHEL nor RHVH of cleanig up hosted-engine. But Simone is probably able to clarify this. Simone, do the cleanup steps look correct to you and is there a defined way of cleaning up hosted-engine?
(In reply to Fabian Deutsch from comment #3) > Simone, do the cleanup steps look correct to you and is there a defined way > of cleaning up hosted-engine? We have manual recovery instructions here: https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install
Here the issue was with supervdsm that fails due to the lack of /dev/stdout Pretty strange... Oct 19 13:06:55 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'. Oct 19 13:06:55 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: supervdsmd.service failed. Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com polkitd[2322]: Registered Authentication Agent for unix-process:30433:595554 (system bus name :1.75 [/usr/bin/pkttyagent --notify-fd 6 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Started Auxiliary vdsm service for running helper functions as root. Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Starting Auxiliary vdsm service for running helper functions as root... Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Starting Virtual Desktop Server Manager... Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com vdsmd_init_common.sh[30439]: vdsm: Running mkdirs Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com vdsmd_init_common.sh[30439]: vdsm: Running configure_coredump Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com vdsmd_init_common.sh[30439]: vdsm: Running configure_vdsm_logs Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com vdsmd_init_common.sh[30439]: vdsm: Running wait_for_network Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com daemonAdapter[30438]: Traceback (most recent call last): Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com daemonAdapter[30438]: File "/usr/share/vdsm/supervdsmServer", line 45, in <module> Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com daemonAdapter[30438]: level=logging.DEBUG) Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com daemonAdapter[30438]: File "/usr/lib64/python2.7/logging/__init__.py", line 1529, in basicConfig Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com daemonAdapter[30438]: hdlr = FileHandler(filename, mode) Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com daemonAdapter[30438]: File "/usr/lib64/python2.7/logging/__init__.py", line 902, in __init__ Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com daemonAdapter[30438]: StreamHandler.__init__(self, self._open()) Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com daemonAdapter[30438]: File "/usr/lib64/python2.7/logging/__init__.py", line 925, in _open Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com daemonAdapter[30438]: stream = open(self.baseFilename, self.mode) Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com daemonAdapter[30438]: IOError: [Errno 6] No such device or address: '/dev/stdout' Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: supervdsmd.service: main process exited, code=exited, status=1/FAILURE Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: Unit supervdsmd.service entered failed state. Oct 19 14:28:46 rhsqa-grafton1.lab.eng.blr.redhat.com systemd[1]: supervdsmd.service failed.
Wed 2016-10-19 13:06:52.780372 IST [s=4f085ef25a1347e78a534b14bf0387d5;i=cfd;b=1d7c4901b75a47799f6b8e2f9f88963b;m=3e192624;t=53f32df6ab554;x=8a1b572c8de9af07] PRIORITY=6 _UID=0 _GID=0 _SYSTEMD_SLICE=system.slice _BOOT_ID=1d7c4901b75a47799f6b8e2f9f88963b _MACHINE_ID=f4cd856b7f414b1f9da665403f07b0ae _HOSTNAME=rhsqa-grafton1.lab.eng.blr.redhat.com SYSLOG_FACILITY=3 _CAP_EFFECTIVE=1fffffffff _SELINUX_CONTEXT=system_u:system_r:init_t:s0 _TRANSPORT=stdout _EXE=/usr/bin/python2.7 SYSLOG_IDENTIFIER=daemonAdapter _COMM=supervdsmServer _CMDLINE=/usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock _SYSTEMD_CGROUP=/system.slice/supervdsmd.service _SYSTEMD_UNIT=supervdsmd.service _PID=30283 MESSAGE=IOError: [Errno 6] No such device or address: '/dev/stdout' Rama, any idea on why /dev/stdout could be disappeared?
(In reply to Simone Tiraboschi from comment #6) > Wed 2016-10-19 13:06:52.780372 IST > [s=4f085ef25a1347e78a534b14bf0387d5;i=cfd;b=1d7c4901b75a47799f6b8e2f9f88963b; > m=3e192624;t=53f32df6ab554;x=8a1b572c8de9af07] > PRIORITY=6 > _UID=0 > _GID=0 > _SYSTEMD_SLICE=system.slice > _BOOT_ID=1d7c4901b75a47799f6b8e2f9f88963b > _MACHINE_ID=f4cd856b7f414b1f9da665403f07b0ae > _HOSTNAME=rhsqa-grafton1.lab.eng.blr.redhat.com > SYSLOG_FACILITY=3 > _CAP_EFFECTIVE=1fffffffff > _SELINUX_CONTEXT=system_u:system_r:init_t:s0 > _TRANSPORT=stdout > _EXE=/usr/bin/python2.7 > SYSLOG_IDENTIFIER=daemonAdapter > _COMM=supervdsmServer > _CMDLINE=/usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile > /var/run/vdsm/svdsm.sock > _SYSTEMD_CGROUP=/system.slice/supervdsmd.service > _SYSTEMD_UNIT=supervdsmd.service > _PID=30283 > MESSAGE=IOError: [Errno 6] No such device or address: '/dev/stdout' > > Rama, any idea on why /dev/stdout could be disappeared? simone, not sure. I performed the steps as listed above and see that supervdsm and vdsm fails to start.
I'll try to reproduce.
Hi,all Thank you for RamaKasturi's details.i follow his steps to reproduce. Steps: 1.Deploy HostedEngine by cockpit UI successfully. 2.Reboot the node 3.Use the clean up steps to clean up hosted engine setup.(comment 0) 4.Now start deploying hosted-engine via cockpit UI Errors display om cockpit : Failed to execute stage 'Environment setup': Failed to start service 'vdsmd' Hosted Engine deployment failed Version-Release number of selected component (if applicable): vdsm-4.18.13-1.el7ev.x86_64 Thanks, Yihui
Yihui Zhao, can you also reproduce this on RHEL-H?
Hi Fabian, As suggested by you in comment 3 i reinstalled the problematic RHV-H node and redeployed hosted-engine. Now when i try adding additional host , host addition fails with error "failed to start vdsmd service". Copied vdsm and host-deploy log to the link below. http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1386581/ Do we need to reinstall all the hosts? Any idea why this happens? Thanks kasturi.
From vdsm.log: Thread-12::ERROR::2016-10-19 12:59:48,658::monitor::484::Storage.Monitor::(_pathChecked) Error checking path /rhev/data-center/mnt/glusterSD/10.70.36.79:_engine/674c131b-01d2-4912-bd64-e4984e1e0d13/dom_md/metadata Traceback (most recent call last): File "/usr/share/vdsm/storage/monitor.py", line 482, in _pathChecked delay = result.delay() File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 367, in delay raise exception.MiscFileReadException(self.path, self.rc, self.err) MiscFileReadException: Internal file read failure: (u'/rhev/data-center/mnt/glusterSD/10.70.36.79:_engine/674c131b-01d2-4912-bd64-e4984e1e0d13/dom_md/metadata', 1, bytearray(b"/usr/bin/dd: failed to open \'/rhev/data-center/mnt/glusterSD/10.70.36.79:_engine/674c131b-01d2-4912-bd64-e4984e1e0d13/dom_md/metadata\': No such file or directory\n")) Thread-110::ERROR::2016-10-19 12:59:48,743::monitor::425::Storage.Monitor::(_checkDomainStatus) Error checking domain 674c131b-01d2-4912-bd64-e4984e1e0d13 Traceback (most recent call last): File "/usr/share/vdsm/storage/monitor.py", line 406, in _checkDomainStatus self.domain.selftest() File "/usr/share/vdsm/storage/fileSD.py", line 656, in selftest self.oop.os.statvfs(self.domaindir) File "/usr/share/vdsm/storage/outOfProcess.py", line 260, in statvfs return self._iop.statvfs(path) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 488, in statvfs resdict = self._sendCommand("statvfs", {"path": path}, self.timeout) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 461, in _sendCommand raise OSError(errcode, errstr) OSError: [Errno 2] No such file or directory … Thread-110::ERROR::2016-10-19 13:06:48,785::monitor::425::Storage.Monitor::(_checkDomainStatus) Error checking domain 674c131b-01d2-4912-bd64-e4984e1e0d13 Traceback (most recent call last): File "/usr/share/vdsm/storage/monitor.py", line 406, in _checkDomainStatus self.domain.selftest() File "/usr/share/vdsm/storage/sdc.py", line 50, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/share/vdsm/storage/sdc.py", line 53, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 125, in _realProduce domain = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 144, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/glusterSD.py", line 55, in findDomain return GlusterStorageDomain(GlusterStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/glusterSD.py", line 51, in findDomainPath raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('674c131b-01d2-4912-bd64-e4984e1e0d13',) MainThread::DEBUG::2016-10-19 13:06:50,417::vdsm::73::vds::(sigtermHandler) Received signal 15 Thus it looks like there is something up with your storage domain. Please verify your storage domains.
Fabian, i see that these errors come up in vdsm logs with out having any issues with my storage domains. I have already logged a bug for StorageDomain DoesNotExist error in the vdsm log.
Ying, can you reproduce this issue?
Created attachment 1228902 [details] vdsm.log
I follow the steps to test on rhvh-4.0-0.20161206.0+1,can reproduce this problem. And the vdsm.log attached on https://bugzilla.redhat.com/attachment.cgi?id=1228902 Ps: Reinstall HE failed because vdsm is down, i tried to start vdsm manual by the command "systemctl start vdsmd",but failed. [root@dell-per510-01 vdsm]# systemctl start vdsmd A dependency job for vdsmd.service failed. See 'journalctl -xe' for details. [root@dell-per510-01 vdsm]# journalctl -xe -- Subject: Unit ovirt-imageio-daemon.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ovirt-imageio-daemon.service has failed. -- -- The result is failed. Dec 07 02:21:17 dell-per510-01.lab.eng.pek2.redhat.com systemd[1]: ovirt-imageio-daemon.service failed. Dec 07 02:21:17 dell-per510-01.lab.eng.pek2.redhat.com systemd[1]: vdsmd.service holdoff time over, scheduling restart. Dec 07 02:21:17 dell-per510-01.lab.eng.pek2.redhat.com systemd[1]: start request repeated too quickly for ovirt-imageio-daemon.service Dec 07 02:21:17 dell-per510-01.lab.eng.pek2.redhat.com systemd[1]: Failed to start oVirt ImageIO Daemon. -- Subject: Unit ovirt-imageio-daemon.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ovirt-imageio-daemon.service has failed. -- -- The result is failed. Dec 07 02:21:17 dell-per510-01.lab.eng.pek2.redhat.com systemd[1]: ovirt-imageio-daemon.service failed. Dec 07 02:21:17 dell-per510-01.lab.eng.pek2.redhat.com systemd[1]: Stopping MOM instance configured for VDSM purposes... -- Subject: Unit mom-vdsm.service has begun shutting down -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit mom-vdsm.service has begun shutting down. Dec 07 02:21:19 dell-per510-01.lab.eng.pek2.redhat.com momd[6529]: 2016-12-07 02:21:19,506 - mom.HostMonitor - INFO - Host Monitor ending Dec 07 02:21:20 dell-per510-01.lab.eng.pek2.redhat.com momd[6529]: 2016-12-07 02:21:20,472 - mom.GuestManager - INFO - Guest Manager ending Thanks, Yihui
Thanks for yihui's detail testing steps. Cancel the needinfo now.
Thanks. To me this sounds as if the cleanup steps are removing to much. Moving this to HE as it looks like this should happen on RHEL-H as well.
Yihui, can you please check yum logs for: warning: user vdsm does not exist - using root warning: group kvm does not exist - using root
(In reply to Simone Tiraboschi from comment #19) > Yihui, can you please check yum logs for: > warning: user vdsm does not exist - using root > warning: group kvm does not exist - using root Hi,Simone I checked yum logs,but there is no content in it. Thanks, Yihui
This should be fixed by BZ #1001181. Please test it and report any bugs if needed. *** This bug has been marked as a duplicate of bug 1001181 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days