Hide Forgot
Created attachment 1449699 [details] zip with host log files Description of problem: [HE] Failed to deploy hosted engine on RHVH with error: local_vm_disk_path is undefined Version-Release number of selected component (if applicable): rhvh-4.2.4.2-0.20180608.0 ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.13-1.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy the hosted engine on RHVH with the automation deploy job. Actual results: Failed to deploy hosted engine on the new rhvh with the following error: ovirt-hosted-engine-setup-ansible-final_clean-20180610172659-9lwtac.log 2018-06-10 17:27:25,095+0300 DEBUG ansible on_any args TASK: Copy engine logs kwargs is_conditional:False 2018-06-10 17:27:25,679+0300 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u'Copy engine logs', 'ansible_result': u'type: <type \'dict\'>\nstr: {\'msg\': u"The task includes an option with an undefined variable. The error was: \'local_vm_disk_path\' is undefined\\n\\nThe error appears to have been in \'/usr/share/ovirt-hosted-engine-setup/ansible/fetch_engine_logs.yml\': line 16, column 3, but may\\nbe elsewhere in the file depending on the exact sy\nrepr: {\'msg\': u"The task includes an option with an undefined variable. The error was: \'local_vm_disk_path\' is undefined\\n\\nThe error appears to have been in \'/usr/share/ovirt-hosted-engine-setup/ansible/fetch_engine_logs.yml\': line 16, column 3, but may\\nbe elsewhere in the file depending on the exact sy\ndir: [\'__class__\', \'__cmp__\', \'__contains__\', \'__delattr__\', \'__delitem__\', \'__doc__\', \'__eq__\', \'__format__\', \'__ge__\', \'__getattribute__\', \'__getitem__\', \'__gt__\', \'__hash__\', \'__init__\', \'__iter__\', \'__le__\', \'__len__\', \'__lt__\', \'__ne__\', \'__new__\', \'__reduce__\', \'__reduce_ex__\', \'__repr__\', \'__setattr__\', \'__setitem__\', \'__sizeof__\', \'__str__\', \'__subclasshook__\', \'clear\', \'copy\', \'fromkeys\', \'get\', \'has_key\', \'items\', \'iteritems\', \'iterkeys\', \'itervalues\', \'keys\', \'pop\', \'popitem\', \'setdefault\', \'update\', \'values\', \'viewitems\', \'viewkeys\', \'viewvalues\']\npprint: {\'msg\': u"The task includes an option with an undefined variable. The error was: \'local_vm_disk_path\' is undefined\\n\\nThe error appears to have been in \'/usr/share/ovirt-hosted-engine-setup/ansible/fetch_engine_logs.yml\': line 16, column 3, but may\\nbe elsewhere in the file depending on the exact sy\n{\'msg\': u"The task includes an option with an undefined variable. The error was: \'local_vm_disk_path.__doc__: "dict() -> new empty dictionary\\ndict(mapping) -> new dictionary initialized from a mapping object\'s\\n (key, value) pairs\\ndict(iterable) -> new dictionary initialized as if via:\\n d = {}\\n for k, v in iterable:\\n d[k] = v\\ndict(**kwargs) -> new dictionary initialized with the name=value pairs\\n in the keyword argument list. For example: dict(one=1, two=2)"\n{\'msg\': u"The task includes an option with an undefined variable. The error was: \'local_vm_disk_path.__hash__: None', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/final_clean.yml'} Expected results: complete the deploy successfully. Additional info: Maybe its related to the following patches: https://gerrit.ovirt.org/#/c/90404/ https://gerrit.ovirt.org/#/c/90405/ The environment is still up and running host: lynx01.lab.eng.tlv2.redhat.com for more info you can also see the failed job: https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-ge-deploy-4.2/1566/consoleFull
The real issue happened before that and it's here: [ INFO ] TASK [Wait for the host to be up] [ ERROR ] fatal: [localhost]: FAILED! => {\"ansible_facts\": {\"ovirt_hosts\": [{\"address\": \"lynx01.lab.eng.tlv2.redhat.com\", \"affinity_labels\": [], \"auto_numa_status\": \"unknown\", \"certificate\": {\"organization\": \"localdomain\", \"subject\": \"O=localdomain,CN=lynx01.lab.eng.tlv2.redhat.com\"}, \"cluster\": {\"href\": \"/ovirt-engine/api/clusters/2471d47c-6cb8-11e8-9660-00163e7be006\", \"id\": \"2471d47c-6cb8-11e8-9660-00163e7be006\"}, \"comment\": \"\", \"cpu\": {\"speed\": 0.0, \"topology\": {}}, \"device_passthrough\": {\"enabled\": false}, \"devices\": [], \"external_network_provider_configurations\": [], \"external_status\": \"ok\", \"hardware_information\": {\"supported_rng_sources\": []}, \"hooks\": [], \"href\": \"/ovirt-engine/api/hosts/1c641819-6f86-4a0a-a6c6-5266d4daa917\", \"id\": \"1c641819-6f86-4a0a-a6c6-5266d4daa917\", \"katello_errata\": [], \"kdump_status\": \"unknown\", \"ksm\": {\"enabled\": false}, \"max_scheduling_memory\": 0, \"memory\": 0, \"name\": \"host_mixed_1\", \"network_attachments\": [], \"nics\": [], \"numa_nodes\": [], \"numa_supported\": false, \"os\": {\"custom_kernel_cmdline\": \"\"}, \"permissions\": [], \"port\": 54321, \"power_management\": {\"automatic_pm_enabled\": true, \"enabled\": false, \"kdump_detection\": true, \"pm_proxies\": []}, \"protocol\": \"stomp\", \"se_linux\": {}, \"spm\": {\"priority\": 5, \"status\": \"none\"}, \"ssh\": {\"fingerprint\": \"SHA256:WaskLez/Yfv0OLnR+Lz1rxwzeajKUCmLbQD5Vy1JgZM\", \"port\": 22}, \"statistics\": [], \"status\": \"install_failed\", \"storage_connection_extensions\": [], \"summary\": {\"total\": 0}, \"tags\": [], \"transparent_huge_pages\": {\"enabled\": false}, \"type\": \"ovirt_node\", \"unmanaged_networks\": [], \"update_available\": false}]}, \"attempts\": 120, \"changed\": false} due to "status\": "install_failed" We have to double check host-deploy logs to see why it failed.
where we see: 2018-06-10 17:14:42,892+0300 DEBUG otopi.plugins.otopi.services.systemd systemd.state:130 starting service vdsmd 2018-06-10 17:14:42,892+0300 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/usr/bin/systemctl', 'start', 'vdsmd.service'), executable='None', cwd='None', env=None 2018-06-10 17:14:46,685+0300 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/usr/bin/systemctl', 'start', 'vdsmd.service'), rc=1 2018-06-10 17:14:46,686+0300 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/usr/bin/systemctl', 'start', 'vdsmd.service') stdout: 2018-06-10 17:14:46,687+0300 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start', 'vdsmd.service') stderr: Job for vdsmd.service canceled. 2018-06-10 17:14:46,687+0300 DEBUG otopi.context context._executeMethod:143 method exception Traceback (most recent call last): File "/tmp/ovirt-hk028Vsxow/pythonlib/otopi/context.py", line 133, in _executeMethod method['method']() File "/tmp/ovirt-hk028Vsxow/otopi-plugins/ovirt-host-deploy/vdsm/packages.py", line 216, in _start self.services.state('vdsmd', True) File "/tmp/ovirt-hk028Vsxow/otopi-plugins/otopi/services/systemd.py", line 141, in state service=name, RuntimeError: Failed to start service 'vdsmd'
and vdsm is down and nothing in its log file: [root@lynx01 ~]# systemctl status vdsmd ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: inactive (dead) since dom 2018-06-10 17:14:46 IDT; 18h ago Process: 48985 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 48829 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=killed, signal=TERM) giu 10 17:14:44 lynx01.lab.eng.tlv2.redhat.com systemd[1]: Starting Virtual Desktop Server Manager... giu 10 17:14:44 lynx01.lab.eng.tlv2.redhat.com vdsmd_init_common.sh[48829]: vdsm: Running mkdirs giu 10 17:14:44 lynx01.lab.eng.tlv2.redhat.com vdsmd_init_common.sh[48829]: vdsm: Running configure_coredump giu 10 17:14:44 lynx01.lab.eng.tlv2.redhat.com vdsmd_init_common.sh[48829]: vdsm: Running configure_vdsm_logs giu 10 17:14:44 lynx01.lab.eng.tlv2.redhat.com vdsmd_init_common.sh[48829]: vdsm: Running wait_for_network giu 10 17:14:44 lynx01.lab.eng.tlv2.redhat.com vdsmd_init_common.sh[48829]: vdsm: Running run_init_hooks giu 10 17:14:44 lynx01.lab.eng.tlv2.redhat.com vdsmd_init_common.sh[48829]: vdsm: Running ngn_configure giu 10 17:14:46 lynx01.lab.eng.tlv2.redhat.com vdsmd_init_common.sh[48985]: vdsm: Running run_final_hooks giu 10 17:14:46 lynx01.lab.eng.tlv2.redhat.com systemd[1]: Stopped Virtual Desktop Server Manager. [root@lynx01 ~]# ls -l /var/log/vdsm/vdsm.log -rw-r--r--. 1 vdsm kvm 0 8 giu 23.19 /var/log/vdsm/vdsm.log
supervdsm failed as well. [root@lynx01 ~]# systemctl status supervdsmd ● supervdsmd.service - Auxiliary vdsm service for running helper functions as root Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static; vendor preset: enabled) Active: inactive (dead) since dom 2018-06-10 17:14:46 IDT; 19h ago Process: 48737 ExecStart=/usr/share/vdsm/daemonAdapter /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock (code=exited, status=0/SUCCESS) Main PID: 48737 (code=exited, status=0/SUCCESS) giu 10 17:14:42 lynx01.lab.eng.tlv2.redhat.com systemd[1]: Started Auxiliary vdsm service for running helper functions as root. giu 10 17:14:42 lynx01.lab.eng.tlv2.redhat.com systemd[1]: Starting Auxiliary vdsm service for running helper functions as root... giu 10 17:14:46 lynx01.lab.eng.tlv2.redhat.com systemd[1]: Stopping Auxiliary vdsm service for running helper functions as root... giu 10 17:14:46 lynx01.lab.eng.tlv2.redhat.com daemonAdapter[48737]: Traceback (most recent call last): giu 10 17:14:46 lynx01.lab.eng.tlv2.redhat.com daemonAdapter[48737]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers giu 10 17:14:46 lynx01.lab.eng.tlv2.redhat.com daemonAdapter[48737]: finalizer() giu 10 17:14:46 lynx01.lab.eng.tlv2.redhat.com daemonAdapter[48737]: File "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__ giu 10 17:14:46 lynx01.lab.eng.tlv2.redhat.com daemonAdapter[48737]: res = self._callback(*self._args, **self._kwargs) giu 10 17:14:46 lynx01.lab.eng.tlv2.redhat.com daemonAdapter[48737]: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock' giu 10 17:14:46 lynx01.lab.eng.tlv2.redhat.com systemd[1]: Stopped Auxiliary vdsm service for running helper functions as root.
Severity?
(In reply to Yaniv Kaul from comment #5) > Severity? This will highly impact the RHHI deployment. With RHHI, this issue is 'urgent', as the HE deployment can't be done.
Can you attach {super,}vdsm.log and host /var/log/message with hints on why vdsmd failed to start?
just open the attached zip file and you have all the logs :) Also if you want I reproduced this issue and the machine is up and running.
Piotr, could you please take a look? According to logs it seems to me that vdsm-tool configure --force was executed but afterwards supervdsm was not able to start: Jun 10 16:23:31 lynx01 vdsmd_init_common.sh: vdsm: Running mkdirs Jun 10 16:23:31 lynx01 vdsmd_init_common.sh: vdsm: Running configure_coredump Jun 10 16:23:31 lynx01 vdsmd_init_common.sh: vdsm: Running configure_vdsm_logs Jun 10 16:23:31 lynx01 vdsmd_init_common.sh: vdsm: Running wait_for_network Jun 10 16:23:32 lynx01 vdsmd_init_common.sh: vdsm: Running run_init_hooks Jun 10 16:23:32 lynx01 systemd: Started GlusterFS, a clustered file-system server. Jun 10 16:23:32 lynx01 systemd: Reached target Network is Online. Jun 10 16:23:32 lynx01 systemd: Starting Network is Online. Jun 10 16:23:32 lynx01 systemd: Starting Crash recovery kernel arming... Jun 10 16:23:32 lynx01 systemd: Starting System Logging Service... Jun 10 16:23:32 lynx01 systemd: Starting Notify NFS peers of a restart... Jun 10 16:23:32 lynx01 systemd: Starting LSB: Starts the Spacewalk Daemon... Jun 10 16:23:32 lynx01 sm-notify[2501]: Version 1.3.0 starting Jun 10 16:23:32 lynx01 vdsmd_init_common.sh: vdsm: Running ngn_configure ^^^ Here configure should be executed Jun 10 16:23:32 lynx01 systemd: Started Notify NFS peers of a restart. Jun 10 16:23:32 lynx01 rhnsd: Starting Spacewalk Daemon: [ OK ] Jun 10 16:23:32 lynx01 rhnsd[2530]: Spacewalk Services Daemon starting up, check in interval 240 minutes. Jun 10 16:23:32 lynx01 systemd: Started LSB: Starts the Spacewalk Daemon. Jun 10 16:23:32 lynx01 rsyslogd: [origin software="rsyslogd" swVersion="8.24.0" x-pid="2500" x-info="http://www.rsyslog.com"] start Jun 10 16:23:32 lynx01 systemd: Started System Logging Service. Jun 10 16:23:32 lynx01 kdumpctl: No kdump initial ramdisk found. Jun 10 16:23:32 lynx01 kdumpctl: Rebuilding /boot/initramfs-3.10.0-862.3.2.el7.x86_64kdump.img Jun 10 16:23:33 lynx01 sasldblistusers2: _sasldb_getkeyhandle has failed Jun 10 16:23:34 lynx01 vdsmd_init_common.sh: vdsm: Running run_final_hooks Jun 10 16:23:35 lynx01 systemd: Stopped Virtual Desktop Server Manager. Jun 10 16:23:35 lynx01 systemd: Starting Suspend/Resume Running libvirt Guests... Jun 10 16:23:35 lynx01 systemd: Reached target Multi-User System. Jun 10 16:23:35 lynx01 systemd: Starting Multi-User System. Jun 10 16:23:35 lynx01 systemd: Started Stop Read-Ahead Data Collection 10s After Completed Startup. Jun 10 16:23:35 lynx01 systemd: Starting Stop Read-Ahead Data Collection 10s After Completed Startup. Jun 10 16:23:35 lynx01 systemd: Starting Update UTMP about System Runlevel Changes... Jun 10 16:23:35 lynx01 systemd: Stopping Auxiliary vdsm service for running helper functions as root... Jun 10 16:23:35 lynx01 daemonAdapter: Traceback (most recent call last): Jun 10 16:23:35 lynx01 daemonAdapter: File "/usr/lib64/python2.7/multiprocessing/util.py", line 268, in _run_finalizers Jun 10 16:23:35 lynx01 daemonAdapter: finalizer() Jun 10 16:23:35 lynx01 daemonAdapter: File "/usr/lib64/python2.7/multiprocessing/util.py", line 201, in __call__ Jun 10 16:23:35 lynx01 systemd: Started Update UTMP about System Runlevel Changes. Jun 10 16:23:35 lynx01 daemonAdapter: res = self._callback(*self._args, **self._kwargs) Jun 10 16:23:35 lynx01 daemonAdapter: OSError: [Errno 2] No such file or directory: '/var/run/vdsm/svdsm.sock' Jun 10 16:23:35 lynx01 systemd: Started Suspend/Resume Running libvirt Guests. Jun 10 16:23:35 lynx01 systemd: Stopped Auxiliary vdsm service for running helper functions as root.
Martin, it looks like Ryan already pushed a patch to fix the issue.
Ryan, can we build and deliver a fixed RHVH? It blocks RHHI testing, for example.
The workaround from RHVH was posted on Monday and merged this morning (passed code review late yesterday). I'll put out a RHVH build for RHHI QE to use today, though they should expect another build later this week (Friday/Sunday) after the normal RHV compose.
Version for verification: # rpm -qa | grep ovirt-hosted-engine ovirt-hosted-engine-ha-2.2.14-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch [root@lynx21 ~]# imgbase w You are on rhvh-4.2.4.3-0.20180615.0+1
This bugzilla is included in oVirt 4.2.4 release, published on June 26th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.4 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.