Bug 1946204
Summary: | Hosted-engine fail to add first host | ||
---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Roni <reliezer> |
Component: | General | Assignee: | Michal Skrivanek <michal.skrivanek> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.40.60.1 | CC: | aefrat, ahadas, bugs, jmacku, khakimi, michal.skrivanek, msobczyk |
Target Milestone: | ovirt-4.4.6 | Keywords: | Automation, AutomationBlocker, Regression, Triaged |
Target Release: | 4.40.60.4 | Flags: | pm-rhel:
ovirt-4.4+
ahadas: blocker+ |
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | vdsm-4.40.60.4 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-05 05:36:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1946697 | ||
Bug Blocks: |
Description
Roni
2021-04-05 11:14:00 UTC
Does it always reproduce at the same exact point? How many times did it fail like that? can you please get more logs (journal) from the engine being deployed? also, does it fail or succeed when deployed using the documented procedure? (In reply to Michal Skrivanek from comment #2) > Does it always reproduce at the same exact point? How many times did it fail > like that? > can you please get more logs (journal) from the engine being deployed? Yes it always reproduces, please see attached complete logs from the deployed host, note that the Engine VM was not created, due to this failure (In reply to Roni from comment #4) > (In reply to Michal Skrivanek from comment #2) > > Does it always reproduce at the same exact point? How many times did it fail > > like that? > > can you please get more logs (journal) from the engine being deployed? > > Yes it always reproduces, please see attached complete logs from the > deployed host, > note that the Engine VM was not created, due to this failure thanks, but can you please answer my other questions - Does it fail or succeed when deployed using the documented procedure? Does it always reproduce at the same exact point (during host deploy, start of ovirt-vmconsole-host-sshd being the last message received in engine)? There is an engine - the one deploying that host that fails - and the logs included in the report are not sufficient, hence I'd like to ask for a journal - or a sosreport of that environment. It could be that it's removed at the end after the timeout of 120 retries... (In reply to Michal Skrivanek from comment #5) > (In reply to Roni from comment #4) > > (In reply to Michal Skrivanek from comment #2) > > > Does it always reproduce at the same exact point? How many times did it fail > > > like that? > > > can you please get more logs (journal) from the engine being deployed? > > > > Yes it always reproduces, please see attached complete logs from the > > deployed host, > > note that the Engine VM was not created, due to this failure > > thanks, but can you please answer my other questions - Does it fail or > succeed when deployed using the documented procedure? Does it always > reproduce at the same exact point (during host deploy, start of > ovirt-vmconsole-host-sshd being the last message received in engine)? > There is an engine - the one deploying that host that fails - and the logs > included in the report are not sufficient, hence I'd like to ask for a > journal - or a sosreport of that environment. It could be that it's removed > at the end after the timeout of 120 retries... Here is another run example, it fails on the same place: 20:51:14 TASK [ovirt.ovirt.hosted_engine_setup : Obtain SSO token using username/password credentials] *** 20:51:15 ok: [lynx12.lab.eng.tlv2.redhat.com] 20:51:15 20:51:15 TASK [ovirt.ovirt.hosted_engine_setup : Wait for the host to be up] ************ 20:51:16 FAILED - RETRYING: Wait for the host to be up (120 retries left). 20:51:27 FAILED - RETRYING: Wait for the host to be up (119 retries left). 20:51:38 FAILED - RETRYING: Wait for the host to be up (118 retries left). 20:51:49 FAILED - RETRYING: Wait for the host to be up (117 retries left). 20:51:59 FAILED - RETRYING: Wait for the host to be up (116 retries left). 20:52:10 FAILED - RETRYING: Wait for the host to be up (115 retries left). 20:52:21 FAILED - RETRYING: Wait for the host to be up (114 retries left). 20:52:32 FAILED - RETRYING: Wait for the host to be up (113 retries left). 20:52:43 FAILED - RETRYING: Wait for the host to be up (112 retries left). 20:52:53 FAILED - RETRYING: Wait for the host to be up (111 retries left). 20:53:04 FAILED - RETRYING: Wait for the host to be up (110 retries left). 20:53:15 FAILED - RETRYING: Wait for the host to be up (109 retries left). 20:53:26 FAILED - RETRYING: Wait for the host to be up (108 retries left). 20:53:36 FAILED - RETRYING: Wait for the host to be up (107 retries left). 20:53:47 FAILED - RETRYING: Wait for the host to be up (106 retries left). 20:53:58 FAILED - RETRYING: Wait for the host to be up (105 retries left). 20:54:12 FAILED - RETRYING: Wait for the host to be up (104 retries left). 20:54:26 FAILED - RETRYING: Wait for the host to be up (103 retries left). 20:54:39 FAILED - RETRYING: Wait for the host to be up (102 retries left). 20:54:53 FAILED - RETRYING: Wait for the host to be up (101 retries left). 20:55:07 FAILED - RETRYING: Wait for the host to be up (100 retries left). 20:55:21 FAILED - RETRYING: Wait for the host to be up (99 retries left). 20:55:34 FAILED - RETRYING: Wait for the host to be up (98 retries left). 20:55:48 FAILED - RETRYING: Wait for the host to be up (97 retries left). 20:56:02 FAILED - RETRYING: Wait for the host to be up (96 retries left). 20:56:16 FAILED - RETRYING: Wait for the host to be up (95 retries left). 20:56:30 FAILED - RETRYING: Wait for the host to be up (94 retries left). 20:56:43 FAILED - RETRYING: Wait for the host to be up (93 retries left). 20:56:57 FAILED - RETRYING: Wait for the host to be up (92 retries left). 20:57:11 FAILED - RETRYING: Wait for the host to be up (91 retries left). 20:57:25 FAILED - RETRYING: Wait for the host to be up (90 retries left). 20:57:38 FAILED - RETRYING: Wait for the host to be up (89 retries left). 20:57:52 FAILED - RETRYING: Wait for the host to be up (88 retries left). 20:58:06 FAILED - RETRYING: Wait for the host to be up (87 retries left). 20:58:20 FAILED - RETRYING: Wait for the host to be up (86 retries left). 20:58:33 FAILED - RETRYING: Wait for the host to be up (85 retries left). 20:58:47 FAILED - RETRYING: Wait for the host to be up (84 retries left). 20:59:01 FAILED - RETRYING: Wait for the host to be up (83 retries left). 20:59:15 FAILED - RETRYING: Wait for the host to be up (82 retries left). 20:59:29 FAILED - RETRYING: Wait for the host to be up (81 retries left). 20:59:42 FAILED - RETRYING: Wait for the host to be up (80 retries left). 20:59:56 FAILED - RETRYING: Wait for the host to be up (79 retries left). 21:00:10 FAILED - RETRYING: Wait for the host to be up (78 retries left). 21:00:24 FAILED - RETRYING: Wait for the host to be up (77 retries left). 21:00:37 FAILED - RETRYING: Wait for the host to be up (76 retries left). 21:00:51 FAILED - RETRYING: Wait for the host to be up (75 retries left). 21:01:05 FAILED - RETRYING: Wait for the host to be up (74 retries left). 21:01:19 FAILED - RETRYING: Wait for the host to be up (73 retries left). 21:01:33 FAILED - RETRYING: Wait for the host to be up (72 retries left). 21:01:46 FAILED - RETRYING: Wait for the host to be up (71 retries left). 21:02:00 FAILED - RETRYING: Wait for the host to be up (70 retries left). 21:02:14 FAILED - RETRYING: Wait for the host to be up (69 retries left). 21:02:28 FAILED - RETRYING: Wait for the host to be up (68 retries left). 21:02:42 FAILED - RETRYING: Wait for the host to be up (67 retries left). 21:02:55 FAILED - RETRYING: Wait for the host to be up (66 retries left). 21:03:09 FAILED - RETRYING: Wait for the host to be up (65 retries left). 21:03:23 FAILED - RETRYING: Wait for the host to be up (64 retries left). 21:03:37 FAILED - RETRYING: Wait for the host to be up (63 retries left). 21:03:50 FAILED - RETRYING: Wait for the host to be up (62 retries left). 21:04:04 FAILED - RETRYING: Wait for the host to be up (61 retries left). 21:04:18 FAILED - RETRYING: Wait for the host to be up (60 retries left). 21:04:32 FAILED - RETRYING: Wait for the host to be up (59 retries left). 21:04:45 FAILED - RETRYING: Wait for the host to be up (58 retries left). 21:04:59 FAILED - RETRYING: Wait for the host to be up (57 retries left). 21:05:13 FAILED - RETRYING: Wait for the host to be up (56 retries left). 21:05:27 FAILED - RETRYING: Wait for the host to be up (55 retries left). 21:05:41 FAILED - RETRYING: Wait for the host to be up (54 retries left). 21:05:54 FAILED - RETRYING: Wait for the host to be up (53 retries left). 21:06:08 FAILED - RETRYING: Wait for the host to be up (52 retries left). 21:06:22 FAILED - RETRYING: Wait for the host to be up (51 retries left). 21:06:36 FAILED - RETRYING: Wait for the host to be up (50 retries left). 21:06:49 FAILED - RETRYING: Wait for the host to be up (49 retries left). 21:07:03 FAILED - RETRYING: Wait for the host to be up (48 retries left). 21:07:17 FAILED - RETRYING: Wait for the host to be up (47 retries left). 21:07:31 FAILED - RETRYING: Wait for the host to be up (46 retries left). 21:07:44 FAILED - RETRYING: Wait for the host to be up (45 retries left). 21:07:58 FAILED - RETRYING: Wait for the host to be up (44 retries left). 21:08:12 FAILED - RETRYING: Wait for the host to be up (43 retries left). 21:08:26 FAILED - RETRYING: Wait for the host to be up (42 retries left). 21:08:39 FAILED - RETRYING: Wait for the host to be up (41 retries left). 21:08:53 FAILED - RETRYING: Wait for the host to be up (40 retries left). 21:09:07 FAILED - RETRYING: Wait for the host to be up (39 retries left). 21:09:21 FAILED - RETRYING: Wait for the host to be up (38 retries left). 21:09:34 FAILED - RETRYING: Wait for the host to be up (37 retries left). 21:09:48 FAILED - RETRYING: Wait for the host to be up (36 retries left). 21:10:02 FAILED - RETRYING: Wait for the host to be up (35 retries left). 21:10:16 FAILED - RETRYING: Wait for the host to be up (34 retries left). 21:10:30 FAILED - RETRYING: Wait for the host to be up (33 retries left). 21:10:43 FAILED - RETRYING: Wait for the host to be up (32 retries left). 21:10:57 FAILED - RETRYING: Wait for the host to be up (31 retries left). 21:11:11 FAILED - RETRYING: Wait for the host to be up (30 retries left). 21:11:25 FAILED - RETRYING: Wait for the host to be up (29 retries left). 21:11:38 FAILED - RETRYING: Wait for the host to be up (28 retries left). 21:11:52 FAILED - RETRYING: Wait for the host to be up (27 retries left). 21:12:06 FAILED - RETRYING: Wait for the host to be up (26 retries left). 21:12:20 FAILED - RETRYING: Wait for the host to be up (25 retries left). 21:12:33 FAILED - RETRYING: Wait for the host to be up (24 retries left). 21:12:47 FAILED - RETRYING: Wait for the host to be up (23 retries left). 21:13:01 FAILED - RETRYING: Wait for the host to be up (22 retries left). 21:13:15 FAILED - RETRYING: Wait for the host to be up (21 retries left). 21:13:28 FAILED - RETRYING: Wait for the host to be up (20 retries left). 21:13:42 FAILED - RETRYING: Wait for the host to be up (19 retries left). 21:13:56 FAILED - RETRYING: Wait for the host to be up (18 retries left). 21:14:10 FAILED - RETRYING: Wait for the host to be up (17 retries left). 21:14:24 FAILED - RETRYING: Wait for the host to be up (16 retries left). 21:14:37 FAILED - RETRYING: Wait for the host to be up (15 retries left). 21:14:51 FAILED - RETRYING: Wait for the host to be up (14 retries left). 21:15:05 FAILED - RETRYING: Wait for the host to be up (13 retries left). 21:15:19 FAILED - RETRYING: Wait for the host to be up (12 retries left). 21:15:32 FAILED - RETRYING: Wait for the host to be up (11 retries left). 21:15:46 FAILED - RETRYING: Wait for the host to be up (10 retries left). 21:16:00 FAILED - RETRYING: Wait for the host to be up (9 retries left). 21:16:14 FAILED - RETRYING: Wait for the host to be up (8 retries left). 21:16:27 FAILED - RETRYING: Wait for the host to be up (7 retries left). 21:16:41 FAILED - RETRYING: Wait for the host to be up (6 retries left). 21:16:55 FAILED - RETRYING: Wait for the host to be up (5 retries left). 21:17:09 FAILED - RETRYING: Wait for the host to be up (4 retries left). 21:17:23 FAILED - RETRYING: Wait for the host to be up (3 retries left). 21:17:36 FAILED - RETRYING: Wait for the host to be up (2 retries left). 21:17:50 FAILED - RETRYING: Wait for the host to be up (1 retries left). 21:18:04 An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ovirtsdk4.Error: Failed to read response: [(<pycurl.Curl object at 0x55db31d56b38>, 7, 'Failed to connect to hosted-engine-02.lab.eng.tlv2.redhat.com port 443: No route to host')] 21:18:04 fatal: [lynx12.lab.eng.tlv2.redhat.com]: FAILED! => {"attempts": 120, "changed": false, "msg": "Failed to read response: [(<pycurl.Curl object at 0x55db31d56b38>, 7, 'Failed to connect to hosted-engine-02.lab.eng.tlv2.redhat.com port 443: No route to host')]"} 21:18:04 ...ignoring 21:18:04 21:18:04 TASK [ovirt.ovirt.hosted_engine_setup : debug] ********************************* 21:18:04 ok: [lynx12.lab.eng.tlv2.redhat.com] => { 21:18:04 "host_result_up_check": { 21:18:04 "attempts": 120, 21:18:04 "changed": false, 21:18:04 "exception": "Traceback (most recent call last):\n File \"/tmp/ansible_ovirt_host_info_payload_04wx5866/ansible_ovirt_host_info_payload.zip/ansible_collections/ovirt/ovirt/plugins/modules/ovirt_host_info.py\", line 112, in main\n File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py\", line 13222, in list\n return self._internal_get(headers, query, wait)\n File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 211, in _internal_get\n return future.wait() if wait else future\n File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py\", line 54, in wait\n response = self._connection.wait(self._context)\n File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/__init__.py\", line 497, in wait\n return self.__wait(context, failed_auth)\n File \"/usr/lib64/python3.6/site-packages/ovirtsdk4/__init__.py\", line 511, in __wait\n raise Error(\"Failed to read response: {}\".format(err_list))\novirtsdk4.Error: Failed to read response: [(<pycurl.Curl object at 0x55db31d56b38>, 7, 'Failed to connect to hosted-engine-02.lab.eng.tlv2.redhat.com port 443: No route to host')]\n", 21:18:04 "failed": true, 21:18:04 "msg": "Failed to read response: [(<pycurl.Curl object at 0x55db31d56b38>, 7, 'Failed to connect to hosted-engine-02.lab.eng.tlv2.redhat.com port 443: No route to host')]" 21:18:04 } 21:18:04 } 21:18:04 21:18:04 TASK [ovirt.ovirt.hosted_engine_setup : Notify the user about a failure] ******* 21:18:04 fatal: [lynx12.lab.eng.tlv2.redhat.com]: FAILED! => {"changed": false, "msg": "Host is not up, please check logs, perhaps also on the engine machine"} 21:18:04 21:18:04 TASK [ovirt.ovirt.hosted_engine_setup : Sync on engine machine] **************** 21:18:05 changed: [lynx12.lab.eng.tlv2.redhat.com] 21:18:05 21:18:05 TASK [ovirt.ovirt.hosted_engine_setup : Fetch logs from the engine VM] ********* the HE VM appears to be shut down during its host's deployment. seems like a change in behavior in RHEL 8.4 Doesn't reproduce on CentOS, not even on CentOS Stream with oVirt actually, it seems libvirt services-related, rather than integration. It happens during libvirtd restart(which is what the host deploy role does), and not vdsmd restart - where we depend on libvirt-guests so I was assuming it happens there too, but it doesn't. Marcin, maybe you can take a look, being the last one who "enjoyed" dealing with libvirt systemd changes in RHEL 8:) actually, root cause is https://github.com/libvirt/libvirt/commit/f035f53baa2e5dc00b8e866e594672a90b4cea78 (many thanks to danpb for quick response) ...and reverted in https://github.com/libvirt/libvirt/commit/32c5e432044689b6679cdedeb1026f27653449d8 after reported problem in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955216 keeping the bug for spec bump (not stricly required, but useful for tracking) Regular HE deployment works just fine for me on these components: ovirt-engine-setup-4.4.6.6-0.10.el8ev.noarch ovirt-hosted-engine-ha-2.4.6-1.el8ev.noarch ovirt-hosted-engine-setup-2.5.0-2.el8ev.noarch vdsm-4.40.60.5-1.el8ev.x86_64 Linux 4.18.0-304.el8.x86_64 #1 SMP Tue Apr 6 05:19:59 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux release 8.4 (Ootpa) This bugzilla is included in oVirt 4.4.6 release, published on May 4th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.6 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |