Bug 1858234
Summary: | ovirt 4.4.1.1 hci and problems with ansible 2.9.10 and/or missing python2 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-hosted-engine-setup | Reporter: | Gianluca Cecchi <gianluca.cecchi> | ||||||
Component: | General | Assignee: | Lev Veyde <lveyde> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | SATHEESARAN <sasundar> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 2.4.5 | CC: | aoconnor, arachman, aturgema, bugs, cshao, delfassy, dmitriy.prg, frankblon, holger.petrick, hunter86_bg, jstrauss501, lveyde, michal.skrivanek, mnecas, mperina, nlevy, sasundar, sbonazzo, stirabos, weiwang, zc640618 | ||||||
Target Milestone: | ovirt-4.4.1-1 | Flags: | mperina:
ovirt-4.4?
aoconnor: blocker- |
||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | ovirt-node-ng-installer-4.4.1-2020072220.el8.iso | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-08-04 06:09:44 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Gianluca Cecchi
2020-07-17 10:23:36 UTC
To have the first stage work (and then continue with the "Continue to Hosted Engine Deployment" step) I have to modify: - /etc/ansible/roles/gluster.infra/roles/backend_setup/tasks/main.yml force to use dnf as package manager. For some reason not automatically detected by the "package" ansible module from - name: Change to Install lvm tools for RHEL systems. package: name: device-mapper-persistent-data state: present when: ansible_os_family == 'RedHat' to - name: Change to Install lvm tools for RHEL systems. package: name: device-mapper-persistent-data state: present use: dnf when: ansible_os_family == 'RedHat' - /etc/ansible/roles/gluster.infra/roles/backend_setup/tasks/vdo_create.yml change from yum to package and specify to use dnf from: - name: Install VDO dependencies #maybe use package module? yum: name: "{{ packages }}" register: vdo_deps ... to: - name: Install VDO dependencies #maybe use package module? package: name: "{{ packages }}" use: dnf register: vdo_deps ... also for the deploy of engine I have to use same strategy for ovirt.engine-setup/tasks/engine_setup.yml ovirt.engine-setup/tasks/install_packages.yml ovirt.hosted_engine_setup/tasks/install_packages.yml ovirt.hosted_engine_setup/tasks/create_target_vm/03_hosted_engine_final_tasks.yml ovirt.hosted_engine_setup/tasks/install_appliance.yml anyway the engine deployment failed in the phase where it tries to add the host and waits for the host to be up and if I go into the logs in /var/log/ovirt-hosted-engine-setup/engine-logs-2020-07-17T08:30:48Z/ovirt-engine/host-deploy/ the file ovirt-host-deploy-ansible-20200717104103-novirt2.example.net-3a710f0c.log contains 020-07-17 10:41:17 CEST - fatal: [novirt2.example.net]: FAILED! => {"changed": false, "module_stderr": "/bin/sh: /usr/bin /python2: No such file or directory\n", "module_stdout": "", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127} 2020-07-17 10:41:17 CEST - { "status" : "OK", "msg" : "", "data" : { "uuid" : "00f4c6a8-8423-4a2a-bfd5-f38c34f56ecf", "counter" : 53, "stdout" : "fatal: [novirt2.example.net]: FAILED! => {\"changed\": false, \"module_stderr\": \"/bin/sh: /usr/bin/pytho n2: No such file or directory\\n\", \"module_stdout\": \"\", \"msg\": \"The module failed to execute correctly, you probab ly need to set the interpreter.\\nSee stdout/stderr for the exact error\", \"rc\": 127}", Inside ovirt-host-deploy-ansible-20200717104103-novirt2.example.net-3a710f0c.log at the beginning the gathering of facts seems all ok and python3, but then I see this: 2020-07-17 10:41:14 CEST - TASK [ovirt-host-deploy-facts : Reset configuration of advanced virtualization module] *** 2020-07-17 10:41:14 CEST - TASK [ovirt-host-deploy-facts : Enable advanced virtualization module for relevant OS version] *** 2020-07-17 10:41:17 CEST - TASK [ovirt-host-deploy-facts : Install Python3 for CentOS/RHEL8 hosts] ******** 2020-07-17 10:41:17 CEST - 2020-07-17 10:41:17 CEST - skipping: [novirt2.example.net] 2020-07-17 10:41:17 CEST - TASK [ovirt-host-deploy-facts : Set facts] ************************************* 2020-07-17 10:41:17 CEST - 2020-07-17 10:41:17 CEST - ok: [novirt2.example.net] 2020-07-17 10:41:17 CEST - { "status" : "OK", "msg" : "", "data" : { "uuid" : "c6ab4add-cf41-41b0-b7c8-2a3d156542fa", "counter" : 39, "stdout" : "ok: [novirt2.example.net]", "start_line" : 37, "end_line" : 38, "runner_ident" : "402b6968-c809-11ea-9d36-00163e0765c1", "event" : "runner_on_ok", "pid" : 28793, "created" : "2020-07-17T08:41:14.380339", "parent_uuid" : "00163e07-65c1-0ebc-a725-000000000021", "event_data" : { "playbook" : "ovirt-host-deploy.yml", "playbook_uuid" : "eb09419a-dc4a-4ea7-9d94-3f666d9d17e0", "play" : "all", "play_uuid" : "00163e07-65c1-0ebc-a725-000000000006", "play_pattern" : "all", "task" : "Set facts", "task_uuid" : "00163e07-65c1-0ebc-a725-000000000021", "task_action" : "set_fact", "task_args" : "", "task_path" : "/usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-deploy-facts/tasks/main.yml:26", "role" : "ovirt-host-deploy-facts", "host" : "novirt2.example.net", "remote_addr" : "novirt2.example.net", "res" : { "changed" : false, "ansible_facts" : { "ansible_python_interpreter" : "/usr/bin/python2" }, "_ansible_no_log" : false }, "start" : "2020-07-17T08:41:14.242287", "end" : "2020-07-17T08:41:14.379769", "duration" : 0.137482, "event_loop" : null, "uuid" : "c6ab4add-cf41-41b0-b7c8-2a3d156542fa" } } } and then the failure because the ansible_python_intepreter has been set to python2 I'm going to attach logs Created attachment 1701532 [details]
content of /var/log/ovirt-hosted-engine-setup on host
Can you provide the target's facts via '-m setup' ? right now I have no access to the server, but inside the logs there should be the output of the fact gathering action. I had the same problem while trying to install a new host/node image. The issue seams to be in the release version file in /usr/lib/os.release.d. The version number that ansible facts produce before and after the update of the file are incorrect. From the looks of it someone did some refactoring there. To reproduce 1 Install a fresh node from the iso (any from since yesterday coming from master) 2 boot and run dnf update (node image release need to be in the updated packages) 3 run any ovirt script that needs to perform sys mods Spent a couples cycles reinstalling/updating/trying and was able to reproduce it 100% of the time The way I temporally solve it was to use the previous version file in that folder and all ansible scripts are running fine for now. My testing is limited so this may have cause other issue. I was about to submit a bug report when I say this. I can file another one if needed. before the the update 'ansible_distribution_file_parsed': True, 'ansible_distribution_file_path': '/etc/redhat-release', 'ansible_distribution_file_variety': 'RedHat', 'ansible_distribution_major_version': '8', 'ansible_distribution_release': 'Core', 'ansible_distribution_version': '8.2', after the update "ansible_distribution": "CentOS", "ansible_distribution_file_parsed": true, "ansible_distribution_file_path": "/etc/redhat-release", "ansible_distribution_file_variety": "RedHat", "ansible_distribution_major_version": "4", "ansible_distribution_release": "Core", "ansible_distribution_version": "4.4", Writing again: this is not gluster related, it affect all Ansible flows, where OS detection is used (for example host deploy, host upgrade, OVA management, ...). This seems like an issue with node, where /etc/os-release or /etc/redhat-release suddenly changed its content, which made OS version detection unreliable. More information can also be found at https://lists.ovirt.org/archives/list/users@ovirt.org/thread/ROTLHVLIXQENVPSB4QBUX3QBKDFPQ5KJ/ The same error occurs when trying to add a node to a standalone Engine Manager. Steps to Reproduce: 1. install a standalone Engine Manager on EL8 step by step with this documentation https://www.ovirt.org/documentation/installing_ovirt_as_a_standalone_manager_with_local_databases/#Installing_the_Red_Hat_Virtualization_Manager_SM_localDB_deploy 2. install a node using ovirt-node-ng-installer-4.4.1-2020071311.el8.iso 3. try to add the node inside the Engine Manager Log output: 2020-07-21 10:14:17 CEST - included: /usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-deploy-vdsm/tasks/packages.yml for ovirtnode1.example.net 2020-07-21 10:14:17 CEST - included: /usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-deploy-vdsm/tasks/vdsmid.yml for ovirtnode1.example.net 2020-07-21 10:14:17 CEST - included: /usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-deploy-vdsm/tasks/pki.yml for ovirtnode1.example.net 2020-07-21 10:14:17 CEST - included: /usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-deploy-vdsm/tasks/restart_services.yml for ovirtnode1.example.net 2020-07-21 10:14:17 CEST - TASK [ovirt-host-deploy-vdsm : Install ovirt-host package] ********************* 2020-07-21 10:14:17 CEST - 2020-07-21 10:14:17 CEST - fatal: [ovirtnode1.example.net]: FAILED! => {"changed": false, "module_stderr": "/bin/sh: /usr/bin/python2: No such file or directory\n", "module_stdout": "", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127} 2020-07-21 10:14:17 CEST - { "status" : "OK", "msg" : "", "data" : { "uuid" : "bfb59a41-6672-4115-ba93-1755556b3d11", "counter" : 61, "stdout" : "fatal: [ovirtnode1.example.net]: FAILED! => {\"changed\": false, \"module_stderr\": \"/bin/sh: /usr/bin/python2: No such file or directory\\n\", \"module_stdout\": \"\", \"msg\": \"The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error\", \"rc\": 127}", "start_line" : 54, "end_line" : 55, "runner_ident" : "230c108e-cb2a-11ea-97da-00155d120809", "event" : "runner_on_failed", "pid" : 28965, "created" : "2020-07-21T08:14:16.528490", "parent_uuid" : "00155d12-0809-c845-9450-0000000001a1", "event_data" : { "playbook" : "ovirt-host-deploy.yml", "playbook_uuid" : "3a66c11c-02f9-492a-937d-21f1386e492d", "play" : "all", "play_uuid" : "00155d12-0809-c845-9450-000000000006", "play_pattern" : "all", "task" : "Install ovirt-host package", "task_uuid" : "00155d12-0809-c845-9450-0000000001a1", "task_action" : "yum", "task_args" : "", "task_path" : "/usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-deploy-vdsm/tasks/packages.yml:2", "role" : "ovirt-host-deploy-vdsm", "host" : "ovirtnode1.example.net", "remote_addr" : "ovirtnode1.example.net", "res" : { "module_stdout" : "", "module_stderr" : "/bin/sh: /usr/bin/python2: No such file or directory\n", "msg" : "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc" : 127, "_ansible_no_log" : false, "changed" : false }, "start" : "2020-07-21T08:14:15.365609", "end" : "2020-07-21T08:14:16.528185", "duration" : 1.162576, "ignore_errors" : null, "event_loop" : null, "uuid" : "bfb59a41-6672-4115-ba93-1755556b3d11" } } Adding a blocker? as this is clear upstream blocker Debugged the issue... so here what happened... Ansible OS version detection logic will try to get the most "precise" OS version. ( https://github.com/ansible/ansible/blob/96b74d3e0b340f1bc6b3102d874f17516fe35e79/lib/ansible/module_utils/distro/_distro.py#L807 ) It checks several places in the system for versions, and compares the versions it finds in these places, to see what provides the most specific version - evaluated by the number of numbers in the version (actually the number of dots). One of these places seems to be the os-release's PRETTY_NAME, in which we normally put a string of "oVirt Node", together with the oVirt full release version, i.e. 4.4.1. The latest release, 4.4.1.1 made the Ansible prefer it over the CentOS release version - 8.2.2004, since it contained 4 numbers vs. 3 - 4.4.1.1 vs 8.2.2004. Thus in order to not break the Ansible OS version detection logic, we need to make sure that we never provide anything more specific than the OS release version in the oVirt-Node specific strings that Ansible evaluates for OS version. Can I download ovirt-node-ng-installer-4.4.1-2020072220.el8.iso and test it? Any link? (In reply to Gianluca Cecchi from comment #17) > Can I download ovirt-node-ng-installer-4.4.1-2020072220.el8.iso and test it? > Any link? It should be available among master snapshots: https://resources.ovirt.org/pub/ovirt-master-snapshot/iso/ovirt-node-ng-installer/ (In reply to Gianluca Cecchi from comment #17) > Can I download ovirt-node-ng-installer-4.4.1-2020072220.el8.iso and test it? > Any link? Just please note that this build is not a final one. It includes the fix for this specific issue of the BZ, specifically to make the Ansible OS detection logic work in a proper way. If I try this iso I see that it tries to install the appliance in package ovirt-engine-appliance-4.4-20200716090128 Will it be ok or do I have also to use a different appliance to test a single node HCI install? (In reply to Gianluca Cecchi from comment #20) > If I try this iso I see that it tries to install the appliance in package > ovirt-engine-appliance-4.4-20200716090128 > Will it be ok or do I have also to use a different appliance to test a > single node HCI install? I think that it should be OK for the test, but you can also use a more recent one i.e.: https://jenkins.ovirt.org/job/ovirt-appliance_master_build-artifacts-el8-x86_64/316/artifact/exported-artifacts/ Using ovirt-node-ng-installer-4.4.1-2020072220.el8.iso and then manually installing ovirt-engine-appliance-4.4-20200721234616.1.el8.x86_64.rpm all went good in single host HCI setup with one host, up to the final startup of engine vm on shared storage where the engine started, but blocked at grub error, I'm going to attach Created attachment 1702396 [details]
grub error in engine final stage startup after exiting maintenance on shared storage
In my install, on host I had a first disk of 100Gb used for install of hypervisor and a second disk of 150Gb used for Gluster and I choose 100Gb (as proposed by the gui) for engine volume, 150Gb for data and 50Gb for vmstore df -h on host: [root@novirt2 ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 32G 0 32G 0% /dev tmpfs 32G 228K 32G 1% /dev/shm tmpfs 32G 42M 32G 1% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/mapper/onn-ovirt--node--ng--4.4.1.4--0.20200722.0+1 33G 7.5G 25G 24% / /dev/mapper/onn-home 1014M 40M 975M 4% /home /dev/mapper/onn-tmp 1014M 41M 974M 4% /tmp /dev/mapper/onn-var 15G 5.2G 9.9G 35% /var /dev/sda1 976M 114M 796M 13% /boot /dev/mapper/onn-var_log 8.0G 100M 7.9G 2% /var/log /dev/mapper/onn-var_crash 10G 105M 9.9G 2% /var/crash /dev/mapper/onn-var_log_audit 2.0G 50M 2.0G 3% /var/log/audit tmpfs 6.3G 0 6.3G 0% /run/user/0 /dev/mapper/gluster_vg_sdb-gluster_lv_engine 100G 7.3G 93G 8% /gluster_bricks/engine /dev/mapper/gluster_vg_sdb-gluster_lv_data 150G 1.1G 149G 1% /gluster_bricks/data /dev/mapper/gluster_vg_sdb-gluster_lv_vmstore 50G 391M 50G 1% /gluster_bricks/vmstore novirt2st.storage.local:/engine 100G 8.3G 92G 9% /rhev/data-center/mnt/glusterSD/novirt2st.storage.local:_engine [root@novirt2 ~]# *** Bug 1860695 has been marked as a duplicate of this bug. *** *** Bug 1859778 has been marked as a duplicate of this bug. *** *** Bug 1860708 has been marked as a duplicate of this bug. *** (In reply to Gianluca Cecchi from comment #24) > In my install, on host I had a first disk of 100Gb used for install of > hypervisor and a second disk of 150Gb used for Gluster and I choose 100Gb > (as proposed by the gui) for engine volume, 150Gb for data and 50Gb for > vmstore Could you get the version of gluster package included in this ovirt-ng ? Just do: # rpm -qa |grep gluster [g.cecchi@novirt2 ~]$ rpm -qa|grep gluster gluster-ansible-features-1.0.5-6.el8.noarch vdsm-gluster-4.40.22-1.el8.x86_64 gluster-ansible-maintenance-1.0.1-3.el8.noarch glusterfs-fuse-7.6-1.el8.x86_64 glusterfs-libs-7.6-1.el8.x86_64 glusterfs-7.6-1.el8.x86_64 glusterfs-cli-7.6-1.el8.x86_64 glusterfs-api-7.6-1.el8.x86_64 gluster-ansible-cluster-1.0.0-1.el8.noarch gluster-ansible-repositories-1.0.1-2.el8.noarch gluster-ansible-roles-1.0.5-12.el8.noarch gluster-ansible-infra-1.0.4-10.el8.noarch glusterfs-geo-replication-7.6-1.el8.x86_64 python3-gluster-7.6-1.el8.x86_64 qemu-kvm-block-gluster-4.2.0-19.el8.x86_64 glusterfs-server-7.6-1.el8.x86_64 glusterfs-events-7.6-1.el8.x86_64 glusterfs-client-xlators-7.6-1.el8.x86_64 glusterfs-rdma-7.6-1.el8.x86_64 libvirt-daemon-driver-storage-gluster-6.0.0-17.el8.x86_64 [g.cecchi@novirt2 ~]$ Gluster v7.6 is old. I'm using 7.7 on ovirt 4.3.10. Maybe someone can update to latest v7 ? (In reply to Strahil Nikolov from comment #31) > Gluster v7.6 is old. I'm using 7.7 on ovirt 4.3.10. Maybe someone can > update to latest v7 ? Please open a RFE for this. (In reply to Strahil Nikolov from comment #31) > Gluster v7.6 is old. I'm using 7.7 on ovirt 4.3.10. Maybe someone can > update to latest v7 ? Yes, Strahil. You are right. The fix - https://review.gluster.org/#/c/glusterfs/+/24480/ has made in to gluster-7.7 release. I will raise a new bug for ovirt-ng to make use of gluster-7.7 Verified with ovirt-ng - ovirt-node-ng-installer-4.4.1-2020072310.el8.iso Gluster deployment is successful (In reply to Gianluca Cecchi from comment #30) > [g.cecchi@novirt2 ~]$ rpm -qa|grep gluster > gluster-ansible-features-1.0.5-6.el8.noarch > vdsm-gluster-4.40.22-1.el8.x86_64 > gluster-ansible-maintenance-1.0.1-3.el8.noarch > glusterfs-fuse-7.6-1.el8.x86_64 > glusterfs-libs-7.6-1.el8.x86_64 > glusterfs-7.6-1.el8.x86_64 > glusterfs-cli-7.6-1.el8.x86_64 > glusterfs-api-7.6-1.el8.x86_64 > gluster-ansible-cluster-1.0.0-1.el8.noarch > gluster-ansible-repositories-1.0.1-2.el8.noarch > gluster-ansible-roles-1.0.5-12.el8.noarch > gluster-ansible-infra-1.0.4-10.el8.noarch > glusterfs-geo-replication-7.6-1.el8.x86_64 > python3-gluster-7.6-1.el8.x86_64 > qemu-kvm-block-gluster-4.2.0-19.el8.x86_64 > glusterfs-server-7.6-1.el8.x86_64 > glusterfs-events-7.6-1.el8.x86_64 > glusterfs-client-xlators-7.6-1.el8.x86_64 > glusterfs-rdma-7.6-1.el8.x86_64 > libvirt-daemon-driver-storage-gluster-6.0.0-17.el8.x86_64 > [g.cecchi@novirt2 ~]$ I have raised a RFE to include gluster-7.7 https://bugzilla.redhat.com/show_bug.cgi?id=1862588 This will solve your problem. |