Description of problem: In RHOS-16.2 RHEL-8 integration line, ovb jobs are failing due to following reasons at this step https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/playbooks/baremetal-full-overcloud-validate.yml#L19: ``` PLAY [setup dstat performance monitoring] ************************************** 2022-07-20 23:43:03.536565 | primary | TASK [Gathering Facts] ********************************************************* 2022-07-20 23:43:03.536625 | primary | Wednesday 20 July 2022 23:43:03 -0400 (0:00:00.288) 2:00:10.708 ******** 2022-07-20 23:43:08.557699 | primary | fatal: [overcloud-controller-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '127.0.0.2' (ECDSA) to the list of known hosts.\r\nWarning: Permanently added 'overcloud-controller-1' (ECDSA) to the list of known hosts.\r\ntripleo-admin@overcloud-controller-1: Permission denied (publickey).", "unreachable": true} 2022-07-20 23:43:08.615266 | primary | fatal: [overcloud-controller-0]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '127.0.0.2' (ECDSA) to the list of known hosts.\r\nWarning: Permanently added 'overcloud-controller-0' (ECDSA) to the list of known hosts.\r\ntripleo-admin@overcloud-controller-0: Permission denied (publickey).", "unreachable": true} 2022-07-20 23:43:08.648094 | primary | fatal: [overcloud-controller-2]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '127.0.0.2' (ECDSA) to the list of known hosts.\r\nWarning: Permanently added 'overcloud-controller-2' (ECDSA) to the list of known hosts.\r\ntripleo-admin@overcloud-controller-2: Permission denied (publickey).", "unreachable": true} 2022-07-20 23:43:08.743256 | primary | fatal: [overcloud-novacompute-0]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '127.0.0.2' (ECDSA) to the list of known hosts.\r\nWarning: Permanently added 'overcloud-novacompute-0' (ECDSA) to the list of known hosts.\r\ntripleo-admin@overcloud-novacompute-0: Permission denied (publickey).", "unreachable": true} 2022-07-20 23:43:18.778759 | primary | ok: [127.0.0.2] ``` and it changes the exit status of the playbook leading to job failure ``` PLAY RECAP ********************************************************************* 2022-07-21 00:29:10.571089 | primary | 127.0.0.2 : ok=4 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 2022-07-21 00:29:10.571096 | primary | localhost : ok=18 changed=7 unreachable=0 failed=0 skipped=78 rescued=0 ignored=0 2022-07-21 00:29:10.571104 | primary | overcloud-controller-0 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 2022-07-21 00:29:10.571139 | primary | overcloud-controller-1 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 2022-07-21 00:29:10.571149 | primary | overcloud-controller-2 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 2022-07-21 00:29:10.572968 | primary | overcloud-novacompute-0 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 2022-07-21 00:29:10.573014 | primary | undercloud : ok=300 changed=124 unreachable=0 failed=0 skipped=486 rescued=0 ignored=5 2022-07-21 00:29:10.573020 | primary | 2022-07-21 00:29:10.573033 | primary | Thursday 21 July 2022 00:29:10 -0400 (0:00:00.196) 2:46:17.743 ********* 2022-07-21 00:29:10.573037 | primary | =============================================================================== 2022-07-21 00:29:10.663855 | primary | overcloud-deploy : Deploy the overcloud ------------------------------ 3189.09s 2022-07-21 00:29:10.663897 | primary | os_tempest : Execute tempest tests ----------------------------------- 2597.61s 2022-07-21 00:29:10.663902 | primary | tripleo.operator.tripleo_undercloud_install : undercloud install ----- 2090.16s 2022-07-21 00:29:10.663905 | primary | repo-setup : Setup repos on live host --------------------------------- 406.53s 2022-07-21 00:29:10.663909 | primary | tripleo.operator.tripleo_overcloud_node_introspect : Introspect node -- 367.31s 2022-07-21 00:29:10.665006 | primary | undercloud-setup : Run the package installation script ---------------- 304.53s 2022-07-21 00:29:10.665023 | primary | validate-perf : Install the latest version of dstat on overcloud ------- 68.79s 2022-07-21 00:29:10.665028 | primary | modify-image : Close initramfs image ----------------------------------- 50.87s 2022-07-21 00:29:10.665032 | primary | tripleo.operator.tripleo_overcloud_node_import : Import node(s) -------- 34.78s 2022-07-21 00:29:10.665035 | primary | os_tempest : Install distro packages ----------------------------------- 29.98s 2022-07-21 00:29:10.665039 | primary | tripleo.operator.tripleo_overcloud_image_upload : Overcloud image upload -- 24.80s 2022-07-21 00:29:10.665044 | primary | build-test-packages : Pip install pre-installed DLRN ------------------- 18.00s 2022-07-21 00:29:10.665051 | primary | overcloud-prep-images : List overcloud flavors for Nova deployment ----- 16.91s 2022-07-21 00:29:10.665054 | primary | Gathering Facts -------------------------------------------------------- 15.64s 2022-07-21 00:29:10.665058 | primary | build-test-packages : Clean up loop devices created by mock ------------ 14.98s 2022-07-21 00:29:10.665066 | primary | build-test-packages : Check loop devices stat -------------------------- 14.18s 2022-07-21 00:29:10.665070 | primary | os_tempest : Executing python-tempestconf ------------------------------ 10.86s 2022-07-21 00:29:10.665073 | primary | modify-image : Extract initramfs image ---------------------------------- 9.49s 2022-07-21 00:29:10.665077 | primary | validate-perf : Install the latest version of dstat on overcloud -------- 9.24s 2022-07-21 00:29:10.665081 | primary | Add eth2 interface from eth2.conf --------------------------------------- 8.74s 2022-07-21 00:29:11.222806 | primary | +(./toci_quickstart.sh:161): main(): exit_value=4 2022-07-21 00:29:11.223360 | primary | +(./toci_quickstart.sh:164): main(): [[ 4 == 0 ]] 2022-07-21 00:29:11.223383 | primary | +(./toci_quickstart.sh:165): main(): [[ 4 != 0 ]] 2022-07-21 00:29:11.224419 | primary | +(./toci_quickstart.sh:165): main(): echo 'Playbook run of ovb.yml failed' 2022-07-21 00:29:11.225451 | primary | Playbook run of ovb.yml failed 2022-07-21 00:29:11.225486 | primary | +(./toci_quickstart.sh:165): main(): break 2022-07-21 00:29:11.228890 | primary | +(./toci_quickstart.sh:167): main(): [[ 4 == 0 ]] 2022-07-21 00:29:11.228917 | primary | +(./toci_quickstart.sh:167): main(): echo 'Playbook run failed' 2022-07-21 00:29:11.228924 | primary | Playbook run failed 2022-07-21 00:29:11.228932 | primary | +(./toci_quickstart.sh:170): main(): echo 'Quickstart completed.' 2022-07-21 00:29:11.228937 | primary | Quickstart completed. 2022-07-21 00:29:11.228941 | primary | +(./toci_quickstart.sh:171): main(): exit 4 ``` The tempest tests are finishing successfully but due to 4 exit code, the job is failing. It started happening from 2022-07-19 05:52:55. We donot know what caused the issue. It is a tracker bug.
In passing log ``` 2022-07-18 00:25:00.806929 | primary | 2022-07-18 00:25:00.806980 | primary | PLAY [setup dstat performance monitoring] ************************************** 2022-07-18 00:25:00.887341 | primary | 2022-07-18 00:25:00.887425 | primary | TASK [Gathering Facts] ********************************************************* 2022-07-18 00:25:00.887464 | primary | Monday 18 July 2022 00:25:00 -0400 (0:00:00.399) 2:33:56.921 *********** 2022-07-18 00:25:27.688942 | primary | ok: [overcloud-controller-2] 2022-07-18 00:25:59.164076 | primary | ok: [127.0.0.2] 2022-07-18 00:26:00.075704 | primary | ok: [overcloud-novacompute-0] 2022-07-18 00:26:06.448511 | primary | ok: [overcloud-controller-0] 2022-07-18 00:26:08.157595 | primary | ok: [overcloud-controller-1] ``` and playbook exit status ``` 022-07-18 01:57:38.648215 | primary | PLAY RECAP ********************************************************************* 2022-07-18 01:57:38.648237 | primary | 127.0.0.2 : ok=4 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0 2022-07-18 01:57:38.648242 | primary | localhost : ok=18 changed=7 unreachable=0 failed=0 skipped=74 rescued=0 ignored=0 2022-07-18 01:57:38.648246 | primary | overcloud-controller-0 : ok=8 changed=5 unreachable=0 failed=0 skipped=5 rescued=0 ignored=2 2022-07-18 01:57:38.648252 | primary | overcloud-controller-1 : ok=8 changed=5 unreachable=0 failed=0 skipped=5 rescued=0 ignored=2 2022-07-18 01:57:38.648257 | primary | overcloud-controller-2 : ok=8 changed=5 unreachable=0 failed=0 skipped=5 rescued=0 ignored=2 2022-07-18 01:57:38.648315 | primary | overcloud-novacompute-0 : ok=8 changed=5 unreachable=0 failed=0 skipped=5 rescued=0 ignored=2 2022-07-18 01:57:38.648323 | primary | undercloud : ok=300 changed=124 unreachable=0 failed=0 skipped=484 rescued=0 ignored=5 2022-07-18 01:57:38.648343 | primary | 2022-07-18 01:57:38.648481 | primary | Monday 18 July 2022 01:57:38 -0400 (0:00:00.242) 4:06:34.682 *********** 2022-07-18 01:57:38.648491 | primary | =============================================================================== 2022-07-18 01:57:38.650861 | primary | os_tempest : Execute tempest tests ----------------------------------- 5236.49s 2022-07-18 01:57:38.650933 | primary | overcloud-deploy : Deploy the overcloud ------------------------------ 4699.41s 2022-07-18 01:57:38.650942 | primary | tripleo.operator.tripleo_undercloud_install : undercloud install ----- 2374.66s 2022-07-18 01:57:38.650951 | primary | repo-setup : Setup repos on live host --------------------------------- 474.41s 2022-07-18 01:57:38.650960 | primary | undercloud-setup : Run the package installation script ---------------- 363.06s 2022-07-18 01:57:38.650969 | primary | tripleo.operator.tripleo_overcloud_node_introspect : Introspect node -- 341.45s 2022-07-18 01:57:38.650977 | primary | validate-perf : Install the latest version of dstat on overcloud ------- 85.40s 2022-07-18 01:57:38.650985 | primary | Gathering Facts -------------------------------------------------------- 68.32s 2022-07-18 01:57:38.650993 | primary | modify-image : Close initramfs image ----------------------------------- 57.33s 2022-07-18 01:57:38.651002 | primary | os_tempest : Install distro packages ----------------------------------- 35.80s 2022-07-18 01:57:38.651010 | primary | tripleo.operator.tripleo_overcloud_node_import : Import node(s) -------- 35.57s 2022-07-18 01:57:38.651018 | primary | tripleo.operator.tripleo_overcloud_image_upload : Overcloud image upload -- 26.00s 2022-07-18 01:57:38.651034 | primary | os_tempest : Executing python-tempestconf ------------------------------ 25.62s 2022-07-18 01:57:38.651046 | primary | build-test-packages : Pip install pre-installed DLRN ------------------- 23.60s 2022-07-18 01:57:38.651055 | primary | overcloud-prep-images : List overcloud flavors for Nova deployment ----- 22.65s 2022-07-18 01:57:38.651062 | primary | os_tempest : Create router --------------------------------------------- 17.18s 2022-07-18 01:57:38.651068 | primary | build-test-packages : Clean up loop devices created by mock ------------ 15.53s 2022-07-18 01:57:38.651074 | primary | build-test-packages : Check loop devices stat -------------------------- 15.34s 2022-07-18 01:57:38.651080 | primary | validate-perf : Install the latest version of dstat on overcloud ------- 13.88s 2022-07-18 01:57:38.651086 | primary | modify-image : Extract initramfs image --------------------------------- 10.56s 2022-07-18 01:57:39.289865 | primary | +(./toci_quickstart.sh:161): main(): exit_value=0 2022-07-18 01:57:39.291586 | primary | +(./toci_quickstart.sh:164): main(): [[ 0 == 0 ]] 2022-07-18 01:57:39.293276 | primary | +(./toci_quickstart.sh:164): main(): echo 'Playbook run of ovb.yml passed successfully' 2022-07-18 01:57:39.293341 | primary | Playbook run of ovb.yml passed successfully 2022-07-18 01:57:39.293348 | primary | +(./toci_quickstart.sh:165): main(): [[ 0 != 0 ]] 2022-07-18 01:57:39.296453 | primary | +(./toci_quickstart.sh:167): main(): [[ 0 == 0 ]] 2022-07-18 01:57:39.296517 | primary | +(./toci_quickstart.sh:167): main(): echo 'Playbook run passed successfully' 2022-07-18 01:57:39.296525 | primary | Playbook run passed successfully 2022-07-18 01:57:39.296531 | primary | +(./toci_quickstart.sh:170): main(): echo 'Quickstart completed.' 2022-07-18 01:57:39.296552 | primary | Quickstart completed. 2022-07-18 01:57:39.296557 | primary | +(./toci_quickstart.sh:171): main(): exit 0 2022-07-18 01:57:39.326978 | primary | +(/home/zuul/src/opendev.org/openstack/tripleo-ci/toci_gate_test.sh:193): main(): echo 'Run completed' 2022-07-18 01:57:39.327059 | primary | Run completed 2022-07-18 05:57:40.519850 | primary | ok: Runtime: 4:09:35.890357 2022-07-18 05:57:40.640590 | 2022-07-18 05:57:40.640799 | PLAY RECAP 2022-07-18 05:57:40.640907 | primary | ok: 11 changed: 7 unreachable: 0 failed: 0 skipped: 9 rescued: 0 ignored: 0 ```