Bug 2109495 - Hosts unreachable during gather_facts and return exit status 4 leading to ovb job failure
Summary: Hosts unreachable during gather_facts and return exit status 4 leading to ovb...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: distribution
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: OSP Team
QA Contact: Lon Hohberger
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-21 12:03 UTC by Chandan Kumar
Modified: 2022-07-27 08:59 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-27 08:59:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-17774 0 None None None 2022-07-21 12:04:51 UTC

Description Chandan Kumar 2022-07-21 12:03:38 UTC
Description of problem:


In RHOS-16.2 RHEL-8 integration line, ovb jobs are failing due to following reasons at this step
https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/playbooks/baremetal-full-overcloud-validate.yml#L19:
``` 
PLAY [setup dstat performance monitoring] **************************************

2022-07-20 23:43:03.536565 | primary |
TASK [Gathering Facts] *********************************************************

2022-07-20 23:43:03.536625 | primary | Wednesday 20 July 2022  23:43:03 -0400 (0:00:00.288)       2:00:10.708 ********

2022-07-20 23:43:08.557699 | primary | fatal: [overcloud-controller-1]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '127.0.0.2' (ECDSA) to the list of known hosts.\r\nWarning: Permanently added 'overcloud-controller-1' (ECDSA) to the list of known hosts.\r\ntripleo-admin@overcloud-controller-1: Permission denied (publickey).", "unreachable": true}

2022-07-20 23:43:08.615266 | primary | fatal: [overcloud-controller-0]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '127.0.0.2' (ECDSA) to the list of known hosts.\r\nWarning: Permanently added 'overcloud-controller-0' (ECDSA) to the list of known hosts.\r\ntripleo-admin@overcloud-controller-0: Permission denied (publickey).", "unreachable": true}

2022-07-20 23:43:08.648094 | primary | fatal: [overcloud-controller-2]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '127.0.0.2' (ECDSA) to the list of known hosts.\r\nWarning: Permanently added 'overcloud-controller-2' (ECDSA) to the list of known hosts.\r\ntripleo-admin@overcloud-controller-2: Permission denied (publickey).", "unreachable": true}

2022-07-20 23:43:08.743256 | primary | fatal: [overcloud-novacompute-0]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '127.0.0.2' (ECDSA) to the list of known hosts.\r\nWarning: Permanently added 'overcloud-novacompute-0' (ECDSA) to the list of known hosts.\r\ntripleo-admin@overcloud-novacompute-0: Permission denied (publickey).", "unreachable": true}

2022-07-20 23:43:18.778759 | primary | ok: [127.0.0.2]
```
and it changes the exit status of the playbook leading to job failure
```
PLAY RECAP *********************************************************************

2022-07-21 00:29:10.571089 | primary | 127.0.0.2                  : ok=4    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0

2022-07-21 00:29:10.571096 | primary | localhost                  : ok=18   changed=7    unreachable=0    failed=0    skipped=78   rescued=0    ignored=0

2022-07-21 00:29:10.571104 | primary | overcloud-controller-0     : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

2022-07-21 00:29:10.571139 | primary | overcloud-controller-1     : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

2022-07-21 00:29:10.571149 | primary | overcloud-controller-2     : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

2022-07-21 00:29:10.572968 | primary | overcloud-novacompute-0    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

2022-07-21 00:29:10.573014 | primary | undercloud                 : ok=300  changed=124  unreachable=0    failed=0    skipped=486  rescued=0    ignored=5

2022-07-21 00:29:10.573020 | primary |

2022-07-21 00:29:10.573033 | primary | Thursday 21 July 2022  00:29:10 -0400 (0:00:00.196)       2:46:17.743 *********

2022-07-21 00:29:10.573037 | primary | ===============================================================================

2022-07-21 00:29:10.663855 | primary | overcloud-deploy : Deploy the overcloud ------------------------------ 3189.09s

2022-07-21 00:29:10.663897 | primary | os_tempest : Execute tempest tests ----------------------------------- 2597.61s

2022-07-21 00:29:10.663902 | primary | tripleo.operator.tripleo_undercloud_install : undercloud install ----- 2090.16s

2022-07-21 00:29:10.663905 | primary | repo-setup : Setup repos on live host --------------------------------- 406.53s

2022-07-21 00:29:10.663909 | primary | tripleo.operator.tripleo_overcloud_node_introspect : Introspect node -- 367.31s

2022-07-21 00:29:10.665006 | primary | undercloud-setup : Run the package installation script ---------------- 304.53s

2022-07-21 00:29:10.665023 | primary | validate-perf : Install the latest version of dstat on overcloud ------- 68.79s

2022-07-21 00:29:10.665028 | primary | modify-image : Close initramfs image ----------------------------------- 50.87s

2022-07-21 00:29:10.665032 | primary | tripleo.operator.tripleo_overcloud_node_import : Import node(s) -------- 34.78s

2022-07-21 00:29:10.665035 | primary | os_tempest : Install distro packages ----------------------------------- 29.98s

2022-07-21 00:29:10.665039 | primary | tripleo.operator.tripleo_overcloud_image_upload : Overcloud image upload -- 24.80s

2022-07-21 00:29:10.665044 | primary | build-test-packages : Pip install pre-installed DLRN ------------------- 18.00s

2022-07-21 00:29:10.665051 | primary | overcloud-prep-images : List overcloud flavors for Nova deployment ----- 16.91s

2022-07-21 00:29:10.665054 | primary | Gathering Facts -------------------------------------------------------- 15.64s

2022-07-21 00:29:10.665058 | primary | build-test-packages : Clean up loop devices created by mock ------------ 14.98s

2022-07-21 00:29:10.665066 | primary | build-test-packages : Check loop devices stat -------------------------- 14.18s

2022-07-21 00:29:10.665070 | primary | os_tempest : Executing python-tempestconf ------------------------------ 10.86s

2022-07-21 00:29:10.665073 | primary | modify-image : Extract initramfs image ---------------------------------- 9.49s

2022-07-21 00:29:10.665077 | primary | validate-perf : Install the latest version of dstat on overcloud -------- 9.24s

2022-07-21 00:29:10.665081 | primary | Add eth2 interface from eth2.conf --------------------------------------- 8.74s

2022-07-21 00:29:11.222806 | primary | +(./toci_quickstart.sh:161): main(): exit_value=4

2022-07-21 00:29:11.223360 | primary | +(./toci_quickstart.sh:164): main(): [[ 4 == 0 ]]

2022-07-21 00:29:11.223383 | primary | +(./toci_quickstart.sh:165): main(): [[ 4 != 0 ]]

2022-07-21 00:29:11.224419 | primary | +(./toci_quickstart.sh:165): main(): echo 'Playbook run of ovb.yml failed'

2022-07-21 00:29:11.225451 | primary | Playbook run of ovb.yml failed

2022-07-21 00:29:11.225486 | primary | +(./toci_quickstart.sh:165): main(): break

2022-07-21 00:29:11.228890 | primary | +(./toci_quickstart.sh:167): main(): [[ 4 == 0 ]]

2022-07-21 00:29:11.228917 | primary | +(./toci_quickstart.sh:167): main(): echo 'Playbook run failed'

2022-07-21 00:29:11.228924 | primary | Playbook run failed

2022-07-21 00:29:11.228932 | primary | +(./toci_quickstart.sh:170): main(): echo 'Quickstart completed.'

2022-07-21 00:29:11.228937 | primary | Quickstart completed.

2022-07-21 00:29:11.228941 | primary | +(./toci_quickstart.sh:171): main(): exit 4
```
The tempest tests are finishing successfully but due to 4 exit code, the job is failing.
It started happening from 2022-07-19 05:52:55. We donot know what caused the issue. It is a tracker bug.

Comment 2 Chandan Kumar 2022-07-21 12:07:28 UTC
In passing log
```
2022-07-18 00:25:00.806929 | primary |

2022-07-18 00:25:00.806980 | primary | PLAY [setup dstat performance monitoring] **************************************

2022-07-18 00:25:00.887341 | primary |

2022-07-18 00:25:00.887425 | primary | TASK [Gathering Facts] *********************************************************

2022-07-18 00:25:00.887464 | primary | Monday 18 July 2022  00:25:00 -0400 (0:00:00.399)       2:33:56.921 ***********

2022-07-18 00:25:27.688942 | primary | ok: [overcloud-controller-2]

2022-07-18 00:25:59.164076 | primary | ok: [127.0.0.2]

2022-07-18 00:26:00.075704 | primary | ok: [overcloud-novacompute-0]

2022-07-18 00:26:06.448511 | primary | ok: [overcloud-controller-0]

2022-07-18 00:26:08.157595 | primary | ok: [overcloud-controller-1]
```
and playbook exit status
```
022-07-18 01:57:38.648215 | primary | PLAY RECAP *********************************************************************

2022-07-18 01:57:38.648237 | primary | 127.0.0.2                  : ok=4    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0

2022-07-18 01:57:38.648242 | primary | localhost                  : ok=18   changed=7    unreachable=0    failed=0    skipped=74   rescued=0    ignored=0

2022-07-18 01:57:38.648246 | primary | overcloud-controller-0     : ok=8    changed=5    unreachable=0    failed=0    skipped=5    rescued=0    ignored=2

2022-07-18 01:57:38.648252 | primary | overcloud-controller-1     : ok=8    changed=5    unreachable=0    failed=0    skipped=5    rescued=0    ignored=2

2022-07-18 01:57:38.648257 | primary | overcloud-controller-2     : ok=8    changed=5    unreachable=0    failed=0    skipped=5    rescued=0    ignored=2

2022-07-18 01:57:38.648315 | primary | overcloud-novacompute-0    : ok=8    changed=5    unreachable=0    failed=0    skipped=5    rescued=0    ignored=2

2022-07-18 01:57:38.648323 | primary | undercloud                 : ok=300  changed=124  unreachable=0    failed=0    skipped=484  rescued=0    ignored=5

2022-07-18 01:57:38.648343 | primary |

2022-07-18 01:57:38.648481 | primary | Monday 18 July 2022  01:57:38 -0400 (0:00:00.242)       4:06:34.682 ***********

2022-07-18 01:57:38.648491 | primary | ===============================================================================

2022-07-18 01:57:38.650861 | primary | os_tempest : Execute tempest tests ----------------------------------- 5236.49s

2022-07-18 01:57:38.650933 | primary | overcloud-deploy : Deploy the overcloud ------------------------------ 4699.41s

2022-07-18 01:57:38.650942 | primary | tripleo.operator.tripleo_undercloud_install : undercloud install ----- 2374.66s

2022-07-18 01:57:38.650951 | primary | repo-setup : Setup repos on live host --------------------------------- 474.41s

2022-07-18 01:57:38.650960 | primary | undercloud-setup : Run the package installation script ---------------- 363.06s

2022-07-18 01:57:38.650969 | primary | tripleo.operator.tripleo_overcloud_node_introspect : Introspect node -- 341.45s

2022-07-18 01:57:38.650977 | primary | validate-perf : Install the latest version of dstat on overcloud ------- 85.40s

2022-07-18 01:57:38.650985 | primary | Gathering Facts -------------------------------------------------------- 68.32s

2022-07-18 01:57:38.650993 | primary | modify-image : Close initramfs image ----------------------------------- 57.33s

2022-07-18 01:57:38.651002 | primary | os_tempest : Install distro packages ----------------------------------- 35.80s

2022-07-18 01:57:38.651010 | primary | tripleo.operator.tripleo_overcloud_node_import : Import node(s) -------- 35.57s

2022-07-18 01:57:38.651018 | primary | tripleo.operator.tripleo_overcloud_image_upload : Overcloud image upload -- 26.00s

2022-07-18 01:57:38.651034 | primary | os_tempest : Executing python-tempestconf ------------------------------ 25.62s

2022-07-18 01:57:38.651046 | primary | build-test-packages : Pip install pre-installed DLRN ------------------- 23.60s

2022-07-18 01:57:38.651055 | primary | overcloud-prep-images : List overcloud flavors for Nova deployment ----- 22.65s

2022-07-18 01:57:38.651062 | primary | os_tempest : Create router --------------------------------------------- 17.18s

2022-07-18 01:57:38.651068 | primary | build-test-packages : Clean up loop devices created by mock ------------ 15.53s

2022-07-18 01:57:38.651074 | primary | build-test-packages : Check loop devices stat -------------------------- 15.34s

2022-07-18 01:57:38.651080 | primary | validate-perf : Install the latest version of dstat on overcloud ------- 13.88s

2022-07-18 01:57:38.651086 | primary | modify-image : Extract initramfs image --------------------------------- 10.56s

2022-07-18 01:57:39.289865 | primary | +(./toci_quickstart.sh:161): main(): exit_value=0

2022-07-18 01:57:39.291586 | primary | +(./toci_quickstart.sh:164): main(): [[ 0 == 0 ]]

2022-07-18 01:57:39.293276 | primary | +(./toci_quickstart.sh:164): main(): echo 'Playbook run of ovb.yml passed successfully'

2022-07-18 01:57:39.293341 | primary | Playbook run of ovb.yml passed successfully

2022-07-18 01:57:39.293348 | primary | +(./toci_quickstart.sh:165): main(): [[ 0 != 0 ]]

2022-07-18 01:57:39.296453 | primary | +(./toci_quickstart.sh:167): main(): [[ 0 == 0 ]]

2022-07-18 01:57:39.296517 | primary | +(./toci_quickstart.sh:167): main(): echo 'Playbook run passed successfully'

2022-07-18 01:57:39.296525 | primary | Playbook run passed successfully

2022-07-18 01:57:39.296531 | primary | +(./toci_quickstart.sh:170): main(): echo 'Quickstart completed.'

2022-07-18 01:57:39.296552 | primary | Quickstart completed.

2022-07-18 01:57:39.296557 | primary | +(./toci_quickstart.sh:171): main(): exit 0

2022-07-18 01:57:39.326978 | primary | +(/home/zuul/src/opendev.org/openstack/tripleo-ci/toci_gate_test.sh:193): main(): echo 'Run completed'

2022-07-18 01:57:39.327059 | primary | Run completed

2022-07-18 05:57:40.519850 | primary | ok: Runtime: 4:09:35.890357

2022-07-18 05:57:40.640590 | 

2022-07-18 05:57:40.640799 | PLAY RECAP

2022-07-18 05:57:40.640907 | primary | ok: 11 changed: 7 unreachable: 0 failed: 0 skipped: 9 rescued: 0 ignored: 0
```


Note You need to log in before you can comment on or make changes to this bug.