Bug 1795792
| Summary: | Overcloud minor update fails 'host looking for a container name it would never have' | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Alistair Tonner <atonner> | ||||||
| Component: | Ceph-Ansible | Assignee: | Dimitri Savineau <dsavinea> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Vasishta <vashastr> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 4.0 | CC: | amsyedha, aschoen, ceph-eng-bugs, ceph-qe-bugs, dsavinea, flucifre, gabrioux, gcharot, gfidente, gmeno, johfulto, nthomas, nweinber, tchandra, tserlin, ykaul | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | 4.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | ceph-ansible-4.0.14-1.el8cp, ceph-ansible-4.0.14-1.el7cp | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 1796492 (view as bug list) | Environment: | |||||||
| Last Closed: | 2020-01-31 12:48:52 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1642481, 1796492 | ||||||||
| Attachments: |
|
||||||||
I see a bug like bz1792320 but on rolling_update.yml in ceph-ansible-4.0.13-1.el8cp.noarch "Error: no container with name or ID ceph-mon-overcloud-controller-2 found: no such container" "failed: [overcloud-controller-0 -> 192.168.24.24] (item={'name': 'mgr.controller-0', 'path': '/var/lib/ceph/mgr/ceph-controller-0/keyring', 'copy_key': True}) => changed=true ", " - mgr.controller-0", " delta: '0:00:00.092795'", " end: '2020-01-28 10:50:22.704025'", " _raw_params: podman exec ceph-mon-overcloud-controller-2 ceph --cluster ceph auth get mgr.controller-0", " copy_key: true", " name: mgr.controller-0", " path: /var/lib/ceph/mgr/ceph-controller-0/keyring", " start: '2020-01-28 10:50:22.611230'", The task "waiting for the containerized monitor to join the quorum" https://github.com/ceph/ceph-ansible/blob/v4.0.13/infrastructure-playbooks/rolling_update.yml#L275-L285 was the last to run [1] this should only affect minor updates of ceph. [1] [fultonj@runcible stack]$ grep TASK ceph-update-run.log | tail -5 2020-01-28 10:50:27 | "TASK [start ceph mon] **********************************************************", 2020-01-28 10:50:27 | "TASK [start ceph mgr] **********************************************************", 2020-01-28 10:50:27 | "TASK [restart containerized ceph mon] ******************************************", 2020-01-28 10:50:27 | "TASK [non container | waiting for the monitor to join the quorum...] ***********", 2020-01-28 10:50:27 | "TASK [container | waiting for the containerized monitor to join the quorum...] ***", [fultonj@runcible stack]$ Created attachment 1656122 [details]
ansible log from ceph-update-run
> this should only affect minor updates of ceph.
IMHO if it impacts minor update, this should impact major update too because that's the same playbook.
Hi Tejas, I think you should start RC testing anyway. We will not have confirmation on Blocker/not blocker until Dimitri wakes. John, you also get to call it a blocker (or not) for OSP. Please advise. From my point of view, it is a blocker (In reply to Federico Lucifredi from comment #7) > Hi Tejas, > I think you should start RC testing anyway. We will not have confirmation > on Blocker/not blocker until Dimitri wakes. > > John, you also get to call it a blocker (or not) for OSP. Please advise. Unfortunately I think it is, without it people won't be able to update their overcloud from 16 ga to the next 16 z (16.0.1) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0312 |
Created attachment 1656120 [details] ansible log output from update runs- includes listed error Description of problem: composable-minor_update-RHELOSP-48861 - job fails in overcloud update during ceph-update-run.sh - Version-Release number of selected component (if applicable): RHOS_TRUNK-16.0-RHEL-8-20200124.n.1 puppet-ceph-3.0.1-0.20191002213425.55a0f94.el8ost.noarch ansible-role-redhat-subscription-1.0.5-0.20191022053336.6c67a40.el8ost.noarch openstack-tripleo-puppet-elements-11.2.1-0.20191108131052.2ad3189.el8ost.noarch python3-tripleoclient-12.3.1-0.20191230195937.585fb28.el8ost.noarch ansible-pacemaker-1.0.4-0.20191022042340.0e4d7c0.el8ost.noarch ansible-role-atos-hsm-0.1.1-0.20191024165047.866e075.el8ost.noarch python3-tripleo-common-11.3.3-0.20200121231250.3c68b48.el8ost.noarch openstack-tripleo-common-11.3.3-0.20200121231250.3c68b48.el8ost.noarch puppet-tripleo-11.4.1-0.20200118215809.6f9bf6c.el8ost.noarch ansible-2.8.8-1.el8ae.noarch ansible-config_template-1.0.1-0.20191122040234.ff61269.el8ost.noarch openstack-tripleo-image-elements-10.6.1-0.20191022065313.7338463.el8ost.noarch openstack-tripleo-validations-11.3.1-0.20191126041901.2bba53a.el8ost.noarch openstack-tripleo-heat-templates-11.3.2-0.20200114185851.813f68b.el8ost.noarch ansible-tripleo-ipsec-9.2.0-0.20191022054642.ffe104c.el8ost.noarch ansible-role-thales-hsm-0.2.1-0.20191024165911.2803c6c.el8ost.noarch ansible-role-openstack-operations-0.0.1-0.20191022044056.29cc537.el8ost.noarch ceph-ansible-4.0.13-1.el8cp.noarch python3-heat-agent-ansible-1.10.1-0.20191022061131.96b819c.el8ost.noarch tripleo-ansible-0.4.2-0.20200110023759.ee731ba.el8ost.noarch python3-tripleoclient-heat-installer-12.3.1-0.20191230195937.585fb28.el8ost.noarch ansible-role-chrony-1.0.2-0.20191022052427.03e7fbe.el8ost.noarch ansible-role-tripleo-modify-image-1.1.1-0.20200122200932.58d7a5b.el8ost.noarch ansible-role-container-registry-1.1.1-0.20191025041237.bf2e310.el8ost.noarch openstack-tripleo-common-containers-11.3.3-0.20200121231250.3c68b48.el8ost.noarch How reproducible: Consistent - Steps to Reproduce: 1. Deploy OSP16, 3 cont, 3 ceph, 2 compute, 2 ironic and composable nodes , execute minor update. 2. 3. Actual results: overcloud_update_run_CephStorage.sh fails pointing to ansible error: <192.168.24.24> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o 'IdentityFile=\"/var/lib/mistral/0ed610f4-1262-4635-a01c-9cdba029ce0b/ssh_private_key\"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User=\"tripleo-admin\"' -o ConnectTimeout=60 -o ControlPath=/root/.ansible/cp/%h-%r-%p 192.168.24.24 '/bin/sh -c '\"'\"'sudo -H -S -n -u root /bin/sh -c '\"'\"'\"'\"'\"'\"'\"'\"'echo BECOME-SUCCESS-jhmetxwsdgoftvpgvpogrwqknvupotmj ; /usr/bin/python3'\"'\"'\"'\"'\"'\"'\"'\"' && sleep 0'\"'\"''", "<192.168.24.24> (1, b'\\n{\"msg\": \"non-zero return code\", \"cmd\": [\"podman\", \"exec\", \"ceph-mon-overcloud-controller-2\", \"ceph\", \"--cluster\", \"ceph\", \"auth\", \"get\", \"mgr.controller-0\"], \"stdout\": \"\", \"stderr\": \"Error: no container with name or ID ceph-mon-overcloud-controller-2 found: no such container\", \"rc\": 125, \"start\": \"2020-01-28 10:50:22.611230\", \"end\": \"2020-01-28 10:50:22.704025\", \"delta\": \"0:00:00.092795\", \"changed\": true, \"failed\": true, \"invocation\": {\"module_args\": {\"_raw_params\": \"podman exec ceph-mon-overcloud-controller-2 ceph --cluster ceph auth get mgr.controller-0\", \"warn\": true, \"_uses_shell\": false, \"stdin_add_newline\": true, \"strip_empty_ends\": true, \"argv\": null, \"chdir\": null, \"executable\": null, \"creates\": null, \"removes\": null, \"stdin\": null}}}\\n', b'')", "failed: [overcloud-controller-0 -> 192.168.24.24] (item={'name': 'mgr.controller-0', 'path': '/var/lib/ceph/mgr/ceph-controller-0/keyring', 'copy_key': True}) => changed=true ", " - mgr.controller-0", " delta: '0:00:00.092795'", " end: '2020-01-28 10:50:22.704025'", " _raw_params: podman exec ceph-mon-overcloud-controller-2 ceph --cluster ceph auth get mgr.controller-0", " copy_key: true", " name: mgr.controller-0", " path: /var/lib/ceph/mgr/ceph-controller-0/keyring", " start: '2020-01-28 10:50:22.611230'", Expected results: Overcloud update should succeed and complete. Additional info: