Bug 1847113
| Summary: | After FFWD we should unset ContainerCeph3DaemonImage | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Giulio Fidente <gfidente> | |
| Component: | openstack-tripleo-heat-templates | Assignee: | Jose Luis Franco <jfrancoa> | |
| Status: | CLOSED ERRATA | QA Contact: | David Rosenfeld <drosenfe> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 16.1 (Train) | CC: | aschultz, dmacpher, fpantano, hbrock, jfrancoa, johfulto, jpretori, jslagle, lbezdick, mbracho, mburns, morazi, pgrist, rbrady, spower, sputhenp | |
| Target Milestone: | rc | Keywords: | Triaged | |
| Target Release: | 16.1 (Train on RHEL 8.2) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | openstack-tripleo-heat-templates-11.3.2-0.20200616081527.396affd.el8ost | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1850212 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-29 07:53:11 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1850212 | |||
|
Description
Giulio Fidente
2020-06-15 17:16:28 UTC
This issue presents when after the converge step. When you run `openstack overcloud external-upgrade run --stack $STACK --tags ceph` it fails with the following ceph-ansible error:
2020-06-15 11:08:47,273 p=264551 u=root n=ansible | TASK [container | disallow pre-nautilus OSDs and enable all new nautilus-only functionality] ***
2020-06-15 11:08:47,274 p=264551 u=root n=ansible | Monday 15 June 2020 11:08:47 -0400 (0:00:00.485) 0:20:09.714 ***********
2020-06-15 11:08:49,215 p=264551 u=root n=ansible | fatal: [osp-test-octopi-zorillas-controller-0 -> 10.10.0.116]: FAILED! => changed=true
cmd:
- podman
- exec
- ceph-mon-osp-test-octopi-zorillas-controller-0
- ceph
- osd
- require-osd-release
- nautilus
delta: '0:00:01.537954'
end: '2020-06-15 15:08:49.184682'
msg: non-zero return code
rc: 22
start: '2020-06-15 15:08:47.646728'
stderr: |-
Invalid command: nautilus not in luminous
osd require-osd-release luminous {--yes-i-really-mean-it} : set the minimum allowed OSD release to participate in the cluster
Error EINVAL: invalid command
Error: non zero exit code: 22: OCI runtime error
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
2020-06-15 11:08:49,216 p=264551 u=root n=ansible | NO MORE HOSTS LEFT *************************************************************
2020-06-15 11:08:49,219 p=264551 u=root n=ansible | PLAY RECAP *********************************************************************
2020-06-15 11:08:49,219 p=264551 u=root n=ansible | localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
2020-06-15 11:08:49,219 p=264551 u=root n=ansible | osp-test-octopi-zorillas-cephstorage-0 : ok=160 changed=14 unreachable=0 failed=0 skipped=251 rescued=0 ignored=0
2020-06-15 11:08:49,219 p=264551 u=root n=ansible | osp-test-octopi-zorillas-cephstorage-1 : ok=160 changed=14 unreachable=0 failed=0 skipped=251 rescued=0 ignored=0
2020-06-15 11:08:49,220 p=264551 u=root n=ansible | osp-test-octopi-zorillas-cephstorage-2 : ok=161 changed=13 unreachable=0 failed=0 skipped=250 rescued=0 ignored=0
2020-06-15 11:08:49,220 p=264551 u=root n=ansible | osp-test-octopi-zorillas-controller-0 : ok=420 changed=47 unreachable=0 failed=1 skipped=612 rescued=0 ignored=0
2020-06-15 11:08:49,220 p=264551 u=root n=ansible | osp-test-octopi-zorillas-controller-1 : ok=297 changed=29 unreachable=0 failed=0 skipped=494 rescued=0 ignored=0
2020-06-15 11:08:49,220 p=264551 u=root n=ansible | osp-test-octopi-zorillas-controller-2 : ok=293 changed=27 unreachable=0 failed=0 skipped=484 rescued=0 ignored=0
2020-06-15 11:08:49,220 p=264551 u=root n=ansible | osp-test-octopi-zorillas-novacompute-0 : ok=114 changed=8 unreachable=0 failed=0 skipped=236 rescued=0 ignored=0
2020-06-15 11:08:49,220 p=264551 u=root n=ansible | osp-test-octopi-zorillas-novacompute-1 : ok=111 changed=7 unreachable=0 failed=0 skipped=225 rescued=0 ignored=0
2020-06-15 11:08:49,220 p=264551 u=root n=ansible | Monday 15 June 2020 11:08:49 -0400 (0:00:01.946) 0:20:11.661 ***********
2020-06-15 11:08:49,221 p=264551 u=root n=ansible | ===============================================================================
2020-06-15 11:08:49,225 p=264551 u=root n=ansible | waiting for clean pgs... ----------------------------------------------- 36.07s
2020-06-15 11:08:49,225 p=264551 u=root n=ansible | gather and delegate facts ---------------------------------------------- 28.58s
2020-06-15 11:08:49,225 p=264551 u=root n=ansible | stop standby ceph mds -------------------------------------------------- 26.17s
2020-06-15 11:08:49,225 p=264551 u=root n=ansible | ceph-container-common : pulling osp-test-octopi-zorillas-undercloud.ctlplane.hextupleo.lab:8787/rhceph/rhceph-3-rhel7:3-40 image -- 17.95s
As per comment #2 the rolling_update playbook was using not RHCSv4 containers but RHCSv3 containers! That is why the following task fails. https://github.com/ceph/ceph-ansible/blob/v4.0.23/infrastructure-playbooks/rolling_update.yml#L945 So we need a way in THT for the person doing the upgrade to specify that they want ceph4 containers to be used. HOW TO AVOID THIS ISSUE 1. before running the converge step create a file called no_ceph3.yaml (or something similar) containing the following value: parameter_defaults: ContainerCeph3DaemonImage: '' 2. When you run converge step include the file as the last argument of your openstack overcloud deploy command. E.g. "openstack overcloud deploy ... -e no_ceph3.yaml If you've already run the converge step and encountered this bug, then you may re-run run it. 3. Proceed to the ceph upgrade as usual by running a command like: `openstack overcloud external-upgrade run --stack $STACK --tags ceph` Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3148 |