Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 1479078

Summary: rolling upgrade issues with satellite installation
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED NOTABUG QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.3CC: adeza, aschoen, ceph-eng-bugs, gmeno, jbautist, nthomas, sankarshan
Target Milestone: rc   
Target Release: 2.5   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-28 21:41:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Vikhyat Umrao 2017-08-07 21:53:12 UTC
rolling upgrade issues with satellite installation 

#Issue-1

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html-single/installation_guide_for_red_hat_enterprise_linux/#upgrading_between_minor_versions_and_applying_asynchronous_updates

5.2. Upgrading Between Minor Versions and Applying Asynchronous Updates

We have opened a doc bug: https://bugzilla.redhat.com/show_bug.cgi?id=1479074 this section does not talk about first that the customer needs to upgrade the ansible node.

Because of this the customer ran rolling upgrade without upgrading ceph-anisble version(ansible version was which we shipped in Red Hat Ceph Storage 2.1).

and rolling upgrade playbook went fine but it listed:

- one osd node as failed - and we verified OSD's in this node has latest rpm and in memory version
- and did not unset the nodeep-scrub, noout and noscrub flag

We started troubleshooting the issue:

First, we upgraded to latest ceph-ansible:

ansible-2.2.3.0-1.el7.noarch
ceph-ansible-2.2.11-1.el7scon.noarch

#Issue -2 

As soon as we upgraded to latest ansible it started failing only in the first monitor node with following error:

This was set because of satellite installation in all.yml
ceph_origin: 'distro' # or 'distro' NEEDED FOR SAT
ceph_stable_rh_storage: true


2017-08-07 16:55:00,788 p=4873 u=root |  TASK [ceph-common : verify that a method was chosen for red hat storage] *******

2017-08-07 16:55:00,817 p=4873 u=root |  fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "choose between ceph_rhcs_cdn_install and ceph_rhcs_iso_install"}

Issue-3:

This was set because of satellite installation in all.yml
ceph_origin: 'distro' # or 'distro' NEEDED FOR SAT

and commented:
#ceph_stable_rh_storage: true
#ceph_rhcs_cdn_install: false 

2017-08-07 17:13:27,230 p=7053 u=root |  fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "choose an upstream installation source or read https://github.com/ceph/ceph-ansible/wiki"}


We have manually unset the nodeep-scrub, noscrub and noout flags.

Comment 5 Guillaume Abrioux 2017-08-09 20:45:26 UTC
From what I have seen:

The old rolling_update.yml is still used, then it looks for group_vars/all file

 76   pre_tasks:
 77     - include_vars: roles/ceph-common/defaults/main.yml
 78     - include_vars: roles/ceph-mon/defaults/main.yml
 79     - include_vars: roles/ceph-restapi/defaults/main.yml
 80     - include_vars: group_vars/all
 81       failed_when: false
 82     - include_vars: group_vars/{{ mon_group_name }}
 83       failed_when: false
 84     - include_vars: group_vars/{{ restapi_group_name }}
 85       failed_when: false


but group_vars/all doesn't exist :

$~/Downloads/debug_bz/usr/share/ceph-ansible$ ls group_vars/all
ls: group_vars/all: No such file or directory

therefore it takes the default value for ceph_origin which is :

roles/ceph-common/defaults/main.yml:83:ceph_origin: 'upstream' # or 'distro' or 'local'

then it end up with the error we can see in the logs because its entering in that condition:

https://github.com/ceph/ceph-ansible/blob/v2.2.11/roles/ceph-common/tasks/checks/check_mandatory_vars.yml#L12-L23

I think playing the new version of rolling_update.yml should fix this error since it will look for group_vars/all.yml (which actually exists) and set ceph_origin to 'distro' as expected.