Bug 1479078 - rolling upgrade issues with satellite installation
rolling upgrade issues with satellite installation
Status: CLOSED NOTABUG
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible (Show other bugs)
2.3
x86_64 Linux
medium Severity medium
: rc
: 2.5
Assigned To: Guillaume Abrioux
ceph-qe-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-07 17:51 EDT by Vikhyat Umrao
Modified: 2017-08-28 17:41 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-28 17:41:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 2 Vikhyat Umrao 2017-08-07 17:53:12 EDT
rolling upgrade issues with satellite installation 

#Issue-1

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html-single/installation_guide_for_red_hat_enterprise_linux/#upgrading_between_minor_versions_and_applying_asynchronous_updates

5.2. Upgrading Between Minor Versions and Applying Asynchronous Updates

We have opened a doc bug: https://bugzilla.redhat.com/show_bug.cgi?id=1479074 this section does not talk about first that the customer needs to upgrade the ansible node.

Because of this the customer ran rolling upgrade without upgrading ceph-anisble version(ansible version was which we shipped in Red Hat Ceph Storage 2.1).

and rolling upgrade playbook went fine but it listed:

- one osd node as failed - and we verified OSD's in this node has latest rpm and in memory version
- and did not unset the nodeep-scrub, noout and noscrub flag

We started troubleshooting the issue:

First, we upgraded to latest ceph-ansible:

ansible-2.2.3.0-1.el7.noarch
ceph-ansible-2.2.11-1.el7scon.noarch

#Issue -2 

As soon as we upgraded to latest ansible it started failing only in the first monitor node with following error:

This was set because of satellite installation in all.yml
ceph_origin: 'distro' # or 'distro' NEEDED FOR SAT
ceph_stable_rh_storage: true


2017-08-07 16:55:00,788 p=4873 u=root |  TASK [ceph-common : verify that a method was chosen for red hat storage] *******

2017-08-07 16:55:00,817 p=4873 u=root |  fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "choose between ceph_rhcs_cdn_install and ceph_rhcs_iso_install"}

Issue-3:

This was set because of satellite installation in all.yml
ceph_origin: 'distro' # or 'distro' NEEDED FOR SAT

and commented:
#ceph_stable_rh_storage: true
#ceph_rhcs_cdn_install: false 

2017-08-07 17:13:27,230 p=7053 u=root |  fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "choose an upstream installation source or read https://github.com/ceph/ceph-ansible/wiki"}


We have manually unset the nodeep-scrub, noscrub and noout flags.
Comment 5 Guillaume Abrioux 2017-08-09 16:45:26 EDT
From what I have seen:

The old rolling_update.yml is still used, then it looks for group_vars/all file

 76   pre_tasks:
 77     - include_vars: roles/ceph-common/defaults/main.yml
 78     - include_vars: roles/ceph-mon/defaults/main.yml
 79     - include_vars: roles/ceph-restapi/defaults/main.yml
 80     - include_vars: group_vars/all
 81       failed_when: false
 82     - include_vars: group_vars/{{ mon_group_name }}
 83       failed_when: false
 84     - include_vars: group_vars/{{ restapi_group_name }}
 85       failed_when: false


but group_vars/all doesn't exist :

$~/Downloads/debug_bz/usr/share/ceph-ansible$ ls group_vars/all
ls: group_vars/all: No such file or directory

therefore it takes the default value for ceph_origin which is :

roles/ceph-common/defaults/main.yml:83:ceph_origin: 'upstream' # or 'distro' or 'local'

then it end up with the error we can see in the logs because its entering in that condition:

https://github.com/ceph/ceph-ansible/blob/v2.2.11/roles/ceph-common/tasks/checks/check_mandatory_vars.yml#L12-L23

I think playing the new version of rolling_update.yml should fix this error since it will look for group_vars/all.yml (which actually exists) and set ceph_origin to 'distro' as expected.

Note You need to log in before you can comment on or make changes to this bug.