1479078 – rolling upgrade issues with satellite installation

Bug 1479078 - rolling upgrade issues with satellite installation

Summary: rolling upgrade issues with satellite installation

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	2.3
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	2.5
Assignee:	Guillaume Abrioux
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-08-07 21:51 UTC by Vikhyat Umrao
Modified:	2020-09-10 11:09 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-28 21:41:48 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 2 Vikhyat Umrao 2017-08-07 21:53:12 UTC

rolling upgrade issues with satellite installation 

#Issue-1

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html-single/installation_guide_for_red_hat_enterprise_linux/#upgrading_between_minor_versions_and_applying_asynchronous_updates

5.2. Upgrading Between Minor Versions and Applying Asynchronous Updates

We have opened a doc bug: https://bugzilla.redhat.com/show_bug.cgi?id=1479074 this section does not talk about first that the customer needs to upgrade the ansible node.

Because of this the customer ran rolling upgrade without upgrading ceph-anisble version(ansible version was which we shipped in Red Hat Ceph Storage 2.1).

and rolling upgrade playbook went fine but it listed:

- one osd node as failed - and we verified OSD's in this node has latest rpm and in memory version
- and did not unset the nodeep-scrub, noout and noscrub flag

We started troubleshooting the issue:

First, we upgraded to latest ceph-ansible:

ansible-2.2.3.0-1.el7.noarch
ceph-ansible-2.2.11-1.el7scon.noarch

#Issue -2 

As soon as we upgraded to latest ansible it started failing only in the first monitor node with following error:

This was set because of satellite installation in all.yml
ceph_origin: 'distro' # or 'distro' NEEDED FOR SAT
ceph_stable_rh_storage: true


2017-08-07 16:55:00,788 p=4873 u=root |  TASK [ceph-common : verify that a method was chosen for red hat storage] *******

2017-08-07 16:55:00,817 p=4873 u=root |  fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "choose between ceph_rhcs_cdn_install and ceph_rhcs_iso_install"}

Issue-3:

This was set because of satellite installation in all.yml
ceph_origin: 'distro' # or 'distro' NEEDED FOR SAT

and commented:
#ceph_stable_rh_storage: true
#ceph_rhcs_cdn_install: false 

2017-08-07 17:13:27,230 p=7053 u=root |  fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "choose an upstream installation source or read https://github.com/ceph/ceph-ansible/wiki"}


We have manually unset the nodeep-scrub, noscrub and noout flags.

Comment 5 Guillaume Abrioux 2017-08-09 20:45:26 UTC

From what I have seen:

The old rolling_update.yml is still used, then it looks for group_vars/all file

 76   pre_tasks:
 77     - include_vars: roles/ceph-common/defaults/main.yml
 78     - include_vars: roles/ceph-mon/defaults/main.yml
 79     - include_vars: roles/ceph-restapi/defaults/main.yml
 80     - include_vars: group_vars/all
 81       failed_when: false
 82     - include_vars: group_vars/{{ mon_group_name }}
 83       failed_when: false
 84     - include_vars: group_vars/{{ restapi_group_name }}
 85       failed_when: false


but group_vars/all doesn't exist :

$~/Downloads/debug_bz/usr/share/ceph-ansible$ ls group_vars/all
ls: group_vars/all: No such file or directory

therefore it takes the default value for ceph_origin which is :

roles/ceph-common/defaults/main.yml:83:ceph_origin: 'upstream' # or 'distro' or 'local'

then it end up with the error we can see in the logs because its entering in that condition:

https://github.com/ceph/ceph-ansible/blob/v2.2.11/roles/ceph-common/tasks/checks/check_mandatory_vars.yml#L12-L23

I think playing the new version of rolling_update.yml should fix this error since it will look for group_vars/all.yml (which actually exists) and set ceph_origin to 'distro' as expected.

Note You need to log in before you can comment on or make changes to this bug.