Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1649957

Summary: jewel to luminous containerized upgrade fails when mgr is collocated with mons
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Coady LaCroix <clacroix>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Coady LaCroix <clacroix>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.2CC: aschoen, ceph-eng-bugs, clacroix, gabrioux, gmeno, hgurav, hnallurv, nthomas, rperiyas, sankarshan, seb, tserlin, vakulkar, vpoliset
Target Milestone: rcKeywords: Automation, AutomationBlocker
Target Release: 3.2Flags: vakulkar: automate_bug+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.0-0.1.rc5.el7cp Ubuntu: ceph-ansible_3.2.0~rc5-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-03 19:02:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
container upgrade failure none

Description Coady LaCroix 2018-11-14 23:40:14 UTC
Created attachment 1505856 [details]
container upgrade failure

Description of problem: 

During execution of the rolling update playbook to upgrade a containerized jewel installation to luminous(3.2), the playbook is failing with the following message:

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/fetch//0d5194c8-20d1-410e-be3b-ba05d14e25d8//etc/ceph/ceph.mgr.ceph-clacroix-1542220323880-node1-mon.keyring'
failed: [ceph-clacroix-1542220323880-node1-mon] (item={u'dest': u'/var/lib/ceph/mgr/ceph-ceph-clacroix-1542220323880-node1-mon/keyring', u'name': u'/etc/ceph/ceph.mgr.ceph-clacroix-1542220323880-node1-mon.keyring', u'copy_key': True}) => {"changed": false, "failed": true, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/ceph-ceph-clacroix-1542220323880-node1-mon/keyring", "name": "/etc/ceph/ceph.mgr.ceph-clacroix-1542220323880-node1-mon.keyring"}, "msg": "Could not find or access '~/fetch//0d5194c8-20d1-410e-be3b-ba05d14e25d8//etc/ceph/ceph.mgr.ceph-clacroix-1542220323880-node1-mon.keyring'"}

The cluster is configured prior to upgrade to collocate the mgr and mons. The fetch directory is also configured to be ~/fetch.


Version-Release number of selected component (if applicable):
ceph-ansible-3.2.0-0.1.rc1.el7cp.noarch

How reproducible:
Every attempt to upgrade a containerized jewel installation to luminous 3.2.

Steps to Reproduce:
1. Install jewel containerized 
2. Configure inventory to collocate mgr on existing mons
3. Run rolling update playbook

Actual results:
Failure (see above) during execution. Full logs attached.

Expected results:
Successful playbook execution and upgraded cluster.

Additional info:

Comment 3 Sébastien Han 2018-11-20 17:34:54 UTC
Can I see your inventory file? Do you have a [mgrs] section?
Thanks!


To give you more info, we have to determine why this task got skipped https://github.com/ceph/ceph-ansible/blob/d5409109fbec7a318fae09ad469f10ac0aae3866/infrastructure-playbooks/rolling_update.yml#L257-L258

Comment 4 Coady LaCroix 2018-11-20 21:09:09 UTC
Yes, there is a mgr section in the inventory file. The inventory file is updated after the jewel installation but before the upgrade to luminous. The entries for the mons section are copied to the mgrs section. Full contents are below. Note this was a separate run where I reproduced the issue so the hostnames won't match up to what are in the original log, however the structure should be identical.

[mons]
ceph-clacroix-1542743011152-node1-mon monitor_interface=eth0
ceph-clacroix-1542743011152-node3-mon monitor_interface=eth0
ceph-clacroix-1542743011152-node2-mon monitor_interface=eth0
[osds]
ceph-clacroix-1542743011152-node5-osd monitor_interface=eth0  devices='["/dev/vdb", "/dev/vdc", "/dev/vdd"]' 
ceph-clacroix-1542743011152-node4-osd monitor_interface=eth0  devices='["/dev/vdb", "/dev/vdc", "/dev/vdd"]' 
ceph-clacroix-1542743011152-node6-osd monitor_interface=eth0  devices='["/dev/vdb", "/dev/vdc", "/dev/vdd"]' 
[rgws]
ceph-clacroix-1542743011152-node9-rgw radosgw_interface=eth0
[clients]
ceph-clacroix-1542743011152-node10-client client_interface=eth0
[mgrs]
ceph-clacroix-1542743011152-node1-mon monitor_interface=eth0
ceph-clacroix-1542743011152-node3-mon monitor_interface=eth0
ceph-clacroix-1542743011152-node2-mon monitor_interface=eth0

Comment 5 seb 2018-11-27 13:25:48 UTC
*** Bug 1653667 has been marked as a duplicate of this bug. ***

Comment 10 Coady LaCroix 2018-12-04 22:23:04 UTC
Verified the issue has been resolved.

Comment 12 errata-xmlrpc 2019-01-03 19:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0020