Bug 1649957 - jewel to luminous containerized upgrade fails when mgr is collocated with mons
Summary: jewel to luminous containerized upgrade fails when mgr is collocated with mons
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.2
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: 3.2
Assignee: Guillaume Abrioux
QA Contact: Coady LaCroix
URL:
Whiteboard:
: 1653667 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-14 23:40 UTC by Coady LaCroix
Modified: 2019-01-03 19:02 UTC (History)
14 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.0-0.1.rc5.el7cp Ubuntu: ceph-ansible_3.2.0~rc5-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-03 19:02:22 UTC
Embargoed:
vakulkar: automate_bug+


Attachments (Terms of Use)
container upgrade failure (541.56 KB, application/zip)
2018-11-14 23:40 UTC, Coady LaCroix
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 3372 0 None None None 2018-11-27 13:36:32 UTC
Red Hat Product Errata RHBA-2019:0020 0 None None None 2019-01-03 19:02:28 UTC

Description Coady LaCroix 2018-11-14 23:40:14 UTC
Created attachment 1505856 [details]
container upgrade failure

Description of problem: 

During execution of the rolling update playbook to upgrade a containerized jewel installation to luminous(3.2), the playbook is failing with the following message:

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: AnsibleFileNotFound: Could not find or access '~/fetch//0d5194c8-20d1-410e-be3b-ba05d14e25d8//etc/ceph/ceph.mgr.ceph-clacroix-1542220323880-node1-mon.keyring'
failed: [ceph-clacroix-1542220323880-node1-mon] (item={u'dest': u'/var/lib/ceph/mgr/ceph-ceph-clacroix-1542220323880-node1-mon/keyring', u'name': u'/etc/ceph/ceph.mgr.ceph-clacroix-1542220323880-node1-mon.keyring', u'copy_key': True}) => {"changed": false, "failed": true, "item": {"copy_key": true, "dest": "/var/lib/ceph/mgr/ceph-ceph-clacroix-1542220323880-node1-mon/keyring", "name": "/etc/ceph/ceph.mgr.ceph-clacroix-1542220323880-node1-mon.keyring"}, "msg": "Could not find or access '~/fetch//0d5194c8-20d1-410e-be3b-ba05d14e25d8//etc/ceph/ceph.mgr.ceph-clacroix-1542220323880-node1-mon.keyring'"}

The cluster is configured prior to upgrade to collocate the mgr and mons. The fetch directory is also configured to be ~/fetch.


Version-Release number of selected component (if applicable):
ceph-ansible-3.2.0-0.1.rc1.el7cp.noarch

How reproducible:
Every attempt to upgrade a containerized jewel installation to luminous 3.2.

Steps to Reproduce:
1. Install jewel containerized 
2. Configure inventory to collocate mgr on existing mons
3. Run rolling update playbook

Actual results:
Failure (see above) during execution. Full logs attached.

Expected results:
Successful playbook execution and upgraded cluster.

Additional info:

Comment 3 Sébastien Han 2018-11-20 17:34:54 UTC
Can I see your inventory file? Do you have a [mgrs] section?
Thanks!


To give you more info, we have to determine why this task got skipped https://github.com/ceph/ceph-ansible/blob/d5409109fbec7a318fae09ad469f10ac0aae3866/infrastructure-playbooks/rolling_update.yml#L257-L258

Comment 4 Coady LaCroix 2018-11-20 21:09:09 UTC
Yes, there is a mgr section in the inventory file. The inventory file is updated after the jewel installation but before the upgrade to luminous. The entries for the mons section are copied to the mgrs section. Full contents are below. Note this was a separate run where I reproduced the issue so the hostnames won't match up to what are in the original log, however the structure should be identical.

[mons]
ceph-clacroix-1542743011152-node1-mon monitor_interface=eth0
ceph-clacroix-1542743011152-node3-mon monitor_interface=eth0
ceph-clacroix-1542743011152-node2-mon monitor_interface=eth0
[osds]
ceph-clacroix-1542743011152-node5-osd monitor_interface=eth0  devices='["/dev/vdb", "/dev/vdc", "/dev/vdd"]' 
ceph-clacroix-1542743011152-node4-osd monitor_interface=eth0  devices='["/dev/vdb", "/dev/vdc", "/dev/vdd"]' 
ceph-clacroix-1542743011152-node6-osd monitor_interface=eth0  devices='["/dev/vdb", "/dev/vdc", "/dev/vdd"]' 
[rgws]
ceph-clacroix-1542743011152-node9-rgw radosgw_interface=eth0
[clients]
ceph-clacroix-1542743011152-node10-client client_interface=eth0
[mgrs]
ceph-clacroix-1542743011152-node1-mon monitor_interface=eth0
ceph-clacroix-1542743011152-node3-mon monitor_interface=eth0
ceph-clacroix-1542743011152-node2-mon monitor_interface=eth0

Comment 5 seb 2018-11-27 13:25:48 UTC
*** Bug 1653667 has been marked as a duplicate of this bug. ***

Comment 10 Coady LaCroix 2018-12-04 22:23:04 UTC
Verified the issue has been resolved.

Comment 12 errata-xmlrpc 2019-01-03 19:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0020


Note You need to log in before you can comment on or make changes to this bug.