Bug 1519052

Summary: OSP 11 -> OSP12 overcloud with dedicated rados gateway node upgrade failed
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Yogev Rabl <yrabl>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED EOL QA Contact: Vasishta <vashastr>
Severity: high Docs Contact:
Priority: high    
Version: 2.4CC: adeza, anharris, aschoen, aschultz, ccamacho, ceph-eng-bugs, flucifre, gabrioux, gfidente, gmeno, kdreyer, lbezdick, mburns, mcornea, nthomas, rhel-osp-director-maint, sankarshan, shan, tserlin, yrabl
Target Milestone: z5   
Target Release: 2.5   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-27 04:39:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1548354    
Attachments:
Description Flags
the ceph upgrade log none

Description Yogev Rabl 2017-11-30 01:26:56 UTC
Created attachment 1360646 [details]
the ceph upgrade log

Description of problem:
The upgrade failed with when ceph-ansible failed to start rgw container. The error is: 
fatal: [192.168.24.13]: FAILED! => {"changed": false, "failed": true, "msg": "AnsibleUndefinedVariable: No first item, sequence was empty."}

The rest of the ceph cluster was upgraded and containerized. 

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.14-1.el7cp.noarch
puppet-tripleo-7.4.3-10.el7ost.noarch
openstack-tripleo-common-containers-7.6.3-6.el7ost.noarch
openstack-tripleo-puppet-elements-7.0.1-2.el7ost.noarch
openstack-tripleo-validations-7.4.2-1.el7ost.noarch
python-tripleoclient-7.3.3-6.el7ost.noarch
openstack-tripleo-heat-templates-7.0.3-16.el7ost.noarch
openstack-tripleo-ui-7.4.3-4.el7ost.noarch
openstack-tripleo-common-7.6.3-6.el7ost.noarch
openstack-tripleo-image-elements-7.0.1-1.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy an overcloud with dedicated node for rados gateway in OSP11
2. upgrade it to OSP12


Actual results:
The deployment failed with the rados gateway fail to start in a container.

Expected results:
All of the ceph cluster is running on containers

Additional info:

Comment 3 Carlos Camacho 2017-11-30 14:43:48 UTC
Is this related with https://bugzilla.redhat.com/show_bug.cgi?id=1519055 ??

Hi Giulio! Can you give us more information about this issue?

Comment 4 Giulio Fidente 2017-11-30 14:51:11 UTC
Looks like a real issue in ceph-ansible, moving to Ceph product

Comment 7 Ken Dreyer (Red Hat) 2017-11-30 20:17:44 UTC
Is there a reproducer outside of OSP?

Comment 8 Federico Lucifredi 2017-11-30 23:26:48 UTC
Looks like a blocker for 2.5, but OSPd-driven Ceph 3 upgrade from 11 to 12 is not a valid upgrade path for 3.0.

Giulio/Yogev please explain if you disagree with the above. My guess is you set target=3 because this is a Ceph-Ansible 3 issue.

Setting target=2.5

Comment 10 Giulio Fidente 2017-12-01 08:30:42 UTC
It looks like we have a similar issue with mons in BZ #1519055 ; probably a duplicate

Comment 20 Sébastien Han 2019-03-21 04:10:24 UTC
Nothing has be done on this. I'm the assignee but I won't be working on this. We need to triage this and assign it to someone else and then move this to ASSIGNED.

Comment 24 Red Hat Bugzilla 2023-09-14 04:12:54 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days