Bug 1609007

Summary: [ceph-container] : dmcrypt OSDs are not starting after upgrading from 2.5.z1 to 3.0.z4
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: ContainerAssignee: Sébastien Han <shan>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact: Aron Gunn <agunn>
Priority: urgent    
Version: 3.0CC: agunn, anharris, ceph-eng-bugs, edonnell, evelu, flucifre, gabrioux, gmeno, hnallurv, kdreyer, seb, shan, tchandra
Target Milestone: rcKeywords: Regression
Target Release: 3.1Flags: shan: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.1.0-0.1.rc17.el7cp Ubuntu: ceph-ansible_3.1.0~rc17-2redhat1 rhceph:ceph-3.1-rhel-7-containers-candidate-38485-20180810211451 Doc Type: Bug Fix
Doc Text:
.A `dmcrypt` OSD comes up after upgrading a containerized {product} cluster to 3.x Previously, on FileStore, `ceph-disk` created the lockbox partition for `dmcrypt` on partition number 3. With the introduction of BlueStore, this partition is now on position number 5, but `ceph-disk` was trying to create the partition on position number 3 causing the OSD to fail. In this release, `ceph-disk` can now detect the correct partition to use for the lockbox partition.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-26 19:16:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1584264    
Attachments:
Description Flags
File contains snippet of journald logs of an OSD service with verbose enabled none

Description Vasishta 2018-07-26 17:20:41 UTC
Created attachment 1470843 [details]
File contains snippet of journald logs of an OSD service with verbose enabled

Description of problem:
dmcrypt OSDs are not coming up after upgrading containerized cluster from 2.5.z1 to 3.0.z4

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.39-1.el7cp.noarch

Upgrade from <brew-registry>/rhceph:2.5-7 to <live-registry>.rhceph/rhceph-3-rhel7:latest

How reproducible:
Always (1/1)

Steps to Reproduce:
(Path followed)
1. Initialize 2.5 containerized cluster
2. Upgrade manually to 2.5.z1
3. Upgrade cluster to 3.0.z4 using rolling_update

Actual results:
dmcrypt OSDs failed to start

Expected results:
OSDs must be up and running

Additional info:

Comment 8 Sébastien Han 2018-07-31 10:31:34 UTC
This is critical and I had to change the PM needinfo to approve and push my patch.
Federico, I hope you don't mind and didn't mean overstepping here.

If you disagree I can always revert the commit.
Thanks.

Comment 9 Sébastien Han 2018-07-31 10:32:40 UTC
Actually my commit got rejected:

remote: *** Checking commit b11e126e8131802a73fe9e79e2bc5bc23a36e0b0
remote: *** Resolves:
remote: ***   Unapproved:
remote: ***     rhbz#1609007 (needinfo+, qa_ack+, ceph-3.y?, pm_ack+, devel_ack+)
remote: *** No approved Bugzilla IDs referenced in log message or changelog for b11e126e8131802a73fe9e79e2bc5bc23a36e0b0
remote: *** Unapproved Bugzilla IDs referenced in log message or changelog for b11e126e8131802a73fe9e79e2bc5bc23a36e0b0
remote: *** Commit b11e126e8131802a73fe9e79e2bc5bc23a36e0b0 denied
remote: *** Current checkin policy requires:
remote:     (ceph-3.0 == ? or ceph-3.0 == +)
remote: *** See https://mojo.redhat.com/docs/DOC-1020853 for more information
remote: hooklet hooks/update.secondary.d/01-gitbzverify.py failed
remote: hooks/update.secondary died
remote: error: hook declined to update refs/heads/ceph-3.0-rhel-7
To ssh://pkgs.devel.redhat.com/containers/rhceph-rhel7
 ! [remote rejected] ceph-3.0-rhel-7 -> ceph-3.0-rhel-7 (hook declined)
error: failed to push some refs to 'ssh://shan.redhat.com/containers/rhceph-rhel7'

Ken, can you help me with this? Thanks

Comment 10 Harish NV Rao 2018-07-31 10:45:03 UTC
(In reply to leseb from comment #8)
> This is critical and I had to change the PM needinfo to approve and push my
> patch.
> Federico, I hope you don't mind and didn't mean overstepping here.
> 
> If you disagree I can always revert the commit.
> Thanks.

@Sebastien, Need to change the milestone and target release to match 3.0z5. Right?

Comment 11 Sébastien Han 2018-07-31 12:07:05 UTC
Done harish. Thanks

Comment 15 Sébastien Han 2018-07-31 12:14:37 UTC
remote: *** Checking commit b11e126e8131802a73fe9e79e2bc5bc23a36e0b0
remote: *** Resolves:
remote: ***   Approved:
remote: ***     rhbz#1609007 (needinfo+, qa_ack+, devel_ack+, ceph-3.0?, pm_ack+, ceph-3.y?)
remote: *** Commit b11e126e8131802a73fe9e79e2bc5bc23a36e0b0 allowed
remote: * Publishing information for 1 commits
To ssh://pkgs.devel.redhat.com/containers/rhceph-rhel7
   b755c66..b11e126  ceph-3.0-rhel-7 -> ceph-3.0-rhel-7

Comment 18 Sébastien Han 2018-07-31 12:18:39 UTC
Also pushed the patch in 3.1:

remote: *** No rules for ceph-3.1-rhel-7.  Happy hacking!
remote: * Publishing information for 1 commits
To ssh://pkgs.devel.redhat.com/containers/rhceph-rhel7
   93eb0d2..46ddc34  ceph-3.1-rhel-7 -> ceph-3.1-rhel-7


commit 46ddc3429a6ee280ac0c6d4f39d31517061b7277 (HEAD -> ceph-3.1-rhel-7, origin/ceph-3.1-rhel-7)

Comment 23 Ken Dreyer (Red Hat) 2018-07-31 17:42:29 UTC
In order to resolve this in a build for RHCS 3.0 z-stream, we need a new upstream tag on the stable-3.0 branch in GitHub. Currently stable-3.0 is 21 commits ahead of v3.0.39.

Comment 34 errata-xmlrpc 2018-09-26 19:16:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2820