Bug 1366808

Summary: Ansible: upgrade nodes with encrypted OSDs and support for dm-crypt
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Federico Lucifredi <flucifre>
Component: ceph-ansibleAssignee: seb
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: unspecified Docs Contact: Bara Ancincova <bancinco>
Priority: unspecified    
Version: 2CC: adeza, aschoen, bancinco, ceph-eng-bugs, flucifre, gmeno, hnallurv, kdreyer, ldachary, nlevine, nthomas, rghatvis, sankarshan, seb, shan, tchandra, vashastr, vumrao
Target Milestone: ---   
Target Release: 2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
.Upgrading encrypted OSDs is now supported Previously, the `ceph-ansible` utility did not support adding encrypted OSD nodes. As a consequence, an attempt to upgrade to a newer, minor, or major version failed on encrypted OSD nodes. In addition, Ansible returned the following error message during the disk activation task: ---- mount: unknown filesystem type 'crypto_LUKS' ---- With this update, `ceph-ansible` supports adding encrypted OSD nodes, and upgrading works as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-14 15:50:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1388885    
Bug Blocks: 1322504, 1346350, 1383917, 1412948    
Attachments:
Description Flags
workaround with encrypted OSD's rolling update
none
ansible playbook log with ceph-osd commented none

Description Federico Lucifredi 2016-08-12 22:36:18 UTC
Description of problem:

The current version of ceph-ansible does not support OSDs encrypted with dm-crypt. 
When installing async updates this causes benign but troubling errors, and a workaround is needed.

This is a regression against ceph-deploy functionality.

As ceph-ansible is the only way to upgrade the product, we need to address this promptly. 

Resolution:

Sebastien has implemented this upstream. The latest upstream version of ceph-ansible is able to perform upgrades and transparently supports dm-crypt encrypted OSDs.

Comment 2 Federico Lucifredi 2016-08-12 22:37:45 UTC
This should be targeted at the first async, but I only have releases 2 and 3 as options —

Comment 5 seb 2016-08-16 15:55:40 UTC
@Rakesh, I just did, let me know if it's clear.
Ken, I'll ping you tomorrow, there is something I'm not sure about the cherry-pick stuff :)

Comment 7 seb 2016-08-17 14:55:49 UTC
Looks good to me, I just added "during the disk activation task", to point exactly where the playbook will fail.

Comment 10 seb 2016-10-07 08:38:16 UTC
Yup fixed with v1.0.8

Comment 11 Federico Lucifredi 2016-10-07 17:02:04 UTC
This will ship concurrently with RHCS 2.1.

Comment 14 Tejas 2016-11-08 16:40:19 UTC
Hi Bara,

  The doc text needs changing. the current version of ceph-ansible supports encrypted OSD's.
ceph-ansible-1.0.5-39.el7scon

Also I ran the rolling_updates.yml, commenting out roles is not needed anymore.

@Seb can you please confirm.

Thanks,
Tejas

Comment 17 seb 2016-11-09 10:40:23 UTC
LGTM Bara! Thanks!

Comment 20 Federico Lucifredi 2016-11-10 21:01:02 UTC
Docs, Please make sure the workaround documented in in 2.0 for use of encrypted OSDs is also documented in the 2.1 release notes.

Comment 22 Tejas 2016-11-14 15:27:44 UTC
Hi Seb,

   Doing a rolling_update with the workaround that Bara has mentioned in the Doc text is causing a failure on the MON update. ( commenting out the roles except ceph-common)
I have attached the log here.

Thanks,
Tejas

Comment 23 Tejas 2016-11-14 15:28:36 UTC
Created attachment 1220469 [details]
workaround with encrypted OSD's rolling update

Comment 24 seb 2016-11-14 15:47:46 UTC
@Bara please update the doc, you only need to comment out the ceph-osd role, the others are good since "only" OSD nodes are impacted.

Tejas, I see 2 issues:

1. ceph.conf changed and tries to trigger a restart, which shouldn't be an issue if the ceph-mon is called.
2. ceph-mon role was commented out

Can you try without commenting out ceph-mon?
Then

Comment 25 seb 2016-11-14 15:48:18 UTC
@Bara please update the doc, you only need to comment out the ceph-osd role, the others are good since "only" OSD nodes are impacted.

Tejas, I see 2 issues:

1. ceph.conf changed and tries to trigger a restart, which shouldn't be an issue if the ceph-mon is called.
2. ceph-mon role was commented out

Can you try without commenting out ceph-mon?

Comment 27 Tejas 2016-11-14 17:10:56 UTC
Seb,

  I tried without commenting any role, but I hit into this.
https://bugzilla.redhat.com/show_bug.cgi?id=1391675

So the MON updates go through fine.
Can't be certain about the OSD upgrades till this bug is resolved.

Thanks,
Tejas

Comment 28 Christina Meno 2016-11-14 19:44:20 UTC
I added a Depends on: https://bugzilla.redhat.com/show_bug.cgi?id=1394928
That is the workaround Bara documented depends on https://bugzilla.redhat.com/show_bug.cgi?id=1394928

Comment 31 Tejas 2016-11-15 11:40:43 UTC
Created attachment 1220807 [details]
ansible playbook  log with ceph-osd commented

Comment 32 Harish NV Rao 2016-11-15 11:50:41 UTC
*** Bug 1395171 has been marked as a duplicate of this bug. ***

Comment 33 seb 2016-11-15 17:05:43 UTC
This doesn't seem to be an issue with Ansible, this looks more like a package upgrade problem. It seems that dm's permission got changed from ceph:ceph to root:disk.
Udev rules look good, I'm not sure what changed the permission.

Comment 34 Christina Meno 2016-11-15 17:36:07 UTC
It's a blocker -- we're asking for help and forming a plan, you'll know more as soon as we do.

Comment 35 Loic Dachary 2016-11-15 18:41:44 UTC
Could http://tracker.ceph.com/issues/17813 help ?

Comment 36 John Poelstra 2016-11-15 18:55:05 UTC
Changing version to 2 so this bug shows up in our queries as it should.

Comment 37 Alfredo Deza 2016-11-15 20:30:13 UTC
After investigating this further we determined that the one OSD having issues was because the journal had root permissions:

   # ls -alh /dev/dm-0
   brw-rw----. 1 root disk 253, 0 Nov 15 11:19 /dev/dm-0

We got to /dev/dm-0 because the journal points to:

    lrwxrwxrwx.   1 ceph ceph   48 Nov 15 06:04 journal -> /dev/mapper/cd92e354-e9ed-4207-bbfd-62e6778d3838

And the mapper points to /dev/dm-0:

    lrwxrwxrwx. 1 root root 7 Nov 15 11:19 /dev/mapper/cd92e354-e9ed-4207-bbfd-62e6778d3838 -> ../dm-0

In order to get this working we had to do the following (manually):

1) Set the right permissions on the actual device: 
    chown -R ceph:ceph /dev/dm-0

2) Reset the failure state for the ceph-osd service
    systemctl reset-failed ceph-osd

3) Start the osd daemon
    systemctl start ceph-osd

4) Verify that the OSD daemon came up and it shows in the osd tree:
    systemctl status ceph-osd
    ceph osd tree

Once all those steps were completed, we were able to run the rolling upgrade playbook again but the cluster seems to be in a degraded state (unsure how this particular cluster got here) and the playbook is unable to complete.

This looks like a problem originating by the system udev rules that are racing to change permissions on these devices (see linked tracker ticket). This leaves us with no guarantee that manually changing permissions in this way will persist.

At this point we would need to test this again with a new, healthy cluster, or wait for the current one to get out of the degraded state to verify that the playbook can complete correctly.

Comment 38 Loic Dachary 2016-11-15 22:00:54 UTC
For the record, an IRC log of unconclusive explorations

<loicd> andrewschoen: I commented on the bz  but I still don't know how to workaround this race with dm.
<andrewschoen> loicd: we're trying a workaround of manually fixing the permissions and then rerunning the update right now
<andrewschoen> we did get the OSD back up with that
<loicd> ideally there would be a way to override the default user/group for a given device
<loicd> so that we don't have to fight udev over it 
<loicd> but maybe the root:root default user/group is hardcoded 
<loicd> or if that's not possible, it would be useful to ask udev to never chown devices
<loicd>  /lib/udev/rules.d/50-udev-default.rules
<loicd> it only changes the group but it does a chown(2) instead of a chgrp(2) and changes both
<loicd> systemd-229/src/udev/udev-rules.c is where it is interpreted
<loicd> although 7.2 is running 219 I guess it did not change much
<loicd> andrewschoen: if you can consistently reproduce the problem, it may be worth trying to comment out all lines in /lib/udev/rules.d/50-udev-default.rules that have GROUP="disk" and see if that fixes the problem. It does not solve the problem of packaging this workaround but ...
<andrewschoen> loicd: at this point we can't run the playbook again because the cluster is not healthy, we're gonna wait and see if it gets healthy and then try to run the upgrade again
<loicd> andrewschoen: ... it would confirm that it originates here
<loicd> andrewschoen: do you have a minimal reproducer ? 
<alfredodeza> not right now loicd
-*- loicd installing a cluster
<andrewschoen> the strange thing is that in the run that found this 3 OSDs were upgraded and started just fine before this one failed
<loicd> since it's impossible to guarantee the sequence in which udev events are fired, it is enough that something (anything) triggers the event that runs 50-udev-default.rules to revert the permissions
<loicd> it would help to have the output of udevadm monitor to see the sequence of udev events
<loicd> it should show that 50-udev-default.rules happens on the device before and *after* 95-ceph-osd.rules. If not ... it means the theory that 50-udev-default.rules is responsible for reverting permissions is false.
<loicd> andrewschoen: do you confirm the playbook does not, at any time, run partprobe or partx on the devices ? 
<loicd> I guess it does not otherwise this problem would have surfaced often.
<andrewschoen> loicd: I know for certain that this specific playbook run does not because the ceph-osd role is not in use for rolling upgrades of dmcrypt osds
<andrewschoen> loicd: I can't find use of partprobe or partx in the ceph-osd role either
<loicd> andrewschoen: from which ceph version to which ceph version ? 
<loicd> I thought maybe the somewhat recent re-addition of 60-ceph-by-parttypeuuid.rules would cause problem but it does not contain OWNER/GROUP/MODE
<andrewschoen> yeah, so 10.2.2 to 10.2.3
-*- loicd looking at http://tracker.ceph.com/versions/518
<loicd> https://github.com/ceph/ceph/pull/8754/files is about premissions but not about devices
<loicd> https://github.com/ceph/ceph/pull/10008/files is in the vicinity (parted) but I don't see any harm
<loicd> https://github.com/ceph/ceph/pull/10497/files is about partprobe but prevents a race which is good
<loicd> I dont' see anything in v10.2.3 http://tracker.ceph.com/versions/518 that could, even remotely introduce a regression on device permissions
<ceph-ircslackbot> <sage> loicd it could just be a timing thing?
<loicd> sage: I don't see how. The only thing I can think of is that too many udev events are fired. And if that's confirmed we'll have to figure out who's firing them.
<loicd> sage: a) udev fires a "add" event, gets to default.rules which chown root:dis, then gets to ceph.rules which chown ceph:ceph, b) ceph activate runs in the background, c) "something" fires udev modified, d) default.rules chown root:disk, e) ceph-disk activate fails because of the permission
<loicd> hum not even
<loicd> because another ceph-disk activate would then be run, wait for the first to finish failing and run after the chown is done
<loicd> because 50-udev-default.rules won't change permission on anything but udev add ( ACTION!="add", GOTO="default_permissions_end" )
-*- loicd out of ideas

Comment 39 seb 2016-11-16 10:38:08 UTC
As a temporary workaround we could force the right permissions on the journal devices after the ceph-common role has been applied.

If it's urgent and a blocker we can either:

* document this as a well known but hard to reproduce issue
* introduce the change in the playbook so we keep the right permissions

What do you guys think?

Comment 40 Tejas 2016-11-16 11:42:31 UTC
Seb,

 I had a question. Can the permissions of the osd-lockbox cause this. Since /dev/sdb3 is mounted and its permissions are root:disk?

root@magna056 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       917G  2.7G  868G   1% /
devtmpfs         16G     0   16G   0% /dev
tmpfs            16G     0   16G   0% /dev/shm
tmpfs            16G  8.6M   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
tmpfs           3.2G     0  3.2G   0% /run/user/1000
/dev/sdb3       8.7M  179K  7.9M   3% /var/lib/ceph/osd-lockbox/6b074d80-96e4-4a1a-b195-0b7224b541b9
/dev/dm-1       922G   34M  922G   1% /var/lib/ceph/osd/ceph-2
[root@magna056 ~]# 
[root@magna056 ~]# 
[root@magna056 ~]# 
[root@magna056 ~]# ceph-disk list
/dev/dm-0 other, unknown
/dev/dm-1 other, xfs, mounted on /var/lib/ceph/osd/ceph-2
/dev/sda :
 /dev/sda1 other, ext4, mounted on /
/dev/sdb :
 /dev/sdb2 ceph journal (dmcrypt LUKS /dev/dm-0), for /dev/sdb1
 /dev/sdb3 ceph lockbox, active, for /dev/sdb1
 /dev/sdb1 ceph data (dmcrypt LUKS /dev/dm-1), cluster ceph, osd.2, journal /dev/sdb2
/dev/sdc other, unknown
/dev/sdd other, unknown
[root@magna056 ~]# 
[root@magna056 ~]# 
[root@magna056 ~]# ll /dev/dm*
brw-rw----. 1 ceph ceph 253, 0 Nov 16 10:33 /dev/dm-0
brw-rw----. 1 ceph ceph 253, 1 Nov 16 10:33 /dev/dm-1
[root@magna056 ~]# 
[root@magna056 ~]# 
[root@magna056 ~]# 
[root@magna056 ~]# ll /dev/sdb*
brw-rw----. 1 root disk 8, 16 Nov 16 10:33 /dev/sdb
brw-rw----. 1 ceph ceph 8, 17 Nov 16 10:33 /dev/sdb1
brw-rw----. 1 ceph ceph 8, 18 Nov 16 10:33 /dev/sdb2
brw-rw----. 1 root disk 8, 19 Nov 16 10:33 /dev/sdb3

Thanks,
Tejas

Comment 41 seb 2016-11-16 13:50:49 UTC
I don't know, we just know this has something to do with udev. So udev might have caused this for this particular OSD and all its dependancies.

Comment 42 Christina Meno 2016-11-16 14:35:36 UTC
Seb, we're recommending documenting as per https://bugzilla.redhat.com/show_bug.cgi?id=1366808#c37

Comment 43 seb 2016-11-16 14:47:09 UTC
Alright thanks Greg!

Comment 44 Tejas 2016-11-16 15:39:49 UTC
(In reply to Alfredo Deza from comment #37)
> After investigating this further we determined that the one OSD having
> issues was because the journal had root permissions:
> 
>    # ls -alh /dev/dm-0
>    brw-rw----. 1 root disk 253, 0 Nov 15 11:19 /dev/dm-0
> 
> We got to /dev/dm-0 because the journal points to:
> 
>     lrwxrwxrwx.   1 ceph ceph   48 Nov 15 06:04 journal ->
> /dev/mapper/cd92e354-e9ed-4207-bbfd-62e6778d3838
> 
> And the mapper points to /dev/dm-0:
> 
>     lrwxrwxrwx. 1 root root 7 Nov 15 11:19
> /dev/mapper/cd92e354-e9ed-4207-bbfd-62e6778d3838 -> ../dm-0
> 
> In order to get this working we had to do the following (manually):
> 
> 1) Set the right permissions on the actual device: 
>     chown -R ceph:ceph /dev/dm-0
> 
> 2) Reset the failure state for the ceph-osd service
>     systemctl reset-failed ceph-osd
> 
> 3) Start the osd daemon
>     systemctl start ceph-osd
> 
> 4) Verify that the OSD daemon came up and it shows in the osd tree:
>     systemctl status ceph-osd
>     ceph osd tree
> 
> Once all those steps were completed, we were able to run the rolling upgrade
> playbook again but the cluster seems to be in a degraded state (unsure how
> this particular cluster got here) and the playbook is unable to complete.
The cluster is able to reach a OK state after following these steps
> 
> This looks like a problem originating by the system udev rules that are
> racing to change permissions on these devices (see linked tracker ticket).
> This leaves us with no guarantee that manually changing permissions in this
> way will persist.
> 
> At this point we would need to test this again with a new, healthy cluster,
> or wait for the current one to get out of the degraded state to verify that
> the playbook can complete correctly.

I ran again with a fresh cluster, and saw the same issue on the first OSD node.
Ran the above steps, and cluster was OK again.
After running rolling_update again, there was no issues on the first node, saw a failure on the second node and so on.

Another observation:
Only the /dev/dm* devices pertaining to colocated journals had their permissions changed to root:root. The dm devices of dedicated journals were not changed.

Thanks,
Tejas

Comment 46 Christina Meno 2016-11-16 18:36:00 UTC
I recommend that we use yum update on failed nodes instead of another invocation of rolling_update because https://bugzilla.redhat.com/show_bug.cgi?id=1395820

We've already got that process documented in the upgrade guide for 1.3 to 2.0

Federico what do you think?

Comment 48 Christina Meno 2016-11-16 22:54:19 UTC
We will document this issue by telling customers to do a rolling update for clusters that do not have encrypted OSDs. If you have an encrypted OSD then they should do a yum update(like we advised in the upgrade from 1.3 to 2.0 steps).

Comment 50 seb 2016-11-21 14:16:27 UTC
lgtm :)

Comment 52 Christina Meno 2017-02-15 19:05:23 UTC
Seb and Andrew,

What if we wrote some ansible to make sure the ceph user was in the disk group?

Comment 53 Andrew Schoen 2017-02-15 22:15:08 UTC
Gregory,

I also see root:root in the comments here, so I'm unsure if making sure that the ceph user is in the disk group would be enough.

Seb had an idea in https://bugzilla.redhat.com/show_bug.cgi?id=1366808#c39, maybe that would work? Seb, any thoughts on a workaround in ceph-ansible?

Comment 54 Andrew Schoen 2017-02-16 17:37:52 UTC
Tejas,

Would you please test this BZ again with the latest v2.1.9 version of ceph-ansible? I'm wondering if this is still even an issue now. I've been working on upstream tests for this and haven't been able to reproduce yet.

Thanks!

Andrew

Comment 56 Tejas 2017-02-17 03:33:59 UTC
As part of ceph 2.2 testing, I have tested upgrades from 1.3.3 to 2.2, and from 2.1 to 2.2, all with colocated journal and dedicated journal encrypted OSDs.
This has been done on both RHEL and Ubuntu.
So far no issues have been seen.
so yes this has been resolved in the rebased. ceph-ansible

Thanks,
Tejas

Comment 57 Tejas 2017-02-17 03:52:49 UTC
Bara,

  I don't think this needs doc text anymore.

Moving this bug to Verified state.

Thanks,
Tejas

Comment 59 seb 2017-02-20 20:56:56 UTC
Bara, doc text fields looks good to me.

Comment 60 Ken Dreyer (Red Hat) 2017-03-03 16:37:13 UTC
*** Bug 1373736 has been marked as a duplicate of this bug. ***

Comment 61 Ken Dreyer (Red Hat) 2017-03-03 16:41:18 UTC
*** Bug 1391468 has been marked as a duplicate of this bug. ***

Comment 63 errata-xmlrpc 2017-03-14 15:50:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:0515