Bug 1392631 - [Docs][Upgrade] Provide a script as a sample for those who want to migrate their RHOSP 9 deployments into enforcing mode after the upgrade to RHOSP 10
Summary: [Docs][Upgrade] Provide a script as a sample for those who want to migrate th...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Dan Macpherson
QA Contact: RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-07 23:03 UTC by Omri Hochman
Modified: 2021-03-30 13:24 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-30 13:24:52 UTC
Target Upstream Version:
wusui: needinfo+


Attachments (Terms of Use)
audit.log (262.46 KB, text/plain)
2016-11-07 23:05 UTC, Omri Hochman
no flags Details
relabel_osd_paths.sh (433 bytes, application/x-shellscript)
2017-04-26 10:19 UTC, Giulio Fidente
no flags Details
systemctl list-units output of ceph osds. (2.60 KB, text/plain)
2017-07-08 03:00 UTC, Warren
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1640241 0 None None None 2016-11-08 17:41:06 UTC

Description Omri Hochman 2016-11-07 23:03:15 UTC
osp-director-10 : Multiple AVCs on Ceph-node when running reboot post Upgrade 9-10 (Ceph SELinux AVC denied errors) 


Environment ( ceph-node):
-------------
openstack-selinux-0.7.12-1.el7ost.noarch
python-cephfs-10.2.2-41.el7cp.x86_64
ceph-osd-10.2.2-41.el7cp.x86_64
ceph-selinux-10.2.2-41.el7cp.x86_64
puppet-ceph-2.2.1-3.el7ost.noarch
ceph-common-10.2.2-41.el7cp.x86_64
ceph-mon-10.2.2-41.el7cp.x86_64
libcephfs1-10.2.2-41.el7cp.x86_64
ceph-base-10.2.2-41.el7cp.x86_64

undercloud:
-------------
instack-undercloud-5.0.0-2.el7ost.noarch
openstack-tripleo-heat-templates-5.0.0-1.2.el7ost.noarch


Steps: 
--------
(1) Deploy OSP9 with 3 controller 1 compute 1 ceph
(2) Attempt to upgrade osp9 to osp10 
(3) post upgrade -> reboot the undercloud + overcloud 
(4) check the ceph node for AVCs 


Results: 
--------
Multiple AVCs on Ceph node


Ceph node /var/log/audit/audit.log: 
------------------------------------
type=AVC msg=audit(1478509118.347:229): avc:  denied  { read } for  pid=37679 comm="ceph-osd" name="magic" dev="sda2" ino=553648195 scontext=system_u:system_r:ceph_t:s0 tcontext=system_u:obj
ect_r:var_t:s0 tclass=file
type=AVC msg=audit(1478509118.347:229): avc:  denied  { open } for  pid=37679 comm="ceph-osd" path="/srv/data/magic" dev="sda2" ino=553648195 scontext=system_u:system_r:ceph_t:s0 tcontext=sy
stem_u:object_r:var_t:s0 tclass=file
type=SYSCALL msg=audit(1478509118.347:229): arch=c000003e syscall=2 success=yes exit=4 a0=7fffb82c8520 a1=0 a2=7fffb82c853e a3=5 items=0 ppid=1 pid=37679 auid=4294967295 uid=167 gid=167 euid
=167 suid=167 fsuid=167 egid=167 sgid=167 fsgid=167 tty=(none) ses=4294967295 comm="ceph-osd" exe="/usr/bin/ceph-osd" subj=system_u:system_r:ceph_t:s0 key=(null)
type=AVC msg=audit(1478509118.413:230): avc:  denied  { write } for  pid=37679 comm="ceph-osd" name="fsid" dev="sda2" ino=553648194 scontext=system_u:system_r:ceph_t:s0 tcontext=system_u:obj
ect_r:var_t:s0 tclass=file
type=SYSCALL msg=audit(1478509118.413:230): arch=c000003e syscall=2 success=yes exit=8 a0=7fffb82c9570 a1=2 a2=1a4 a3=18 items=0 ppid=1 pid=37679 auid=4294967295 uid=167 gid=167 euid=167 sui
d=167 fsuid=167 egid=167 sgid=167 fsgid=167 tty=(none) ses=4294967295 comm="ceph-osd" exe="/usr/bin/ceph-osd" subj=system_u:system_r:ceph_t:s0 key=(null)
type=AVC msg=audit(1478509118.413:231): avc:  denied  { lock } for  pid=37679 comm="ceph-osd" path="/srv/data/fsid" dev="sda2" ino=553648194 scontext=system_u:system_r:ceph_t:s0 tcontext=sys
tem_u:object_r:var_t:s0 tclass=file
type=SYSCALL msg=audit(1478509118.413:231): arch=c000003e syscall=72 success=yes exit=0 a0=8 a1=6 a2=7fffb82c92e0 a3=18 items=0 ppid=1 pid=37679 auid=4294967295 uid=167 gid=167 euid=167 suid
=167 fsuid=167 egid=167 sgid=167 fsgid=167 tty=(none) ses=4294967295 comm="ceph-osd" exe="/usr/bin/ceph-osd" subj=system_u:system_r:ceph_t:s0 key=(null)
type=AVC msg=audit(1478509118.413:232): avc:  denied  { read write } for  pid=37679 comm="ceph-osd" name="data" dev="sda2" ino=553648192 scontext=system_u:system_r:ceph_t:s0 tcontext=system_
u:object_r:var_t:s0 tclass=dir
type=SYSCALL msg=audit(1478509118.413:232): arch=c000003e syscall=21 success=yes exit=0 a0=7f478dc5cc58 a1=6 a2=7f477f6e3380 a3=7f477f6fb960 items=0 ppid=1 pid=37679 auid=4294967295 uid=167 
gid=167 euid=167 suid=167 fsuid=167 egid=167 sgid=167 fsgid=167 tty=(none) ses=4294967295 comm="ceph-osd" exe="/usr/bin/ceph-osd" subj=system_u:system_r:ceph_t:s0 key=(null)
type=AVC msg=audit(1478509118.413:233): avc:  denied  { add_name } for  pid=37679 comm="ceph-osd" name="fiemap_test" scontext=system_u:system_r:ceph_t:s0 tcontext=system_u:object_r:var_t:s0 
tclass=dir
type=AVC msg=audit(1478509118.413:233): avc:  denied  { create } for  pid=37679 comm="ceph-osd" name="fiemap_test" scontext=system_u:system_r:ceph_t:s0 tcontext=system_u:object_r:var_t:s0 tc
lass=file
type=SYSCALL msg=audit(1478509118.413:233): arch=c000003e syscall=2 success=yes exit=10 a0=7fffb82c5f90 a1=242 a2=1a4 a3=18 items=0 ppid=1 pid=37679 auid=4294967295 uid=167 gid=167 euid=167 
suid=167 fsuid=167 egid=167 sgid=167 fsgid=167 tty=(none) ses=4294967295 comm="ceph-osd" exe="/usr/bin/ceph-osd" subj=system_u:system_r:ceph_t:s0 key=(null)
type=AVC msg=audit(1478509118.414:234): avc:  denied  { remove_name } for  pid=37679 comm="ceph-osd" name="fiemap_test" dev="sda2" ino=553648210 scontext=system_u:system_r:ceph_t:s0 tcontext
=system_u:object_r:var_t:s0 tclass=dir
type=AVC msg=audit(1478509118.414:234): avc:  denied  { unlink } for  pid=37679 comm="ceph-osd" name="fiemap_test" dev="sda2" ino=553648210 scontext=system_u:system_r:ceph_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file
type=SYSCALL msg=audit(1478509118.414:234): arch=c000003e syscall=87 success=yes exit=0 a0=7fffb82c5f90 a1=0 a2=0 a3=fffffff8 items=0 ppid=1 pid=37679 auid=4294967295 uid=167 gid=167 euid=167 suid=167 fsuid=167 egid=167 sgid=167 fsgid=167 tty=(none) ses=4294967295 comm="ceph-osd" exe="/usr/bin/ceph-osd" subj=system_u:system_r:ceph_t:s0 key=(null)
type=AVC msg=audit(1478509120.434:235): avc:  denied  { setattr } for  pid=37679 comm="ceph-osd" name="xattr_test" dev="sda2" ino=553648210 scontext=system_u:system_r:ceph_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file
type=SYSCALL msg=audit(1478509120.434:235): arch=c000003e syscall=190 success=yes exit=0 a0=a a1=7fffb82c7010 a2=7fffb82c7188 a3=4 items=0 ppid=1 pid=37679 auid=4294967295 uid=167 gid=167 euid=167 suid=167 fsuid=167 egid=167 sgid=167 fsgid=167 tty=(none) ses=4294967295 comm="ceph-osd" exe="/usr/bin/ceph-osd" subj=system_u:system_r:ceph_t:s0 key=(null)
type=AVC msg=audit(1478509120.434:236): avc:  denied  { getattr } for  pid=37679 comm="ceph-osd" name="xattr_test" dev="sda2" ino=553648210 scontext=system_u:system_r:ceph_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file
type=SYSCALL msg=audit(1478509120.434:236): arch=c000003e syscall=193 success=yes exit=4 a0=a a1=7fffb82c7010 a2=7fffb82c718c a3=4 items=0 ppid=1 pid=37679 auid=4294967295 uid=167 gid=167 euid=167 suid=167 fsuid=167 egid=167 sgid=167 fsgid=167 tty=(none) ses=4294967295 comm="ceph-osd" exe="/usr/bin/ceph-osd" subj=system_u:system_r:ceph_t:s0 key=(null)
type=AVC msg=audit(1478509120.533:237): avc:  denied  { rename } for  pid=37679 comm="ceph-osd" name="000009.dbtmp" dev="sda2" ino=562036849 scontext=system_u:system_r:ceph_t:s0 tcontext=system_u:object_r:var_t:s0 tclass=file

Comment 1 Omri Hochman 2016-11-07 23:05:16 UTC
Created attachment 1218298 [details]
audit.log

Comment 2 Giulio Fidente 2016-11-08 15:30:20 UTC
I suspect we need to run a restorecon after the upgrade. We could gather that from audit.log

Comment 3 Lon Hohberger 2016-11-08 15:36:03 UTC
Is /srv/data hard-coded (or part of the rpm build definitions) in ceph or an installation-specific option?

I don't have any listings for it when I check 'semanage fcontext -l', but that's not on an OSP deployment.

Comment 4 seb 2016-11-08 15:51:50 UTC
@Giulio so we just need to update the upgrade script right?

Comment 5 Giulio Fidente 2016-11-08 16:10:04 UTC
(In reply to seb from comment #4)
> @Giulio so we just need to update the upgrade script right?

ack, that's the default location TripleO uses for non-bdev OSDs ; we'll need to set the appropriate context on it during upgrade as we move in "enforcing" mode with the upgrade

Comment 12 Omri Hochman 2016-11-10 16:54:47 UTC
It seems that the ceph node post upgrade 9 - 10 is in 'Permissive mode '-  therefore those AVCs won't have any bad effect.  

Post Upgrade 9 -> 10  (Ceph-node): 
[root@overcloud-cephstorage-0 ~]# getenforce 
Permissive

According our discussion, we're not changing SELliux to be 'Enforcing' automatically during upgrade, and we can document 'how to set Ceph-node to Enforcing mode' by running Manual-steps Post Upgrade and fixing the label in a way that won't have those AVCs issue.

Comment 13 Marios Andreou 2016-11-11 13:30:17 UTC
So WRT the discussion on the review and the concerns about how long it could potentially take to run the chcon, I note that Keith estimates 'more than 20 minutes' for a really bad case. I think that is within the realms of acceptable but ultimately if we can avoid it alltogether then I'd prefer that :). So since this doesn't affect the upgrade, it could be a post upgrade Docs note.

Omri, I think we only get a say/interested here if this affects upgrades. As discussed on last night's scrum and noted by you in comment #12 even though we may see AVCs they should not cause problems since the ceph nodes will by default be Permissive. So as of current discussion and my understanding is this is no longer a lifecycle issue or a QE test blocker. It is already assigned to Ceph and they will process accordingly.

@gfidente please confirm this and clear the needinfo hanging over your good name

Comment 14 Giulio Fidente 2016-11-11 13:43:30 UTC
Omri, can you confirm that you don't see denials on the controller nodes?

Comment 15 Jeff Brown 2016-11-11 14:19:08 UTC
The Ceph team has dealt with this upgrade step by creating by creating a POST upgrade step it has documented.  We should be consistent with their approach.

They document it and let the admin deal with it

It is mentioned in the upgrade downstream doc from 1.3.3 to 2.0 -
https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/paged/installation-guide-for-red-hat-enterprise-linux/chapter-5-upgrading-ceph-storage-cluster

This is what is mentioned there:

If SELinux is set to enforcing mode, then set a relabelling of the
SELinux context on files for the next reboot:

# touch /.autorelabel

We will look at maybe changing the touch command to the following applied against the Ceph directories..  We will verify with the Ceph team.

# chcon/restorecon the Ceph directories

WARNING

Relabeling will take a long time to complete, because SELinux must
traverse every file system and fix any mislabeled files.

Comment 17 John Fulton 2016-11-11 22:11:15 UTC
Giulio,

Is the plan to have documentation to tell the user to create a NodeExtraConfigPost shell script [1]? 

Would the script would run the commands from your proposed changes to major_upgrade_ceph_storage.sh [2] ? 

Please confirm and I can write up some doctext. 

Thanks,
  John

[1] https://access.redhat.com/documentation/en/red-hat-openstack-platform/10-beta/single/advanced-overcloud-customization/#sect-Customizing_Overcloud_PostConfiguration

[2] https://review.openstack.org/#/c/395097/3/extraconfig/tasks/major_upgrade_ceph_storage.sh

Comment 19 Derek 2016-11-14 21:55:07 UTC
Sean, it will likely be post-GA, given our current commitments for GA.

Comment 20 Giulio Fidente 2016-11-21 12:46:40 UTC
(In reply to John Fulton from comment #17)
> Giulio,
> 
> Is the plan to have documentation to tell the user to create a
> NodeExtraConfigPost shell script [1]? 
> 
> Would the script would run the commands from your proposed changes to
> major_upgrade_ceph_storage.sh [2] ? 
> 
> Please confirm and I can write up some doctext. 
> 
> Thanks,
>   John
> 
> [1]
> https://access.redhat.com/documentation/en/red-hat-openstack-platform/10-
> beta/single/advanced-overcloud-customization/#sect-
> Customizing_Overcloud_PostConfiguration
> 
> [2]
> https://review.openstack.org/#/c/395097/3/extraconfig/tasks/
> major_upgrade_ceph_storage.sh

hi John,

running an SELinux relabel might not be sufficient in our scenario because we don't have any fcontext rule describing which context should be set on the OSDs data directories.

I think to add the fcontext rule we could test something like the following:

    OSD_IDS=$(ls /var/lib/ceph/osd | awk 'BEGIN { FS = "-" } ; { print $2 }')
    for OSD_ID in $OSD_IDS; do
      OSD_RPATH=$(realpath /var/lib/ceph/osd/ceph-${OSD_ID}/)
      if [ ${OSD_RPATH}/ != ${OSD_PATH} ]; then
        semanage fcontext -a -t ceph_var_lib_t "${OSD_RPATH}(/.*)?"
      fi
    done

At which point we should be able to continue as per existing docs with

    touch /.autorelabel
    reboot

Comment 21 Giulio Fidente 2016-11-29 20:49:15 UTC
The AVC denials do not prevent the Ceph OSDs from working because we don't switch selinux to 'enforcing' mode during the upgrade.

The additional steps (which we need to test) to configure a cephstorage to run in 'enforcing' mode after the upgrade are as follows:

1) On a Monitor node, set noout and norebalance flags for the OSDs:

    ceph osd set noout
    ceph osd set norebalance

2) On each cephstorage node, one by one, execute the following:

    OSD_IDS=$(ls /var/lib/ceph/osd | awk 'BEGIN { FS = "-" } ; { print $2 }')
    for OSD_ID in $OSD_IDS; do
      systemctl stop ceph-osd@${OSD_ID}
      OSD_RPATH=$(realpath /var/lib/ceph/osd/ceph-${OSD_ID}/)
      if [ ${OSD_RPATH}/ != ${OSD_PATH} ]; then
        semanage fcontext -a -t ceph_var_lib_t "${OSD_RPATH}(/.*)?"
      fi
    done
    touch /.autorelabel
    reboot

3) On a Monitor node, and after all cephstorage nodes have been upgraded, unset the noout and norebalance flags:

    ceph osd unset noout
    ceph osd unset norebalance

NOTE: while the noout and norebalance flags are set, the Ceph cluster will have a HEALTH_WARN status

Comment 24 Lucy Bopf 2017-04-24 02:29:19 UTC
Giulio, Yogev, is this bug ready for the documentation team to work on? If not, can you please change the bug component to whichever team is currently responsible?

This will be immensely helpful for our bug tracking. Thanks!

Comment 27 Warren 2017-04-25 01:52:11 UTC
I have not yet run the full openstack 9 to openstack 10 upgrade because I have run into a subscription issue at the moment.

However, on a running system I ssh'ed to a ceph osd and ran the script.  OSD_PATH was not set, so the statement:

if [ ${OSD_RPATH}/ != ${OSD_PATH} ]; then

failed with:

-bash: [: /var/lib/ceph/osd/ceph-1/: unary operator expected

What should OSD_PATH be set to?

Comment 28 Warren 2017-04-25 01:52:22 UTC
I have not yet run the full openstack 9 to openstack 10 upgrade because I have run into a subscription issue at the moment.

However, on a running system I ssh'ed to a ceph osd and ran the script.  OSD_PATH was not set, so the statement:

if [ ${OSD_RPATH}/ != ${OSD_PATH} ]; then

failed with:

-bash: [: /var/lib/ceph/osd/ceph-1/: unary operator expected

What should OSD_PATH be set to?

Comment 29 Warren 2017-04-25 22:07:38 UTC
Step 2 in comment 21 suggests doing the the following:

   OSD_IDS=$(ls /var/lib/ceph/osd | awk 'BEGIN { FS = "-" } ; { print $2 }')
    for OSD_ID in $OSD_IDS; do
      systemctl stop ceph-osd@${OSD_ID}
      OSD_RPATH=$(realpath /var/lib/ceph/osd/ceph-${OSD_ID}/)
      if [ ${OSD_RPATH}/ != ${OSD_PATH} ]; then
        semanage fcontext -a -t ceph_var_lib_t "${OSD_RPATH}(/.*)?"
      fi
    done
    touch /.autorelabel
    reboot

I ssh'ed to an OSD node and tried running this script.

It failed with:
   -bash: [: /var/lib/ceph/osd/ceph-1/: unary operator expected

${OSD_PATH} is not set in my shell session.

Also, I think that the /var/lib/ceph/osd/ceph-* name may be incorrect for clusters not named ceph.

I would like to know how to proceed.  Is this a problem with the script or a problem with the system that I tested this on?

Thanks.

Comment 30 Giulio Fidente 2017-04-26 10:19:05 UTC
Created attachment 1274168 [details]
relabel_osd_paths.sh

Thanks for helping, I apologize. I am updating the script (attaching it to the bug instead of pasting it inline) and have revisited the steps to avoid the reboot.

1) On a Monitor node, set the noout and norebalance osd flags so that data is not rebalanced during the OSDs downtime:

    ceph osd set noout
    ceph osd set norebalance

2) On each CephStorage node, one by one and not on all nodes together, run the relabel_osd_fcontext.sh script.

The script will stop the OSDs on the node, add an SELinux fcontext rule to label the OSD data path, relabel the data path (which might take a long time, depending on how much data is stored by the OSD) and start the OSD again.

3) On a Monitor node, and after all CephStorage nodes have been upgraded, unset the noout and norebalance flags:

    ceph osd unset noout
    ceph osd unset norebalance

NOTE: while the noout and norebalance flags are set, the Ceph cluster will have a HEALTH_WARN status.

Can you guys please help me testing both the script and the process again?

Comment 31 Warren 2017-04-26 23:29:56 UTC
I am currently upgrading OSP9 to OSP10 in order to test this.

I would like to know if my interpretation of this is correct.

After upgrading, we will not be in enforcing mode (from comment 25).
We should then run the steps in Comment 30.
After that, we should be in enforcing mode.

Comment 32 Giulio Fidente 2017-04-27 11:09:36 UTC
(In reply to Warren from comment #31)
> I am currently upgrading OSP9 to OSP10 in order to test this.
> 
> I would like to know if my interpretation of this is correct.
> 
> After upgrading, we will not be in enforcing mode (from comment 25).
> We should then run the steps in Comment 30.
> After that, we should be in enforcing mode.

hi Warren, yes after the automated upgrade on the CephStorage nodes SELinux will still be running in 'permissive' mode and the AVCs denials will be seen in the audit.log

The process in comment #30 (and the script) will relabel the Ceph data but switch SELinux in 'enforcing' mode. When the process in comment #30 is finished, there should not be new denials in the audit.log and if so, the operator can manually switch SELinux in enforcing mode.

We could make the script switch SELinux in 'enforcing' mode but it seems risky because we should first make sure there aren't new denials in the audit.log

Comment 33 Warren 2017-05-10 06:42:51 UTC
Upgrading from OSP 9 to 10 has been problematical on the virtual magna machines.  Two problems have come up today.

Bring up the OSPD 9 overcloud, I would get messages like:

2017-05-10 05:20:53 [NovaCompute]: CREATE_FAILED ResourceInError: resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"

Oddly, if I run heat stack-delete overcloud, and rerun the overcloud install, the overcloud would come up.  My guess is that the first pass ends up setting stuff that makes things work on the second pass, but I am not sure.

The second problem is that once the undercloud is upgraded to OSP-10, I cannot ssh to some of my virtual machines (No route to host).  So I am unable to check the /var/log/yum.log files on those nodes to see if there is any update information.  arp -an shows:

? (192.0.2.11) at <incomplete> on br-ctlplane
instead of
? (192.0.2.11) at XX:XX:XX:XX:XX:XX on br-ctlplane (XX... was a MAC address)

I am reinstalling OSP 9 on one machine (magna050) right now (I will probably let it run for a while -- the overcloud part of the install takes over an hour and a half).  The other machine (magna032) is in the state where I cannot ssh to the virtual machines.   I can see that those virtual machines are still there:

virsh list --all
Id    Name                           State
----------------------------------------------------
 1     aardvark_magna032              running
 12    aardvark-ceph-2                running
 13    aardvark-ceph-3                running
 14    aardvark-ceph-1                running
 15    aardvark-compute-1             running
 16    aardvark-control-1             running

Comment 34 Warren 2017-06-21 04:23:24 UTC
I have gotten back to this after being waylaid by other issues.

I reinstalled OSP-9 and started upgrading to OSP-10, when I ran into some connection issues.  I ended up in a situation where one ceph node and the compute node were still building while all the other nodes were up.   The vm that i was running on then froze for some reason.  When I brought things back up and reconnected, I figured that I could do a heat-stack delete and start the overcloud install again.  The delete failed and I still had the overcloud
sitting around.

I am running the overcloud install right now (reimaged everything) and it should be up by tomorrow morning.

I have also looked at the scripts and I believe that there is a problem, but before I put my foot in my mouth I want to double check things once things are up again.

Comment 35 Warren 2017-06-28 18:00:06 UTC
I will attempt to test the script today to make sure it behaves properly.  I am currently having connection issues and will probably have to reinstall on my test machine.

After that, I will attempt the OSP9 to OSP10 upgrade and check the behavior of  selinux and the script.

Comment 36 Warren 2017-07-08 03:00:55 UTC
Created attachment 1295425 [details]
systemctl list-units output of ceph osds.

Comment 37 Warren 2017-07-08 03:05:05 UTC
It appears that systemctl stop ceph-osd@${OSD_ID} may not work in the script in all cases.

On my test system, I see that the /var/lib/ceph/osd directory has ceph-2 ceph-5 and ceph-7 directories, but the running osd services are named 
ceph-osd.2.1499453467.081271239.service ceph-osd.5.1499453485.880793667.service and ceph-osd.7.1499453506.087426145.service

[heat-admin@overcloud-cephstorage-0 ~]$ ls /var/lib/ceph/osd
ceph-2  ceph-5  ceph-7
[heat-admin@overcloud-cephstorage-0 ~]$ sudo systemsctl list-units | grep ceph
sudo: systemsctl: command not found
[heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl list-units | grep ceph
  sys-devices-pci0000:00-0000:00:07.0-ata4-host3-target3:0:0-3:0:0:0-block-sdb-sdb1.device loaded active plugged   QEMU_HARDDISK ceph\x20data
  sys-devices-pci0000:00-0000:00:07.0-ata4-host3-target3:0:0-3:0:0:0-block-sdb-sdb2.device loaded active plugged   QEMU_HARDDISK ceph\x20journal
  sys-devices-pci0000:00-0000:00:07.0-ata5-host4-target4:0:0-4:0:0:0-block-sdc-sdc1.device loaded active plugged   QEMU_HARDDISK ceph\x20data
  sys-devices-pci0000:00-0000:00:07.0-ata5-host4-target4:0:0-4:0:0:0-block-sdc-sdc2.device loaded active plugged   QEMU_HARDDISK ceph\x20journal
  sys-devices-pci0000:00-0000:00:07.0-ata6-host5-target5:0:0-5:0:0:0-block-sdd-sdd1.device loaded active plugged   QEMU_HARDDISK ceph\x20data
  sys-devices-pci0000:00-0000:00:07.0-ata6-host5-target5:0:0-5:0:0:0-block-sdd-sdd2.device loaded active plugged   QEMU_HARDDISK ceph\x20journal
  var-lib-ceph-osd-ceph\x2d2.mount                                                         loaded active mounted   /var/lib/ceph/osd/ceph-2
  var-lib-ceph-osd-ceph\x2d5.mount                                                         loaded active mounted   /var/lib/ceph/osd/ceph-5
  var-lib-ceph-osd-ceph\x2d7.mount                                                         loaded active mounted   /var/lib/ceph/osd/ceph-7
  ceph-osd.2.1499453467.081271239.service                                                  loaded active running   /bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph -f
  ceph-osd.5.1499453485.880793667.service                                                  loaded active running   /bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 5 --pid-file /var/run/ceph/osd.5.pid -c /etc/ceph/ceph.conf --cluster ceph -f
  ceph-osd.7.1499453506.087426145.service                                                  loaded active running   /bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=128MB /usr/bin/ceph-osd -i 7 --pid-file /var/run/ceph/osd.7.pid -c /etc/ceph/ceph.conf --cluster ceph -f
  ceph.service                                                                             loaded active exited    LSB: Start Ceph distributed file system daemons at boot time
[heat-admin@overcloud-cephstorage-0 ~]$ 

On another system that I just looked at, I saw the following:
[ubuntu@magna032 ~]$ ls /var/lib/ceph/osd
ceph-4  ceph-6  ceph-8
[ubuntu@magna032 ~]$ sudo systemctl list-units | grep ceph
  sys-devices-pci0000:00-0000:00:1f.2-ata2-host1-target1:0:0-1:0:0:0-block-sdb-sdb1.device loaded active plugged   Hitachi_HUA722010CLA330 ceph\x20data
  sys-devices-pci0000:00-0000:00:1f.2-ata2-host1-target1:0:0-1:0:0:0-block-sdb-sdb2.device loaded active plugged   Hitachi_HUA722010CLA330 ceph\x20journal
  sys-devices-pci0000:00-0000:00:1f.2-ata3-host2-target2:0:0-2:0:0:0-block-sdc-sdc1.device loaded active plugged   Hitachi_HUA722010CLA330 ceph\x20data
  sys-devices-pci0000:00-0000:00:1f.2-ata3-host2-target2:0:0-2:0:0:0-block-sdc-sdc2.device loaded active plugged   Hitachi_HUA722010CLA330 ceph\x20journal
  sys-devices-pci0000:00-0000:00:1f.2-ata4-host3-target3:0:0-3:0:0:0-block-sdd-sdd1.device loaded active plugged   Hitachi_HUA722010CLA330 ceph\x20data
  sys-devices-pci0000:00-0000:00:1f.2-ata4-host3-target3:0:0-3:0:0:0-block-sdd-sdd2.device loaded active plugged   Hitachi_HUA722010CLA330 ceph\x20journal
  ceph-mon                                                                loaded active running   Ceph Monitor
  ceph-osd                                                                     loaded active running   Ceph OSD
  ceph-osd                                                                     loaded active running   Ceph OSD
  ceph-osd                                                                     loaded active running   Ceph OSD
  system-ceph\x2dmon.slice                                                                 loaded active active    system-ceph\x2dmon.slice
  system-ceph\x2dosd.slice                                                                 loaded active active    system-ceph\x2dosd.slice
[ubuntu@magna032 ~]$

Comment 38 Warren 2017-08-11 21:16:14 UTC
Finally got an OSPD9 version upgraded to OPSD10.  The script works fine here.  I set enforcing mode, ran the script on each osd, rebooted, and saw no new audit.log messages where the type=USER_AVC.

Comment 39 Lucy Bopf 2017-09-04 05:08:02 UTC
Giulio, Warren,

Now that the script has been verified, are there documentation changes we need to make? If so, I will move the status back to 'NEW' so that we can add this to our backlog.

Comment 40 Giulio Fidente 2017-11-24 09:27:19 UTC
I think we might provide the script as a sample for those who want to migrate their OSP9 deployments into enforcing mode after the upgrade to OSP10. Warren, what do you think?

Comment 41 Lucy Bopf 2017-12-21 01:33:11 UTC
Restoring needinfo on Warren re comment 40.

Comment 42 Warren 2018-01-15 17:07:14 UTC
Giulio's attachment in Comment 30 (attachment 1274168 [details]) is what we would want to document.  The new paragraph in the upgrade document should read something like:

When migrating OSP9 deployments to OSP10 in enforcing mode, run the following script after the upgrade to OSP10.

              <insert Giulio's script (attachment 1274168 [details]) here>

Comment 44 Lucy Bopf 2018-04-24 02:31:18 UTC
Moving back to 'NEW' to be rescoped as resources allow.

Also, changing DFG designation to 'Upgrades', given this is exclusively about an upgrade use case.

Comment 45 Sofer Athlan-Guyot 2018-08-07 12:14:06 UTC
Hi,

reviving this old bz, as it had some highlight recently.

So the current state is that:

 1. osp9 was in permissive mode:  

Then the relabeling happens during the installation of ceph-selinux.  You will need ceph 2.5z2 in the osp10 view or more recent for the relabeling to be fast.  If you have a lesser version please get a newer one unless you want to wait hour for relabeling.

To activate selinux on the node you then just proceed as usual.  After the upgrade, you change the value in /etc/sysconfig/selinux of SELINUX to ENFORCING and reboot the node.

 2. osp9 was in selinux disabled mode.

Then relabeling wasn't done during the upgrade.  To enable selinux, you have to follow that procedure:

  a. reboot in permissive mode by changing the configuration in /etc/sysconfig/selinux
  b. stop all the osd on the node:

for instance using:

   OSD_IDS=$(ls /var/lib/ceph/osd | awk 'BEGIN { FS = "-" } ; { print $2 }')
   for OSD_ID in $OSD_IDS; do
       systemctl stop ceph-osd@${OSD_ID}
   done

  c. fix the labels:  ceph-disk fix --all
  d. you can now reboot in enforcing mode after having changed /etc/sysconfig/selinux appropriately

Enjoy selinux and stop ... https://stopdisablingselinux.com/

Comment 46 Sofer Athlan-Guyot 2018-08-07 12:16:54 UTC
Hi Dan,

do you thing this could belong to the osp10 upgrade documentation ?

Comment 47 Dan Macpherson 2018-08-07 14:02:15 UTC
I think it's possible. Would the steps outlined in comment #45 be a pre or post upgrade procedure?

Comment 48 Sofer Athlan-Guyot 2018-08-16 08:21:00 UTC
Hi Dan,

(In reply to Dan Macpherson from comment #47)
> I think it's possible. Would the steps outlined in comment #45 be a pre or
> post upgrade procedure?

A bit of both :)

it would have to be a pre-upgrade *check*:
 - check if you're osp9 is in permissive or disabled selinux;
   - if permissive check that your osp10 satellite view has at least ceph 2.5z2
     - if not, get it as your osd node upgrade will take hours without it.
     - do the ceph non-controller upgrade as usual
   - if disabled do the ceph non-controller upgrade as usual;

Now, *after* the ceph osd upgrade, you can make the switch to enforcing mode.  This can be done one node at a time just after each node upgrade or after all the node has been upgraded, but still one node at a time (as it requires a reboot).

1. from permissive:

To activate selinux on the node you then just proceed as usual.  After the upgrade, you change the value in /etc/sysconfig/selinux of SELINUX to ENFORCING and reboot the node.


2. from disabled, it's the step2 in comment #45

On a more general view we needs to have a "stuff to manually check before upgrade section in osp10/11/12/13/14/ffwd. For 15 we should have a tool for it.

I'm about to create some more content for it in another bz, starting with osp10.  So the selinux mode/ceph-selinux version would be one of those items. So it would be better to make a generic "Manual checks before upgrading" and include the cephs osd in it than doing "check ceph osd" section.

Comment 49 Dan Macpherson 2018-08-20 04:06:38 UTC
ACK. Thanks, Sofer.

Comment 53 Dan Macpherson 2021-03-30 13:24:52 UTC
OSP10 was EOL on Dec 16, 2019 and this most likely doesn't affect later versions. If the issue still occurs, feel free to reopen this BZ against OSP 16.1.


Note You need to log in before you can comment on or make changes to this bug.