Bug 1544836

Summary: container naming for OSDs makes them difficult to manage
Product: Red Hat Ceph Storage Reporter: Ben England <bengland>
Component: Ceph-AnsibleAssignee: leseb <shan>
Status: CLOSED ERRATA QA Contact: Shreekar <sshreeka>
Severity: medium Docs Contact: Bara Ancincova <bancinco>
Priority: medium    
Version: 3.0CC: adeza, anharris, aschoen, ceph-eng-bugs, ddharwar, edonnell, gfidente, gmeno, hnallurv, jtaleric, nthomas, sankarshan, shan, tserlin, twilkins, vashastr
Target Milestone: rc   
Target Release: 3.2   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: RHEL: ceph-ansible-3.2.0-0.1.beta5.el7cp Ubuntu: ceph-ansible_3.2.0~beta5-2redhat1 Doc Type: Bug Fix
Doc Text:
.Ansible now sets container and service names that correspond with OSD numbers When containerized Ceph OSDs were deployed with the `ceph-ansible` utility, the resulting container names and service names of the OSDs did not correspond in any way to the OSD number and were thus difficult to find and use. With this update, `ceph-ansible` has been improved to set container and service names that correspond with OSD numbers. Note that this change does not affect existing deployed OSDs.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-03 19:01:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1629656    

Description Ben England 2018-02-13 15:16:32 UTC
Description of problem:

When containerized Ceph OSDs are deployed with ceph-ansible, the resulting container names and service names of the OSDs do not correspond in any way to the OSD number and are thus difficult to find and use (details below).  This differs from non-containerized Ceph, where OSD services are identified by OSD number, and OSD processes can be easily found by searching for "-i NNN" where OSD number is NNN.  While logs are forwarded to the hypervisor in RHOSP, it is again hard to read them there because you need to do "systemctl -u ceph-osd@XXXX" where XXXX is OSD's block device name, not its number.  

The RHOSP Ceph DFG team's comment was that this was a ceph-ansible issue and really nothing to do with OpenStack, which I think is correct.  They suggested that I file this bug so that we could start tracking this problem and discussing what to do about it.  I am concerned that this will never be fixed if it is part of RHOSP 13, a long-term release, because of upgrade requirements associated with fixing it.

Version-Release number of selected component (if applicable):

ceph-ansible master branch (sorry I'm not sure which branches and tags get used for RHCS or RHOSP).

How reproducible:

every time

Steps to Reproduce:
1. deploy RHOSP 12
2. try to find ID of container that houses OSD N
3. try to start/stop OSD N
4. try to look at logs for container for OSD N

Actual results:

You can't easily do any of these things.  These are common tasks for Ceph admins because block devices fail for a variety of reasons, including hardware failure, upgrades, etc.  

Expected results:

container name should embed the OSD number so that it can be directly referenced if it is running.

service name should embed the OSD number so that it can be directly started/stopped.

consequence would be that it becomes easy to examine OSD logs with "systemctl -u ceph-osd@N"

Additional info:

Script to find container corresponding to OSD N is:

for c in `docker ps | grep ceph-osd | awk '{ print $1}' ` ; do
   echo -n "$c " ; docker exec -it $c bash -c \
   ‘ls /var/run/ceph/ceph-osd.*.log' ; \

But if OSDs are flapping (going up and down), the container ID may no longer be valid by the time that you access it.  Container names are more stable, but a reboot could cause container's block device name to no longer match container name, since Linux does not guarantee block device name stability across reboots.

Comment 3 Ben England 2018-02-16 13:51:12 UTC
Here's a suggestion for how it could work, have briefly discussed this with LVM developer Joe Thornber (ejt@redhat.com).

-- preparation:

Am I right that the problem is that we don't know the OSD number at the time that you want to first name the container and the unit?  There is a way around that - ask Ceph to just assign an OSD number with

osdnum=$(ceph osd create)

and then pass this into the script/container that prepares the OSD. 

Then when you prepare the OSD using "ceph-volume lvm prepare --osd-id NNN", ceph-volume tags the volume using LVM's tagging feature, which stores data in the VG.  

# lvchange --addtag "cephosd=NNN" /dev/vg_...

Unfortunately ceph-volume does not accept an OSD number in the command line, so there is no way to pass it into "ceph-volume lvm prepare", so we would need to fix ceph-volume to do this and fix the container that does the OSD preparation to do it.  ceph-disk used to have this feature by the way.  in "ceph-disk prepare --help":

--osd-id ID           unique OSD id to assign this disk to

If we are preparing a non-LVM OSD device, it will be filestore (right Alfredo?) and ceph --mkfs will write the "whoami" file containing the OSD number in the mountpoint directory as before.

-- activation:

if we encounter an LVM volume with the LVM tag "cephosd=NNN" then we know to activate the OSD of this number on this volume.   

if we encounter a "simple" OSD device (created by ceph-disk), and it is using filestore, we dig the OSD number out by mounting it and reading the "whoami" file in the mountpoint, as before.

Comment 5 leseb 2018-04-27 09:50:39 UTC
Indeed, a nice to have but not in our roadmap at the moment.
I can't exactly tell you when this will be available. This won't be in 3.1 RHCS, unlikely in 3.2 so maybe 3.3.


Comment 6 Ben England 2018-09-13 19:08:34 UTC
*** Bug 1628713 has been marked as a duplicate of this bug. ***

Comment 7 Ben England 2018-09-13 19:19:45 UTC
This is still not fixed in RHOSP 13 AFAICT.  It is a major problem if you are trying to troubleshoot or maintain a containerized Ceph cluster.  I know we're going to Rook and Kubernetes, but in the meantime there are a lot of sites trying to get by with what we have and Kubernetes isn't running everywhere yet.  Suppose you have to try to start the OSD, but you don't know what unit file to use?  How do you find out?

I would settle for just embedding the OSD number and the device name both in the container name.  So then you could just do docker ps | grep osd | grep _N_ | awk '{ print $1}' to get the container ID.

As for the unit file, this could be done with softlinks, for example - because when the unit file is created Ceph knows what OSD number will go with that block device.

Comment 8 Yaniv Kaul 2018-09-14 02:49:32 UTC
(In reply to leseb from comment #5)
> Indeed, a nice to have but not in our roadmap at the moment.
> I can't exactly tell you when this will be available. This won't be in 3.1
> RHCS, unlikely in 3.2 so maybe 3.3.
> Thanks.

Why not? Looks like important supportability item to me, and not very difficult to implement (I reckon - did not look at this more than the brief comments above!)

Comment 10 Ben England 2018-09-19 18:49:27 UTC
the containerized Ceph documentation says to restart the container using its block device name, but does not say how to determine what that is.


Comment 11 leseb 2018-09-26 08:58:31 UTC
Yaniv, comment 5 was a long time ago and I realized that this could be implemented as part of a ceph-volume containerization happening here: https://github.com/ceph/ceph-ansible/pull/2866. So this is on-going, my goal is to have this for 3.2.


Comment 16 Vasishta 2018-10-30 17:35:42 UTC

We observed that, only when osd_scenario is set as 'lvm', osd service names are set with corresponding OSD ids and when osd_scenario is set as collocated or non-collocated, OSD service names are still having device names.

We think that this will create confusion for users thus affects usability.

Moving back to ASSIGNED state, please let us know if there are any concerns.

Vasishta shastry
QE, Ceph

Comment 17 leseb 2018-10-30 17:42:16 UTC
You must lvm for both container and non-container.

collocated or non-collocated are not encouraged anymore, you should do all your testing on lvm.

Comment 18 Ben England 2018-10-31 15:36:00 UTC
what about sites that are being upgraded?  These will still have the problem, yes?  If so, will Bluestore migration in RHCS 4 remove this issue?

Comment 19 leseb 2018-11-07 11:25:21 UTC
For existing deployed OSD yes they will still have the @<disk> naming.

Comment 22 leseb 2018-11-13 13:05:27 UTC
lgtm, thanks!

Comment 26 errata-xmlrpc 2019-01-03 19:01:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.