Bug 1724428

Summary:	The "host" signature in "ceph osd status" remains unchanged on moving an OSD disk from failed node to a new node (workaround: mgr restart)
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Neha Berry <nberry>
Component:	RADOS	Assignee:	Neha Ojha <nojha>
Status:	CLOSED ERRATA	QA Contact:	Manohar Murthy <mmurthy>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.0	CC:	bniver, ceph-eng-bugs, dzafman, ebenahar, etamir, hyelloji, jdurgin, kchai, madam, mkasturi, nojha, ocs-bugs, owasserm, prsurve, ratamir, sostapov, suprasad, tserlin
Target Milestone:	rc	Keywords:	AutomationBackLog
Target Release:	4.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ceph-14.2.8-7.el8, ceph-14.2.8-6.el7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-05-19 17:30:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Neha Berry 2019-06-27 05:19:36 UTC

Description of problem:
---------------------------
Rook csi ceph setup = latest rook + csi setup (after PR -https://github.com/rook/rook/pull/3324 was merged) 

When a node with OSD is removed permanently from OCP cluster, and the corresponding OSD disk is added to a new OCP node( through AWS console), the OSD is indeed recovered but the "host" name under "ceph osd status" still lists the hostname of the failed node instead of the new node.

Details of steps performed added in the next comment.

Snip of output which shows inconsistency between host name from oc get pods and ceph osd status

sh-4.2# ceph osd status
+----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+
| id |                    host                    |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | ip-10-0-149-134.us-east-2.compute.internal | 1025M | 97.9G |    0   |     0   |    1   |    16   | exists,up |
| 1  | ip-10-0-129-70.us-east-2.compute.internal  | 1025M | 97.9G |    0   |     0   |    0   |     0   | exists,up |
| 2  | ip-10-0-161-198.us-east-2.compute.internal | 1025M | 97.9G |    0   |     0   |    1   |    90   | exists,up |
+----+--------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+


$oc get pods -n openshift-storage -o wide|grep osd-

rook-ceph-osd-0-cd8c5475d-pmzw5                1/1     Running     0          19h   10.129.2.17    ip-10-0-149-134.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-1-dd6754887-gssh2                1/1     Running     0          14h   10.130.2.9     ip-10-0-132-79.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-2-64c99cc4c6-hdmph               1/1     Running     0          19h   10.128.2.13    ip-10-0-161-198.us-east-2.compute.internal   <none>           <none>




How reproducible:
-------------------------

100%

Steps to Reproduce:
--------------------------

1. Create a ceph based rook-csi cluster using the latest rook repo files (https://github.com/rook/rook) - master branch

2. Remove one of the worker  machine which hosts an OSD
  $ date; time oc delete machine/nberry-jun26-1-mwzjg-worker-us-east-2a-4ppjb -n openshift-machine-api; date

3. After some time, confirm that a new OCP node is spun up and a new machine is added.

4. Once ceph marks the failed OSD as OUT, add the "Available" disk from failed node to the new node in AWS console
 
5. Check the following 
#oc get pods -o wide -n <openshift-storage>  ----> for osd-x again coming up without any issue

Within toolbox:

#ceph status

#ceph osd status





Actual results:
===================
When a failed OSD is recovered by re-adding the disk to a new node, its "host" signature still shows the old hostname instead of the new. This can be misleading.

Expected results:
====================
Ceph OSD status should get updated with new host name for the re-added OSD.



Version-Release number of selected component (if applicable):
=================================================

Ceph and rook version
-----------------------

sh-4.2# ceph version
ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)

sh-4.2# rook version
 rook: v1.0.0-154.g004f795


CSI version
-------------

$oc logs csi-rbdplugin-provisioner-0 -n openshift-storage -c csi-rbdplugin

I0626 09:40:26.830842       1 cephcsi.go:108] Starting driver type: rbd with name: rbd.csi.ceph.com
I0626 09:40:26.830922       1 rbd.go:104] Driver: rbd.csi.ceph.com version: 1.0.0

$ oc describe pod csi-rbdplugin-provisioner-0 -n openshift-storage|grep -i image
    Image:         quay.io/k8scsi/csi-provisioner:v1.2.0
    Image ID:      quay.io/k8scsi/csi-provisioner@sha256:0dffe9a8d39c4fdd49c5dd98ca5611a3f9726c012b082946f630e36988ba9f37
    Image:         quay.io/k8scsi/csi-attacher:v1.1.1
    Image ID:      quay.io/k8scsi/csi-attacher@sha256:e4db94969e1d463807162a1115192ed70d632a61fbeb3bdc97b40fe9ce78c831
    Image:         quay.io/k8scsi/csi-snapshotter:v1.1.0
    Image ID:      quay.io/k8scsi/csi-snapshotter@sha256:a49e0da1af6f2bf717e41ba1eee8b5e6a1cbd66a709dd92cc43fe475fe2589eb
    Image:         quay.io/cephcsi/cephcsi:canary
    Image ID:      quay.io/cephcsi/cephcsi@sha256:e832be9790bf12ab74c87217cce1dbd0a2416500e1d9a39af53b3fece414feac

Rookk image
----------------------

$ oc describe pod rook-ceph-operator-5c6fd4b7db-rldcf -n openshift-storage|grep -i image
    Image:         rook/ceph:master
    Image ID:      docker.io/rook/ceph@sha256:4d0057e90c28a7bd8d3c3e9b13df40d0df1847567ef50a2a1e41dcea7ddb1d18
      ROOK_CSI_CEPH_IMAGE:                quay.io/cephcsi/cephcsi:canary
      ROOK_CSI_REGISTRAR_IMAGE:           quay.io/k8scsi/csi-node-driver-registrar:v1.1.0
      ROOK_CSI_PROVISIONER_IMAGE:         quay.io/k8scsi/csi-provisioner:v1.2.0
      ROOK_CSI_SNAPSHOTTER_IMAGE:         quay.io/k8scsi/csi-snapshotter:v1.1.0
      ROOK_CSI_ATTACHER_IMAGE:            quay.io/k8scsi/csi-attacher:v1.1.1

Comment 7 Travis Nielsen 2019-07-19 16:44:14 UTC

As I understand, the OSD does not automatically change where it is in the CRUSH map since this can cause data movement. Rook isn't controlling the setting of the osd crush location other than making sure the host is set correctly in the osd context. 
@Josh what is your expectation for OSDs moving to a different node? How should the tree be updated?

Comment 8 Noah Watkins 2019-07-22 19:48:37 UTC

Neha, I was able to reproduce this and found that the inconsistency is resolved after I restart the ceph manager pod. This means it is likely an issue in which the ceph-mgr cache is not being invalidated. It looks very similar to http://tracker.ceph.com/issues/40011 / https://bugzilla.redhat.com/show_bug.cgi?id=1705464.

before ceph-mgr restart:

[nwatkins@smash rook]$ kubectl -n rook-ceph exec -it rook-ceph-tools-7cf4cc7568-kz4q6 ceph osd status
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| id |   host  |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | worker1 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
| 1  | worker0 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
[nwatkins@smash rook]$ kubectl -n rook-ceph exec -it rook-ceph-tools-7cf4cc7568-kz4q6 ceph osd tree
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       0.01758 root default                             
-5       0.01758     host worker0                         
 0   hdd 0.00879         osd.0        up  1.00000 1.00000 
 1   hdd 0.00879         osd.1        up  1.00000 1.00000 

after ceph-mgr restart:

[nwatkins@smash rook]$ kubectl -n rook-ceph exec -it rook-ceph-tools-7cf4cc7568-kz4q6 ceph osd status
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| id |   host  |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | worker0 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
| 1  | worker0 | 1027M | 8188M |    0   |     0   |    0   |     0   | exists,up |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+

here is an upstream tracker issue for this: http://tracker.ceph.com/issues/40871

Comment 9 Travis Nielsen 2019-07-31 18:55:17 UTC

Component changed to ceph per @Noah's analysis.

Comment 10 Kefu Chai 2019-09-27 17:48:11 UTC

upstream backport fix posted at https://github.com/ceph/ceph/pull/30624.

@Neha

> # ceph version
> ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)

i just realized we were testing a community release. but unlike downstream releases, we cannot cherry-pick into our own branch at will. the upstream has its own release schedule. how can you test a not-yet-released release?

could you shed some light on it?

Comment 27 Elad 2020-01-09 15:11:10 UTC

Docs bug for adding the restart of MGR to the procedure of moving the OSD disk between nodes - bug 1789436

Comment 47 Kefu Chai 2020-03-20 08:16:00 UTC

*** Bug 1776750 has been marked as a duplicate of this bug. ***

Comment 54 errata-xmlrpc 2020-05-19 17:30:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:2231

Comment 56 Red Hat Bugzilla 2023-09-14 05:30:57 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days