Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1849575

Summary: [cephadm] 5.0 Zap (erase) device option makes the deleted OSDs live
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Preethi <pnataraj>
Component: CephadmAssignee: Juan Miguel Olmo <jolmomar>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: medium Docs Contact: Karen Norteman <knortema>
Priority: unspecified    
Version: 5.0CC: peter598philip, sewagner, tserlin, vereddy
Target Milestone: ---   
Target Release: 5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-16.0.0-6275.el8cp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-30 08:25:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 7 Preethi 2020-11-10 17:05:19 UTC
Hi Juan,

Issue is still with latest build of pacific based 5.0 cluster:

I have removed osd4.0 by marking the OSD down, out and followed by remove command. AFter performing Zap, deleted OSDs listing as up in ceph osd tree and ceph -s shows the OSDs in up and In to the cluster.

Below output:

[ceph: root@magna094 /]# ceph osd down 4
marked down osd.4. 
[ceph: root@magna094 /]# ceph osd out 4
marked out osd.4. 
[ceph: root@magna094 /]# ceph osd rm 4
removed osd.4


[ceph: root@magna094 /]# ceph osd tree
ID   CLASS  WEIGHT    TYPE NAME          STATUS  REWEIGHT  PRI-AFF
 -1         23.65216  root default                                
 -5          1.81940      host magna067                           
  3    hdd   0.90970          osd.3          up   1.00000  1.00000
  5    hdd   0.90970          osd.5          up   1.00000  1.00000
 -7          2.72910      host magna073                           
  6    hdd   0.90970          osd.6          up   1.00000  1.00000
  7    hdd   0.90970          osd.7          up   1.00000  1.00000
  8    hdd   0.90970          osd.8          up   1.00000  1.00000
-17          2.72910      host magna075                           
 11    hdd   0.90970          osd.11         up   1.00000  1.00000
 17    hdd   0.90970          osd.17         up   1.00000  1.00000
 23    hdd   0.90970          osd.23         up   1.00000  1.00000
-15          2.72910      host magna076                           
 13    hdd   0.90970          osd.13         up   1.00000  1.00000
 19    hdd   0.90970          osd.19         up   1.00000  1.00000
 25    hdd   0.90970          osd.25         up   1.00000  1.00000
-19          2.72910      host magna077                           
  9    hdd   0.90970          osd.9          up   1.00000  1.00000
 15    hdd   0.90970          osd.15         up   1.00000  1.00000
 21    hdd   0.90970          osd.21         up   1.00000  1.00000
-13          2.72910      host magna079                           
 10    hdd   0.90970          osd.10         up   1.00000  1.00000
 16    hdd   0.90970          osd.16         up   1.00000  1.00000
 22    hdd   0.90970          osd.22         up   1.00000  1.00000
-11          2.72910      host magna092                           
 12    hdd   0.90970          osd.12         up   1.00000  1.00000
 18    hdd   0.90970          osd.18         up   1.00000  1.00000
 24    hdd   0.90970          osd.24         up   1.00000  1.00000
 -9          2.72910      host magna093                           
 14    hdd   0.90970          osd.14         up   1.00000  1.00000
 20    hdd   0.90970          osd.20         up   1.00000  1.00000
 26    hdd   0.90970          osd.26         up   1.00000  1.00000
 -3          2.72910      host magna094                           
  0    hdd   0.90970          osd.0          up   1.00000  1.00000
  1    hdd   0.90970          osd.1          up   1.00000  1.00000
  2    hdd   0.90970          osd.2          up   1.00000  1.00000


[ceph: root@magna094 /]# ceph orch ls osd.None --export
No services reported
[ceph: root@magna094 /]# ceph orch device zap magna067 /dev/sdc --force
Error EINVAL: Zap failed: /bin/podman:stderr WARNING: The same type, major and minor should not be used for multiple devices.
/bin/podman:stderr --> Zapping: /dev/sdc
/bin/podman:stderr --> Zapping lvm member /dev/sdc. lv_path is /dev/ceph-e5bd52b5-931f-428c-8ef3-a2946689a851/osd-block-d7410ec3-a1a6-428b-9f0b-f3f329cf5835
/bin/podman:stderr Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-e5bd52b5-931f-428c-8ef3-a2946689a851/osd-block-d7410ec3-a1a6-428b-9f0b-f3f329cf5835 bs=1M count=10 conv=fsync
/bin/podman:stderr  stderr: 10+0 records in
/bin/podman:stderr 10+0 records out
/bin/podman:stderr 10485760 bytes (10 MB, 10 MiB) copied, 0.092196 s, 114 MB/s
/bin/podman:stderr --> Only 1 LV left in VG, will proceed to destroy volume group ceph-e5bd52b5-931f-428c-8ef3-a2946689a851
/bin/podman:stderr Running command: /usr/sbin/vgremove -v -f ceph-e5bd52b5-931f-428c-8ef3-a2946689a851
/bin/podman:stderr  stderr: Logical volume ceph-e5bd52b5-931f-428c-8ef3-a2946689a851/osd-block-d7410ec3-a1a6-428b-9f0b-f3f329cf5835 in use.
/bin/podman:stderr --> Unable to remove vg ceph-e5bd52b5-931f-428c-8ef3-a2946689a851
/bin/podman:stderr -->  RuntimeError: command returned non-zero exit status: 5
Traceback (most recent call last):
  File "<stdin>", line 6041, in <module>
  File "<stdin>", line 1276, in _infer_fsid
  File "<stdin>", line 1359, in _infer_image
  File "<stdin>", line 3588, in command_ceph_volume
  File "<stdin>", line 1038, in call_throws
RuntimeError: Failed command: /bin/podman run --rm --ipc=host --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk -e CONTAINER_IMAGE=registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-96803-20201013192445 -e NODE_NAME=magna067 -v /var/run/ceph/c97c2c8c-0942-11eb-ae18-002590fbecb6:/var/run/ceph:z -v /var/log/ceph/c97c2c8c-0942-11eb-ae18-002590fbecb6:/var/log/ceph:z -v /var/lib/ceph/c97c2c8c-0942-11eb-ae18-002590fbecb6/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-96803-20201013192445 lvm zap --destroy /dev/sdc
[ceph: root@magna094 /]# 
[ceph: root@magna094 /]# ceph osd tree
ID   CLASS  WEIGHT    TYPE NAME          STATUS  REWEIGHT  PRI-AFF
 -1         23.65216  root default                                
 -5          1.81940      host magna067                           
  3    hdd   0.90970          osd.3          up   1.00000  1.00000
  5    hdd   0.90970          osd.5          up   1.00000  1.00000
 -7          2.72910      host magna073                           
  6    hdd   0.90970          osd.6          up   1.00000  1.00000
  7    hdd   0.90970          osd.7          up   1.00000  1.00000
  8    hdd   0.90970          osd.8          up   1.00000  1.00000
-17          2.72910      host magna075                           
 11    hdd   0.90970          osd.11         up   1.00000  1.00000
 17    hdd   0.90970          osd.17         up   1.00000  1.00000
 23    hdd   0.90970          osd.23         up   1.00000  1.00000
-15          2.72910      host magna076                           
 13    hdd   0.90970          osd.13         up   1.00000  1.00000
 19    hdd   0.90970          osd.19         up   1.00000  1.00000
 25    hdd   0.90970          osd.25         up   1.00000  1.00000
-19          2.72910      host magna077                           
  9    hdd   0.90970          osd.9          up   1.00000  1.00000
 15    hdd   0.90970          osd.15         up   1.00000  1.00000
 21    hdd   0.90970          osd.21         up   1.00000  1.00000
-13          2.72910      host magna079                           
 10    hdd   0.90970          osd.10         up   1.00000  1.00000
 16    hdd   0.90970          osd.16         up   1.00000  1.00000
 22    hdd   0.90970          osd.22         up   1.00000  1.00000
-11          2.72910      host magna092                           
 12    hdd   0.90970          osd.12         up   1.00000  1.00000
 18    hdd   0.90970          osd.18         up   1.00000  1.00000
 24    hdd   0.90970          osd.24         up   1.00000  1.00000
 -9          2.72910      host magna093                           
 14    hdd   0.90970          osd.14         up   1.00000  1.00000
 20    hdd   0.90970          osd.20         up   1.00000  1.00000
 26    hdd   0.90970          osd.26         up   1.00000  1.00000
 -3          2.72910      host magna094                           
  0    hdd   0.90970          osd.0          up   1.00000  1.00000
  1    hdd   0.90970          osd.1          up   1.00000  1.00000
  2    hdd   0.90970          osd.2          up   1.00000  1.00000
  4                0  osd.4                  up   1.00000  1.00000
[ceph: root@magna094 /]# ceph -s
  cluster:
    id:     c97c2c8c-0942-11eb-ae18-002590fbecb6
    health: HEALTH_ERR
            Module 'diskprediction_local' has failed: No module named 'sklearn'
            1 pool(s) full
            12 slow ops, oldest one blocked for 1016 sec, mon.magna094 has slow ops
 
  services:
    mon: 3 daemons, quorum magna094,magna067,magna073 (age 4d)
    mgr: magna094.hussmr(active, since 30h), standbys: magna067.cudixx
    mds: test:1 {0=test.magna076.xymdrn=up:active} 2 up:standby
    osd: 27 osds: 27 up (since 17m), 27 in (since 17m)
    rgw: 2 daemons active (myorg.us-east-1.magna092.bxiihn, myorg.us-east-1.magna093.nhekwk)
 
  data:
    pools:   31 pools, 937 pgs
    objects: 443 objects, 6.9 MiB
    usage:   4.9 GiB used, 24 TiB / 24 TiB avail
    pgs:     937 active+clean
 
  io:
    client:   85 B/s rd, 0 op/s rd, 0 op/s wr
 


ceph version:

[root@magna094 ubuntu]# ./cephadm version
Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-96803-20201013192445
ceph version 16.0.0-6275.el8cp (d1e0606106224ac333f1c245150d7484cb626841) pacific (dev)

[root@magna094 ubuntu]# rpm -qa |grep cephadm
cephadm-16.0.0-6817.el8cp.x86_64
[root@magna094 ubuntu]#

Comment 9 Preethi 2020-11-19 16:45:10 UTC
@Juan, Issue is not seen when we set the all available OSDs flag to unmanaged. Following are the steps to follow before issuing rm command 

1) systemctl diable ceph-osd@4
2) systemctl stop ceph-osd@4
3) ceph osd out 4
4) ceph osd crush remove osd.4
5) ceph auth del osd.4
ceph osd rm 4

and then perform ceph orch device zap magna067 /dev/sdc --force --> Zap completes without any issue> no issue observed.

Comment 12 errata-xmlrpc 2021-08-30 08:25:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294

Comment 13 Peter Philip 2024-12-19 08:59:02 UTC
Hello,
The Cephadm 5.0 Zap option erases device data but doesn't clear LVM metadata, causing deleted OSDs to remain active. Ensure to manually clear LVM data for complete removal. https://www-mylonestar.com


Best Regards,
Peter Philip