Bug 1849575
| Summary: | [cephadm] 5.0 Zap (erase) device option makes the deleted OSDs live | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Preethi <pnataraj> |
| Component: | Cephadm | Assignee: | Juan Miguel Olmo <jolmomar> |
| Status: | CLOSED ERRATA | QA Contact: | Vasishta <vashastr> |
| Severity: | medium | Docs Contact: | Karen Norteman <knortema> |
| Priority: | unspecified | ||
| Version: | 5.0 | CC: | peter598philip, sewagner, tserlin, vereddy |
| Target Milestone: | --- | ||
| Target Release: | 5.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-16.0.0-6275.el8cp | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-08-30 08:25:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
@Juan, Issue is not seen when we set the all available OSDs flag to unmanaged. Following are the steps to follow before issuing rm command 1) systemctl diable ceph-osd@4 2) systemctl stop ceph-osd@4 3) ceph osd out 4 4) ceph osd crush remove osd.4 5) ceph auth del osd.4 ceph osd rm 4 and then perform ceph orch device zap magna067 /dev/sdc --force --> Zap completes without any issue> no issue observed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3294 Hello, The Cephadm 5.0 Zap option erases device data but doesn't clear LVM metadata, causing deleted OSDs to remain active. Ensure to manually clear LVM data for complete removal. https://www-mylonestar.com Best Regards, Peter Philip |
Hi Juan, Issue is still with latest build of pacific based 5.0 cluster: I have removed osd4.0 by marking the OSD down, out and followed by remove command. AFter performing Zap, deleted OSDs listing as up in ceph osd tree and ceph -s shows the OSDs in up and In to the cluster. Below output: [ceph: root@magna094 /]# ceph osd down 4 marked down osd.4. [ceph: root@magna094 /]# ceph osd out 4 marked out osd.4. [ceph: root@magna094 /]# ceph osd rm 4 removed osd.4 [ceph: root@magna094 /]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 23.65216 root default -5 1.81940 host magna067 3 hdd 0.90970 osd.3 up 1.00000 1.00000 5 hdd 0.90970 osd.5 up 1.00000 1.00000 -7 2.72910 host magna073 6 hdd 0.90970 osd.6 up 1.00000 1.00000 7 hdd 0.90970 osd.7 up 1.00000 1.00000 8 hdd 0.90970 osd.8 up 1.00000 1.00000 -17 2.72910 host magna075 11 hdd 0.90970 osd.11 up 1.00000 1.00000 17 hdd 0.90970 osd.17 up 1.00000 1.00000 23 hdd 0.90970 osd.23 up 1.00000 1.00000 -15 2.72910 host magna076 13 hdd 0.90970 osd.13 up 1.00000 1.00000 19 hdd 0.90970 osd.19 up 1.00000 1.00000 25 hdd 0.90970 osd.25 up 1.00000 1.00000 -19 2.72910 host magna077 9 hdd 0.90970 osd.9 up 1.00000 1.00000 15 hdd 0.90970 osd.15 up 1.00000 1.00000 21 hdd 0.90970 osd.21 up 1.00000 1.00000 -13 2.72910 host magna079 10 hdd 0.90970 osd.10 up 1.00000 1.00000 16 hdd 0.90970 osd.16 up 1.00000 1.00000 22 hdd 0.90970 osd.22 up 1.00000 1.00000 -11 2.72910 host magna092 12 hdd 0.90970 osd.12 up 1.00000 1.00000 18 hdd 0.90970 osd.18 up 1.00000 1.00000 24 hdd 0.90970 osd.24 up 1.00000 1.00000 -9 2.72910 host magna093 14 hdd 0.90970 osd.14 up 1.00000 1.00000 20 hdd 0.90970 osd.20 up 1.00000 1.00000 26 hdd 0.90970 osd.26 up 1.00000 1.00000 -3 2.72910 host magna094 0 hdd 0.90970 osd.0 up 1.00000 1.00000 1 hdd 0.90970 osd.1 up 1.00000 1.00000 2 hdd 0.90970 osd.2 up 1.00000 1.00000 [ceph: root@magna094 /]# ceph orch ls osd.None --export No services reported [ceph: root@magna094 /]# ceph orch device zap magna067 /dev/sdc --force Error EINVAL: Zap failed: /bin/podman:stderr WARNING: The same type, major and minor should not be used for multiple devices. /bin/podman:stderr --> Zapping: /dev/sdc /bin/podman:stderr --> Zapping lvm member /dev/sdc. lv_path is /dev/ceph-e5bd52b5-931f-428c-8ef3-a2946689a851/osd-block-d7410ec3-a1a6-428b-9f0b-f3f329cf5835 /bin/podman:stderr Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-e5bd52b5-931f-428c-8ef3-a2946689a851/osd-block-d7410ec3-a1a6-428b-9f0b-f3f329cf5835 bs=1M count=10 conv=fsync /bin/podman:stderr stderr: 10+0 records in /bin/podman:stderr 10+0 records out /bin/podman:stderr 10485760 bytes (10 MB, 10 MiB) copied, 0.092196 s, 114 MB/s /bin/podman:stderr --> Only 1 LV left in VG, will proceed to destroy volume group ceph-e5bd52b5-931f-428c-8ef3-a2946689a851 /bin/podman:stderr Running command: /usr/sbin/vgremove -v -f ceph-e5bd52b5-931f-428c-8ef3-a2946689a851 /bin/podman:stderr stderr: Logical volume ceph-e5bd52b5-931f-428c-8ef3-a2946689a851/osd-block-d7410ec3-a1a6-428b-9f0b-f3f329cf5835 in use. /bin/podman:stderr --> Unable to remove vg ceph-e5bd52b5-931f-428c-8ef3-a2946689a851 /bin/podman:stderr --> RuntimeError: command returned non-zero exit status: 5 Traceback (most recent call last): File "<stdin>", line 6041, in <module> File "<stdin>", line 1276, in _infer_fsid File "<stdin>", line 1359, in _infer_image File "<stdin>", line 3588, in command_ceph_volume File "<stdin>", line 1038, in call_throws RuntimeError: Failed command: /bin/podman run --rm --ipc=host --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk -e CONTAINER_IMAGE=registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-96803-20201013192445 -e NODE_NAME=magna067 -v /var/run/ceph/c97c2c8c-0942-11eb-ae18-002590fbecb6:/var/run/ceph:z -v /var/log/ceph/c97c2c8c-0942-11eb-ae18-002590fbecb6:/var/log/ceph:z -v /var/lib/ceph/c97c2c8c-0942-11eb-ae18-002590fbecb6/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-96803-20201013192445 lvm zap --destroy /dev/sdc [ceph: root@magna094 /]# [ceph: root@magna094 /]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 23.65216 root default -5 1.81940 host magna067 3 hdd 0.90970 osd.3 up 1.00000 1.00000 5 hdd 0.90970 osd.5 up 1.00000 1.00000 -7 2.72910 host magna073 6 hdd 0.90970 osd.6 up 1.00000 1.00000 7 hdd 0.90970 osd.7 up 1.00000 1.00000 8 hdd 0.90970 osd.8 up 1.00000 1.00000 -17 2.72910 host magna075 11 hdd 0.90970 osd.11 up 1.00000 1.00000 17 hdd 0.90970 osd.17 up 1.00000 1.00000 23 hdd 0.90970 osd.23 up 1.00000 1.00000 -15 2.72910 host magna076 13 hdd 0.90970 osd.13 up 1.00000 1.00000 19 hdd 0.90970 osd.19 up 1.00000 1.00000 25 hdd 0.90970 osd.25 up 1.00000 1.00000 -19 2.72910 host magna077 9 hdd 0.90970 osd.9 up 1.00000 1.00000 15 hdd 0.90970 osd.15 up 1.00000 1.00000 21 hdd 0.90970 osd.21 up 1.00000 1.00000 -13 2.72910 host magna079 10 hdd 0.90970 osd.10 up 1.00000 1.00000 16 hdd 0.90970 osd.16 up 1.00000 1.00000 22 hdd 0.90970 osd.22 up 1.00000 1.00000 -11 2.72910 host magna092 12 hdd 0.90970 osd.12 up 1.00000 1.00000 18 hdd 0.90970 osd.18 up 1.00000 1.00000 24 hdd 0.90970 osd.24 up 1.00000 1.00000 -9 2.72910 host magna093 14 hdd 0.90970 osd.14 up 1.00000 1.00000 20 hdd 0.90970 osd.20 up 1.00000 1.00000 26 hdd 0.90970 osd.26 up 1.00000 1.00000 -3 2.72910 host magna094 0 hdd 0.90970 osd.0 up 1.00000 1.00000 1 hdd 0.90970 osd.1 up 1.00000 1.00000 2 hdd 0.90970 osd.2 up 1.00000 1.00000 4 0 osd.4 up 1.00000 1.00000 [ceph: root@magna094 /]# ceph -s cluster: id: c97c2c8c-0942-11eb-ae18-002590fbecb6 health: HEALTH_ERR Module 'diskprediction_local' has failed: No module named 'sklearn' 1 pool(s) full 12 slow ops, oldest one blocked for 1016 sec, mon.magna094 has slow ops services: mon: 3 daemons, quorum magna094,magna067,magna073 (age 4d) mgr: magna094.hussmr(active, since 30h), standbys: magna067.cudixx mds: test:1 {0=test.magna076.xymdrn=up:active} 2 up:standby osd: 27 osds: 27 up (since 17m), 27 in (since 17m) rgw: 2 daemons active (myorg.us-east-1.magna092.bxiihn, myorg.us-east-1.magna093.nhekwk) data: pools: 31 pools, 937 pgs objects: 443 objects, 6.9 MiB usage: 4.9 GiB used, 24 TiB / 24 TiB avail pgs: 937 active+clean io: client: 85 B/s rd, 0 op/s rd, 0 op/s wr ceph version: [root@magna094 ubuntu]# ./cephadm version Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-96803-20201013192445 ceph version 16.0.0-6275.el8cp (d1e0606106224ac333f1c245150d7484cb626841) pacific (dev) [root@magna094 ubuntu]# rpm -qa |grep cephadm cephadm-16.0.0-6817.el8cp.x86_64 [root@magna094 ubuntu]#