Created attachment 1414140 [details] File contains contents ansible-playbook log Description of problem: Shrinking of OSD with NVMe disks are failing in task deallocate osd(s) id when ceph-disk destroy fail saying "Error EBUSY: osd.<id> is still up; must be down before removal. " Version-Release number of selected component (if applicable): ceph-ansible-3.0.28-1.el7cp.noarch How reproducible: Always (3/3) Steps to Reproduce: 1. Configure containerized cluster with NVMe disks for OSDs 2. Try to shrink an OSD. Actual results: TASK [deallocate osd(s) id when ceph-disk destroy fail] -------------------------------- "stderr_lines": [ "Error EBUSY: osd.7 is still up; must be down before removal. " ], Expected results: OSD must be removed successfully Additional info: The task TASK [stop osd services (container)] was completed with status 'ok'.
Can you check if the container is still running? Perhaps we tried to stop the wrong service. Thanks.
Hi Sebastien, As I remember container was running. (Unfortunately I don't have environment as of now) I think we must have tried to stop wrong service as I see "name": "ceph-osd@nvme0n1p" in the log (atachment). By following the convention, service name must have been "ceph-osd@nvme0n1" The logic we have in shrink-osd.yml [1] to findout service seems to be not working for nvme disks. - name: stop osd services (container) service: name: "ceph-osd@{{ item.0.stdout[:-1] | regex_replace('/dev/', '') }}" I think It would have been fine if we could have "item.0.stdout[:-2]" only for nvme disks. [1] https://github.com/ceph/ceph-ansible/blob/37117071ebb7ab3cf68b607b6760077a2b46a00d/infrastructure-playbooks/shrink-osd.yml#L119-L121 Regards, Vasishta Shastry AQE, Ceph
*** Bug 1555793 has been marked as a duplicate of this bug. ***
Will be in the next release v3.0.32
working fine with ceph-ansible-3.0.32-1.el7cp
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1563