Bug 1561456 - [ceph-ansible] [ceph-container] : shrink OSD with NVMe disks - failing as OSD services are not stoppped
Summary: [ceph-ansible] [ceph-container] : shrink OSD with NVMe disks - failing as OSD...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible
Version: 3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: z3
: 3.0
Assignee: leseb
QA Contact: Vasishta
Erin Donnelly
URL:
Whiteboard:
Keywords:
: 1555793 (view as bug list)
Depends On:
Blocks: 1553254 1557269 1572368 1600697
TreeView+ depends on / blocked
 
Reported: 2018-03-28 11:36 UTC by Vasishta
Modified: 2018-07-12 19:43 UTC (History)
13 users (show)

(edit)
.The `shrink-osd` playbook supports NVMe drives

Previously, the `shrink-osd` Ansible playbook did not support shrinking OSDs backed by an NVMe drive. NVMe drive support has been added in this release.
Clone Of:
(edit)
Last Closed: 2018-05-15 18:20:31 UTC


Attachments (Terms of Use)
File contains contents ansible-playbook log (70.80 KB, text/plain)
2018-03-28 11:36 UTC, Vasishta
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1563 None None None 2018-05-15 18:21 UTC
Github ceph ceph-ansible pull 2537 None None None 2018-04-20 09:15 UTC

Description Vasishta 2018-03-28 11:36:46 UTC
Created attachment 1414140 [details]
File contains contents ansible-playbook log

Description of problem:
Shrinking of OSD with NVMe disks are failing in task deallocate osd(s) id when ceph-disk destroy fail saying "Error EBUSY: osd.<id> is still up; must be down before removal. "

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.28-1.el7cp.noarch

How reproducible:
Always (3/3)

Steps to Reproduce:
1. Configure containerized cluster with NVMe disks for OSDs
2. Try to shrink an OSD.


Actual results:
TASK [deallocate osd(s) id when ceph-disk destroy fail]
--------------------------------
"stderr_lines": [
        "Error EBUSY: osd.7 is still up; must be down before removal. "
    ], 

Expected results:
OSD must be removed successfully 

Additional info:

The task TASK [stop osd services (container)] was completed with status 'ok'.

Comment 6 leseb 2018-04-12 12:14:43 UTC
Can you check if the container is still running?
Perhaps we tried to stop the wrong service.

Thanks.

Comment 7 Vasishta 2018-04-12 13:12:32 UTC
Hi Sebastien,

As I remember container was running. (Unfortunately I don't have environment as of now)

I think we must have tried to stop wrong service as I see "name": "ceph-osd@nvme0n1p" in the log (atachment). By following the convention, service name must have been "ceph-osd@nvme0n1"

The logic we have in shrink-osd.yml [1] to findout service seems to be not working for nvme disks.


- name: stop osd services (container)
      service:
        name: "ceph-osd@{{ item.0.stdout[:-1] | regex_replace('/dev/', '') }}"

I think It would have been fine if we could have "item.0.stdout[:-2]" only for nvme disks.

[1] https://github.com/ceph/ceph-ansible/blob/37117071ebb7ab3cf68b607b6760077a2b46a00d/infrastructure-playbooks/shrink-osd.yml#L119-L121


Regards,
Vasishta Shastry
AQE, Ceph

Comment 9 leseb 2018-04-20 09:44:40 UTC
*** Bug 1555793 has been marked as a duplicate of this bug. ***

Comment 10 leseb 2018-04-23 21:02:22 UTC
Will be in the next release v3.0.32

Comment 14 Vasishta 2018-05-09 07:15:44 UTC
working fine with ceph-ansible-3.0.32-1.el7cp

Comment 17 errata-xmlrpc 2018-05-15 18:20:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1563


Note You need to log in before you can comment on or make changes to this bug.