Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1561456 - [ceph-ansible] [ceph-container] : shrink OSD with NVMe disks - failing as OSD services are not stoppped
[ceph-ansible] [ceph-container] : shrink OSD with NVMe disks - failing as OSD...
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible (Show other bugs)
3.0
Unspecified Unspecified
unspecified Severity high
: z3
: 3.0
Assigned To: leseb
Vasishta
Erin Donnelly
:
: 1555793 (view as bug list)
Depends On:
Blocks: 1553254 1557269 1572368 1600697
  Show dependency treegraph
 
Reported: 2018-03-28 07:36 EDT by Vasishta
Modified: 2018-07-12 15:43 EDT (History)
13 users (show)

See Also:
Fixed In Version: RHEL: ceph-ansible-3.0.32-1.el7cp Ubuntu: ceph-ansible_3.0.32-2redhat1
Doc Type: Bug Fix
Doc Text:
.The `shrink-osd` playbook supports NVMe drives Previously, the `shrink-osd` Ansible playbook did not support shrinking OSDs backed by an NVMe drive. NVMe drive support has been added in this release.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-05-15 14:20:31 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
File contains contents ansible-playbook log (70.80 KB, text/plain)
2018-03-28 07:36 EDT, Vasishta
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Github ceph/ceph-ansible/pull/2537 None None None 2018-04-20 05:15 EDT
Red Hat Product Errata RHBA-2018:1563 None None None 2018-05-15 14:21 EDT

  None (edit)
Description Vasishta 2018-03-28 07:36:46 EDT
Created attachment 1414140 [details]
File contains contents ansible-playbook log

Description of problem:
Shrinking of OSD with NVMe disks are failing in task deallocate osd(s) id when ceph-disk destroy fail saying "Error EBUSY: osd.<id> is still up; must be down before removal. "

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.28-1.el7cp.noarch

How reproducible:
Always (3/3)

Steps to Reproduce:
1. Configure containerized cluster with NVMe disks for OSDs
2. Try to shrink an OSD.


Actual results:
TASK [deallocate osd(s) id when ceph-disk destroy fail]
--------------------------------
"stderr_lines": [
        "Error EBUSY: osd.7 is still up; must be down before removal. "
    ], 

Expected results:
OSD must be removed successfully 

Additional info:

The task TASK [stop osd services (container)] was completed with status 'ok'.
Comment 6 leseb 2018-04-12 08:14:43 EDT
Can you check if the container is still running?
Perhaps we tried to stop the wrong service.

Thanks.
Comment 7 Vasishta 2018-04-12 09:12:32 EDT
Hi Sebastien,

As I remember container was running. (Unfortunately I don't have environment as of now)

I think we must have tried to stop wrong service as I see "name": "ceph-osd@nvme0n1p" in the log (atachment). By following the convention, service name must have been "ceph-osd@nvme0n1"

The logic we have in shrink-osd.yml [1] to findout service seems to be not working for nvme disks.


- name: stop osd services (container)
      service:
        name: "ceph-osd@{{ item.0.stdout[:-1] | regex_replace('/dev/', '') }}"

I think It would have been fine if we could have "item.0.stdout[:-2]" only for nvme disks.

[1] https://github.com/ceph/ceph-ansible/blob/37117071ebb7ab3cf68b607b6760077a2b46a00d/infrastructure-playbooks/shrink-osd.yml#L119-L121


Regards,
Vasishta Shastry
AQE, Ceph
Comment 9 leseb 2018-04-20 05:44:40 EDT
*** Bug 1555793 has been marked as a duplicate of this bug. ***
Comment 10 leseb 2018-04-23 17:02:22 EDT
Will be in the next release v3.0.32
Comment 14 Vasishta 2018-05-09 03:15:44 EDT
working fine with ceph-ansible-3.0.32-1.el7cp
Comment 17 errata-xmlrpc 2018-05-15 14:20:31 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1563

Note You need to log in before you can comment on or make changes to this bug.