Bug 1561456

Summary:

[ceph-ansible] [ceph-container] : shrink OSD with NVMe disks - failing as OSD services are not stoppped

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Vasishta <vashastr>

Component:

Ceph-Ansible

Assignee:

Sébastien Han <shan>

Status:

CLOSED ERRATA

QA Contact:

Vasishta <vashastr>

Severity:

high

Docs Contact:

Erin Donnelly <edonnell>

Priority:

unspecified

Version:

3.0

CC:

adeza, agunn, aschoen, ceph-eng-bugs, edonnell, gmeno, hnallurv, jquinn, kdreyer, nthomas, sankarshan, shan, tchandra

Target Milestone:

Target Release:

3.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

RHEL: ceph-ansible-3.0.32-1.el7cp Ubuntu: ceph-ansible_3.0.32-2redhat1

Doc Type:

Bug Fix

Doc Text:

.The `shrink-osd` playbook supports NVMe drives Previously, the `shrink-osd` Ansible playbook did not support shrinking OSDs backed by an NVMe drive. NVMe drive support has been added in this release.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-05-15 18:20:31 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1553254, 1557269, 1572368, 1600697

Attachments:

Description	Flags
File contains contents ansible-playbook log	none

Description Vasishta 2018-03-28 11:36:46 UTC

Created attachment 1414140 [details]
File contains contents ansible-playbook log

Description of problem:
Shrinking of OSD with NVMe disks are failing in task deallocate osd(s) id when ceph-disk destroy fail saying "Error EBUSY: osd.<id> is still up; must be down before removal. "

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.28-1.el7cp.noarch

How reproducible:
Always (3/3)

Steps to Reproduce:
1. Configure containerized cluster with NVMe disks for OSDs
2. Try to shrink an OSD.


Actual results:
TASK [deallocate osd(s) id when ceph-disk destroy fail]
--------------------------------
"stderr_lines": [
        "Error EBUSY: osd.7 is still up; must be down before removal. "
    ], 

Expected results:
OSD must be removed successfully 

Additional info:

The task TASK [stop osd services (container)] was completed with status 'ok'.

Comment 6 Sébastien Han 2018-04-12 12:14:43 UTC

Can you check if the container is still running?
Perhaps we tried to stop the wrong service.

Thanks.

Comment 7 Vasishta 2018-04-12 13:12:32 UTC

Hi Sebastien,

As I remember container was running. (Unfortunately I don't have environment as of now)

I think we must have tried to stop wrong service as I see "name": "ceph-osd@nvme0n1p" in the log (atachment). By following the convention, service name must have been "ceph-osd@nvme0n1"

The logic we have in shrink-osd.yml [1] to findout service seems to be not working for nvme disks.


- name: stop osd services (container)
      service:
        name: "ceph-osd@{{ item.0.stdout[:-1] | regex_replace('/dev/', '') }}"

I think It would have been fine if we could have "item.0.stdout[:-2]" only for nvme disks.

[1] https://github.com/ceph/ceph-ansible/blob/37117071ebb7ab3cf68b607b6760077a2b46a00d/infrastructure-playbooks/shrink-osd.yml#L119-L121


Regards,
Vasishta Shastry
AQE, Ceph

Comment 9 Sébastien Han 2018-04-20 09:44:40 UTC

*** Bug 1555793 has been marked as a duplicate of this bug. ***

Comment 10 Sébastien Han 2018-04-23 21:02:22 UTC

Will be in the next release v3.0.32

Comment 14 Vasishta 2018-05-09 07:15:44 UTC

working fine with ceph-ansible-3.0.32-1.el7cp

Comment 17 errata-xmlrpc 2018-05-15 18:20:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1563