1561456 – [ceph-ansible] [ceph-container] : shrink OSD with NVMe disks - failing as OSD services are not stoppped

Bug 1561456 - [ceph-ansible] [ceph-container] : shrink OSD with NVMe disks - failing as OSD services are not stoppped

Summary: [ceph-ansible] [ceph-container] : shrink OSD with NVMe disks - failing as OSD...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	z3
Target Release:	3.0
Assignee:	Sébastien Han
QA Contact:	Vasishta
Docs Contact:	Erin Donnelly
URL:
Whiteboard:
Duplicates (1):	1555793 (view as bug list)
Depends On:
Blocks:	1553254 1557269 1572368 1600697
TreeView+	depends on / blocked

Reported:	2018-03-28 11:36 UTC by Vasishta
Modified:	2021-06-10 15:33 UTC (History)
CC List:	13 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.0.32-1.el7cp Ubuntu: ceph-ansible_3.0.32-2redhat1
Doc Type:	Bug Fix
Doc Text:	.The `shrink-osd` playbook supports NVMe drives Previously, the `shrink-osd` Ansible playbook did not support shrinking OSDs backed by an NVMe drive. NVMe drive support has been added in this release.
Clone Of:
Environment:
Last Closed:	2018-05-15 18:20:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
File contains contents ansible-playbook log (70.80 KB, text/plain) 2018-03-28 11:36 UTC, Vasishta	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 2537	0	None	None	None	2018-04-20 09:15:52 UTC
Red Hat Product Errata	RHBA-2018:1563	0	None	None	None	2018-05-15 18:21:25 UTC

Description Vasishta 2018-03-28 11:36:46 UTC

Created attachment 1414140 [details]
File contains contents ansible-playbook log

Description of problem:
Shrinking of OSD with NVMe disks are failing in task deallocate osd(s) id when ceph-disk destroy fail saying "Error EBUSY: osd.<id> is still up; must be down before removal. "

Version-Release number of selected component (if applicable):
ceph-ansible-3.0.28-1.el7cp.noarch

How reproducible:
Always (3/3)

Steps to Reproduce:
1. Configure containerized cluster with NVMe disks for OSDs
2. Try to shrink an OSD.


Actual results:
TASK [deallocate osd(s) id when ceph-disk destroy fail]
--------------------------------
"stderr_lines": [
        "Error EBUSY: osd.7 is still up; must be down before removal. "
    ], 

Expected results:
OSD must be removed successfully 

Additional info:

The task TASK [stop osd services (container)] was completed with status 'ok'.

Comment 6 Sébastien Han 2018-04-12 12:14:43 UTC

Can you check if the container is still running?
Perhaps we tried to stop the wrong service.

Thanks.

Comment 7 Vasishta 2018-04-12 13:12:32 UTC

Hi Sebastien,

As I remember container was running. (Unfortunately I don't have environment as of now)

I think we must have tried to stop wrong service as I see "name": "ceph-osd@nvme0n1p" in the log (atachment). By following the convention, service name must have been "ceph-osd@nvme0n1"

The logic we have in shrink-osd.yml [1] to findout service seems to be not working for nvme disks.


- name: stop osd services (container)
      service:
        name: "ceph-osd@{{ item.0.stdout[:-1] | regex_replace('/dev/', '') }}"

I think It would have been fine if we could have "item.0.stdout[:-2]" only for nvme disks.

[1] https://github.com/ceph/ceph-ansible/blob/37117071ebb7ab3cf68b607b6760077a2b46a00d/infrastructure-playbooks/shrink-osd.yml#L119-L121


Regards,
Vasishta Shastry
AQE, Ceph

Comment 9 Sébastien Han 2018-04-20 09:44:40 UTC

*** Bug 1555793 has been marked as a duplicate of this bug. ***

Comment 10 Sébastien Han 2018-04-23 21:02:22 UTC

Will be in the next release v3.0.32

Comment 14 Vasishta 2018-05-09 07:15:44 UTC

working fine with ceph-ansible-3.0.32-1.el7cp

Comment 17 errata-xmlrpc 2018-05-15 18:20:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1563

Note You need to log in before you can comment on or make changes to this bug.