Bug 1616159 - [ceph-ansible] [ceph-container] : switch from rpm to containerized - OSDs not coming up after the switch saying encrypted device still in use
Summary: [ceph-ansible] [ceph-container] : switch from rpm to containerized - OSDs not...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z3
: 3.3
Assignee: Dimitri Savineau
QA Contact: Vasishta
URL:
Whiteboard:
Depends On:
Blocks: 1628763
TreeView+ depends on / blocked
 
Reported: 2018-08-15 06:55 UTC by Vasishta
Modified: 2019-12-19 17:59 UTC (History)
9 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.37-1.el7cp Ubuntu: ceph-ansible_3.2.37-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1896392 (view as bug list)
Environment:
Last Closed: 2019-12-19 17:58:55 UTC
Embargoed:


Attachments (Terms of Use)
File contains contents of OSD journald log and contents of all.yml, inventory (303.51 KB, text/plain)
2018-08-15 06:55 UTC, Vasishta
no flags Details
File contains contents ansible-playbook log (1.98 MB, text/plain)
2018-08-15 07:00 UTC, Vasishta
no flags Details
File contains playbook log (1.84 MB, text/plain)
2019-11-27 09:55 UTC, Vasishta
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 4560 0 'None' closed switch_to_containers: umount osd lockbox partition 2020-12-21 23:31:42 UTC
Github ceph ceph-ansible pull 4562 0 'None' closed switch_to_containers: umount osd lockbox partition (bp #4560) 2020-12-21 23:31:42 UTC
Github ceph ceph-ansible pull 4563 0 'None' closed switch_to_containers: umount osd lockbox partition (bp #4560) 2020-12-21 23:32:14 UTC
Github ceph ceph-ansible pull 4788 0 'None' closed switch_to_containers: fix umount ceph partitions 2020-12-21 23:32:14 UTC
Red Hat Product Errata RHSA-2019:4353 0 None None None 2019-12-19 17:59:11 UTC

Description Vasishta 2018-08-15 06:55:42 UTC
Created attachment 1476046 [details]
File contains contents of OSD journald log and contents of all.yml, inventory

Description of problem:
switch from rpm to containerized - OSDs not coming up after the switch saying encrypted device still in use

Version-Release number of selected component (if applicable):
ceph-ansible-3.1.0-0.1.rc17.el7cp.noarch
ceph-osd-12.2.5-38 
ceph-3.1-rhel-7-containers-candidate-38485-20180810211451

How reproducible:
(1/1)

Steps to Reproduce:
1. Configure rpm based cluster with atleast one dmcrypt OSD
2. run switch-from-non-containerized-to-containerized-ceph-daemons.yml

Actual results:
playbook failed OSDs are not coming up after new osd service is restarted by ansible-playbook

Expected results:
OSD must be up and running and playbook must coplete its run successfully

Comment 3 Vasishta 2018-08-15 07:00:33 UTC
Created attachment 1476048 [details]
File contains contents ansible-playbook log

ansible-playbook log

Comment 4 Sébastien Han 2018-08-20 15:29:56 UTC
Can I get access to one machine with the problem?

Comment 6 Sébastien Han 2018-08-21 10:16:54 UTC
I'm not sure how the ansible 'file' module works, but I suspect it does a run-through the directory and then applies the perms recursively. Then the object disappeared in between. But this is a different problem, I agree.

Comment 8 Giridhar Ramaraju 2019-08-05 13:08:48 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 9 Giridhar Ramaraju 2019-08-05 13:10:11 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 13 Vasishta 2019-11-27 09:49:49 UTC
Hi,

None of the OSD directories were unmounted as the task [1] was skipped on all directories. I suspect that the return value for "docker ps --filter='name=ceph-osd'" at [2] was expected to be something other than the actual -

"stdout_lines": [
        "CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES"
    ]


[1] https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml#L274
[2] https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml#L263


And thus the playbook is failing waiting for cluster to come active+clean, as OSDs are not coming up.
Moving to ASSIGNED state.


Regards,
Vasishta Shastry
QE, Ceph

Comment 14 Vasishta 2019-11-27 09:55:31 UTC
Created attachment 1640051 [details]
File contains playbook log

Comment 15 Vasishta 2019-11-27 10:16:04 UTC
> None of the OSD directories were unmounted as the task [1] was skipped on
> all directories. I suspect that the return value for "docker ps
> --filter='name=ceph-osd'" at [2] was expected to be something other than the
> actual -
> 
> "stdout_lines": [
>         "CONTAINER ID        IMAGE               COMMAND             CREATED
> STATUS              PORTS               NAMES"
>     ]
> 
> 
> [1]
> https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-
> playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.
> yml#L274
> [2]
> https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-
> playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.
> yml#L263

I think we are missing '-q' at [2]

Comment 17 Guillaume Abrioux 2019-11-27 12:19:33 UTC
(In reply to Vasishta from comment #15)
> > None of the OSD directories were unmounted as the task [1] was skipped on
> > all directories. I suspect that the return value for "docker ps
> > --filter='name=ceph-osd'" at [2] was expected to be something other than the
> > actual -
> > 
> > "stdout_lines": [
> >         "CONTAINER ID        IMAGE               COMMAND             CREATED
> > STATUS              PORTS               NAMES"
> >     ]
> > 
> > 
> > [1]
> > https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-
> > playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.
> > yml#L274
> > [2]
> > https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-
> > playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.
> > yml#L263
> 
> I think we are missing '-q' at [2]

I don't think so.

In fact, https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml#L263 always returns 0

eg:

-bash-4.2# docker ps -a -q --filter name=ceph-mon-mon0
ecb65415cf53
-bash-4.2# echo $?
0
-bash-4.2# docker ps -a -q --filter name=foo
-bash-4.2# echo $?
0
-bash-4.2#

Comment 22 Vasishta 2019-12-13 11:31:41 UTC
Hi, 

Still observing that after playbook tries to switch OSDs for which dmcrypt is enabled, OSDs won't start saying crypt devices are still in use.
Moving back to ASSIGNED state.

Was using ceph-ansible-3.2.37-1.el7cp.noarch

Regards,
Vasishta Shastry
QE, Ceph

Comment 28 errata-xmlrpc 2019-12-19 17:58:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:4353


Note You need to log in before you can comment on or make changes to this bug.