Bug 1616159

Summary: [ceph-ansible] [ceph-container] : switch from rpm to containerized - OSDs not coming up after the switch saying encrypted device still in use
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: Ceph-AnsibleAssignee: Dimitri Savineau <dsavinea>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1CC: aschoen, ceph-eng-bugs, gabrioux, gmeno, hnallurv, nthomas, pasik, tserlin, vashastr
Target Milestone: z3   
Target Release: 3.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-ansible-3.2.37-1.el7cp Ubuntu: ceph-ansible_3.2.37-2redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1896392 (view as bug list) Environment:
Last Closed: 2019-12-19 17:58:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1628763    
Attachments:
Description Flags
File contains contents of OSD journald log and contents of all.yml, inventory
none
File contains contents ansible-playbook log
none
File contains playbook log none

Description Vasishta 2018-08-15 06:55:42 UTC
Created attachment 1476046 [details]
File contains contents of OSD journald log and contents of all.yml, inventory

Description of problem:
switch from rpm to containerized - OSDs not coming up after the switch saying encrypted device still in use

Version-Release number of selected component (if applicable):
ceph-ansible-3.1.0-0.1.rc17.el7cp.noarch
ceph-osd-12.2.5-38 
ceph-3.1-rhel-7-containers-candidate-38485-20180810211451

How reproducible:
(1/1)

Steps to Reproduce:
1. Configure rpm based cluster with atleast one dmcrypt OSD
2. run switch-from-non-containerized-to-containerized-ceph-daemons.yml

Actual results:
playbook failed OSDs are not coming up after new osd service is restarted by ansible-playbook

Expected results:
OSD must be up and running and playbook must coplete its run successfully

Comment 3 Vasishta 2018-08-15 07:00:33 UTC
Created attachment 1476048 [details]
File contains contents ansible-playbook log

ansible-playbook log

Comment 4 Sébastien Han 2018-08-20 15:29:56 UTC
Can I get access to one machine with the problem?

Comment 6 Sébastien Han 2018-08-21 10:16:54 UTC
I'm not sure how the ansible 'file' module works, but I suspect it does a run-through the directory and then applies the perms recursively. Then the object disappeared in between. But this is a different problem, I agree.

Comment 8 Giridhar Ramaraju 2019-08-05 13:08:48 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 9 Giridhar Ramaraju 2019-08-05 13:10:11 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 13 Vasishta 2019-11-27 09:49:49 UTC
Hi,

None of the OSD directories were unmounted as the task [1] was skipped on all directories. I suspect that the return value for "docker ps --filter='name=ceph-osd'" at [2] was expected to be something other than the actual -

"stdout_lines": [
        "CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES"
    ]


[1] https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml#L274
[2] https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml#L263


And thus the playbook is failing waiting for cluster to come active+clean, as OSDs are not coming up.
Moving to ASSIGNED state.


Regards,
Vasishta Shastry
QE, Ceph

Comment 14 Vasishta 2019-11-27 09:55:31 UTC
Created attachment 1640051 [details]
File contains playbook log

Comment 15 Vasishta 2019-11-27 10:16:04 UTC
> None of the OSD directories were unmounted as the task [1] was skipped on
> all directories. I suspect that the return value for "docker ps
> --filter='name=ceph-osd'" at [2] was expected to be something other than the
> actual -
> 
> "stdout_lines": [
>         "CONTAINER ID        IMAGE               COMMAND             CREATED
> STATUS              PORTS               NAMES"
>     ]
> 
> 
> [1]
> https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-
> playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.
> yml#L274
> [2]
> https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-
> playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.
> yml#L263

I think we are missing '-q' at [2]

Comment 17 Guillaume Abrioux 2019-11-27 12:19:33 UTC
(In reply to Vasishta from comment #15)
> > None of the OSD directories were unmounted as the task [1] was skipped on
> > all directories. I suspect that the return value for "docker ps
> > --filter='name=ceph-osd'" at [2] was expected to be something other than the
> > actual -
> > 
> > "stdout_lines": [
> >         "CONTAINER ID        IMAGE               COMMAND             CREATED
> > STATUS              PORTS               NAMES"
> >     ]
> > 
> > 
> > [1]
> > https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-
> > playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.
> > yml#L274
> > [2]
> > https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-
> > playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.
> > yml#L263
> 
> I think we are missing '-q' at [2]

I don't think so.

In fact, https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml#L263 always returns 0

eg:

-bash-4.2# docker ps -a -q --filter name=ceph-mon-mon0
ecb65415cf53
-bash-4.2# echo $?
0
-bash-4.2# docker ps -a -q --filter name=foo
-bash-4.2# echo $?
0
-bash-4.2#

Comment 22 Vasishta 2019-12-13 11:31:41 UTC
Hi, 

Still observing that after playbook tries to switch OSDs for which dmcrypt is enabled, OSDs won't start saying crypt devices are still in use.
Moving back to ASSIGNED state.

Was using ceph-ansible-3.2.37-1.el7cp.noarch

Regards,
Vasishta Shastry
QE, Ceph

Comment 28 errata-xmlrpc 2019-12-19 17:58:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:4353