Created attachment 1476046 [details] File contains contents of OSD journald log and contents of all.yml, inventory Description of problem: switch from rpm to containerized - OSDs not coming up after the switch saying encrypted device still in use Version-Release number of selected component (if applicable): ceph-ansible-3.1.0-0.1.rc17.el7cp.noarch ceph-osd-12.2.5-38 ceph-3.1-rhel-7-containers-candidate-38485-20180810211451 How reproducible: (1/1) Steps to Reproduce: 1. Configure rpm based cluster with atleast one dmcrypt OSD 2. run switch-from-non-containerized-to-containerized-ceph-daemons.yml Actual results: playbook failed OSDs are not coming up after new osd service is restarted by ansible-playbook Expected results: OSD must be up and running and playbook must coplete its run successfully
Created attachment 1476048 [details] File contains contents ansible-playbook log ansible-playbook log
Can I get access to one machine with the problem?
I'm not sure how the ansible 'file' module works, but I suspect it does a run-through the directory and then applies the perms recursively. Then the object disappeared in between. But this is a different problem, I agree.
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. Regards, Giri
Hi, None of the OSD directories were unmounted as the task [1] was skipped on all directories. I suspect that the return value for "docker ps --filter='name=ceph-osd'" at [2] was expected to be something other than the actual - "stdout_lines": [ "CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES" ] [1] https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml#L274 [2] https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml#L263 And thus the playbook is failing waiting for cluster to come active+clean, as OSDs are not coming up. Moving to ASSIGNED state. Regards, Vasishta Shastry QE, Ceph
Created attachment 1640051 [details] File contains playbook log
> None of the OSD directories were unmounted as the task [1] was skipped on > all directories. I suspect that the return value for "docker ps > --filter='name=ceph-osd'" at [2] was expected to be something other than the > actual - > > "stdout_lines": [ > "CONTAINER ID IMAGE COMMAND CREATED > STATUS PORTS NAMES" > ] > > > [1] > https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure- > playbooks/switch-from-non-containerized-to-containerized-ceph-daemons. > yml#L274 > [2] > https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure- > playbooks/switch-from-non-containerized-to-containerized-ceph-daemons. > yml#L263 I think we are missing '-q' at [2]
(In reply to Vasishta from comment #15) > > None of the OSD directories were unmounted as the task [1] was skipped on > > all directories. I suspect that the return value for "docker ps > > --filter='name=ceph-osd'" at [2] was expected to be something other than the > > actual - > > > > "stdout_lines": [ > > "CONTAINER ID IMAGE COMMAND CREATED > > STATUS PORTS NAMES" > > ] > > > > > > [1] > > https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure- > > playbooks/switch-from-non-containerized-to-containerized-ceph-daemons. > > yml#L274 > > [2] > > https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure- > > playbooks/switch-from-non-containerized-to-containerized-ceph-daemons. > > yml#L263 > > I think we are missing '-q' at [2] I don't think so. In fact, https://github.com/ceph/ceph-ansible/blob/stable-3.2/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml#L263 always returns 0 eg: -bash-4.2# docker ps -a -q --filter name=ceph-mon-mon0 ecb65415cf53 -bash-4.2# echo $? 0 -bash-4.2# docker ps -a -q --filter name=foo -bash-4.2# echo $? 0 -bash-4.2#
Hi, Still observing that after playbook tries to switch OSDs for which dmcrypt is enabled, OSDs won't start saying crypt devices are still in use. Moving back to ASSIGNED state. Was using ceph-ansible-3.2.37-1.el7cp.noarch Regards, Vasishta Shastry QE, Ceph
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:4353