Bug 1505011
Summary: | [Ceph-Ansible 3.0.3-1.el7cp] dont default restart machines during purge-cluster | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Vasu Kulkarni <vakulkar> |
Component: | Ceph-Ansible | Assignee: | Sébastien Han <shan> |
Status: | CLOSED ERRATA | QA Contact: | Vasu Kulkarni <vakulkar> |
Severity: | medium | Docs Contact: | |
Priority: | urgent | ||
Version: | 3.0 | CC: | adeza, anharris, aschoen, ceph-eng-bugs, ceph-qe-bugs, gmeno, hnallurv, kdreyer, nthomas, sankarshan |
Target Milestone: | rc | ||
Target Release: | 3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | RHEL: ceph-ansible-3.0.6-1.el7cp Ubuntu: ceph-ansible_3.0.6-2redhat1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-12-05 23:48:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Vasu Kulkarni
2017-10-21 02:37:09 UTC
The restart is definitely not optimal and should be removed if possible. This is a bad workaround, meaning we are giving up on finding the root cause of not being able to delete /var/lib/ceph. Our CI does not reproduce that, it'd be nice to investigate in your env. At least to understand why this directory cannot be removed. Thanks in advance for your help. Sebastein, I think the issue is in Ceph itself and I think it pop ups now and then as its highly dependent on the order of purge and what background tasks it is running, We have seen this problem in ceph-deploy too where due to few monitor lock in /var/lib/ceph/mon it can't cleanup that dir. I believe you could fail the purge-cluster here as well if it fails to cleanup /var/lib/ceph and they can rerun it again with reboot enabled which could be documented. A recent example from ceph-deploy where few tests failed due to stale locks on /var/lib/ceph/mon even after purge http://pulpito.ceph.com/teuthology-2017-10-21_05:55:02-ceph-deploy-luminous-distro-basic-vps/ Command failed on vpm091 with status 1: 'sudo tar cz -f /tmp/tmpPRsXUl -C /var/lib/ceph/mon -- .' Moved this to 3.0 since it looks like it's blocking Vasu's testing. https://github.com/ceph/ceph-ansible/releases/tag/v3.0.6 contains the fix, Ken please build a new package. Thanks. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387 |