Bug 1624341
| Summary: | All OSDs down after OSP FFU | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Gregory Charot <gcharot> | ||||||||
| Component: | Container | Assignee: | Erwan Velu <evelu> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Vasishta <vashastr> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 3.1 | CC: | ceph-eng-bugs, evelu, gabrioux, gfidente, hnallurv, mbracho, shan | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | 3.2 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2019-01-09 12:22:52 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1578730 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Gregory Charot
2018-08-31 09:23:43 UTC
Created attachment 1480063 [details]
osd-logs
osd-logs - error at the end of the file
Created attachment 1480065 [details]
ceph-ansible-logs
ceph-ansible logs from mistral
Created attachment 1480066 [details]
THT
@Gregory, If you want this bug to be in 3.1 release notes, please add 1584264 in the blocks field. Currently this bug is not targeted to 3.1 (GA Sep 12th). I investigated that issue and found some improvements to make to avoid this situation. https://github.com/ceph/ceph-container/pull/1179 Therefore, I don't know what is the default gracetime we have in the product but I'd suggest to have at least 30 secs to avoid docker sending a sigkill too soon. Please, can anyone consider checking if the default gracetime can be increase too ? Improving our code is fine but it would be more secured to increase it also. Latest available version is ceph-ansible-3.2.0-1.el7cp from http://access.redhat.com/errata/RHBA-2019:0020 |