| Summary: | Most of the instances fail to boot. Need RCA | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jaison Raju <jraju> | ||||||||
| Component: | openstack-cinder | Assignee: | Jon Bernard <jobernar> | ||||||||
| Status: | CLOSED NOTABUG | QA Contact: | nlevinki <nlevinki> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 6.0 (Juno) | CC: | areis, berrange, dpeacock, eharney, esandeen, fahmed, fjayalat, jobernar, jraju, knoel, kwolf, lyarwood, pbandark, sgotliv, sputhenp, swhiteho, tbarron, yeylon | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | 8.0 (Liberty) | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2016-02-25 13:53:06 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Attachments: |
|
||||||||||
|
Description
Jaison Raju
2016-02-03 10:06:08 UTC
Created attachment 1120711 [details]
console
Created attachment 1120712 [details]
console
The only thing I can add is that the order of events is unclear (to me). If a guest filesystem was mounted on the compute host, that could certainly cause the corruption we're seeing. But if the corruption happened earlier, and the host mounting was an attempt to investigate and/or repair a previously corrupt filesystem, then this may not be the initial cause. Most of their instances (both Linux and Windows) fail to boot. We need to formulate an action plan to recover them together with Engineering. - Assess the current level of corruption and understand if this can be recovered. - If this can be recovered, create an action plan together with Engineering and execute. - I would prefer to have a Bomgar session with CEE, Engineering and Field team (Felix Tsang) to recover one Linux and one Windows vm. Then Engineering can disengage and GSS can do it for rest of the instances. We can then concentrate on RCA as sev2. Right now just heard from SA that they are trying rebuild these instances from images and RCA for this is first priority now so that we can prevent this from happening again. Rather than a dd of the first 10mb of the corrupted volume, please use the xfs_metadump tool to capture all metadata. If filenames and attributes are not considered sensitive information, please use the "-o" option. Then compress it and attach it to this bug. Thanks, -Eric Created attachment 1123080 [details] For the attachment to comment 26 This is not a Cinder and most probably not even an Openstack bug. Once you discover who is running an os-prober reopen and assign to the relevant component. |