Bug 1230759
Summary: | nova fails to evacuate instance due to invalid shared storage state | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Fabio Massimo Di Nitto <fdinitto> | |
Component: | openstack-nova | Assignee: | Vladik Romanovsky <vromanso> | |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | nlevinki <nlevinki> | |
Severity: | high | Docs Contact: | ||
Priority: | medium | |||
Version: | 7.0 (Kilo) | CC: | berrange, dasmith, eglynn, fdinitto, jschluet, kchamart, rscarazz, sbauza, sferdjao, sgordon, srevivo, vromanso | |
Target Milestone: | z5 | Keywords: | ZStream | |
Target Release: | 7.0 (Kilo) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1295603 (view as bug list) | Environment: | ||
Last Closed: | 2017-07-18 13:45:53 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1185030, 1251948, 1261487, 1295603 |
Description
Fabio Massimo Di Nitto
2015-06-11 13:42:20 UTC
I have been able to trigger this problem also with shared storage. Raising severity. I have tested the scratch build provided to me here: http://download.devel.redhat.com/brewroot/work/tasks/7275/9347275/ that is supposed to be 2015.1.0-4 + the fix for #1230237 and I have tested successfully failover and creations of Instances for over 5 hours without any glitch. I can only suspect a regression between .4 and .9 at this point. One extra piece of information that might be useful. When I first switched from local to shared storage with .8+patch build I followed this process: 1) stop nova everywhere 2) wipe clean /var/lib/nova/instances on all nodes 3) mounted the NFS export to /var/lib/nova/instances (it was already clean) 4) started nova again across the board I recall, pretty clearly that /var/lib/nova/instances/compute_nodes file was NOT there. I was looking for it for curiosity (since I saw it on non-shared-storage installation) and I was interested to see how the contents change with shared-storage. I thought that was normal and not given any weight to it. After rolling back to .4+patch (stop everything, wipe everything, downgrade, start), now the file is there with all relevant info about registered compute-nodes that can access a given shared storage. Perhaps that could be part of the reason why we see the problem with shared storage. Maybe it's not relevant at all, but I thought it might good to know anyway. After a full redeploy with .10 packages, i have been unable to reproduce this problem (with shared storage). I am lowering the priority, even tho the severity remains unchanged (due to potential impact on customer). I suspect that the move from non-shared to shared storage did confuse internal status of affairs (even tho all /var/lib/instances were properly wiped while services were in shutdown). On a fresh install the problem is not happening. Perhaps here is a flag somewhere in the db that´s not updated properly? just a guess at this point. Hi Fabio, any further re-occurrences of this? I haven´t seen it since comment #6 with shared storage. No testing has been done without shared storage. Since we haven't had any reports of this being re-produced since https://bugzilla.redhat.com/show_bug.cgi?id=1230759#c6 where Fabio notes he was not seeing it with .10 version of the packages I am closing this. Please re-open if this issue re-occurs. |