Update instance host and task state when post live migration fails If a live migration fails during the post processing it can lead to the instance being shutdown on the source node and left in a migrating task state. The instance is now running on the target node so the instance host and task state should be updated.
I talked with Joachim on IRC this morning - we mentioned coming up with a workaround, since the upstream fix might be touchy and take a while. Having looked over the sosreports and code, the only workaround I can come up with is manually updating instance.host in the database once a situation like this has been detected. It sucks, but once the system is in an inconsistent state, I don't see another way of fixing it. Also, be aware that because post_live_migration failed on the source, some other things didn't get done: * VIFs on the source weren't unplugged. * Port bindings weren't updated to reflect the instance being on the destination There are a bunch of other bugs that I believe are either identical or similar enough that it's worth it to analyse them all before coming up with a fix. That's my next step, and then I'll post a patch.
Since Jaochim asked on IRC, here's the list of BZs that I think are related/duplicates: * bz 1289858 * bz 1630771 * bz 1636280
Bz 1289858 and bz 1630771 are indeed identical, and are both caused by failures when calling Cinder - either Cinder itself, or something "in front" of Cinder, like Keystone or HAProxy. I've proposed [1] upstream. It's really "dumb", but by virtue of being "dumb" it's also simple and minimizes potential side effects. Since the single common root cause in all 3 bugs is the external API call, I've just wrapped it in a try/except. Let's see what the community thinks. [1] https://review.openstack.org/609517 PS: Bz 1636280 is similar but unrelated, and will need a different fix.
*** Bug 1630771 has been marked as a duplicate of this bug. ***
*** Bug 1289858 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0074