Bug 1572017
Summary: | Undercloud neutron and heat db corrupted after commenting out network-isolation.yaml from answers.yaml and doing overcloud deploy | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Stan Toporek <stoporek> |
Component: | openstack-heat | Assignee: | Thomas Hervé <therve> |
Status: | CLOSED WORKSFORME | QA Contact: | Ronnie Rasouli <rrasouli> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 12.0 (Pike) | CC: | bfournie, bhaley, emacchi, mbayer, mburns, sbaker, shardy, srevivo, stoporek, therve |
Target Milestone: | --- | Keywords: | Triaged, ZStream |
Target Release: | 12.0 (Pike) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-10-03 06:56:48 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Stan Toporek
2018-04-26 00:58:30 UTC
Several things here: 1) "A update of the overcloud was attempted and failed" We'll need a dedicated BZ about this update failure and investigate properly. 2) "A suggestion was made to comment out the network-isolation.yaml from answers.yaml" Who made the suggestion? Can you also share your templates? 3) "This caused the undercloud neutron and heat db's corrupted during another update of the overcloud" I'm not sure about that statement, what made you think the database was corrupted? What was the symptom? Before going to the next steps, I would like answers to these questions, so we can efficiently help you. Created attachment 1427695 [details]
Heat Templates
Answer to 1: In contact with Andrew Ludwar about redeploy failure and opening another bug 1572686 Answer to 2: Provided and templates loaded to bug Answer to 3: Redeploy created new UUID's for subnet and attempted to build new subnets. Subnets existed so creation failed, but new UUID's replaced the original UUID's from the first deploy in the neutron and heat databases. That is the core of the issue. I cannot upload databases due to size limitations. I will attempt to update bug with a diff of the databases using mysqldbcompare. Any updates? Cannot recreate condition in a lab instance of OSP 12 stack that creates new networks by removing network-isolation.yaml from deploy. I could recreate the first deployment error from bugzilla 1572686. I then did a deploy without network-isolation.yaml and it worked without changing any networks or subnets UUID's. Will continue to try to replicate issue. Any updates on how to help the customer recover from this? They need to add more storage and compute nodes. Is there a way to get the data to reconstruct the heat db from the overcloud db's using sql. Could we extract the data and reconstruct in into a database we could recover into the undercloud db? I am sure this is not the last time this situation will happen so having a solution would be very helpful in the future. Created attachment 1441229 [details]
Production overcloud db of running stack
Possible source of network uuid's needed for heat template db recovery
Thanks, Thomas. Who do we need to look at the neutron database? Is this still an issue, and is there any way I can help? Sorry, maybe my question should have been - given there are two neutron DBs linked here, what are the problems starting-up, or what are the differences we need to investigate? I'm guessing we have to make sure things are synced correctly based on the old heat template? Please let me know if we need to setup a call as it's still not clear to me what needs to be done with the DBs yet. We fixed the database issue. Now we're getting some issues with Ceph, but hopefully they are close to be handled. |