Bug 1462306
Summary: | Failed create Instance under load: Remote error: NoSuchColumnError | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Yuri Obshansky <yobshans> | ||||||
Component: | openstack-nova | Assignee: | Eoghan Glynn <eglynn> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | Joe H. Rahme <jhakimra> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 11.0 (Ocata) | CC: | berrange, dasmith, eglynn, kchamart, sbauza, sferdjao, sgordon, srevivo, vromanso | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-06-30 14:15:16 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Yuri Obshansky
2017-06-16 16:56:20 UTC
Created attachment 1288416 [details]
controller nova-conductor.log
Created attachment 1288417 [details]
controller mysqld.log
Your conductor log indicates two things: 1. Occasionally nova times out waiting for neutron 2. A _lot_ of database traffic is taking a very long time to complete Both of these could come from purely overwhelming the system (maybe the database?) with too much traffic. The NoSuchColumnError sounds like incomplete setup to me, as that should never be possible unless the schema doesn't match the code. Since you didn't provide that whole log, it's hard to draw much of a conclusion from what you have provided. I would double check your deployment and make sure that you have sync'd your schema levels to match the code on all databases. For diagnosing the issues in the conductor log, I would start by checking the database load to see if it's beyond a reasonable level for your deployment. Next, I would figure out why neutron is timing out and try to resolve that. I'm not sure what your deployment (hardware) looks like, but 20 parallel threads of the load you described is a LOT of traffic. Hi Dan, I resend to you mail which describe tests configuration and reports. Unfortunately, I cannot attach it to bug. But, I'll retest all starting from June 25 and update you with result Regarding traffic, I performed the same test on the same hardware on RHOS9 and 10 without failures. Bug reproduce only in RHOS11. And I don't think, this a LOT traffic. ONLY 20 threads for 3 controllers and 6 computes where each server is Dell Inc. PowerEdge R620/0KCKR5, Red Hat Enterprise Linux Server release 7.3 (Maipo) 24 x Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz 65758316 kB Yuri Thanks for the email context. It's definitely a lot of traffic. Knowing that the same deployment was able to handle it in previous releases is a good data point. The errors in the conductor log are almost definitely related to stress on the database (regardless of where it's coming from). It sounds like the people who have replied on your mail thread have some ideas to resolve that. The NoSuchColumn error is structural. It should either always happen, or never happen. It indicates that the schema of some database we're talking to doesn't match what we expect it to be. I can't really think of any reason that would be dependent on load. Because the other errors indicate an overstressed database, I would say you should resolve those things and then see if we're still hitting the NoSuchColumn error. (In reply to Dan Smith from comment #5) > Thanks for the email context. > > It's definitely a lot of traffic. Knowing that the same deployment was able > to handle it in previous releases is a good data point. > > The errors in the conductor log are almost definitely related to stress on > the database (regardless of where it's coming from). It sounds like the > people who have replied on your mail thread have some ideas to resolve that. I'm going to monitor database and update all with results > > The NoSuchColumn error is structural. It should either always happen, or > never happen. It indicates that the schema of some database we're talking to > doesn't match what we expect it to be. I can't really think of any reason > that would be dependent on load. > I'll ping you to check the database structure when deploy rhos11 on next week. Is it OK? > Because the other errors indicate an overstressed database, I would say you > should resolve those things and then see if we're still hitting the > NoSuchColumn error. If you reproduce this, please re-open and needinfo me. We're going to close this to get it off our dashboard. |