Bug 1571309
Summary: | overcloud deployment fails due to cinder-manage db sync timeout exceeded | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | bigswitch <rhosp-bugs-internal> | ||||||||||||
Component: | openstack-neutron | Assignee: | Assaf Muller <amuller> | ||||||||||||
Status: | CLOSED WORKSFORME | QA Contact: | Toni Freger <tfreger> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 12.0 (Pike) | CC: | abishop, amuller, chrisw, mburns, nyechiel, srevivo | ||||||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2018-05-02 14:01:30 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
bigswitch
2018-04-24 13:40:00 UTC
hi, the sosreport is larger than 20MB. is there an alternative way to upload/share it? Hello, Does this need extra info from our end to debug this further? Also, can you point to some doc that describes how to share attachments that are larger than 20MB? Thanks! Aditya Vaja This seems more of a neutron issue if cinder is failing due to a network connectivity problem. Another thing to note is that in OSP-12, cinder runs on the baremetal host. > Another thing to note is that in OSP-12, cinder runs on the baremetal host.
Yep - cinder is on baremetal, but glance-api is containerized.
And the error started happening after updating the container images to latest available tag. So I thought it would be somehow related :)
It could be a neutron issue. However, the IP configured for glance-api in cinder.conf is not observed in the setup (when checking by running `ip a` on controller and compute nodes). Not sure if its a VIP (virtual IP). Which is why I thought it might be a configuration issue for cinder.
Just my 2 cents.
- Aditya
Please attach an sosreport. Hi Assaf, Can you let me know how to attach or share sosreport which is greater than 20MB? Thanks! - Aditya ah, nvm. found it in the KB: https://access.redhat.com/solutions/2112 Created attachment 1427369 [details]
sosreport part 1
Created attachment 1427375 [details]
sosreport part 2
Created attachment 1427376 [details]
sosreport part 3
Created attachment 1427377 [details]
sosreport part 4
Hello Assaf, I've attached the sosreport by splitting it into 4 parts, since the file was larger than 20MB. Please let me know if I can provide any other information to help debug it. Thanks! - Aditya Hi, We found a similar issue in this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1452082 - which might be the same root cause. We have a single overcloud controller as well. Does that require a change to the deploy command? Our current deployment command looks like this: openstack overcloud deploy --templates -r /home/stack/templates/roles_data.yaml -e /home/stack/templates/node-info.yaml -e /home/stack/templates/overcloud_images.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/templates/network-environment.yaml -e /home/stack/templates/bigswitch-p.yaml -e /home/stack/templates/bigswitch_images.yaml --ntp-server 10.8.29.9 --timeout 150 Overriding using roles_data.yaml is only to enable extra service on compute for BSN. everything else is default. Please let us know if this helps and if we can provide more info. Thanks! - Aditya moving needinfo to target Assaf directly We tried HA setup as well . stil same problem where deployment failed step 3 (undercloud) [stack@rhosp12-director ~]$ openstack stack failures list --long overcloud | grep Error Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 "Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: Command exceeded timeout", "Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Command exceeded timeout", (undercloud) [stack@rhosp12-director ~]$ "Debug: Finishing transaction 55716760", "Debug: Storing state", "Debug: Stored state in 0.09 seconds", "Notice: Applied catalog in 330.24 seconds", "Debug: Applying settings catalog for sections reporting, metrics", "Debug: Finishing transaction 99257380", "Debug: Received report to process from overcloud-controller-0.bigswitch.com", "Debug: Processing report from overcloud-controller-0.bigswitch.com with processor Puppet::Reports::Store" ], "failed_when_result": true } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/69104930-415d-4e85-9e18-2210bebe06a7_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=4 changed=1 unreachable=0 failed=1 deploy_stderr: | issue seen after updating overcloud images with overcloud-full-latest-12.0.tar -> /usr/share/rhosp-director-images/overcloud-full-12.0-20180404.1.el7ost.tar Previous overcloud image dated "overcloud-full-12.0-20180126.1.el7ost.tar" worked fine. We aren't able to reproduce, I would advise opening a support ticket and working with GSS to figure this out. |