Bug 1418010
Summary: | Overcloud upgrade to RHEL 7.3 is failing | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eduard Barrera <ebarrera> |
Component: | openstack-tripleo | Assignee: | Marios Andreou <mandreou> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Arik Chernetsky <achernet> |
Severity: | unspecified | Docs Contact: | |
Priority: | high | ||
Version: | 8.0 (Liberty) | CC: | apetrich, aschultz, augol, bfournie, djuran, hjensas, jslagle, mandreou, mburns, mcornea, mschuppe, rhel-osp-director-maint, sathlang, therve |
Target Milestone: | async | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-08 10:38:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eduard Barrera
2017-01-31 15:28:20 UTC
Eduard - can you add the contents of /var/log/messages from the node that is having problems (controller0). Thanks. Thanks Eduard. It would also be useful to see the status of interfaces via 'ip a' when you logged in and "found out controller0 with all the interfaces down". I wasn't able to correlate that to the sosreports. Also can you run the command "neutron agent-list" on the undercloud? It would be good if you can run the command both after sourcing 'stackrc' and again after sourcing 'overcloudrc' after the deployment finishes. Some thoughts upon looking through the logs... 1. The debug output shows these error messages: \nERROR bb01-ctrl0 failed to join cluster in 600 seconds\n", "deploy_stderr": "Error: unable to start corosync\nError: cluster is not currently running on this node 2. I'm seeing a log of neutron connectivity related issues in /var/log/messages on the controller, not sure if this is just due to upgrading as other neutron agents may be down periodically, but there are many of these type of messages: 14:10:26 bb01-ctrl0 ceilometer-polling: 2017-02-01 14:10:26.508 17552 ERROR ceilometer.neutron_client [-] internalURL endpoint for network service not found (these eventually went away...) Feb 1 15:16:02 bb01-ctrl0 glance-api: 2017-02-01 15:16:02.795 19026 ERROR glance.registry.client.v1.client [req-a9091770-5ac6-4408-aadc-f5a809e7b985 13fe9767454c4ca6a2f618a1e61c878a d41ceb78a14d46b79ab4140a731d75ec - - -] Registry client request GET /images/1fd81867-561f-43dd-96e3-89fd8487c67b raised NotFound Feb 1 15:33:12 bb01-ctrl0 neutron-lbaasv2-agent: 2017-02-01 15:33:12.741 2868 ERROR neutron.common.rpc [-] Timeout in RPC method get_ready_devices. Waiting for 47 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough Feb 1 15:33:59 bb01-ctrl0 neutron-lbaasv2-agent: 2017-02-01 15:33:59.861 2868 ERROR neutron_lbaas.agent.agent_manager [-] Unable to retrieve ready devices 3. I don't really see a connection to this issue and the initscripts bug https://bugzilla.redhat.com/show_bug.cgi?id=1367580, mainly because I don't see NM bringing down any interfaces. In the bug we have interface down messages: Aug 9 20:44:44 overcloud-controller-1 nm-dispatcher: Dispatching action 'down' for br-ex but I don't see interfaces being brought in the logs. 4. The supplied journal for NetworkManager on controller doesn't appear to have captured any problems. 5. The update stayed at "In Progress" for until it was terminated and all of the resources had this message: UPDATE paused until Hook pre-update is cleared The status of the bug is still on new, though from the comments I see that the issue was resolved, if there is a need to track something here, please raise a new bug. |