Bug 1410894

Summary: Occasionally, UI does not update with correct information
Product: Red Hat OpenStack Reporter: Dan Trainor <dtrainor>
Component: puppet-tripleoAssignee: RHOS Maint <rhos-maint>
Status: CLOSED ERRATA QA Contact: nlevinki <nlevinki>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: dtrainor, jjoyce, jpichon, jschluet, jtomasek, rhel-osp-director-maint, slinaber, therve, tvignaud
Target Milestone: rc   
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-6.3.0-6.el7ost.noarch Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-17 19:54:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
websocket connection in console none

Description Dan Trainor 2017-01-06 18:05:59 UTC
Description of problem:

Occasionally, UI does not update its display to reflect changes made to the deployment, or present correct information regarding the state of the deployment configuration.

One example being, when nodes are imported, after clicking "Register Nodes", the dialogue never gets the message indicating that the nodes have been successfully defined.  The "Register Nodes" message spins, but refreshing the page shows that the nodes were successfully registered and available for introspection.

In another example, a successful "Plan overcloud deployment" screen will provide login details for the deployed Overcloud, but not properly populate the "Overcloud IP address", "Username", or "Password" fields.  

In either of these examples, refreshing the UI with the browser will correctly populate this information in the UI (eventually).  A few upstream bugs (linked to this bug) already exist for the examples given above, but they are likely related and part of a larger issue with Zaqar.

When this problem occurs, a connection error is logged in zaqar.log (with debugging turned on):

2017-01-06 11:49:51.241 5066 DEBUG zaqar.common.decorators [(None,) f1111a3b2aa347c28f68cd03da2c8935 9e7742d7ab384c06b75c7c8f5cc7b957 - - -] [project_id:9e7742d7ab384c06b75c7c8f5cc7b957] Messages collection POST: {"project_id": "9e7742d7ab384c06b75c7c8f5cc7b957", "queue_name": "tripleo"} wrapper /usr/lib/python2.7/site-packages/zaqar/common/decorators.py:49
2017-01-06 11:49:51.242 5066 DEBUG zaqar.common.pipeline [(None,) f1111a3b2aa347c28f68cd03da2c8935 9e7742d7ab384c06b75c7c8f5cc7b957 - - -] [project_id:9e7742d7ab384c06b75c7c8f5cc7b957] Stage <zaqar.storage.mongodb.messages.MessageQueueHandler object at 0x49c1dd0> does not implement get_metadata consumer /usr/lib/python2.7/site-packages/zaqar/common/pipeline.py:94
2017-01-06 11:49:51.247 5066 DEBUG zaqar.notification.notifier [(None,) f1111a3b2aa347c28f68cd03da2c8935 9e7742d7ab384c06b75c7c8f5cc7b957 - - -] [project_id:9e7742d7ab384c06b75c7c8f5cc7b957] Notifying subscriber {'confirmed': False, 'age': 2206, 'id': '586fc21180385743e1c390ad', 'subscriber': u'http://undercloud.localdomain:40329/719934cf-e06b-41d3-8e44-8a7f34f5a132', 'source': u'tripleo', 'ttl': 3600, 'options': {}} post /usr/lib/python2.7/site-packages/zaqar/notification/notifier.py:61
2017-01-06 11:49:51.248 5066 DEBUG zaqar.notification.notifier [(None,) f1111a3b2aa347c28f68cd03da2c8935 9e7742d7ab384c06b75c7c8f5cc7b957 - - -] [project_id:9e7742d7ab384c06b75c7c8f5cc7b957] Notifying subscriber {'confirmed': False, 'age': 1348, 'id': '586fc56b80385713c90549fe', 'subscriber': u'http://undercloud.localdomain:35674/62aa9552-304b-4eba-a22d-a9aa7749388a', 'source': u'tripleo', 'ttl': 3600, 'options': {}} post /usr/lib/python2.7/site-packages/zaqar/notification/notifier.py:61
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook [-] webhook task got exception: HTTPConnectionPool(host='undercloud.localdomain', port=40329): Max retries exceeded with url: /719934cf-e06b-41d3-8e44-8a7f34f5a132 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x4bb4a90>: Failed to establish a new connection: [Errno 111] Connection refused',)).
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook Traceback (most recent call last):
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook   File "/usr/lib/python2.7/site-packages/zaqar/notification/tasks/webhook.py", line 44, in execute
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook     headers=headers)
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook   File "/usr/lib/python2.7/site-packages/requests/api.py", line 111, in post
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook     return request('post', url, data=data, json=json, **kwargs)
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook   File "/usr/lib/python2.7/site-packages/requests/api.py", line 57, in request
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook     return session.request(method=method, url=url, **kwargs)
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 475, in request
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook     resp = self.send(prep, **send_kwargs)
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 585, in send
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook     r = adapter.send(request, **kwargs)
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook   File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 467, in send
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook     raise ConnectionError(e, request=request)
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook ConnectionError: HTTPConnectionPool(host='undercloud.localdomain', port=40329): Max retries exceeded with url: /719934cf-e06b-41d3-8e44-8a7f34f5a132 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x4bb4a90>: Failed to establish a new connection: [Errno 111] Connection refused',))
2017-01-06 11:49:51.269 5066 ERROR zaqar.notification.tasks.webhook


Version-Release number of selected component (if applicable):
openstack-tripleo-ui-1.0.5-3.el7ost.noarch
python-zaqarclient-1.2.0-2.el7ost.noarch
openstack-zaqar-3.0.0-3.el7ost.noarch
puppet-zaqar-9.4.0-1.el7ost.noarch

How reproducible:
Most always, no specific patterns or ways to specifically reproduce 100% of the time


Steps to Reproduce:
1. Define nodes or do a successful deployment in UI
2. Note that node definition doesn't appear to complete or information is missing during the successful deployment notification

Actual results:
Some node or deployment related data may not appear in the UI


Expected results:
All node or deployment related data appearing in the UI


Additional info:

Comment 1 Thomas Hervé 2017-01-09 09:19:54 UTC
I believe the traceback is just noise. What happens is that zaqar-server got restarted, so it generated a new internal port. We can see that there are 2 existing subscriptions, so the UI presumably managed to connect again after the restart and subscribe again.

I tested locally, and even with one of the subscription failing, the other gets the messages properly.

That leaves a couple of questions:
 * Why does Zaqar get restarted?
 * Is there only one instance of the UI when your issue happens?
 * Is the UI able to reconnect properly to Zaqar?

Comment 2 Jiri Tomasek 2017-01-11 16:32:11 UTC
If websocket connection disconnects, it is possible to find out in browser console in Network tab (see attachment). It would be nice if you could verify that websocket connection is running when you experience this bug. GUI is not currently capable to reconnect (without doing full page refresh). When error occurs with websocket connection, GUI should notify about it.

Comment 3 Jiri Tomasek 2017-01-11 16:32:53 UTC
Created attachment 1239536 [details]
websocket connection in console

Comment 4 Ola Pavlenko 2017-02-01 16:07:05 UTC
Dan,

Does Jirka's helps? Could you please re-test?

Comment 5 Dan Trainor 2017-04-11 16:23:53 UTC
Adding a tunnel timeout was necessary[0] to keep the Zaqar connection open for longer than the default of 2 seconds.  This timeout was added in the haproxy container for HTTPS, since services (such as Zaqar) are accessed via modified URL over HTTPS[1] (e.g. https://undercloud:443/zaqar).

---

[0] https://review.openstack.org/#/c/453127/
[1] https://blueprints.launchpad.net/tripleo/+spec/proxy-undercloud-api-services

Comment 6 Dan Trainor 2017-04-11 16:29:13 UTC
Works in puppet-tripleo-6.3.0-6.el7ost.noarch

Comment 9 errata-xmlrpc 2017-05-17 19:54:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245