Hide Forgot
> Description of problem: If a cluster is under such a heavy load that the pcsd daemon is unable to provide useful daata, the jquery/ember framework inside a browser instance breaks in a way that it will always fail showing cluster status even if full data is provided by pcsd later. The only way to get out of this is to do a full page reload. The supposedly "guilty" JSON response looks like this: {"cluster_name":"STSRHTS29046","error_list":[],"warning_list":[],"quorate":false,"status":"error","node_list":[{"name":"virt-131","status":"unknown","warning_list":[],"error_list":[]},{"name":"virt-123","status":"unknown","warning_list":[],"error_list":[]}],"resource_list":[],"available_features":[]} Following javascript error can be observed in browser's console: Uncaught TypeError: Cannot read property 'length' of undefined at Function.each (https://virt-123.cluster-qe.lab.eng.brq.redhat.com:2224/js/jquery-1.9.1.min.js:3:5102) at Class.<anonymous> (https://virt-123.cluster-qe.lab.eng.brq.redhat.com:2224/js/nodes-ember.js:812:7) at ComputedPropertyPrototype.get (https://virt-123.cluster-qe.lab.eng.brq.redhat.com:2224/js/ember-1.4.0.js:4628:38) at get (https://virt-123.cluster-qe.lab.eng.brq.redhat.com:2224/js/ember-1.4.0.js:1916:17) at Ember._getPath (https://virt-123.cluster-qe.lab.eng.brq.redhat.com:2224/js/ember-1.4.0.js:1994:12) at get (https://virt-123.cluster-qe.lab.eng.brq.redhat.com:2224/js/ember-1.4.0.js:1911:12) at getWithGlobals (https://virt-123.cluster-qe.lab.eng.brq.redhat.com:2224/js/ember-1.4.0.js:6812:10) at Binding._sync (https://virt-123.cluster-qe.lab.eng.brq.redhat.com:2224/js/ember-1.4.0.js:7007:23) at DeferredActionQueues.flush (https://virt-123.cluster-qe.lab.eng.brq.redhat.com:2224/js/ember-1.4.0.js:5678:24) at Backburner.end (https://virt-123.cluster-qe.lab.eng.brq.redhat.com:2224/js/ember-1.4.0.js:5769:27) > Version-Release number of selected component (if applicable): pcs-0.9.152-10.el7.x86_64 > How reproducible: Sometimes (depends on the possibility of getting an almost-empty json response) > Steps to Reproduce: 1. In this it was enough to follow setup case from bug 1395959 2. Halt one of the nodes 3. Open cluster details in a web interface served from the running node 4. Use browser's inspector to watch the requests > Actual results: The first cluster_status request returns empty JSON with no useful data and a JS error is shown in a debug console. Even if any other subsequent cluster_status actually returns a full JSON response (with size over 300kB in my case) the cluster management page still won't show any cluster details though. > Expected results: The cluster management page is updated as soon as the browser receives any valid response. > Additional info: A snap from the chrome's network log: main 200 document Other 147 KB 1.51 s style.css 304 stylesheet main:4 344 B 36 ms overpass.css 304 stylesheet main:5 344 B 50 ms liberation.css 304 stylesheet main:6 344 B 64 ms jquery-ui-1.10.1.custom.css 304 stylesheet main:7 344 B 80 ms jquery-1.9.1.min.js 304 script main:12 344 B 155 ms jquery-ui-1.10.1.custom.min.js 304 script main:13 344 B 95 ms handlebars-v1.2.1.js 304 script main:14 344 B 151 ms ember-1.4.0.js 304 script main:15 344 B 108 ms pcsd.js 304 script main:16 344 B 137 ms nodes-ember.js 304 script main:1252 344 B 103 ms overpass_regular-web.woff 200 font jquery-1.9.1.min.js:5 (from memory cache) 0 ms overpass_bold-web.woff 200 font jquery-1.9.1.min.js:5 (from memory cache) 0 ms LiberationSans-Regular.ttf 200 font jquery-1.9.1.min.js:5 (from memory cache) 0 ms LiberationSans-Bold.ttf 200 font jquery-1.9.1.min.js:5 (from memory cache) 0 ms ui-bg_inset-soft_25_000000_1x100.png 200 png jquery-1.9.1.min.js:5 (from memory cache) 0 ms ui-bg_gloss-wave_25_333333_500x100.png 200 png jquery-1.9.1.min.js:5 (from memory cache) 0 ms pbar-ani.gif 200 gif jquery-1.9.1.min.js:5 (from memory cache) 0 ms ui-icons_cccccc_256x240.png 200 png jquery-1.9.1.min.js:5 (from memory cache) 0 ms ui-bg_glass_20_555555_1x400.png 200 png jquery-1.9.1.min.js:5 (from memory cache) 0 ms ui-bg_flat_50_5c5c5c_40x100.png 200 png jquery-1.9.1.min.js:5 (from memory cache) 0 ms ui-bg_glass_40_0078a3_1x400.png 200 png jquery-1.9.1.min.js:3 (from memory cache) 0 ms ui-icons_ffffff_256x240.png 200 png jquery-1.9.1.min.js:3 (from memory cache) 0 ms cluster_status 200 xhr jquery-1.9.1.min.js:5 757 B 30.18 s HAM-logo.png 304 png ember-1.4.0.js:20860 344 B 31 ms get_resource_agent_metadata?agent=ocf%3Aheartbeat%3Aapache 200 xhr jquery-1.9.1.min.js:5 5.0 KB 3.49 s Shell_bg.png 200 png main:-Infinity (from memory cache) 0 ms action-icons.png 200 png main:-Infinity (from memory cache) 0 ms field_bg.png 200 png main:-Infinity (from memory cache) 0 ms get_resource_agent_metadata?agent=ocf%3Aheartbeat%3Aapache 200 xhr jquery-1.9.1.min.js:5 5.0 KB 5.98 s get_fence_agent_metadata?agent=stonith%3Afence_apc 200 xhr jquery-1.9.1.min.js:5 14.4 KB 4.05 s favicon.ico 200 vnd.microsoft.icon Other 759 B 134 ms cluster_properties 200 xhr jquery-1.9.1.min.js:5 12.8 KB 2.77 s cluster_status 200 xhr jquery-1.9.1.min.js:5 315 KB 29.52 s get_fence_agent_metadata?agent=stonith%3Afence_xvm 200 xhr jquery-1.9.1.min.js:5 13.3 KB 1.66 s get_resource_agent_metadata?agent=ocf%3Aheartbeat%3ADummy 200 xhr jquery-1.9.1.min.js:5 1.5 KB 1.74 s cluster_status 200 xhr jquery-1.9.1.min.js:5 315 KB 15.31 s
It looks like the crash occurs on line: $.each(self.get("group_list"), function(_, group) { in nodes-ember.js when Pcs.resourcesContainer.group_list is not an array. This should be easy to fix. We can return an empty list from the groups_enum function if we cannot get the groups. But it would be better to fix the issue at its roots instead of where it manifests itself. Ondrej, can you take a look at this and fix it in the place where the fix fits the best? Thanks!
Upstream patch: https://github.com/ClusterLabs/pcs/commit/51580b1b38745b29e97229a2e938694ac4166b8 TEST: 2-node cluster, nodes: rhel7-node1, rhel7-node2 Open nodes page of tested cluster from web UI of node which is not part of the cluster. Block port 2224 (pcsd) on both cluster nodes: [root@rhel7-node1 ~]# iptables -I OUTPUT -p tcp --dport 2224 -j DROP [root@rhel7-node1 ~]# iptables -I INPUT -p tcp --dport 2224 -j DROP [root@rhel7-node2 ~]# iptables -I OUTPUT -p tcp --dport 2224 -j DROP [root@rhel7-node2 ~]# iptables -I INPUT -p tcp --dport 2224 -j DROP After next update of web UI, cluster nodes are marked as offline and there is no JS error in JS console.
After Fix: [vm-rhel72-1 ~] $ rpm -q pcs pcs-0.9.156-1.el7.x86_64 see comment 4
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1958