Created attachment 1361526 [details] 'NONE' cluster id Description of problem: ======================= On a one-day old web-admin server, attempt to login resulted in 'forbidden' error. Web-admin server was unpingable, and hence we logged in to the hypervisor to see if any resource (memory/cpu) was getting constrained. Everything seemed to be within limits for the hypervisor. And then 'free -m', 'df' command resulted in 'Input/output error: Read only file system'. With no other command giving any other output, we were forced to reboot the hypervisor. Once it was up, we powered on the Web-admin VM and restarted tendrl-node-agent and tendrl-gluster-integration services. All the hosts and clusters showed up on the UI but with the existing cluster entries list we can also see a new entry 'NONE' in the cluster id field. Version-Release number of selected component (if applicable): tendrl-grafana-plugins-1.5.4-8.el7rhgs.noarch tendrl-selinux-1.5.4-1.el7rhgs.noarch tendrl-node-agent-1.5.4-8.el7rhgs.noarch tendrl-ansible-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-8.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-1.el7rhgs.noarch tendrl-commons-1.5.4-5.el7rhgs.noarch tendrl-api-1.5.4-3.el7rhgs.noarch tendrl-api-httpd-1.5.4-3.el7rhgs.noarch tendrl-notifier-1.5.4-5.el7rhgs.noarch tendrl-ui-1.5.4-4.el7rhgs.noarch How reproducible: ================= 1/1 Actual results: =============== Seeing a new cluster entry 'NONE' under cluster id field Expected results: ================= We should see the clusters which are configured to Web-admin Additional info: ================ Attached screenshots
Hi Anjana, could you please ensure if all the right flags are present, for tracking this further for this release.
Checked with: tendrl-commons-1.5.4-9.el7rhgs.noarch tendrl-api-1.5.4-4.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-14.el7rhgs.noarch tendrl-ansible-1.5.4-7.el7rhgs.noarch tendrl-node-agent-1.5.4-16.el7rhgs.noarch tendrl-ui-1.5.4-6.el7rhgs.noarch tendrl-grafana-plugins-1.5.4-14.el7rhgs.noarch tendrl-notifier-1.5.4-6.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-api-httpd-1.5.4-4.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch I have re-imported cluster and kill the web-admin server machine several times. However I was not able to reproduce the issue. There's no new line in cluster list, no line with NONE as id.
yes we were able to reproduce it in older versions
I've tried to reproduce it on the RHGS WA 3.3.1 (last GA) version, but without any luck. I've tried various combinations of following actions: * killing the RHGS WA Server VM, * filling up /var/lib/etcd partition * accidental remounting filesystems to read only state * restarting tendrl services on Storage nodes Do you know, if it is reproducible on the RHGS WA 3.3.1 (latest GA) version? Do we have some slightly more clear reproduction scenario? Or do we have some PR/commits related to this issue, to get some idea what was changed and what might be reproducer/what should be tested?
@gowtham, Could you pls provide the PR related info here?
If the dev team doesn't provide detailed evidence for: * the fact that this is reproducible * work (code chagnes) done to fix this particular problem by the end of this week, I'm will propose to drop this BZ and close it as invalid bug.
PM ACK is already set for this issue to be dropped.
I'm closing this BZ (see comment 17 for details on why), as was discussed on program meeting on 2018-08-14. Both development (Nishanth) and product management (Anand) agrees.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days