Description of problem:
Hit this issue when I had a RHGS-console managment node, which had 3 clusters.
Cluster1: 4 node 3.1.3 build
Cluster2: 4 node 3.1.3 build
Cluster3: 6 node 3.2 interim build (3.8.4-10)
Raised BZ 1412982, which talked about issue faced in Cluster3. When Cluster3 was not functioning to its full health, we moved all the hosts of cluster3 to maintenance from Console. But as soon as we activate one (or all) of the 6 hosts, it used to make all the hosts of Cluster1 and Cluster2 unresponsive, or not reachable. When the hosts of Cluster3 were again moved back to maintenance, then the hosts of Cluster1 and Cluster2 used to come back up, by themselves.
Things to note:
* I had 16 nodes managed from the single console. I am not sure if we recommend a limit to number of hosts.
* I had a mix of two gluster versions managed from the same node, albeit in different clusters - 3.1.3 and 3.2. Do we forbid our customers to have this kinda setup?
* I had a mix of RHEL versions - Cluster1 was RHEL6 and Cluster2 was RHEL7. Again, I do not think that should have been a problem.
ovirt-engine logs will be posted at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/
Meanwhile, will update this space if I hit this again, and with more information if I can gather.
Version-Release number of selected component (if applicable):
[root@dhcp35-51 ovirt-engine]# rpm -qa | grep gluster
[root@dhcp35-51 ovirt-engine]# rpm -qa | grep vdsm
[root@dhcp35-51 ovirt-engine]# rpm -qa | grep rhsc
Hit it once
[qe@rhsqe-repo 1412991]$ hostname
[qe@rhsqe-repo 1412991]$ ls
[qe@rhsqe-repo 1412991]$ pwd
[qe@rhsqe-repo 1412991]$ ls -lrt ovirt-engine/
-rwxr-xr-x. 1 qe qe 290838 Jan 13 16:18 server.log
drwxr-xr-x. 2 qe qe 4096 Jan 13 16:18 ovirt-log-collector
-rwxr-xr-x. 1 qe qe 3572970 Jan 13 16:18 engine.log-20170112.gz
-rwxr-xr-x. 1 qe qe 3028435 Jan 13 16:18 engine.log-20170101.gz
-rwxr-xr-x. 1 qe qe 16712194 Jan 13 16:18 engine.log
-rwxr-xr-x. 1 qe qe 1028842 Jan 13 16:18 engine.log-20170110.gz
drwxr-xr-x. 2 qe qe 4096 Jan 13 16:18 dump
-rwxr-xr-x. 1 qe qe 4298763 Jan 13 16:18 engine.log-20170103.gz
drwxr-xr-x. 2 qe qe 4096 Jan 13 16:18 host-deploy
-rwxr-xr-x. 1 qe qe 1565 Jan 13 16:18 boot.log
-rwxr-xr-x. 1 qe qe 2540303 Jan 13 16:18 engine.log-20170113.gz
-rwxr-xr-x. 1 qe qe 4354855 Jan 13 16:18 engine.log-20170102.gz
-rwxr-xr-x. 1 qe qe 1854237 Jan 13 16:18 engine.log-20170109.gz
-rwxr-xr-x. 1 qe qe 0 Jan 13 16:18 console.log
-rwxr-xr-x. 1 qe qe 3032538 Jan 13 16:18 engine.log-20170111.gz
drwxr-xr-x. 2 qe qe 4096 Jan 13 16:18 setup
drwxr-xr-x. 2 qe qe 4096 Jan 13 16:18 notifier
-rwxr-xr-x. 1 qe qe 4288885 Jan 13 16:18 engine.log-20170104.gz
Sweta: We need to sequence steps to reproduce this bug. Othrewise it is totally impossible to understand what is happening in the system.
Closing as there's no further enhancements planned on RHGS-C