Created attachment 1358503 [details] FIle1 Description of problem: Created the volume using hostname resolvable by hosts [root@gqas001 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.96.140 gqas001.sbu.lab.eng.bos.redhat.com gqas001 192.168.96.141 gqas004.sbu.lab.eng.bos.redhat.com gqas004 192.168.96.142 gqas010.sbu.lab.eng.bos.redhat.com gqas010 192.168.96.143 gqas012.sbu.lab.eng.bos.redhat.com gqas012 192.168.96.144 gqac006.sbu.lab.eng.bos.redhat.com gqac006 192.168.96.145 gqac025.sbu.lab.eng.bos.redhat.com gqac025 192.168.96.146 gqac026.sbu.lab.eng.bos.redhat.com gqac026 192.168.96.147 gqac027.sbu.lab.eng.bos.redhat.com gqac027 Version-Release number of selected component (if applicable): 3.8.4-52 How reproducible: 100% Steps to Reproduce: 1. created inital volume using hostname i.e gqas001.sbu.lab.eng.bos.redhat.com whihc was active and green in the dashboard 2. Created an arbiter volume using i.e gqas001 3. after waiting for half hour the arbiter voluem created using gqas001 is not showing up. Actual results: should be up within minutes arbiter volume was down for the whole time. Expected results: Additional info: [root@gqas001 ~]# gluster v info Volume Name: arbiter Type: Replicate Volume ID: 83b4cb59-c903-44c4-abdb-a2c6294c24e4 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gqas012:/bricks/arbiter_1 Brick2: gqas010:/bricks/arbiter_1 Brick3: gqas004:/bricks/arbiter_1 (arbiter) Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on transport.address-family: inet nfs.disable: on Volume Name: arbiter-2 Type: Replicate Volume ID: aadda253-b450-403c-8115-e8d8f41d0feb Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gqas012.sbu.lab.eng.bos.redhat.com:/bricks/arbiter_21 Brick2: gqas010.sbu.lab.eng.bos.redhat.com:/bricks/arbiter_2 Brick3: gqas004.sbu.lab.eng.bos.redhat.com:/bricks/arbiter_2 (arbiter) Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on transport.address-family: inet nfs.disable: on Volume Name: testvol Type: Distributed-Replicate Volume ID: 14cec5af-14b3-407a-b849-b0e68e979ea1 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gqas001.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0 Brick2: gqas004.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1 Brick3: gqas010.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2 Brick4: gqas012.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3 Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on client.event-threads: 4 server.event-threads: 4 cluster.lookup-optimize: on network.inode-lru-limit: 90000 performance.md-cache-timeout: 600 performance.cache-invalidation: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on server.allow-insecure: on performance.stat-prefetch: on transport.address-family: inet nfs.disable: on
There is no fix required to be done in the code. This is basically an issue with sync intervals. So this issue won't be seen with latest builds with reduced default sync intervals, if run on the recommended hw configuration.
Restoring the qa ack.
Could you review this BZ, which we discussed on "RHGS WA with RHS One testing" meeting on 2018-03-05 and add here a comment with severity from RHSOne perspective? Thanks a lot for the feedback.
Could you recheck status of this BZ and add FiV if possible to move it to the ON QE state?
I've tested it with multiple various scenarios (various volumes configurations, usage of FQDN, short hostname or IPs) and in all cases Tendrl "Volumes" page and various Gluster Dashboards properly shows all the volumes in "Up" state. >> VERIFIED ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Version-Release number of selected components: Tendrl Server # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) # rpm -qa | grep -e tendrl -e collectd -e gluster -e etcd | sort collectd-5.7.2-3.1.el7rhgs.x86_64 collectd-ping-5.7.2-3.1.el7rhgs.x86_64 etcd-3.2.7-1.el7.x86_64 libcollectdclient-5.7.2-3.1.el7rhgs.x86_64 python-etcd-0.4.5-2.el7rhgs.noarch rubygem-etcd-0.3.0-2.el7rhgs.noarch tendrl-ansible-1.6.3-3.el7rhgs.noarch tendrl-api-1.6.3-2.el7rhgs.noarch tendrl-api-httpd-1.6.3-2.el7rhgs.noarch tendrl-commons-1.6.3-3.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-1.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-1.el7rhgs.noarch tendrl-node-agent-1.6.3-3.el7rhgs.noarch tendrl-notifier-1.6.3-2.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-1.el7rhgs.noarch Gluster Storage Server # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) # cat /etc/redhat-storage-release Red Hat Gluster Storage Server 3.4.0 # rpm -qa | grep -e tendrl -e collectd -e gluster | sort collectd-5.7.2-3.1.el7rhgs.x86_64 collectd-ping-5.7.2-3.1.el7rhgs.x86_64 glusterfs-3.12.2-8.el7rhgs.x86_64 glusterfs-api-3.12.2-8.el7rhgs.x86_64 glusterfs-cli-3.12.2-8.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-8.el7rhgs.x86_64 glusterfs-events-3.12.2-8.el7rhgs.x86_64 glusterfs-fuse-3.12.2-8.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-8.el7rhgs.x86_64 glusterfs-libs-3.12.2-8.el7rhgs.x86_64 glusterfs-rdma-3.12.2-8.el7rhgs.x86_64 glusterfs-server-3.12.2-8.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch libcollectdclient-5.7.2-3.1.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.2.x86_64 python2-gluster-3.12.2-8.el7rhgs.x86_64 tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch tendrl-commons-1.6.3-3.el7rhgs.noarch tendrl-gluster-integration-1.6.3-2.el7rhgs.noarch tendrl-node-agent-1.6.3-3.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616
*** Bug 1616194 has been marked as a duplicate of this bug. ***