1573075 – Inaccurate Gluster Cluster Data being extracted

Bug 1573075 - Inaccurate Gluster Cluster Data being extracted

Summary: Inaccurate Gluster Cluster Data being extracted

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-gluster-integration
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	gowtham
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-30 06:33 UTC by Shekhar Berry
Modified:	2019-05-08 17:43 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-05-08 15:47:13 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	Tendrl node-agent issues 814	0	None	closed	Fixing issue with multiple interface used in brick creation and gluster peer probe	2020-04-02 14:51:26 UTC

Description Shekhar Berry 2018-04-30 06:33:39 UTC

Created attachment 1428673 [details]
Screenshot of RHGSWA page which shows bricks from only two nodes

Description of problem:

A 3-node gluster cluster was created and a plain distributed gluster volume was created using it. The nodes that were part of cluster had two interfaces associated with it. One 1GB interface for management and another 10GB for data IO. Gluster peer probe was done using 10GB interface and corresponding volume creation used 10GB interface as well 

Here is the output of gluster volume info:

Volume Name: vol1
Type: Distribute
Volume ID: d2b11ebc-956b-4ceb-8fe2-7eea06f1940e
Status: Started
Snapshot Count: 0
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: 172.17.40.14:/bricks/b01/g
Brick2: 172.17.40.15:/bricks/b01/g
Brick3: 172.17.40.16:/bricks/b01/g
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
transport.address-family: inet
nfs.disable: on


This cluster was then imported to RHGSWA. Expectation was RHGSWA will import the cluster and we shall see 3 Hosts (172.17.40.14,172.17.40.15 and 172.17.40.16)  with corresponding 1 brick in each. From 2 of the 3 hosts the bricks were visible but 3rd node in cluster could not be detected from 10GB interface as it was detected from 1GB interface and hence no bricks were seen corresponding to it.

The 3rd node should also have been detected from 10GB interface. Just FYI, the tendrl server connects to tendrl nodes using management interface (1GB) and I suspect the host from where its getting the information about the volume is being detected as 1GB (gprfs015.sbu.lab.eng.bos.redhat.com) and no bricks are seen there. See screenshot attached with the bug for better understanding.

Snippet of Inventory_file

[gluster_servers]
gprfs014.sbu.lab.eng.bos.redhat.com
gprfs015.sbu.lab.eng.bos.redhat.com
gprfs016.sbu.lab.eng.bos.redhat.com

[tendrl_server]
dhcp159-16.sbu.lab.eng.bos.redhat.com
[all:vars]

etcd_ip_address=10.16.159.16
etcd_fqdn=dhcp159-16.sbu.lab.eng.bos.redhat.com
graphite_fqdn=dhcp159-16.sbu.lab.eng.bos.redhat.com


Version-Release number of selected component (if applicable):
On Tendrl Server
----------------

rpm -qa | grep tendrl
tendrl-commons-1.6.3-3.el7rhgs.noarch
tendrl-ui-1.6.3-1.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-node-agent-1.6.3-3.el7rhgs.noarch
tendrl-api-httpd-1.6.3-2.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-1.el7rhgs.noarch
tendrl-notifier-1.6.3-2.el7rhgs.noarch
tendrl-api-1.6.3-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-1.el7rhgs.noarch
tendrl-ansible-1.6.3-2.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch

On Storage Nodes
----------------
rpm -qa | grep tendrl
tendrl-node-agent-1.6.3-3.el7rhgs.noarch
tendrl-commons-1.6.3-3.el7rhgs.noarch
tendrl-gluster-integration-1.6.3-2.el7rhgs.noarch
tendrl-collectd-selinux-1.5.4-2.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch


How reproducible:

Always

Comment 4 gowtham 2018-05-15 14:09:30 UTC

I can't create peer probe using one interface and brick volume create using another interface IP. Gluster gives an error for this. 

 storage node1: 
     eth0: 10.70.43.186
     eth1: 10.70.42.230

 storage node2:
     eth0: 10.70.43.151 
     eth1: 10.70.42.231

 storage node3:
     eth0: 10.70.43.153
     eth1: 10.70.43.14   


from 10.70.43.186 (all eth0)
    gluster peer probe 10.70.43.151
    gluster peer probe 10.70.43.153


gluster peer status:

  Number of Peers: 2

  Hostname: 10.70.43.151
  Uuid: 3f088d2b-105a-4a3d-817f-88cc2ce9cc10
  State: Peer in Cluster (Connected)

  Hostname: 10.70.43.153
  Uuid: 7e81cdd5-5dad-458c-9bdc-db8abe574e7e
  State: Peer in Cluster (Connected)


gluster volume create  V1 10.70.42.230:/root/glusters/b1 10.70.42.231:/root/glusters/b1 10.70.43.14:/root/glusters/b1 force (all eth1)
volume create: V1: failed: Host 10.70.42.231 is not in 'Peer in Cluster' state

but 10.70.42.231 is there in eth1

eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:1a:4a:f7:23:20 brd ff:ff:ff:ff:ff:ff
    inet 10.70.43.151/22 brd 10.70.43.255 scope global dynamic eth0
       valid_lft 81298sec preferred_lft 81298sec
    inet6 2620:52:0:4628:21a:4aff:fef7:2320/64 scope global noprefixroute dynamic 
       valid_lft 2591732sec preferred_lft 604532sec
    inet6 fe80::21a:4aff:fef7:2320/64 scope link 
       valid_lft forever preferred_lft forever
eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:1a:4a:f7:23:24 brd ff:ff:ff:ff:ff:ff
    inet 10.70.42.231/22 brd 10.70.43.255 scope global dynamic eth1
       valid_lft 81301sec preferred_lft 81301sec
    inet6 2620:52:0:4628:21a:4aff:fef7:2324/64 scope global noprefixroute dynamic 
       valid_lft 2591732sec preferred_lft 604532sec
    inet6 fe80::21a:4aff:fef7:2324/64 scope link 
       valid_lft forever preferred_lft forever

Comment 5 gowtham 2018-05-16 15:49:45 UTC

I have talked with Shekhar Berry about this bug, actually, the problem is during ansible installation he gave IP of eth0 for all storage nodes and during peer probe he used eth1 ips. 

But if we do peer probe using the following step then multiple interfaces is possible

peer probe node B using eth1 from node A
now do peer probe for node c from node B
if you see peer probe status hostname of node B is in the "other name" field :

Hostname: 10.70.42.231
Uuid: 42422510-bb1c-42f8-b324-00658e2371ca
State: Peer in Cluster (Connected)
Other names:
dhcp43-151.lab.eng.blr.redhat.com

Hostname: 10.70.43.186
Uuid: 568a2fbe-7f8c-4d38-a01c-b1cca1879d36
State: Peer in Cluster (Connected)

so now we can create brick using eth1 event peer probe is done by eth0

so if we use socket.getbyhostname("10.70.42.231"), it always gives ip of 10.70.43.151.

So peer probe hostname won't match with brick hostname again. So no brick for that node is displayed.

Here the problem is an other_name field.

Comment 6 gowtham 2018-05-17 16:35:02 UTC

PR for this issue is under review https://github.com/Tendrl/node-agent/pull/815

Comment 17 Shubhendu Tripathi 2018-11-19 06:39:30 UTC

Nishanth, I feel all it needs is verification and based on latest discussions around multiple network support, I feel its worth doing the verification again.

Note You need to log in before you can comment on or make changes to this bug.