1579928 – using auth.allow with hostnames or fqdn breaks volume; volume heal info errors

Bug 1579928 - using auth.allow with hostnames or fqdn breaks volume; volume heal info errors

Summary: using auth.allow with hostnames or fqdn breaks volume; volume heal info errors

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Sanju
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-18 17:29 UTC by Matthias Muench
Modified:	2018-11-06 07:54 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-06 07:54:47 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
getaddr.c (2.10 KB, text/x-csrc) 2018-05-29 11:35 UTC, Mohit Agrawal	no flags	Details
View All

Description Matthias Muench 2018-05-18 17:29:04 UTC

Description of problem:
Using auth.allow with symbolic hostnames breaks glusterfs for replicated or distributed-replicated (w/ or w/o arbiter) volumes. When client mounted volume and writes files, running `gluster volume heal ${volname} info` reports "Status: Transport endpoint is not connected" or "volgeo7: Not able to fetch volfile from glusterd
Volume heal failed." 


Version-Release number of selected component (if applicable):
glusterfs-server-3.8.4-54.8.el7rhgs.x86_64

How reproducible:
regularly


Steps to Reproduce:
1. create replicated or distributed-replicated volume (i.e. volgeo6)
2. create a hostlist: `for i in `seq 1 254`; do echo gl-dummycl-$i >> hostlist_names_254; done`
3. add valid client name to list
4. gluster volume set volgeo6 auth.allow `cat hostlist_names_254`
5. mount volume from client: `mount -t glusterfs gl-n4:/volgeo6 /gluster/volgeo6`
6. write data: `cp -r /usr/lib /* /gluster/volgeo6/`
7. on RHGS server: gluster vol heal volgeo6 info

Actual results:
[root@gl-n5 glusterfs]# gluster vol heal volgeo6 info
date
Brick gl-n4.private-eval.local:/rhgs/brick_o/brick
Status: Connected
Number of entries: 0

Brick gl-n5.private-eval.local:/rhgs/brick_o/brick
Status: Transport endpoint is not connected
Number of entries: -

Brick gl-n6.private-eval.local:/rhgs/brick_o/brick
Status: Connected
Number of entries: 0



Expected results:
[root@gl-n5 glusterfs]# gluster vol heal volgeo6 info
date
Brick gl-n4.private-eval.local:/rhgs/brick_o/brick
Status: Connected
Number of entries: 0

Brick gl-n5.private-eval.local:/rhgs/brick_o/brick
Status: Connected
Number of entries: 0

Brick gl-n6.private-eval.local:/rhgs/brick_o/brick
Status: Connected
Number of entries: 0




Additional info:
Using IP addresses, this works.
Affected volumes from data: volgeo6, volgeo7 (using hostnames). Not affected volumes: volgeo4, volgeo5 (using IP addresses)
data available from: https://github.com/mattmuench/bug-gluster/tree/master/bug-authallow-hostnames

Comment 2 Mohit Agrawal 2018-05-29 11:34:41 UTC

Hi Matthias,

   I don't think the problem is in glusterfs code not accepting fqdn name. I believe the
   problem in your environment, dns is not able to resolve hostname successfully.
   
   Below are the errors are coming at the time of resolving hostname with dns calls
   getaddrinfo, as you can see it is throwing messages Name or service not known.

   >>>>>>>>>>>>>>>>>>>
    
   [2018-05-05 10:40:10.022307] I [addr.c:55:compare_addr_and_update] 0-/rhgs/brick_m/brick: allowed = "gl-dummycl-45", received addr = "172.20.11.15"
[2018-05-05 10:40:10.023607] W [MSGID: 101075] [common-utils.c:3550:gf_is_same_address] 0-gl-dummycl-45: error in getaddrinfo: Name or service not known

[2018-05-05 10:40:10.023625] I [addr.c:55:compare_addr_and_update] 0-/rhgs/brick_m/brick: allowed = "gl-dummycl-46", received addr = "172.20.11.15"
[2018-05-05 10:40:10.024874] W [MSGID: 101075] [common-utils.c:3550:gf_is_same_address] 0-gl-dummycl-46: error in getaddrinfo: Name or service not known

[2018-05-05 10:40:10.024900] I [addr.c:55:compare_addr_and_update] 0-/rhgs/brick_m/brick: allowed = "gl-dummycl-47", received addr = "172.20.11.15"
[2018-05-05 10:40:10.026125] W [MSGID: 101075] [common-utils.c:3550:gf_is_same_address] 0-gl-dummycl-47: error in getaddrinfo: Name or service not known

[2018-05-05 10:40:10.026143] I [addr.c:55:compare_addr_and_update] 0-/rhgs/brick_m/brick: allowed = "gl-dummycl-48", received addr = "172.20.11.15"
[2018-05-05 10:40:10.027485] W [MSGID: 101075] [common-utils.c:3550:gf_is_same_address] 0-gl-dummycl-48: error in getaddrinfo: Name or service not known


 >>>>>>>>>>>>>>>>>>>>>

 Either you have to update the same on your DNS or need to update the same in /etc/hosts to resolve it successfully. Before passed the same auth.allow you can use attach program if getaddrinfo is successfully resolving hostname or not 

 1) compile attach program
    gcc getaddr.c -o getadd
 2) Run program like below
    getaddr <host-name> <ip-addr>

Regards
Mohit Agrawal

Comment 3 Mohit Agrawal 2018-05-29 11:35:49 UTC

Created attachment 1445355 [details]
getaddr.c

Comment 6 Matthias Muench 2018-06-29 15:21:45 UTC

I checked again with setup of all hostnames in DNS. Once the names are properly resolved it's working. Toggeling back to unknown names, so removed the names from DNS again, this can be easily triggered.
For RHGS 3.3, it seems to be related to be not able to properly resolve FQDN/hostnames.

It should be checked in RHGS 3.4.0 whether this can be triggered again, using unresolvable FQDN.

Note You need to log in before you can comment on or make changes to this bug.