Bug 1257343 - vol heal info fails when transport.socket.bind-address is set in glusterd
vol heal info fails when transport.socket.bind-address is set in glusterd
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
3.1
x86_64 Linux
unspecified Severity high
: ---
: RHGS 3.1.2
Assigned To: Mohamed Ashiq
RajeshReddy
: ZStream
Depends On:
Blocks: 1260783 1277997 1285168 1285962
  Show dependency treegraph
 
Reported: 2015-08-26 16:48 EDT by Paul Cuzner
Modified: 2016-09-17 08:15 EDT (History)
12 users (show)

See Also:
Fixed In Version: glusterfs-3.7.5-9
Doc Type: Bug Fix
Doc Text:
Previously, when transport.socket.bind-address was set in glusterd, heal request for the volfile failed. Due to this, executing the heal command, 'gluster volume heal <VOLNAME> info' resulted in the following error: <vol_name>: Not able to fetch volfile from glusterd.Volume heal failed. With this fix, set_volfile-server-transport type is set as "unix" and executing the heal command 'gluster volume heal <VOLNAME> info' does not fail, even when glusterd is bind to a specific IP.
Story Points: ---
Clone Of:
: 1277997 (view as bug list)
Environment:
Last Closed: 2016-03-01 00:34:37 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
mliyazud: needinfo+


Attachments (Terms of Use)

  None (edit)
Description Paul Cuzner 2015-08-26 16:48:29 EDT
Description of problem:
When you bind glusterd to a specific IP, shd operates correctly but the 'vol heal info' command attempts to contact glusterd on 127.0.0.1 and fails. Admins have no visibility of self heal activity, and any automated monitoring (nagios) etc is also not going to show heal information.

Version-Release number of selected component (if applicable):
glusterfs 3.7.x (and prior releases?)

How reproducible:
Every time! I noticed this in a test environment I'm using for containers, where I bind glusterd host IP.

Steps to Reproduce:
1. use transport.socket.bind-address to bind glusterd to a specific IP on each host
2. run the vol heal <vol> info command


Actual results:
in the self heal log file
> >>> [2015-08-25 03:50:55.412092] I [MSGID: 101190]
> >>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> >>> with index 1
> >>> [2015-08-25 03:50:55.424020] E [socket.c:2332:socket_connect_finish]
> >>> 0-gfapi: connection to 127.0.0.1:24007 failed (Connection refused)
> >>> [2015-08-25 03:50:55.424055] E [MSGID: 104024]
> >>> [glfs-mgmt.c:738:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with
> >>> remote-host: localhost (Transport endpoint is not connected) [Transport
> >>> endpoint is not connected]
> >>> [2015-08-25 03:50:55.424071] I [MSGID: 104025]
> >>> [glfs-mgmt.c:744:mgmt_rpc_notify] 0-glfs-mgmt: Exhausted all volfile
> >>> servers [Transport endpoint is not connected]

Expected results:
vol heal info should work whether glusterd is bound to a specific IP or not

Additional info:
The issue has been discussed with Ravi (ravishankar@redhat.com) and Humble (hchiramm@redhat.com) and identified as localhost being hardcoded in glfs_set_volfile_server() within glfs-heal.c

e.g. 
        ret = glfs_set_volfile_server (fs, "tcp", "localhost", 24007);
        if (ret) {
                printf("Setting the volfile server failed, %s\n", strerror (errno));
                goto out;
        }


Document URL: 

Section Number and Name: 

Describe the issue: 

Suggestions for improvement: 

Additional information:
Comment 2 Humble Chirammal 2015-10-07 07:07:13 EDT
afaict, the hard coded part ( -s localhost) in glusterd clients like quota,rebalance ..etc can be changed to use 'bind-address'.

For ex: in quota client deamon , currently we have:

        runinit (&runner);
        runner_add_args (&runner, SBIN_DIR"/glusterfs",
                         "-s", "localhost",
                         "--volfile-id", volname,
			 "--use-readdirp=no",
                         "--client-pid", QUOTA_CRAWL_PID,
                         "-l", logfile, mountdir, NULL);
[]$ 



iic, we have the bind address in dict (THIS->options) as 'transport.socket.bind-address"' . We can use this address instead of 'localhost' and pass it to the runner.

something like 

vol_server ="localhost";
vol_server  = data_to_str(dict_get(THIS->options, "transport.socket.bind-address"))

         runner_add_args (&runner, SBIN_DIR"/glusterfs",
-                         "-s", "localhost",
+                         "-s", vol_server,
Comment 3 Humble Chirammal 2015-10-12 05:00:25 EDT
The bz comment#2 give a workaround for glusterd clients like quota,rebalance..etc. However, iic, we need a different solution for libgfapi based clients like glfsheal
Comment 4 Humble Chirammal 2015-11-02 00:47:24 EST
(In reply to Humble Chirammal from comment #3)
> The bz comment#2 give a workaround for glusterd clients like
> quota,rebalance.

It should have been 'replace-brick' instead of 'quota,rebalace'. As mentioned earlier, 'heal/quota' ..etc need a solution where cli  make a rpc call and get the ip and use it for client connection.
Comment 5 Mohamed Ashiq 2015-11-30 05:53:29 EST
patch:

https://code.engineering.redhat.com/gerrit/62580
Comment 7 RajeshReddy 2015-12-15 05:33:05 EST
Tested with glusterfs-server-3.7.5-11 and after having the transport.socket.bind-address = <IP address> entry in the glusterd.info able to launch heal and slef heal is working fine so marking this bug as verified
Comment 8 Bhavana 2016-02-04 02:01:57 EST
Hi Ashiq,

I have updated the doc-text info. Please sign-off on the same if it looks ok.
Comment 9 Mohamed Ashiq 2016-02-04 04:40:21 EST
Looks good to me.
Comment 11 errata-xmlrpc 2016-03-01 00:34:37 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Note You need to log in before you can comment on or make changes to this bug.