Bug 765337 (GLUSTER-3605)

Summary: Client complains on non existent server running on port 24008
Product: [Community] GlusterFS Reporter: Harshavardhana <fharshav>
Component: rdmaAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 3.2.3CC: amarts, andrei, cww, gluster-bugs, vijay, vraman
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 849122 (view as bug list) Environment:
Last Closed: 2013-07-24 17:18:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 849122    

Description Harshavardhana 2011-09-21 19:40:08 UTC
Server is running on port 24010. A client mount strangely defaults to 24008 all the time until for a while. 

While during this time i forced remote-port to 24010 by writing a new volfile. 

After a umount and a remount fetching the file from server made client start connecting on 24010. 

Seems like RDMA internal RPC exchange causes a dummy port to be listened. So after the first RPC exchange mount starts working.

Comment 1 Raghavendra G 2011-09-22 01:50:50 UTC
glusterd rdma transport listens on 24008. Clients first connect to glusterd (through port 24008), fetch the volfile and then connect to the server exporting appropriate brick.

Comment 2 Harshavardhana 2011-09-22 02:00:06 UTC
(In reply to comment #1)
> glusterd rdma transport listens on 24008. Clients first connect to glusterd
> (through port 24008), fetch the volfile and then connect to the server
> exporting appropriate brick.

But i was receiving 'Connection Refused' does this mean that the 'glusterd' started over RDMA failed in some ways? 

I have seen that many times Now.

Comment 3 Vijay Bellur 2011-09-22 02:16:46 UTC
> But i was receiving 'Connection Refused' does this mean that the 'glusterd'
> started over RDMA failed in some ways? 
> 

Was glusterd started before the ib modules were loaded?

Comment 4 Harshavardhana 2011-09-22 02:24:41 UTC
(In reply to comment #3)
> > But i was receiving 'Connection Refused' does this mean that the 'glusterd'
> > started over RDMA failed in some ways? 
> > 
> 
> Was glusterd started before the ib modules were loaded?

glusterd was installed like a day later after infiniband was configured, this is CentOS 6.0.

Comment 5 Amar Tumballi 2011-09-22 02:29:59 UTC
If rdma is present in the machine while starting glusterd, then it should be listening on 24008. Can you confirm the port is open for listening by 'netstat -ntlp' ?

Comment 6 Harshavardhana 2011-09-22 03:03:39 UTC
(In reply to comment #5)
> If rdma is present in the machine while starting glusterd, then it should be
> listening on 24008. Can you confirm the port is open for listening by 'netstat
> -ntlp' ?

From what i remember starting glusterd never showed on netstat 24008, so i had to restart it 2-3 times. 

Then i wrote a new vol file just to connect to the server process from client by specifying remote-port. 

After that i umounted and fetched again from server. This time the client connected to the volume. 

We have seen this at repetitive occurrences on couple of customer sites. 

Hopefully it is reproducible in our labs.

Comment 7 Amar Tumballi 2011-09-28 04:29:33 UTC
Will try to reproduce in our labs and update you. But would take some time as we have demand for machines with IB.

Comment 8 Amar Tumballi 2012-12-20 07:52:29 UTC
http://review.gluster.org/4323 should fix this...

Comment 9 Andrei Mikhailovsky 2013-02-05 14:54:55 UTC
I am having very similar issue using rdma only transport with 3.3.1. I've tried downgrading to 3.3.0 and the problem goes away. Using CentOS 6.3 on clients and Ubuntu 12.04 on file servers.