Bug 763815 (GLUSTER-2083)

Summary: [3.1.1qa5]: After replace-brick NFS portmap registration failed
Product: [Community] GlusterFS Reporter: Harshavardhana <fharshav>
Component: nfsAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED WORKSFORME QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: mainlineCC: cww, gluster-bugs, shehjart, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harshavardhana 2010-11-10 22:05:53 UTC
A volume stop and start makes it working again.

Comment 1 Harshavardhana 2010-11-11 01:05:02 UTC
[2010-11-10 16:52:40.622842] E [client-handshake.c:1067:client_query_portmap_cbk
] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:43.628288] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:46.633742] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:49.638282] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:52.643817] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:55.648590] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:58.653951] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:01.661029] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:04.667193] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:07.672639] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:10.677231] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:13.682807] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:16.692675] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed


Process to reproduce 


[root@compel1 ~]# gluster volume info 

Volume Name: dist
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: compel1:/export1
Brick2: compel4:/export1


copy some 4000 files in the volume

gluster volume replace-brick dist compel4:/export1 compel1:/export2 start

gluster volume replace-brick dist compel4:/export1 compel1:/export2 status 

Migration is progress...  message with number of files once it is finished. 

gluster volume replace-brick dist compel4;/export1 compel1:/export2 commit

All files are properly migrated checked with the backend. 

[root@compel4 ~]# gluster volume info dist

Volume Name: dist
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: compel1:/export1
Brick2: compel1:/export2

Now "showmount -e compel1" results in 

[root@compel4 ~]# showmount -e compel1
mount clntudp_create: RPC: Program not registered
[root@compel4 ~]# mount compel1:/dist /mnt
mount: mount to NFS server 'compel1' failed: System Error: Connection refused.
[root@compel4 ~]# 


Looks like NFS is never restarted and fails since the running NFS server has still old config as shown below

"
+------------------------------------------------------------------------------+
  1: volume dist-client-0
  2:     type protocol/client
  3:     option remote-host compel1
  4:     option remote-subvolume /export1
  5:     option transport-type tcp
  6: end-volume
  7: 
  8: volume dist-client-1
  9:     type protocol/client
 10:     option remote-host compel4
 11:     option remote-subvolume /export1
 12:     option transport-type tcp
 13: end-volume
 14: 
 15: volume dist-dht
 16:     type cluster/distribute
 17:     subvolumes dist-client-0 dist-client-1
 18: end-volume
 19: 
 20: volume dist-write-behind
 21:     type performance/write-behind
 22:     subvolumes dist-dht
 23: end-volume
 24: 
 25: volume dist-read-ahead
 26:     type performance/read-ahead
 27:     subvolumes dist-write-behind
 28: end-volume
 29: 
 30: volume dist-io-cache
 31:     type performance/io-cache
 32:     subvolumes dist-read-ahead
 33: end-volume

"

Comment 2 Pranith Kumar K 2011-01-11 07:10:09 UTC
(In reply to comment #1)
> A volume stop and start makes it working again.

hi Harsha,
      I tried the steps provided in the bug description. It is not happening. Could you please let me know if I missed anything?.

pranith @ ~/workspace/1repo
15:28:03 :) $ sudo gluster volume info

Volume Name: pranith
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: pranith-laptop:/tmp/3
Brick2: pranith-laptop:/tmp/1

pranith @ ~/workspace/1repo
15:28:10 :) $ showmount -e localhost
Export list for localhost:
/pranith *

pranith @ ~/workspace/1repo
15:28:19 :) $ sudo gluster volume replace-brick pranith pranith-laptop:/tmp/1 pranith-laptop:/tmp/2 start
replace-brick started successfully

pranith @ ~/workspace/1repo
15:29:38 :) $ sleep 5 ; sudo gluster volume replace-brick pranith pranith-laptop:/tmp/1 pranith-laptop:/tmp/2 status
Number of files migrated = 4000        Migration complete 

pranith @ ~/workspace/1repo
15:30:21 :) $ ls /tmp/2

pranith @ ~/workspace/1repo
15:30:26 :) $ git pull
Already up-to-date.

pranith @ ~/workspace/1repo
15:32:09 :) $ sleep 5 ; sudo gluster volume replace-brick pranith pranith-laptop:/tmp/1 pranith-laptop:/tmp/2 commit
replace-brick commit successful

pranith @ ~/workspace/1repo
15:34:02 :) $ ls /tmp/2

pranith @ ~/workspace/1repo
15:34:06 :) $ showmount -e localhost
Export list for localhost:
/pranith *

Comment 3 Harshavardhana 2011-01-11 07:13:45 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > A volume stop and start makes it working again.
> 
> hi Harsha,
>       I tried the steps provided in the bug description. It is not happening.
> Could you please let me know if I missed anything?.
> 
> pranith @ ~/workspace/1repo
> 15:28:03 :) $ sudo gluster volume info
> 

This has been working since 3.1.1 it was against 3.1.1qa5, if you not able to reproduce then it is relinquished to close.

Comment 4 Shehjar Tikoo 2011-01-11 07:40:57 UTC
I think Pranith's patch to de-register ports through glusterd fixed this problem.

Comment 5 Pranith Kumar K 2011-01-11 07:49:45 UTC
(In reply to comment #4)
> I think Pranith's patch to de-register ports through glusterd fixed this
> problem.

Shehjar,
     that patch is not yet accepted. Its working fine even without that patch.

Pranith.