Bug 763815 (GLUSTER-2083) - [3.1.1qa5]: After replace-brick NFS portmap registration failed
Summary: [3.1.1qa5]: After replace-brick NFS portmap registration failed
Keywords:
Status: CLOSED WORKSFORME
Alias: GLUSTER-2083
Product: GlusterFS
Classification: Community
Component: nfs
Version: mainline
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-11 01:05 UTC by Harshavardhana
Modified: 2015-03-23 01:03 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Harshavardhana 2010-11-10 22:05:53 UTC
A volume stop and start makes it working again.

Comment 1 Harshavardhana 2010-11-11 01:05:02 UTC
[2010-11-10 16:52:40.622842] E [client-handshake.c:1067:client_query_portmap_cbk
] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:43.628288] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:46.633742] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:49.638282] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:52.643817] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:55.648590] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:52:58.653951] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:01.661029] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:04.667193] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:07.672639] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:10.677231] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:13.682807] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed to get the port number for remote subvolume
[2010-11-10 16:53:16.692675] E [client-handshake.c:1067:client_query_portmap_cbk] dist-client-1: failed


Process to reproduce 


[root@compel1 ~]# gluster volume info 

Volume Name: dist
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: compel1:/export1
Brick2: compel4:/export1


copy some 4000 files in the volume

gluster volume replace-brick dist compel4:/export1 compel1:/export2 start

gluster volume replace-brick dist compel4:/export1 compel1:/export2 status 

Migration is progress...  message with number of files once it is finished. 

gluster volume replace-brick dist compel4;/export1 compel1:/export2 commit

All files are properly migrated checked with the backend. 

[root@compel4 ~]# gluster volume info dist

Volume Name: dist
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: compel1:/export1
Brick2: compel1:/export2

Now "showmount -e compel1" results in 

[root@compel4 ~]# showmount -e compel1
mount clntudp_create: RPC: Program not registered
[root@compel4 ~]# mount compel1:/dist /mnt
mount: mount to NFS server 'compel1' failed: System Error: Connection refused.
[root@compel4 ~]# 


Looks like NFS is never restarted and fails since the running NFS server has still old config as shown below

"
+------------------------------------------------------------------------------+
  1: volume dist-client-0
  2:     type protocol/client
  3:     option remote-host compel1
  4:     option remote-subvolume /export1
  5:     option transport-type tcp
  6: end-volume
  7: 
  8: volume dist-client-1
  9:     type protocol/client
 10:     option remote-host compel4
 11:     option remote-subvolume /export1
 12:     option transport-type tcp
 13: end-volume
 14: 
 15: volume dist-dht
 16:     type cluster/distribute
 17:     subvolumes dist-client-0 dist-client-1
 18: end-volume
 19: 
 20: volume dist-write-behind
 21:     type performance/write-behind
 22:     subvolumes dist-dht
 23: end-volume
 24: 
 25: volume dist-read-ahead
 26:     type performance/read-ahead
 27:     subvolumes dist-write-behind
 28: end-volume
 29: 
 30: volume dist-io-cache
 31:     type performance/io-cache
 32:     subvolumes dist-read-ahead
 33: end-volume

"

Comment 2 Pranith Kumar K 2011-01-11 07:10:09 UTC
(In reply to comment #1)
> A volume stop and start makes it working again.

hi Harsha,
      I tried the steps provided in the bug description. It is not happening. Could you please let me know if I missed anything?.

pranith @ ~/workspace/1repo
15:28:03 :) $ sudo gluster volume info

Volume Name: pranith
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: pranith-laptop:/tmp/3
Brick2: pranith-laptop:/tmp/1

pranith @ ~/workspace/1repo
15:28:10 :) $ showmount -e localhost
Export list for localhost:
/pranith *

pranith @ ~/workspace/1repo
15:28:19 :) $ sudo gluster volume replace-brick pranith pranith-laptop:/tmp/1 pranith-laptop:/tmp/2 start
replace-brick started successfully

pranith @ ~/workspace/1repo
15:29:38 :) $ sleep 5 ; sudo gluster volume replace-brick pranith pranith-laptop:/tmp/1 pranith-laptop:/tmp/2 status
Number of files migrated = 4000        Migration complete 

pranith @ ~/workspace/1repo
15:30:21 :) $ ls /tmp/2

pranith @ ~/workspace/1repo
15:30:26 :) $ git pull
Already up-to-date.

pranith @ ~/workspace/1repo
15:32:09 :) $ sleep 5 ; sudo gluster volume replace-brick pranith pranith-laptop:/tmp/1 pranith-laptop:/tmp/2 commit
replace-brick commit successful

pranith @ ~/workspace/1repo
15:34:02 :) $ ls /tmp/2

pranith @ ~/workspace/1repo
15:34:06 :) $ showmount -e localhost
Export list for localhost:
/pranith *

Comment 3 Harshavardhana 2011-01-11 07:13:45 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > A volume stop and start makes it working again.
> 
> hi Harsha,
>       I tried the steps provided in the bug description. It is not happening.
> Could you please let me know if I missed anything?.
> 
> pranith @ ~/workspace/1repo
> 15:28:03 :) $ sudo gluster volume info
> 

This has been working since 3.1.1 it was against 3.1.1qa5, if you not able to reproduce then it is relinquished to close.

Comment 4 Shehjar Tikoo 2011-01-11 07:40:57 UTC
I think Pranith's patch to de-register ports through glusterd fixed this problem.

Comment 5 Pranith Kumar K 2011-01-11 07:49:45 UTC
(In reply to comment #4)
> I think Pranith's patch to de-register ports through glusterd fixed this
> problem.

Shehjar,
     that patch is not yet accepted. Its working fine even without that patch.

Pranith.


Note You need to log in before you can comment on or make changes to this bug.