Bug 763607 (GLUSTER-1875) - NFS failover results in unknown error 526
Summary: NFS failover results in unknown error 526
Keywords:
Status: CLOSED NOTABUG
Alias: GLUSTER-1875
Product: GlusterFS
Classification: Community
Component: nfs
Version: 3.1-beta
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ---
Assignee: Shehjar Tikoo
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-08 23:45 UTC by Allen Lu
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: nfs
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Allen Lu 2010-10-08 23:45:41 UTC
Client is unable to resume IO to NFS mounts after node has failed over from a power off.

Setup
Two nodes with a replicated volume using backend server host IPs (not vIPs).
Use ucarp to monitor a vIP on each node.
Mount client to vIP on any node.
Power off the node that the client is mounted to.

ucarp will failover the vIP to the other node. IO resumes after about 30-45 seconds but all result in unknown error 526.

Killing ucarp only will work but the results don't reflect a true failover since the backend continues to communicate using host IPs.


cp: cannot create regular file `./libsvn_ra_svn-1.so.0.0.0': Unknown error 526
`/usr/lib/libsvn_repos-1.so.0' -> `./libsvn_repos-1.so.0'
cp: cannot create regular file `./libsvn_repos-1.so.0': Unknown error 526
`/usr/lib/libsvn_repos-1.so.0.0.0' -> `./libsvn_repos-1.so.0.0.0'
...

Solution needed

We need to have the ability for clients to continue performing IO where it left off on the replicate partner node.

We also need to have the ability for clients to failback once the node is back online, including triggering self-heal before the server is online or use background non-blocking self-healing.

Comment 1 Shehjar Tikoo 2010-10-09 03:21:21 UTC
Can I please have the log from the nfs server to which the client is connected at the time the replica fails over.

Comment 2 Shehjar Tikoo 2010-10-09 03:21:35 UTC
(In reply to comment #1)
> Can I please have the log from the nfs server to which the client is connected
> at the time the replica fails over.

In TRACE log level.

Comment 3 Anand Avati 2010-10-14 02:20:24 UTC
Unknown error 526 is a bug in the Linux kernel NFS client. It returns a bad error number to applications which the server returns SERVERFAULT RPC status for NFS3ACCESS call. SERVERFAULT is a generic status code for all kinds of RPC failures.

In our case, the distribute or replicate translator returns ENOTCONN as the RPC reply when calls are landing before all subvolumes are connected. Our gNFS server translates this into the generic SERVERFAULT status (doing it rightly) but this ends up triggering the linux NFS client bug.

In 3.1 we have re-done the event notification handling of the distribute translator to signal NFS server that the volume is ready only after all subvolumes are either connected or have a confirmed failure. This should provide a fix to this bug, in the sense that we circumvent the scenario or returning SERVERFAULT for the NFS3ACCESS method  during failovers (by processing the request only after all subvolumes are up and thereby not having to return such an error at all).

Comment 4 Allen Lu 2010-10-14 15:29:54 UTC
Avati,

Nfs logs on server ‘failed to’.

tail -f /var/log/glusterfs/nfs.log
[2010-10-13 18:25:45.288091] W [afr-self-heal-common.c:584:afr_sh_pending_to_delta] afr_sh_pending_to_delta: Unable to get dict value.
[2010-10-13 18:25:45.288116] W [afr-self-heal-common.c:584:afr_sh_pending_to_delta] afr_sh_pending_to_delta: Unable to get dict value.
[2010-10-13 18:25:45.295452] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] mirrorvol1-replicate-0: background  meta-data data self-heal completed on /bin/stat
[2010-10-13 18:28:18.522523] E [client-handshake.c:116:rpc_client_ping_timer_expired] mirrorvol1-client-0: Server 10.1.30.127:24009 has not responded in the last 42 seconds, disconnecting.
[2010-10-13 18:28:18.522765] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2ba3fe516509] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2ba3fe515cae] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2ba3fe515c1e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(FINODELK(30)) called at 2010-10-13 18:27:11.779234
[2010-10-13 18:28:18.522827] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2ba3fe516509] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2ba3fe515cae] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2ba3fe515c1e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(XATTROP(33)) called at 2010-10-13 18:27:11.779366
[2010-10-13 18:28:18.522908] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x2ba3fe516509] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x2ba3fe515cae] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x2ba3fe515c1e]))) rpc-clnt: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2010-10-13 18:27:36.459191
[2010-10-13 18:28:21.525870] E [socket.c:1657:socket_connect_finish] mirrorvol1-client-0: connection to 10.1.30.127:24009 failed (No route to host)
[2010-10-13 18:33:38.482826] I [client-handshake.c:699:select_server_supported_programs] mirrorvol1-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2010-10-13 18:33:38.873463] I [client-handshake.c:535:client_setvolume_cbk] mirrorvol1-client-0: Connected to 10.1.30.127:24009, attached to remote volume '/mnt1'.

At this point, client hangs forever.

Ucarp scripts on server ‘failed to’ using 10.1.30.252

root      3256  3212  0 17:57 pts/0    00:00:02 ucarp -P -z -b 1 -i eth0 -s 10.1.30.126 -v 2 -p mypass1 -a 10.1.30.253 --upscript=/etc/vip-up-253.sh --downscript=/etc/vip-down-253.sh
root      3261  3212  0 17:57 pts/0    00:00:01 ucarp -z -b 2 -i eth0 -s 10.1.30.126 -v 1 -p mypass -a 10.1.30.252 --upscript=/etc/vip-up-252.sh --downscript=/etc/vip-down-252.sh

# more vip-up-252.sh
#!/bin/sh
/sbin/ip addr add 10.1.30.252/24 dev eth0

arping -U -I eth0 10.1.30.252 -c 1
[root@alu-vm2 etc]# more vip-up-253.sh
#!/bin/sh
/sbin/ip addr add 10.1.30.253/24 dev eth0

arping -U -I eth0 10.1.30.253 -c 1

Comment 5 Shehjar Tikoo 2010-10-14 16:08:01 UTC
(In reply to comment #3)
> Unknown error 526 is a bug in the Linux kernel NFS client. It returns a bad
> error number to applications which the server returns SERVERFAULT RPC status
> for NFS3ACCESS call. SERVERFAULT is a generic status code for all kinds of RPC
> failures.
> 
> In our case, the distribute or replicate translator returns ENOTCONN as the RPC
> reply when calls are landing before all subvolumes are connected. Our gNFS
> server translates this into the generic SERVERFAULT status (doing it rightly)
> but this ends up triggering the linux NFS client bug.
> 
> In 3.1 we have re-done the event notification handling of the distribute
> translator to signal NFS server that the volume is ready only after all
> subvolumes are either connected or have a confirmed failure. This should
> provide a fix to this bug, in the sense that we circumvent the scenario or
> returning SERVERFAULT for the NFS3ACCESS method  during failovers (by
> processing the request only after all subvolumes are up and thereby not having
> to return such an error at all).

No, i think this is the scenario where the nfs server is already running on the server-failed-to, while glusterfsd is migrated to this same server.

Allen, the resolution is to have nfs server also restarted through the scripts that start up a new glusterfsd on the server failed to. Thats when the above logic will kick in.

Comment 6 Allen Lu 2010-10-14 17:37:20 UTC
Shehjar, in 3.1, do I need to start a new NFSd?

Comment 7 Harshavardhana 2010-10-14 18:04:36 UTC
> Allen, the resolution is to have nfs server also restarted through the scripts
> that start up a new glusterfsd on the server failed to. Thats when the above
> logic will kick in.

Shehjar, 

  This is not a 526 error anymore, that is still valid since nfs client is not able to translate it into a better error message. 

We are filing a new bug for the hang, since what we have is 3.1 mirror config which is supposed to work with the ucarp failover.

Remember we are not using Vips internally, our volumes have physical ips so the node goes down means replicate should kick its logic, if you see the error message "client connected" at the last is the only one. ping_timeout or any such frame timeout never happened so the NFS client was still trying to connect to nfs/server which was unresponsive. 10.1.30.127 is the log was the node which went down, but we didn't see any ping_timeout or frames unwinding. 

And this behaviour is reproducible. 

We did lots of tcpdumps to see the connections but there is a connection which was initiated from the client side but server is not sending an "ack" back. 

Also note that the "nfs client" was using a VIP1 ie. 10.1.30.252. which was on node1 and it was migrated to node2 upon shutdown, his was not a clean shutdown instead an abrupt poweroff. 

Also your logic of saying restart everything doesn't hold good in this case.

Comment 8 Allen Lu 2010-10-14 18:06:21 UTC
This bug is no longer valid. Please see new bug 763688 for the current NFS failover issue.


Note You need to log in before you can comment on or make changes to this bug.