Bug 1003798 - U6: NFS Transport: I/O error on upgrading from U4 -> U6 {Servers Crash}
U6: NFS Transport: I/O error on upgrading from U4 -> U6 {Servers Crash}
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
Unspecified Unspecified
urgent Severity urgent
: ---
: ---
Assigned To: Amar Tumballi
Sudhir D
Depends On:
  Show dependency treegraph
Reported: 2013-09-03 05:08 EDT by Sachidananda Urs
Modified: 2013-12-18 19:09 EST (History)
5 users (show)

See Also:
Fixed In Version: glusterfs-
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-09-23 18:25:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
sosreports and core (7.16 MB, application/x-tar)
2013-09-03 05:10 EDT, Sachidananda Urs
no flags Details

  None (edit)
Description Sachidananda Urs 2013-09-03 05:08:43 EDT
Description of problem:
Upon upgrading from U4 to U6, I see I/O errors on client. Mount point is not accessible.

[root@bob-the-minion fuse]# ls /mnt/nfs 
ls: cannot access /mnt/nfs: Input/output error
[root@bob-the-minion fuse]# 

Version-Release number of selected component (if applicable):
glusterfs built on Aug 29 2013 09:33:47

Steps to Reproduce:

1. Install U4 and create a 1x2 volume
2. Mount the volume with NFS
3. Create some data.
4. umount client and stop the volume.
5. Upgrade to U6
6. Try creating data again.
7. Brick processes crash.

Actual results:
Brick processes crash and mount point is inaccessible

Additional info:
Attaching sosreports and core files from both the servers.
Comment 1 Sachidananda Urs 2013-09-03 05:10:03 EDT
Created attachment 793074 [details]
sosreports and core
Comment 3 Vivek Agarwal 2013-09-04 01:57:04 EDT
Per discussion with Sac, the client in question is 2.1. Not for bigbend or u6.
Comment 4 rjoseph 2013-09-04 04:05:58 EDT
As per discussion with Sachidananda:

The server is upgraded from U4 to U6. 
After the upgrade used a 2.1 client to connect to the server. Brick crashed when the 2.1 client started I/O operations.
Comment 5 rjoseph 2013-09-04 04:10:56 EDT
NFS is giving I/O error because the brick server is crashed. This is the expected behavior.

The problem is in RPC layer where server is trying to decode the data sent by the client without checking the client version.

Following is the backtrace output from crash:

#0  0x000000368aa8860b in memcpy () from /lib64/libc.so.6
#1  0x00007ffdf8d05c9f in memdup (orig_buf=<value optimized out>, size=<value optimized out>, fill=0xf2d878) at /usr/include/bits/string3.h:52
#2  dict_unserialize (orig_buf=<value optimized out>, size=<value optimized out>, fill=0xf2d878) at dict.c:2428
#3  0x00007ffdef585fbb in server_writev (req=0x7ffdeee95208) at server3_1-fops.c:3393
#4  0x00007ffdf8ae27a7 in rpcsvc_handle_rpc_call (svc=0xecafa0, trans=<value optimized out>, msg=<value optimized out>) at rpcsvc.c:502
#5  0x00007ffdf8ae28a3 in rpcsvc_notify (trans=0xf88180, mydata=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpcsvc.c:612
#6  0x00007ffdf8ae3308 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:489
#7  0x00007ffdf55d0a04 in socket_event_poll_in (this=0xf88180) at socket.c:1677
#8  0x00007ffdf55d0ae7 in socket_event_handler (fd=<value optimized out>, idx=8, data=0xf88180, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1792
#9  0x00007ffdf8d2e104 in event_dispatch_epoll_handler (event_pool=0xea2e20) at event.c:785
#10 event_dispatch_epoll (event_pool=0xea2e20) at event.c:847
#11 0x00000000004077d4 in main (argc=<value optimized out>, argv=0x7fff67008cc8) at glusterfsd.c:1841

Here server_writev is calling dict_unserialize without checking the client version. server_writev is sending xdata_val for parsing. Looks like 2.1 client is sending a different xdata_val then what is expected here.
Comment 7 Sachidananda Urs 2013-09-07 04:58:49 EDT
Connected from 2.1 clients and did some I/O, no crashes seen. IO successful.

[root@bob-the-minion fuse-2.0]# glusterfs --version
glusterfs built on Sep  6 2013 10:27:55

Comment 8 Scott Haines 2013-09-23 18:25:11 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.