Bug 1003798 - U6: NFS Transport: I/O error on upgrading from U4 -> U6 {Servers Crash}
Summary: U6: NFS Transport: I/O error on upgrading from U4 -> U6 {Servers Crash}
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: 2.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Amar Tumballi
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-09-03 09:08 UTC by Sachidananda Urs
Modified: 2013-12-19 00:09 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.4.0.32rhs-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-23 22:25:11 UTC
Embargoed:


Attachments (Terms of Use)
sosreports and core (7.16 MB, application/x-tar)
2013-09-03 09:10 UTC, Sachidananda Urs
no flags Details

Description Sachidananda Urs 2013-09-03 09:08:43 UTC
Description of problem:
Upon upgrading from U4 to U6, I see I/O errors on client. Mount point is not accessible.

[root@bob-the-minion fuse]# ls /mnt/nfs 
ls: cannot access /mnt/nfs: Input/output error
[root@bob-the-minion fuse]# 

Version-Release number of selected component (if applicable):
glusterfs 3.3.0.14rhs built on Aug 29 2013 09:33:47

Steps to Reproduce:

1. Install U4 and create a 1x2 volume
2. Mount the volume with NFS
3. Create some data.
4. umount client and stop the volume.
5. Upgrade to U6
6. Try creating data again.
7. Brick processes crash.

Actual results:
Brick processes crash and mount point is inaccessible

Additional info:
Attaching sosreports and core files from both the servers.

Comment 1 Sachidananda Urs 2013-09-03 09:10:03 UTC
Created attachment 793074 [details]
sosreports and core

Comment 3 Vivek Agarwal 2013-09-04 05:57:04 UTC
Per discussion with Sac, the client in question is 2.1. Not for bigbend or u6.

Comment 4 rjoseph 2013-09-04 08:05:58 UTC
As per discussion with Sachidananda:

The server is upgraded from U4 to U6. 
After the upgrade used a 2.1 client to connect to the server. Brick crashed when the 2.1 client started I/O operations.

Comment 5 rjoseph 2013-09-04 08:10:56 UTC
NFS is giving I/O error because the brick server is crashed. This is the expected behavior.

The problem is in RPC layer where server is trying to decode the data sent by the client without checking the client version.

Following is the backtrace output from crash:

#0  0x000000368aa8860b in memcpy () from /lib64/libc.so.6
#1  0x00007ffdf8d05c9f in memdup (orig_buf=<value optimized out>, size=<value optimized out>, fill=0xf2d878) at /usr/include/bits/string3.h:52
#2  dict_unserialize (orig_buf=<value optimized out>, size=<value optimized out>, fill=0xf2d878) at dict.c:2428
#3  0x00007ffdef585fbb in server_writev (req=0x7ffdeee95208) at server3_1-fops.c:3393
#4  0x00007ffdf8ae27a7 in rpcsvc_handle_rpc_call (svc=0xecafa0, trans=<value optimized out>, msg=<value optimized out>) at rpcsvc.c:502
#5  0x00007ffdf8ae28a3 in rpcsvc_notify (trans=0xf88180, mydata=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpcsvc.c:612
#6  0x00007ffdf8ae3308 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:489
#7  0x00007ffdf55d0a04 in socket_event_poll_in (this=0xf88180) at socket.c:1677
#8  0x00007ffdf55d0ae7 in socket_event_handler (fd=<value optimized out>, idx=8, data=0xf88180, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1792
#9  0x00007ffdf8d2e104 in event_dispatch_epoll_handler (event_pool=0xea2e20) at event.c:785
#10 event_dispatch_epoll (event_pool=0xea2e20) at event.c:847
#11 0x00000000004077d4 in main (argc=<value optimized out>, argv=0x7fff67008cc8) at glusterfsd.c:1841


Here server_writev is calling dict_unserialize without checking the client version. server_writev is sending xdata_val for parsing. Looks like 2.1 client is sending a different xdata_val then what is expected here.

Comment 7 Sachidananda Urs 2013-09-07 08:58:49 UTC
Connected from 2.1 clients and did some I/O, no crashes seen. IO successful.

Client: 
[root@bob-the-minion fuse-2.0]# glusterfs --version
glusterfs 3.4.0.32rhs built on Sep  6 2013 10:27:55

Server:
glusterfs 3.3.0.14rhs

Comment 8 Scott Haines 2013-09-23 22:25:11 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.