Bug 1003798

Summary: U6: NFS Transport: I/O error on upgrading from U4 -> U6 {Servers Crash}
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sachidananda Urs <surs>
Component: glusterdAssignee: Amar Tumballi <amarts>
Status: CLOSED ERRATA QA Contact: Sudhir D <sdharane>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 2.0CC: rhs-bugs, rjoseph, vagarwal, vbellur, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.32rhs-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-23 22:25:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreports and core none

Description Sachidananda Urs 2013-09-03 09:08:43 UTC
Description of problem:
Upon upgrading from U4 to U6, I see I/O errors on client. Mount point is not accessible.

[root@bob-the-minion fuse]# ls /mnt/nfs 
ls: cannot access /mnt/nfs: Input/output error
[root@bob-the-minion fuse]# 

Version-Release number of selected component (if applicable):
glusterfs 3.3.0.14rhs built on Aug 29 2013 09:33:47

Steps to Reproduce:

1. Install U4 and create a 1x2 volume
2. Mount the volume with NFS
3. Create some data.
4. umount client and stop the volume.
5. Upgrade to U6
6. Try creating data again.
7. Brick processes crash.

Actual results:
Brick processes crash and mount point is inaccessible

Additional info:
Attaching sosreports and core files from both the servers.

Comment 1 Sachidananda Urs 2013-09-03 09:10:03 UTC
Created attachment 793074 [details]
sosreports and core

Comment 3 Vivek Agarwal 2013-09-04 05:57:04 UTC
Per discussion with Sac, the client in question is 2.1. Not for bigbend or u6.

Comment 4 rjoseph 2013-09-04 08:05:58 UTC
As per discussion with Sachidananda:

The server is upgraded from U4 to U6. 
After the upgrade used a 2.1 client to connect to the server. Brick crashed when the 2.1 client started I/O operations.

Comment 5 rjoseph 2013-09-04 08:10:56 UTC
NFS is giving I/O error because the brick server is crashed. This is the expected behavior.

The problem is in RPC layer where server is trying to decode the data sent by the client without checking the client version.

Following is the backtrace output from crash:

#0  0x000000368aa8860b in memcpy () from /lib64/libc.so.6
#1  0x00007ffdf8d05c9f in memdup (orig_buf=<value optimized out>, size=<value optimized out>, fill=0xf2d878) at /usr/include/bits/string3.h:52
#2  dict_unserialize (orig_buf=<value optimized out>, size=<value optimized out>, fill=0xf2d878) at dict.c:2428
#3  0x00007ffdef585fbb in server_writev (req=0x7ffdeee95208) at server3_1-fops.c:3393
#4  0x00007ffdf8ae27a7 in rpcsvc_handle_rpc_call (svc=0xecafa0, trans=<value optimized out>, msg=<value optimized out>) at rpcsvc.c:502
#5  0x00007ffdf8ae28a3 in rpcsvc_notify (trans=0xf88180, mydata=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpcsvc.c:612
#6  0x00007ffdf8ae3308 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:489
#7  0x00007ffdf55d0a04 in socket_event_poll_in (this=0xf88180) at socket.c:1677
#8  0x00007ffdf55d0ae7 in socket_event_handler (fd=<value optimized out>, idx=8, data=0xf88180, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1792
#9  0x00007ffdf8d2e104 in event_dispatch_epoll_handler (event_pool=0xea2e20) at event.c:785
#10 event_dispatch_epoll (event_pool=0xea2e20) at event.c:847
#11 0x00000000004077d4 in main (argc=<value optimized out>, argv=0x7fff67008cc8) at glusterfsd.c:1841


Here server_writev is calling dict_unserialize without checking the client version. server_writev is sending xdata_val for parsing. Looks like 2.1 client is sending a different xdata_val then what is expected here.

Comment 7 Sachidananda Urs 2013-09-07 08:58:49 UTC
Connected from 2.1 clients and did some I/O, no crashes seen. IO successful.

Client: 
[root@bob-the-minion fuse-2.0]# glusterfs --version
glusterfs 3.4.0.32rhs built on Sep  6 2013 10:27:55

Server:
glusterfs 3.3.0.14rhs

Comment 8 Scott Haines 2013-09-23 22:25:11 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html