1003798 – U6: NFS Transport: I/O error on upgrading from U4 -> U6 {Servers Crash}

Bug 1003798 - U6: NFS Transport: I/O error on upgrading from U4 -> U6 {Servers Crash}

Summary: U6: NFS Transport: I/O error on upgrading from U4 -> U6 {Servers Crash}

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Amar Tumballi
QA Contact:	Sudhir D
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-09-03 09:08 UTC by Sachidananda Urs
Modified:	2013-12-19 00:09 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.4.0.32rhs-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:25:11 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sosreports and core (7.16 MB, application/x-tar) 2013-09-03 09:10 UTC, Sachidananda Urs	no flags	Details
View All

Description Sachidananda Urs 2013-09-03 09:08:43 UTC

Description of problem:
Upon upgrading from U4 to U6, I see I/O errors on client. Mount point is not accessible.

[root@bob-the-minion fuse]# ls /mnt/nfs 
ls: cannot access /mnt/nfs: Input/output error
[root@bob-the-minion fuse]# 

Version-Release number of selected component (if applicable):
glusterfs 3.3.0.14rhs built on Aug 29 2013 09:33:47

Steps to Reproduce:

1. Install U4 and create a 1x2 volume
2. Mount the volume with NFS
3. Create some data.
4. umount client and stop the volume.
5. Upgrade to U6
6. Try creating data again.
7. Brick processes crash.

Actual results:
Brick processes crash and mount point is inaccessible

Additional info:
Attaching sosreports and core files from both the servers.

Comment 1 Sachidananda Urs 2013-09-03 09:10:03 UTC

Created attachment 793074 [details]
sosreports and core

Comment 3 Vivek Agarwal 2013-09-04 05:57:04 UTC

Per discussion with Sac, the client in question is 2.1. Not for bigbend or u6.

Comment 4 rjoseph 2013-09-04 08:05:58 UTC

As per discussion with Sachidananda:

The server is upgraded from U4 to U6. 
After the upgrade used a 2.1 client to connect to the server. Brick crashed when the 2.1 client started I/O operations.

Comment 5 rjoseph 2013-09-04 08:10:56 UTC

NFS is giving I/O error because the brick server is crashed. This is the expected behavior.

The problem is in RPC layer where server is trying to decode the data sent by the client without checking the client version.

Following is the backtrace output from crash:

#0  0x000000368aa8860b in memcpy () from /lib64/libc.so.6
#1  0x00007ffdf8d05c9f in memdup (orig_buf=<value optimized out>, size=<value optimized out>, fill=0xf2d878) at /usr/include/bits/string3.h:52
#2  dict_unserialize (orig_buf=<value optimized out>, size=<value optimized out>, fill=0xf2d878) at dict.c:2428
#3  0x00007ffdef585fbb in server_writev (req=0x7ffdeee95208) at server3_1-fops.c:3393
#4  0x00007ffdf8ae27a7 in rpcsvc_handle_rpc_call (svc=0xecafa0, trans=<value optimized out>, msg=<value optimized out>) at rpcsvc.c:502
#5  0x00007ffdf8ae28a3 in rpcsvc_notify (trans=0xf88180, mydata=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpcsvc.c:612
#6  0x00007ffdf8ae3308 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:489
#7  0x00007ffdf55d0a04 in socket_event_poll_in (this=0xf88180) at socket.c:1677
#8  0x00007ffdf55d0ae7 in socket_event_handler (fd=<value optimized out>, idx=8, data=0xf88180, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1792
#9  0x00007ffdf8d2e104 in event_dispatch_epoll_handler (event_pool=0xea2e20) at event.c:785
#10 event_dispatch_epoll (event_pool=0xea2e20) at event.c:847
#11 0x00000000004077d4 in main (argc=<value optimized out>, argv=0x7fff67008cc8) at glusterfsd.c:1841


Here server_writev is calling dict_unserialize without checking the client version. server_writev is sending xdata_val for parsing. Looks like 2.1 client is sending a different xdata_val then what is expected here.

Comment 6 Amar Tumballi 2013-09-06 16:34:34 UTC

https://code.engineering.redhat.com/gerrit/#/c/12575/

Comment 7 Sachidananda Urs 2013-09-07 08:58:49 UTC

Connected from 2.1 clients and did some I/O, no crashes seen. IO successful.

Client: 
[root@bob-the-minion fuse-2.0]# glusterfs --version
glusterfs 3.4.0.32rhs built on Sep  6 2013 10:27:55

Server:
glusterfs 3.3.0.14rhs

Comment 8 Scott Haines 2013-09-23 22:25:11 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.