977218 – quota: two bricks down with crash

Bug 977218 - quota: two bricks down with crash

Summary: quota: two bricks down with crash

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	vpshastry
QA Contact:	Saurabh
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-06-24 05:50 UTC by Saurabh
Modified:	2016-01-19 06:12 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.4.0.12rhs.beta1-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:39:52 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Saurabh 2013-06-24 05:50:51 UTC

Description of problem:
gluster volume quota is enabled
and
crash is seen in the brick logs.

the core is not dumped.

volume info

Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 16ce5fbe-f72a-41c4-b3da-282d78acc1c0
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.98:/rhs/bricks/d1r1
Brick2: 10.70.37.174:/rhs/bricks/d1r2
Brick3: 10.70.37.136:/rhs/bricks/d2r1
Brick4: 10.70.37.168:/rhs/bricks/d2r2
Brick5: 10.70.37.98:/rhs/bricks/d3r1
Brick6: 10.70.37.174:/rhs/bricks/d3r2
Brick7: 10.70.37.136:/rhs/bricks/d4r1
Brick8: 10.70.37.168:/rhs/bricks/d4r2
Brick9: 10.70.37.98:/rhs/bricks/d5r1
Brick10: 10.70.37.174:/rhs/bricks/d5r2
Brick11: 10.70.37.136:/rhs/bricks/d6r1
Brick12: 10.70.37.168:/rhs/bricks/d6r2


volume status,
[root@quota1 ~]# gluster volume status
Status of volume: dist-rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.37.98:/rhs/bricks/d1r1			49152	Y	2453
Brick 10.70.37.174:/rhs/bricks/d1r2			49152	Y	2376
Brick 10.70.37.136:/rhs/bricks/d2r1			49152	Y	2308
Brick 10.70.37.168:/rhs/bricks/d2r2			49152	Y	2406
Brick 10.70.37.98:/rhs/bricks/d3r1			49153	Y	2462
Brick 10.70.37.174:/rhs/bricks/d3r2			49153	Y	2385
Brick 10.70.37.136:/rhs/bricks/d4r1			N/A	N	2317
Brick 10.70.37.168:/rhs/bricks/d4r2			N/A	N	2415
Brick 10.70.37.98:/rhs/bricks/d5r1			49154	Y	2471
Brick 10.70.37.174:/rhs/bricks/d5r2			49154	Y	2394
Brick 10.70.37.136:/rhs/bricks/d6r1			49154	Y	2326
Brick 10.70.37.168:/rhs/bricks/d6r2			49154	Y	2424
NFS Server on localhost					2049	Y	2483
Self-heal Daemon on localhost				N/A	Y	2490
NFS Server on 54fb720d-b816-4e2a-833c-7fbffc5a5363	2049	Y	2436
Self-heal Daemon on 54fb720d-b816-4e2a-833c-7fbffc5a536
3							N/A	Y	2443
NFS Server on 8ba345c1-0723-4c6c-a380-35f2d4c706c7	2049	Y	2338
Self-heal Daemon on 8ba345c1-0723-4c6c-a380-35f2d4c706c
7							N/A	Y	2346
NFS Server on 6f39ec04-96fb-4aa7-b31a-29f11d34dac1	2049	Y	2407
Self-heal Daemon on 6f39ec04-96fb-4aa7-b31a-29f11d34dac
1							N/A	Y	2413
 
There are no active volume tasks



Version-Release number of selected component (if applicable):
[root@quota4 ~]# rpm -qa | grep glusterfs
glusterfs-3.4rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4rhs-1.el6rhs.x86_64
glusterfs-server-3.4rhs-1.el6rhs.x86_64

How reproducible:
seen on two bricks

Steps to Reproduce:a

I would put rather a sequence of operations over here, as I just installed the new ISO and the new quota rpms, tried something very basic and found bricks are down over the weekend, during this time fs-sanity was invoked

1. gluster volume quota <vol-name> enable
2. gluster volume quota <vol-name> limit-usage / 2GB
3. mount using nfs on a client
4. create some data
5. gluster volume quota list
6. gluster volume quota <vol-name> limit-usage / 100GB
7. invoke fs-sanity 

Actual results:
monday morning I have the results with nfs.log cribbing about Connection refused,
on a keener look, found out that the bricks are dowb because of a crash,

[2013-06-21 08:22:52
.448361] W [quota.c:1752:quota_fstat_cbk] 0-dist-rep-quota: quota context not set in inode (gfid:16484a2a-92a9-48e2-918d-3394aa242d31)
[2013-06-21 08:22:52.453355] W [quota.c:1752:quota_fstat_cbk] 0-dist-rep-quota: quota context not set in inode (gfid:16484a2a-92a9-48e2-918d-3394aa242d31)
[2013-06-21 08:22:52.457778] W [quota.c:1752:quota_fstat_cbk] 0-dist-rep-quota: quota context not set in inode (gfid:16484a2a-92a9-48e2-918d-3394aa242d31)
[2013-06-21 08:22:52.462332] W [quota.c:1752:quota_fstat_cbk] 0-dist-rep-quota: quota context not set in inode (gfid:16484a2a-92a9-48e2-918d-3394aa242d31)
pending frames:
frame : type(0) op(8)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-06-21 08:24:13configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4rhs
/lib64/libc.so.6(+0x32920)[0x7f24710de920]
/usr/lib64/libglusterfs.so.0(uuid_unpack+0x0)[0x7f2471e645d0]
/usr/lib64/libglusterfs.so.0(+0x44366)[0x7f2471e64366]
/usr/lib64/libglusterfs.so.0(uuid_utoa+0x37)[0x7f2471e45b47]
/usr/lib64/glusterfs/3.4rhs/xlator/features/quota.so(quota_rename_cbk+0x30e)[0x7f246a9fc8ce]
/usr/lib64/libglusterfs.so.0(default_rename_cbk+0xfd)[0x7f2471e43fdd]
/usr/lib64/libglusterfs.so.0(+0x2d4da)[0x7f2471e4d4da]
/usr/lib64/libglusterfs.so.0(call_resume+0xa0)[0x7f2471e4ee90]
/usr/lib64/glusterfs/3.4rhs/xlator/features/marker.so(marker_rename_done+0x7c)[0x7f246ac12b6c]
/usr/lib64/glusterfs/3.4rhs/xlator/features/marker.so(marker_rename_release_newp_lock+0x282)[0x7f246ac130a2]
/usr/lib64/glusterfs/3.4rhs/xlator/performance/io-threads.so(iot_inodelk_cbk+0xb9)[0x7f246b032a79]
/usr/lib64/glusterfs/3.4rhs/xlator/features/locks.so(pl_common_inodelk+0x46c)[0x7f246b2559dc]
/usr/lib64/glusterfs/3.4rhs/xlator/features/locks.so(pl_inodelk+0x1d)[0x7f246b2562bd]
/usr/lib64/glusterfs/3.4rhs/xlator/performance/io-threads.so(iot_inodelk_wrapper+0x14d)[0x7f246b03629d]
/usr/lib64/libglusterfs.so.0(call_resume+0x51b)[0x7f2471e4f30b]
/usr/lib64/glusterfs/3.4rhs/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f246b03ea08]
/lib64/libpthread.so.0(+0x7851)[0x7f24717e0851]
/lib64/libc.so.6(clone+0x6d)[0x7f247119490d]
---------



Both the down bricks have same crash traces,
But I didn't find the core dump anywhere.

Comment 6 Scott Haines 2013-09-23 22:39:52 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Comment 7 Scott Haines 2013-09-23 22:43:49 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.