1239317 – quota+afr: quotad crash "afr_local_init (local=0x0, priv=0x7fddd0372220, op_errno=0x7fddce1434dc) at afr-common.c:4112"

Bug 1239317 - quota+afr: quotad crash "afr_local_init (local=0x0, priv=0x7fddd0372220, op_errno=0x7fddce1434dc) at afr-common.c:4112"

Summary: quota+afr: quotad crash "afr_local_init (local=0x0, priv=0x7fddd0372220, op_e...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	quota
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Raghavendra G
QA Contact:	Saurabh
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1239319 1240316 (view as bug list)
Depends On:
Blocks:	1202842 1240254 1240906
TreeView+	depends on / blocked

Reported:	2015-07-05 16:06 UTC by Saurabh
Modified:	2016-09-17 12:38 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-3.7.1-9
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1240254 (view as bug list)
Environment:
Last Closed:	2015-07-29 05:09:32 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sosreport of nfs14 (12.78 MB, application/x-xz) 2015-07-05 16:12 UTC, Saurabh	no flags	Details
coredump of quotad on nfs14 (911.79 KB, application/x-xz) 2015-07-05 16:17 UTC, Saurabh	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description Saurabh 2015-07-05 16:06:46 UTC

Description of problem:
I was having iozone over nfs-ganesha process, with volume being mounted as vers=4. Meantime a failover of nfs-ganesha process was triggered successfully and I/O resumed post failover but later I find that quotad has crashed and coredumped 


Version-Release number of selected component (if applicable):
glusterfs-3.7.1-7.el6rhs.x86_64
nfs-ganesha-2.2.0-3.el6rhs.x86_64

How reproducible:
seen for first time

Steps to Reproduce:
1. create a volume of 6x2 type.
2. nfs-ganesha configuration
3. mount the volume with vers=4
4. trigger a failover by killing a nfs-ganesha process on the node from where mount is done in step 3
5. wait for I/O resume and check the status of things.

Actual results:
step 5 result, 
1. nfs-ganesha process of the node on which failover had happened is killed(no coredump  reposted)
2. quotad killed and coredump reported.

the above two points is not a sequence of events, but issues seen on machine.

(gdb) bt
#0  afr_local_init (local=0x0, priv=0x7fddd0372220, op_errno=0x7fddce1434dc) at afr-common.c:4112
#1  0x00007fddd53572ae in afr_discover (frame=0x7fdde0867118, this=0x7fddd0015610, loc=0x7fddcd0a4074, xattr_req=0x7fdde02617f0) at afr-common.c:2178
#2  0x00007fddd535789d in afr_lookup (frame=0x7fdde0867118, this=0x7fddd0015610, loc=0x7fddcd0a4074, xattr_req=0x7fdde02617f0) at afr-common.c:2327
#3  0x00007fddd50d9be9 in dht_discover (frame=<value optimized out>, this=<value optimized out>, loc=<value optimized out>) at dht-common.c:515
#4  0x00007fddd50dd02e in dht_lookup (frame=0x7fdde086706c, this=0x7fddd001ada0, loc=0x7fddce143840, xattr_req=<value optimized out>)
    at dht-common.c:2171
#5  0x00007fddd4ea1296 in qd_nameless_lookup (this=<value optimized out>, frame=<value optimized out>, req=<value optimized out>, 
    xdata=0x7fdde02617f0, lookup_cbk=0x7fddd4ea2250 <quotad_aggregator_lookup_cbk>) at quotad.c:126
#6  0x00007fddd4ea2377 in quotad_aggregator_lookup (req=<value optimized out>) at quotad-aggregator.c:327
#7  0x00007fdde2a74ee5 in rpcsvc_handle_rpc_call (svc=<value optimized out>, trans=<value optimized out>, msg=0x7fddc801fc20) at rpcsvc.c:703
#8  0x00007fdde2a75123 in rpcsvc_notify (trans=0x7fddc801e3b0, mydata=<value optimized out>, event=<value optimized out>, data=0x7fddc801fc20)
    at rpcsvc.c:797
#9  0x00007fdde2a76ad8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:543
#10 0x00007fddd77dc255 in socket_event_poll_in (this=0x7fddc801e3b0) at socket.c:2290
#11 0x00007fddd77dde4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7fddc801e3b0, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:2403
#12 0x00007fdde2d0f970 in event_dispatch_epoll_handler (data=0x7fddd00fba80) at event-epoll.c:575
#13 event_dispatch_epoll_worker (data=0x7fddd00fba80) at event-epoll.c:678
#14 0x00007fdde1d96a51 in start_thread () from /lib64/libpthread.so.0
#15 0x00007fdde170096d in clone () from /lib64/libc.so.6


[root@nfs14 ~]# gluster volume status vol2
Status of volume: vol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.8:/rhs/brick1/d1r12          49156     0          Y       2665 
Brick 10.70.46.27:/rhs/brick1/d1r22         49156     0          Y       20958
Brick 10.70.46.25:/rhs/brick1/d2r12         49156     0          Y       17883
Brick 10.70.46.29:/rhs/brick1/d2r22         49155     0          Y       20935
Brick 10.70.46.8:/rhs/brick1/d3r12          49157     0          Y       2684 
Brick 10.70.46.27:/rhs/brick1/d3r22         49157     0          Y       20977
Brick 10.70.46.25:/rhs/brick1/d4r12         49157     0          Y       17902
Brick 10.70.46.29:/rhs/brick1/d4r22         49156     0          Y       20954
Brick 10.70.46.8:/rhs/brick1/d5r12          49158     0          Y       2703 
Brick 10.70.46.27:/rhs/brick1/d5r22         49158     0          Y       20996
Brick 10.70.46.25:/rhs/brick1/d6r12         49158     0          Y       17921
Brick 10.70.46.29:/rhs/brick1/d6r22         49157     0          Y       20973
Self-heal Daemon on localhost               N/A       N/A        Y       9905 
Quota Daemon on localhost                   N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.27             N/A       N/A        Y       10010
Quota Daemon on 10.70.46.27                 N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.8              N/A       N/A        Y       2341 
Quota Daemon on 10.70.46.8                  N/A       N/A        Y       2352 
Self-heal Daemon on 10.70.46.22             N/A       N/A        Y       7660 
Quota Daemon on 10.70.46.22                 N/A       N/A        Y       7668 
Self-heal Daemon on 10.70.46.25             N/A       N/A        Y       7479 
Quota Daemon on 10.70.46.25                 N/A       N/A        Y       7484 
 
Task Status of Volume vol2
------------------------------------------------------------------------------
There are no active volume tasks

[root@nfs14 ~]# gluster volume info vol2
 
Volume Name: vol2
Type: Distributed-Replicate
Volume ID: 30ab7484-1480-46d5-8f83-4ab27199640d
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.46.8:/rhs/brick1/d1r12
Brick2: 10.70.46.27:/rhs/brick1/d1r22
Brick3: 10.70.46.25:/rhs/brick1/d2r12
Brick4: 10.70.46.29:/rhs/brick1/d2r22
Brick5: 10.70.46.8:/rhs/brick1/d3r12
Brick6: 10.70.46.27:/rhs/brick1/d3r22
Brick7: 10.70.46.25:/rhs/brick1/d4r12
Brick8: 10.70.46.29:/rhs/brick1/d4r22
Brick9: 10.70.46.8:/rhs/brick1/d5r12
Brick10: 10.70.46.27:/rhs/brick1/d5r22
Brick11: 10.70.46.25:/rhs/brick1/d6r12
Brick12: 10.70.46.29:/rhs/brick1/d6r22
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
ganesha.enable: on
features.cache-invalidation: on
performance.readdir-ahead: on
nfs-ganesha: enable



Expected results:
quotad should not crash

Additional info:

Comment 2 Saurabh 2015-07-05 16:12:39 UTC

Created attachment 1046279 [details]
sosreport of nfs14

Comment 3 Saurabh 2015-07-05 16:17:32 UTC

Created attachment 1046280 [details]
coredump of quotad on nfs14

Comment 6 Vijaikumar Mallikarjuna 2015-07-06 06:35:07 UTC

*** Bug 1239319 has been marked as a duplicate of this bug. ***

Comment 8 Vijaikumar Mallikarjuna 2015-07-08 06:57:16 UTC

*** Bug 1240316 has been marked as a duplicate of this bug. ***

Comment 11 Saurabh 2015-07-14 13:17:25 UTC

Executed the similar test as mentioned in the description section and this time I haven't seen the crash this time.

Comment 12 errata-xmlrpc 2015-07-29 05:09:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.