Description of problem:
I was having iozone over nfs-ganesha process, with volume being mounted as vers=4. Meantime a failover of nfs-ganesha process was triggered successfully and I/O resumed post failover but later I find that quotad has crashed and coredumped
Version-Release number of selected component (if applicable):
glusterfs-3.7.1-7.el6rhs.x86_64
nfs-ganesha-2.2.0-3.el6rhs.x86_64
How reproducible:
seen for first time
Steps to Reproduce:
1. create a volume of 6x2 type.
2. nfs-ganesha configuration
3. mount the volume with vers=4
4. trigger a failover by killing a nfs-ganesha process on the node from where mount is done in step 3
5. wait for I/O resume and check the status of things.
Actual results:
step 5 result,
1. nfs-ganesha process of the node on which failover had happened is killed(no coredump reposted)
2. quotad killed and coredump reported.
the above two points is not a sequence of events, but issues seen on machine.
(gdb) bt
#0 afr_local_init (local=0x0, priv=0x7fddd0372220, op_errno=0x7fddce1434dc) at afr-common.c:4112
#1 0x00007fddd53572ae in afr_discover (frame=0x7fdde0867118, this=0x7fddd0015610, loc=0x7fddcd0a4074, xattr_req=0x7fdde02617f0) at afr-common.c:2178
#2 0x00007fddd535789d in afr_lookup (frame=0x7fdde0867118, this=0x7fddd0015610, loc=0x7fddcd0a4074, xattr_req=0x7fdde02617f0) at afr-common.c:2327
#3 0x00007fddd50d9be9 in dht_discover (frame=<value optimized out>, this=<value optimized out>, loc=<value optimized out>) at dht-common.c:515
#4 0x00007fddd50dd02e in dht_lookup (frame=0x7fdde086706c, this=0x7fddd001ada0, loc=0x7fddce143840, xattr_req=<value optimized out>)
at dht-common.c:2171
#5 0x00007fddd4ea1296 in qd_nameless_lookup (this=<value optimized out>, frame=<value optimized out>, req=<value optimized out>,
xdata=0x7fdde02617f0, lookup_cbk=0x7fddd4ea2250 <quotad_aggregator_lookup_cbk>) at quotad.c:126
#6 0x00007fddd4ea2377 in quotad_aggregator_lookup (req=<value optimized out>) at quotad-aggregator.c:327
#7 0x00007fdde2a74ee5 in rpcsvc_handle_rpc_call (svc=<value optimized out>, trans=<value optimized out>, msg=0x7fddc801fc20) at rpcsvc.c:703
#8 0x00007fdde2a75123 in rpcsvc_notify (trans=0x7fddc801e3b0, mydata=<value optimized out>, event=<value optimized out>, data=0x7fddc801fc20)
at rpcsvc.c:797
#9 0x00007fdde2a76ad8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
at rpc-transport.c:543
#10 0x00007fddd77dc255 in socket_event_poll_in (this=0x7fddc801e3b0) at socket.c:2290
#11 0x00007fddd77dde4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7fddc801e3b0, poll_in=1, poll_out=0,
poll_err=0) at socket.c:2403
#12 0x00007fdde2d0f970 in event_dispatch_epoll_handler (data=0x7fddd00fba80) at event-epoll.c:575
#13 event_dispatch_epoll_worker (data=0x7fddd00fba80) at event-epoll.c:678
#14 0x00007fdde1d96a51 in start_thread () from /lib64/libpthread.so.0
#15 0x00007fdde170096d in clone () from /lib64/libc.so.6
[root@nfs14 ~]# gluster volume status vol2
Status of volume: vol2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.46.8:/rhs/brick1/d1r12 49156 0 Y 2665
Brick 10.70.46.27:/rhs/brick1/d1r22 49156 0 Y 20958
Brick 10.70.46.25:/rhs/brick1/d2r12 49156 0 Y 17883
Brick 10.70.46.29:/rhs/brick1/d2r22 49155 0 Y 20935
Brick 10.70.46.8:/rhs/brick1/d3r12 49157 0 Y 2684
Brick 10.70.46.27:/rhs/brick1/d3r22 49157 0 Y 20977
Brick 10.70.46.25:/rhs/brick1/d4r12 49157 0 Y 17902
Brick 10.70.46.29:/rhs/brick1/d4r22 49156 0 Y 20954
Brick 10.70.46.8:/rhs/brick1/d5r12 49158 0 Y 2703
Brick 10.70.46.27:/rhs/brick1/d5r22 49158 0 Y 20996
Brick 10.70.46.25:/rhs/brick1/d6r12 49158 0 Y 17921
Brick 10.70.46.29:/rhs/brick1/d6r22 49157 0 Y 20973
Self-heal Daemon on localhost N/A N/A Y 9905
Quota Daemon on localhost N/A N/A N N/A
Self-heal Daemon on 10.70.46.27 N/A N/A Y 10010
Quota Daemon on 10.70.46.27 N/A N/A N N/A
Self-heal Daemon on 10.70.46.8 N/A N/A Y 2341
Quota Daemon on 10.70.46.8 N/A N/A Y 2352
Self-heal Daemon on 10.70.46.22 N/A N/A Y 7660
Quota Daemon on 10.70.46.22 N/A N/A Y 7668
Self-heal Daemon on 10.70.46.25 N/A N/A Y 7479
Quota Daemon on 10.70.46.25 N/A N/A Y 7484
Task Status of Volume vol2
------------------------------------------------------------------------------
There are no active volume tasks
[root@nfs14 ~]# gluster volume info vol2
Volume Name: vol2
Type: Distributed-Replicate
Volume ID: 30ab7484-1480-46d5-8f83-4ab27199640d
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.46.8:/rhs/brick1/d1r12
Brick2: 10.70.46.27:/rhs/brick1/d1r22
Brick3: 10.70.46.25:/rhs/brick1/d2r12
Brick4: 10.70.46.29:/rhs/brick1/d2r22
Brick5: 10.70.46.8:/rhs/brick1/d3r12
Brick6: 10.70.46.27:/rhs/brick1/d3r22
Brick7: 10.70.46.25:/rhs/brick1/d4r12
Brick8: 10.70.46.29:/rhs/brick1/d4r22
Brick9: 10.70.46.8:/rhs/brick1/d5r12
Brick10: 10.70.46.27:/rhs/brick1/d5r22
Brick11: 10.70.46.25:/rhs/brick1/d6r12
Brick12: 10.70.46.29:/rhs/brick1/d6r22
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
ganesha.enable: on
features.cache-invalidation: on
performance.readdir-ahead: on
nfs-ganesha: enable
Expected results:
quotad should not crash
Additional info:
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://rhn.redhat.com/errata/RHSA-2015-1495.html
Description of problem: I was having iozone over nfs-ganesha process, with volume being mounted as vers=4. Meantime a failover of nfs-ganesha process was triggered successfully and I/O resumed post failover but later I find that quotad has crashed and coredumped Version-Release number of selected component (if applicable): glusterfs-3.7.1-7.el6rhs.x86_64 nfs-ganesha-2.2.0-3.el6rhs.x86_64 How reproducible: seen for first time Steps to Reproduce: 1. create a volume of 6x2 type. 2. nfs-ganesha configuration 3. mount the volume with vers=4 4. trigger a failover by killing a nfs-ganesha process on the node from where mount is done in step 3 5. wait for I/O resume and check the status of things. Actual results: step 5 result, 1. nfs-ganesha process of the node on which failover had happened is killed(no coredump reposted) 2. quotad killed and coredump reported. the above two points is not a sequence of events, but issues seen on machine. (gdb) bt #0 afr_local_init (local=0x0, priv=0x7fddd0372220, op_errno=0x7fddce1434dc) at afr-common.c:4112 #1 0x00007fddd53572ae in afr_discover (frame=0x7fdde0867118, this=0x7fddd0015610, loc=0x7fddcd0a4074, xattr_req=0x7fdde02617f0) at afr-common.c:2178 #2 0x00007fddd535789d in afr_lookup (frame=0x7fdde0867118, this=0x7fddd0015610, loc=0x7fddcd0a4074, xattr_req=0x7fdde02617f0) at afr-common.c:2327 #3 0x00007fddd50d9be9 in dht_discover (frame=<value optimized out>, this=<value optimized out>, loc=<value optimized out>) at dht-common.c:515 #4 0x00007fddd50dd02e in dht_lookup (frame=0x7fdde086706c, this=0x7fddd001ada0, loc=0x7fddce143840, xattr_req=<value optimized out>) at dht-common.c:2171 #5 0x00007fddd4ea1296 in qd_nameless_lookup (this=<value optimized out>, frame=<value optimized out>, req=<value optimized out>, xdata=0x7fdde02617f0, lookup_cbk=0x7fddd4ea2250 <quotad_aggregator_lookup_cbk>) at quotad.c:126 #6 0x00007fddd4ea2377 in quotad_aggregator_lookup (req=<value optimized out>) at quotad-aggregator.c:327 #7 0x00007fdde2a74ee5 in rpcsvc_handle_rpc_call (svc=<value optimized out>, trans=<value optimized out>, msg=0x7fddc801fc20) at rpcsvc.c:703 #8 0x00007fdde2a75123 in rpcsvc_notify (trans=0x7fddc801e3b0, mydata=<value optimized out>, event=<value optimized out>, data=0x7fddc801fc20) at rpcsvc.c:797 #9 0x00007fdde2a76ad8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:543 #10 0x00007fddd77dc255 in socket_event_poll_in (this=0x7fddc801e3b0) at socket.c:2290 #11 0x00007fddd77dde4d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7fddc801e3b0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2403 #12 0x00007fdde2d0f970 in event_dispatch_epoll_handler (data=0x7fddd00fba80) at event-epoll.c:575 #13 event_dispatch_epoll_worker (data=0x7fddd00fba80) at event-epoll.c:678 #14 0x00007fdde1d96a51 in start_thread () from /lib64/libpthread.so.0 #15 0x00007fdde170096d in clone () from /lib64/libc.so.6 [root@nfs14 ~]# gluster volume status vol2 Status of volume: vol2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.46.8:/rhs/brick1/d1r12 49156 0 Y 2665 Brick 10.70.46.27:/rhs/brick1/d1r22 49156 0 Y 20958 Brick 10.70.46.25:/rhs/brick1/d2r12 49156 0 Y 17883 Brick 10.70.46.29:/rhs/brick1/d2r22 49155 0 Y 20935 Brick 10.70.46.8:/rhs/brick1/d3r12 49157 0 Y 2684 Brick 10.70.46.27:/rhs/brick1/d3r22 49157 0 Y 20977 Brick 10.70.46.25:/rhs/brick1/d4r12 49157 0 Y 17902 Brick 10.70.46.29:/rhs/brick1/d4r22 49156 0 Y 20954 Brick 10.70.46.8:/rhs/brick1/d5r12 49158 0 Y 2703 Brick 10.70.46.27:/rhs/brick1/d5r22 49158 0 Y 20996 Brick 10.70.46.25:/rhs/brick1/d6r12 49158 0 Y 17921 Brick 10.70.46.29:/rhs/brick1/d6r22 49157 0 Y 20973 Self-heal Daemon on localhost N/A N/A Y 9905 Quota Daemon on localhost N/A N/A N N/A Self-heal Daemon on 10.70.46.27 N/A N/A Y 10010 Quota Daemon on 10.70.46.27 N/A N/A N N/A Self-heal Daemon on 10.70.46.8 N/A N/A Y 2341 Quota Daemon on 10.70.46.8 N/A N/A Y 2352 Self-heal Daemon on 10.70.46.22 N/A N/A Y 7660 Quota Daemon on 10.70.46.22 N/A N/A Y 7668 Self-heal Daemon on 10.70.46.25 N/A N/A Y 7479 Quota Daemon on 10.70.46.25 N/A N/A Y 7484 Task Status of Volume vol2 ------------------------------------------------------------------------------ There are no active volume tasks [root@nfs14 ~]# gluster volume info vol2 Volume Name: vol2 Type: Distributed-Replicate Volume ID: 30ab7484-1480-46d5-8f83-4ab27199640d Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.46.8:/rhs/brick1/d1r12 Brick2: 10.70.46.27:/rhs/brick1/d1r22 Brick3: 10.70.46.25:/rhs/brick1/d2r12 Brick4: 10.70.46.29:/rhs/brick1/d2r22 Brick5: 10.70.46.8:/rhs/brick1/d3r12 Brick6: 10.70.46.27:/rhs/brick1/d3r22 Brick7: 10.70.46.25:/rhs/brick1/d4r12 Brick8: 10.70.46.29:/rhs/brick1/d4r22 Brick9: 10.70.46.8:/rhs/brick1/d5r12 Brick10: 10.70.46.27:/rhs/brick1/d5r22 Brick11: 10.70.46.25:/rhs/brick1/d6r12 Brick12: 10.70.46.29:/rhs/brick1/d6r22 Options Reconfigured: features.quota-deem-statfs: on features.inode-quota: on features.quota: on nfs.disable: on ganesha.enable: on features.cache-invalidation: on performance.readdir-ahead: on nfs-ganesha: enable Expected results: quotad should not crash Additional info: