Bug 1214169
| Summary: | glusterfsd crashed while rebalance and self-heal were in progress | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Shruti Sampat <ssampat> | ||||
| Component: | core | Assignee: | Raghavendra G <rgowdapp> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.7.0 | CC: | bugs, byarlaga, gluster-bugs, nbalacha, ravishankar, rgowdapp, skoduri, ssampat | ||||
| Target Milestone: | --- | Keywords: | Triaged | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | glusterfs-3.7.3 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1223915 1236945 1238535 (view as bug list) | Environment: | |||||
| Last Closed: | 2015-07-30 09:50:48 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1186580, 1223915, 1236945, 1238535 | ||||||
| Attachments: |
|
||||||
|
Description
Shruti Sampat
2015-04-22 07:06:26 UTC
On the same volume, another brick process crashed after some time. From the brick logs - [2015-04-22 12:31:23.720702] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/netprio_cgroup.h frame : type(0) op(0) frame : type(0) op(20) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(40) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-04-22 12:31:23 configuration details: [2015-04-22 12:31:23.731428] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/raw.h argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7dev /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3ad30221c6] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3ad303de2f] /lib64/libc.so.6[0x3ad14326a0] /lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x3ad180c380] /usr/lib64/glusterfs/3.7dev/xlator/features/quota.so(+0x589b)[0x7f484d94689b] /usr/lib64/glusterfs/3.7dev/xlator/features/quota.so(quota_fill_inodectx+0x1fa)[0x7f484d94f6aa] /usr/lib64/glusterfs/3.7dev/xlator/features/quota.so(quota_readdirp_cbk+0x13e)[0x7f484d94fa9e] /usr/lib64/glusterfs/3.7dev/xlator/features/marker.so(marker_readdirp_cbk+0x13e)[0x7f484db71bbe] /usr/lib64/libglusterfs.so.0(default_readdirp_cbk+0xc2)[0x3ad302e622] /usr/lib64/glusterfs/3.7dev/xlator/features/locks.so(pl_readdirp_cbk+0x18b)[0x7f484e5b6cfb] /usr/lib64/glusterfs/3.7dev/xlator/features/access-control.so(posix_acl_readdirp_cbk+0x27a)[0x7f484e7d0b7a] /usr/lib64/glusterfs/3.7dev/xlator/features/bitrot-stub.so(br_stub_readdirp_cbk+0x214)[0x7f484e9db304] /usr/lib64/glusterfs/3.7dev/xlator/storage/posix.so(posix_do_readdir+0x1b8)[0x7f484f871498] /usr/lib64/glusterfs/3.7dev/xlator/storage/posix.so(posix_readdirp+0x1ee)[0x7f484f872fde] /usr/lib64/libglusterfs.so.0(default_readdirp+0x83)[0x3ad3027333] /usr/lib64/libglusterfs.so.0(default_readdirp+0x83)[0x3ad3027333] /usr/lib64/libglusterfs.so.0(default_readdirp+0x83)[0x3ad3027333] /usr/lib64/glusterfs/3.7dev/xlator/features/bitrot-stub.so(br_stub_readdirp+0x259)[0x7f484e9d8e29] /usr/lib64/glusterfs/3.7dev/xlator/features/access-control.so(posix_acl_readdirp+0x19d)[0x7f484e7cd4bd] /usr/lib64/glusterfs/3.7dev/xlator/features/locks.so(pl_readdirp+0x204)[0x7f484e5b5d94] /usr/lib64/libglusterfs.so.0(default_readdirp+0x83)[0x3ad3027333] /usr/lib64/libglusterfs.so.0(default_readdirp_resume+0x142)[0x3ad3029db2] /usr/lib64/libglusterfs.so.0(call_resume+0x80)[0x3ad3046470] /usr/lib64/glusterfs/3.7dev/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f484e1a1388] /lib64/libpthread.so.0[0x3ad18079d1] /lib64/libc.so.6(clone+0x6d)[0x3ad14e88fd] --------- While self-heal was running, I attempted to remove a directory on the mount point. # rm -fr linux-3.19.4 rm: cannot remove `linux-3.19.4/include/crypto': Directory not empty rm: cannot remove `linux-3.19.4/include/drm': Directory not empty rm: cannot remove `linux-3.19.4/include/media': Directory not empty rm: cannot remove `linux-3.19.4/include/net/netfilter': Directory not empty rm: cannot remove `linux-3.19.4/include/net/bluetooth': Directory not empty rm: cannot remove `linux-3.19.4/include/net/caif': Directory not empty rm: cannot remove `linux-3.19.4/include/net/tc_act': Directory not empty I also see a lot of the following messages in brick logs - <snip> [2015-04-22 12:31:14.461836] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/netns [2015-04-22 12:31:18.132176] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/caif/caif_layer.h [2015-04-22 12:31:23.675448] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/ip6_fib.h [2015-04-22 12:31:23.691089] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/ip_fib.h [2015-04-22 12:31:23.699589] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/lib80211.h [2015-04-22 12:31:23.711344] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/neighbour.h </snip> From the core for the first crash (core.28981) :
(gdb) t 12
[Switching to thread 12 (Thread 0x7f558ef38700 (LWP 28986))]#0 0x0000003d13839230 in ?? () from /usr/lib64/libsqlite3.so.0
(gdb) bt
#0 0x0000003d13839230 in ?? () from /usr/lib64/libsqlite3.so.0
#1 0x0000003d1381f9a1 in sqlite3_initialize () from /usr/lib64/libsqlite3.so.0
#2 0x0000003d13841949 in ?? () from /usr/lib64/libsqlite3.so.0
#3 0x00007f558dcd0cba in gf_open_sqlite3_conn (args=0x7f559821cbe8, db_conn=0x7f55880795f0) at gfdb_sqlite3.c:190
#4 gf_sqlite3_init (args=0x7f559821cbe8, db_conn=0x7f55880795f0) at gfdb_sqlite3.c:428
#5 0x00007f558dccbe1a in init_db (args=0x7f559821cbe8, gfdb_db_type=GFDB_SQLITE3) at gfdb_data_store.c:251
#6 0x00007f558dee2546 in init (this=0x7f558800a530) at changetimerecorder.c:1046
#7 0x0000003d1401f662 in __xlator_init (xl=0x7f558800a530) at xlator.c:397
#8 xlator_init (xl=0x7f558800a530) at xlator.c:420
#9 0x0000003d1405ec01 in glusterfs_graph_init (graph=<value optimized out>) at graph.c:319
#10 0x0000003d1405ed37 in glusterfs_graph_activate (graph=<value optimized out>, ctx=0xc77010) at graph.c:659
#11 0x0000000000405a9b in glusterfs_process_volfp (ctx=0xc77010, fp=0x7f55880008e0) at glusterfsd.c:2160
#12 0x000000000040d0c0 in mgmt_getspec_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>,
myframe=0x7f55986556d4) at glusterfsd-mgmt.c:1552
#13 0x0000003d1480ec65 in rpc_clnt_handle_reply (clnt=0xcdce60, pollin=0x7f55880016a0) at rpc-clnt.c:766
#14 0x0000003d14810102 in rpc_clnt_notify (trans=<value optimized out>, mydata=0xcdce90, event=<value optimized out>,
data=<value optimized out>) at rpc-clnt.c:894
#15 0x0000003d1480b7f8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>,
data=<value optimized out>) at rpc-transport.c:543
#16 0x00007f558f799bbd in socket_event_poll_in (this=0xcde9b0) at socket.c:2290
#17 0x00007f558f79b6ed in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0xcde9b0, poll_in=1,
poll_out=0, poll_err=0) at socket.c:2403
#18 0x0000003d1407e150 in event_dispatch_epoll_handler (data=0xcdfb70) at event-epoll.c:572
#19 event_dispatch_epoll_worker (data=0xcdfb70) at event-epoll.c:674
#20 0x0000003d124079d1 in start_thread () from /lib64/libpthread.so.0
#21 0x0000003d120e88fd in iopl () from /lib64/libc.so.6
#22 0x0000000000000000 in ?? ()
(gdb) t 12
[Switching to thread 12 (Thread 0x7f558ef38700 (LWP 28986))]#0 0x0000003d13839230 in ?? () from /usr/lib64/libsqlite3.so.0
(gdb) bt
#0 0x0000003d13839230 in ?? () from /usr/lib64/libsqlite3.so.0
#1 0x0000003d1381f9a1 in sqlite3_initialize () from /usr/lib64/libsqlite3.so.0
#2 0x0000003d13841949 in ?? () from /usr/lib64/libsqlite3.so.0
#3 0x00007f558dcd0cba in gf_open_sqlite3_conn (args=0x7f559821cbe8, db_conn=0x7f55880795f0) at gfdb_sqlite3.c:190
#4 gf_sqlite3_init (args=0x7f559821cbe8, db_conn=0x7f55880795f0) at gfdb_sqlite3.c:428
#5 0x00007f558dccbe1a in init_db (args=0x7f559821cbe8, gfdb_db_type=GFDB_SQLITE3) at gfdb_data_store.c:251
#6 0x00007f558dee2546 in init (this=0x7f558800a530) at changetimerecorder.c:1046
#7 0x0000003d1401f662 in __xlator_init (xl=0x7f558800a530) at xlator.c:397
#8 xlator_init (xl=0x7f558800a530) at xlator.c:420
#9 0x0000003d1405ec01 in glusterfs_graph_init (graph=<value optimized out>) at graph.c:319
#10 0x0000003d1405ed37 in glusterfs_graph_activate (graph=<value optimized out>, ctx=0xc77010) at graph.c:659
#11 0x0000000000405a9b in glusterfs_process_volfp (ctx=0xc77010, fp=0x7f55880008e0) at glusterfsd.c:2160
#12 0x000000000040d0c0 in mgmt_getspec_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>,
myframe=0x7f55986556d4) at glusterfsd-mgmt.c:1552
#13 0x0000003d1480ec65 in rpc_clnt_handle_reply (clnt=0xcdce60, pollin=0x7f55880016a0) at rpc-clnt.c:766
#14 0x0000003d14810102 in rpc_clnt_notify (trans=<value optimized out>, mydata=0xcdce90, event=<value optimized out>,
data=<value optimized out>) at rpc-clnt.c:894
#15 0x0000003d1480b7f8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>,
data=<value optimized out>) at rpc-transport.c:543
#16 0x00007f558f799bbd in socket_event_poll_in (this=0xcde9b0) at socket.c:2290
#17 0x00007f558f79b6ed in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0xcde9b0, poll_in=1,
poll_out=0, poll_err=0) at socket.c:2403
#18 0x0000003d1407e150 in event_dispatch_epoll_handler (data=0xcdfb70) at event-epoll.c:572
#19 event_dispatch_epoll_worker (data=0xcdfb70) at event-epoll.c:674
#20 0x0000003d124079d1 in start_thread () from /lib64/libpthread.so.0
#21 0x0000003d120e88fd in iopl () from /lib64/libc.so.6
#22 0x0000000000000000 in ?? ()
So it looks like a request has come and been accepted before all the xlators in the graph have been initialised
Created attachment 1018374 [details]
sosreport of Node1
Please file a separate BZ for the second crash as it is a different issue. BZ #1215550 filed for the second crash. Assigning it to Nithya who seem to have been actively debugging the issue. Please assign it back if otherwise. Thanks! Another crash in master which seems similar to this issue: http://build.gluster.org/job/rackspace-regression-2GB-triggered/9585/consoleFull The glusterfs_graph_init() still is in progress before completion of which notifications were being received. REVIEW: http://review.gluster.org/11463 (protocol/server: start listening only after all xlators in graph are initialised.) posted (#2) for review on release-3.7 by Raghavendra G (rgowdapp) REVIEW: http://review.gluster.org/11504 (protocol/server: fail setvolume if any of xlators is not initialized yet) posted (#1) for review on release-3.7 by Raghavendra G (rgowdapp) COMMIT: http://review.gluster.org/11504 committed in release-3.7 by Raghavendra G (rgowdapp) ------ commit 12c854b7a28a8d764f0446d2e0133c447c2537c2 Author: Raghavendra G <rgowdapp> Date: Wed Jul 1 16:54:55 2015 +0530 protocol/server: fail setvolume if any of xlators is not initialized yet We can only start recieving fops only when all xlators in graph are initialized. Change-Id: Id79100bab5878bb2518ed133c1118554fbb35229 BUG: 1214169 Signed-off-by: Raghavendra G <rgowdapp> Reviewed-on: http://review.gluster.org/11504 Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report. glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |