Bug 1354262 - (Quota on)When glusterfsd init failed and exit, sometime lead to crash
Summary: (Quota on)When glusterfsd init failed and exit, sometime lead to crash
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Vijay Bellur
QA Contact: Rahul Hinduja
URL:
Whiteboard:
Depends On: 1346549
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-11 05:25 UTC by Manikandan
Modified: 2018-02-01 05:55 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1346549
Environment:
Last Closed: 2018-02-01 05:55:13 UTC
Embargoed:


Attachments (Terms of Use)

Description Manikandan 2016-07-11 05:25:58 UTC
+++ This bug was initially created as a clone of Bug #1346549 +++

Description of problem:
Create a volume like this:
Volume Name: test
Type: Distributed-Disperse
Volume ID: 78bd1b85-cfe9-401e-ac1e-dc9e072ed4db
Status: Started
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: node-1:/disk1
Brick2: node-2:/disk1
Brick3: node-3:/disk1
Brick4: node-1:/disk2
Brick5: node-2:/disk2
Brick6: node-3:/disk2
Options Reconfigured:
performance.readdir-ahead: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on

Then I umount /disk{1..3} and set /disk{1..3} readonly.

I  several attempt to gluster vol start test force, sometimes the glusterfsd crash.

glusterfsd's log:
[2016-06-15 09:44:20.567687] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.12 (args: /usr/sbin/glusterfsd -s node-1 --volfile-id test.node-1.disk2 -p /var/lib/glusterd/vols/test/run/node-1-disk2.pid -S /var/run/gluster/bddd1d1330cb529b05a3a9266879baee.socket --brick-name /disk2 -l /var/log/glusterfs/bricks/disk2.log --xlator-option *-posix.glusterd-uuid=dee1dcb8-280b-4b4c-b5a6-6ad7dbd0360a --brick-port 49153 --xlator-option test-server.listen-port=49153)
[2016-06-15 09:44:20.575048] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2016-06-15 09:44:20.580116] I [graph.c:269:gf_add_cmdline_options] 0-test-server: adding option 'listen-port' for volume 'test-server' with value '49153'
[2016-06-15 09:44:20.580187] I [graph.c:269:gf_add_cmdline_options] 0-test-posix: adding option 'glusterd-uuid' for volume 'test-posix' with value 'dee1dcb8-280b-4b4c-b5a6-6ad7dbd0360a'
[2016-06-15 09:44:20.580607] I [MSGID: 115034] [server.c:403:_check_for_auth_option] 0-/disk2: skip format check for non-addr auth option auth.login./disk2.allow
[2016-06-15 09:44:20.580765] I [MSGID: 115034] [server.c:403:_check_for_auth_option] 0-/disk2: skip format check for non-addr auth option auth.login.8306814a-3bf6-49b0-b75a-95665c2ba483.password
[2016-06-15 09:44:20.582297] I [rpcsvc.c:2196:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2016-06-15 09:44:20.582499] W [MSGID: 101002] [options.c:957:xl_opt_validate] 0-test-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction
[2016-06-15 09:44:20.583012] W [socket.c:3759:reconfigure] 0-test-quota: NBIO on -1 failed (Bad file descriptor)
[2016-06-15 09:44:20.583141] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2016-06-15 09:44:20.588207] E [index.c:188:index_dir_create] 0-test-index: /disk2/.glusterfs/indices/xattrop: Failed to create (Permission denied)
[2016-06-15 09:44:20.588401] E [MSGID: 101019] [xlator.c:435:xlator_init] 0-test-index: Initialization of volume 'test-index' failed, review your volfile again
[2016-06-15 09:44:20.588512] E [graph.c:322:glusterfs_graph_init] 0-test-index: initializing translator failed
[2016-06-15 09:44:20.588613] E [graph.c:662:glusterfs_graph_activate] 0-graph: init failed
[2016-06-15 09:44:20.590554] W [glusterfsd.c:1251:cleanup_and_exit] (-->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x307) [0x40dbe7] -->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x13a) [0x408c7a] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x5f) [0x40831f] ) 0-: received signum (1), shutting down
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2016-06-15 09:44:20
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1


gdb bt:
Core was generated by `/usr/sbin/glusterfsd -s node-1 --volfile-id test.node-1.disk2 -p /var/lib/glust'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fd73c6ff688 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
(gdb) bt
#0  0x00007fd73c6ff688 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#1  0x00007fd73c7006f8 in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#2  0x00007fd7418dae26 in __GI___backtrace (array=array@entry=0x7fd735bbfb80, size=size@entry=200) at ../sysdeps/x86_64/backtrace.c:109
#3  0x00007fd742411ea2 in _gf_msg_backtrace_nomem (level=level@entry=GF_LOG_ALERT, stacksize=stacksize@entry=200) at logging.c:1095
#4  0x00007fd74243713d in gf_print_trace (signum=11, ctx=0x2049010) at common-utils.c:615
#5  <signal handler called>
#6  0x00007fd73644adb0 in ?? ()
#7  0x00007fd7421df8e4 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7fd73803ef80, event=<optimized out>, data=0x7fd7380420f0) at rpc-clnt.c:957
#8  0x00007fd7421db593 in rpc_transport_notify (this=this@entry=0x7fd7380420f0, event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x7fd7380420f0) at rpc-transport.c:546
#9  0x00007fd73d579f8f in socket_connect_finish (this=this@entry=0x7fd7380420f0) at socket.c:2429
#10 0x00007fd73d57a3af in socket_event_handler (fd=fd@entry=12, idx=idx@entry=3, data=0x7fd7380420f0, poll_in=0, poll_out=4, poll_err=0) at socket.c:2459
#11 0x00007fd74247f9fa in event_dispatch_epoll_handler (event=0x7fd735bc0e90, event_pool=0x2067da0) at event-epoll.c:575
#12 event_dispatch_epoll_worker (data=0x7fd73801e670) at event-epoll.c:678
#13 0x00007fd741b9d182 in start_thread (arg=0x7fd735bc1700) at pthread_create.c:312
#14 0x00007fd7418ca47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) 



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from jiademing.dd on 2016-06-14 22:04:29 EDT ---

After analysis, rpc_clnt_notify() will call quota_enforcer_notify(),because rpc_clnt_register_notify (rpc, quota_enforcer_notify, this) in quota. glusterfsd exit will call glusterfs_graph_destroy(),in glusterfs_graph_destroy() will         dlclose (xl->dlhandle).

so if dlclose(xl->dlhandle) before rpc_clnt_notify(),quota.so's quota_enforcer_notify() invalid, then lead to crash.

Comment 4 Sanoj Unnikrishnan 2017-02-07 07:10:35 UTC
From the looks of it , It seems to be a race between connect event and graph destroy. So, the component is either core that handles graph switch / Protocol server which should have waited till graph is activated before listening for incoming connections

Comment 5 Amar Tumballi 2018-02-01 05:55:13 UTC
Thank you for your bug report.

We are no longer releasing any bug fixes or, any other updates for this version. This bug will be set to CLOSED WONTFIX to reflect this. Please reopen if the problem continues to be observed after upgrading to a latest version.

[Also considering this is in the path of cleanup_and_exit(), ie, at the time of process stopping, we wouldn't focus on this bug for anytime now.


Note You need to log in before you can comment on or make changes to this bug.