Previously, while trying to enable quota again, the system tried to access a NULL transport object leading to a crash. With this fix, a new transport connection is created every time quota is enabled.
Created attachment 893229[details]
gdb backtrace
Description of problem:
--------------------------
glusterfsd processes of a distributed-replicate volume crashed after quota was enabled on a volume on which it had been enabled and disabled previously.
The following was seen when the core was examined with gdb -
-------------------------------------------------------------------------------
(gdb) p *rpc->conn.rpc_clnt
$4 = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = '\000' <repeats 39 times>, __align = 0}, notifyfn = 0x7f34b77c2690 <quota_enforcer_notify>, conn = {lock = {__data = {__lock = 0, __count = 0,
__owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, trans = 0x0,
config = {rpc_timeout = 0, remote_port = 0, remote_host = 0x0}, reconnect = 0x0, timer = 0x0, ping_timer = 0x0, rpc_clnt = 0x2344ef0, connected = 0 '\000',
saved_frames = 0x233ecc0, frame_timeout = 1800, last_sent = {tv_sec = 1399457050, tv_usec = 294940}, last_received = {tv_sec = 1399456455, tv_usec = 40203},
ping_started = 0, name = 0x233c240 "dis_rep_vol-quota"}, mydata = 0x22ebd60, xid = 160030, programs = {next = 0x2344fd8, prev = 0x2344fd8}, reqpool = 0x233c1c0,
saved_frames_pool = 0x2347520, ctx = 0x22ab010, refcount = 1, auth_null = 0, disabled = 1 '\001'}
Note that 'trans' is seen as null above.
Find attached the gdb backtrace.
Version-Release number of selected component (if applicable):
glusterfs-3.5qa2-0.369.git500a656.el6rhs.x86_64
How reproducible:
Saw it once in my setup.
Steps to Reproduce:
1. Enable quota on a distributed-replicate volume and set the limit. The volume is mounted on a client and data written to cross the soft limit set on a particular directory.
2. Quota is disabled after a while.
3. Enabled quota again after a while.
Actual results:
Brick processes crashed.
Expected results:
Brick processes should not crash.
Additional info:
Verified as fixed in glusterfs-3.6.0.14-1.el6rhs.x86_64
Performed the following steps for both distribute and distributed-replicate volumes -
1. Enable quota on the volume, and set limit on a directory.
2. Cause the usage of the directory to exceed the limit set.
3. Disable quota on the volume and enable it again.
glusterfsd crash not seen.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHEA-2014-1278.html
Created attachment 893229 [details] gdb backtrace Description of problem: -------------------------- glusterfsd processes of a distributed-replicate volume crashed after quota was enabled on a volume on which it had been enabled and disabled previously. The following was seen when the core was examined with gdb - ------------------------------------------------------------------------------- (gdb) p *rpc->conn.rpc_clnt $4 = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, notifyfn = 0x7f34b77c2690 <quota_enforcer_notify>, conn = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, trans = 0x0, config = {rpc_timeout = 0, remote_port = 0, remote_host = 0x0}, reconnect = 0x0, timer = 0x0, ping_timer = 0x0, rpc_clnt = 0x2344ef0, connected = 0 '\000', saved_frames = 0x233ecc0, frame_timeout = 1800, last_sent = {tv_sec = 1399457050, tv_usec = 294940}, last_received = {tv_sec = 1399456455, tv_usec = 40203}, ping_started = 0, name = 0x233c240 "dis_rep_vol-quota"}, mydata = 0x22ebd60, xid = 160030, programs = {next = 0x2344fd8, prev = 0x2344fd8}, reqpool = 0x233c1c0, saved_frames_pool = 0x2347520, ctx = 0x22ab010, refcount = 1, auth_null = 0, disabled = 1 '\001'} Note that 'trans' is seen as null above. Find attached the gdb backtrace. Version-Release number of selected component (if applicable): glusterfs-3.5qa2-0.369.git500a656.el6rhs.x86_64 How reproducible: Saw it once in my setup. Steps to Reproduce: 1. Enable quota on a distributed-replicate volume and set the limit. The volume is mounted on a client and data written to cross the soft limit set on a particular directory. 2. Quota is disabled after a while. 3. Enabled quota again after a while. Actual results: Brick processes crashed. Expected results: Brick processes should not crash. Additional info: