Bug 765225 (GLUSTER-3493) - [glusterfs-3.2.3]: glusterfs server crashed
Summary: [glusterfs-3.2.3]: glusterfs server crashed
Keywords:
Status: CLOSED WORKSFORME
Alias: GLUSTER-3493
Product: GlusterFS
Classification: Community
Component: quota
Version: 3.2.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-30 05:33 UTC by Raghavendra Bhat
Modified: 2012-02-01 05:17 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-01 05:17:58 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Raghavendra Bhat 2011-08-30 05:33:41 UTC
glusterfs server crashes while running some tests.

Setup:

Replicate setup with replica count 2
1 fuse client and 1 nfs client
quota and profile enabled with quota set on a directory

Both fuse client and nfs client were running sanity scripts inside the directory where quota limit is set.

Brought one of the servers down, slept for some time and brought it up. On the other server volume set operations were running in a loop. Both the clients were running find <mount_point> | xargs stat to trigger self-heal whenever the server wakes up.

After the server waking up, both the servers had died.

This is the backtrace of the core:

Core was generated by `/usr/local/sbin/glusterfsd --xlator-option mirror-server.listen-port=24010 -s l'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000030b8e7288e in free () from /lib64/libc.so.6
(gdb) bt
#0  0x00000030b8e7288e in free () from /lib64/libc.so.6
#1  0x00002acd2cd6cac2 in dict_destroy (this=0x2aaab000d080) at ../../../libglusterfs/src/dict.c:404
#2  0x00002acd2cd6cb81 in dict_unref (this=0x2aaab000d080) at ../../../libglusterfs/src/dict.c:430
#3  0x00002acd2cda3345 in call_stub_destroy_wind (stub=0x2acd2e191c34) at ../../../libglusterfs/src/call-stub.c:3540
#4  0x00002acd2cda3857 in call_stub_destroy (stub=0x2acd2e191c34) at ../../../libglusterfs/src/call-stub.c:3832
#5  0x00002acd2cda3979 in call_resume (stub=0x2acd2e191c34) at ../../../libglusterfs/src/call-stub.c:3865
#6  0x00002aaaab883199 in iot_worker (data=0x136ea60) at ../../../../../xlators/performance/io-threads/src/io-threads.c:129
#7  0x00000030b960673d in start_thread () from /lib64/libpthread.so.0
#8  0x00000030b8ed44bd in clone () from /lib64/libc.so.6
(gdb) info thr
  11 Thread 6093  0x00000030b8ed48a8 in epoll_wait () from /lib64/libc.so.6
  10 Thread 6094  0x00000030b960e838 in do_sigwait () from /lib64/libpthread.so.0
  9 Thread 6095  0x00000030b8e9a541 in nanosleep () from /lib64/libc.so.6
  8 Thread 6102  0x00000030b960b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7 Thread 6103  0x00000030b960b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6 Thread 6104  0x00000030b960d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
  5 Thread 6105  0x00000030b960b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 6106  0x00000030b960b732 in ?? () from /lib64/libpthread.so.0
  3 Thread 6107  0x00000030b960b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 6153  0x00000030b960b737 in ?? () from /lib64/libpthread.so.0
* 1 Thread 8783  0x00000030b8e7288e in free () from /lib64/libc.so.6
(gdb)  t 2
[Switching to thread 2 (Thread 6153)]#0  0x00000030b960b737 in ?? () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00000030b960b737 in ?? () from /lib64/libpthread.so.0
#1  0x00002aaaabaada3f in quota_update_inode_contribution (frame=0x2acd2df58650, cookie=0x2acd2dcdc374, this=0x1368fe0, op_ret=0, 
    op_errno=22, inode=0x2aaaac3c0d54, buf=0x419cee30, dict=0x2aaab80077e0, postparent=0x419cedc0)
    at ../../../../../xlators/features/marker/src/marker-quota.c:1626
#2  0x00002aaaab88340f in iot_lookup_cbk (frame=0x2acd2dcdc374, cookie=0x2acd2dc8a418, this=0x1367f20, op_ret=0, op_errno=22, 
    inode=0x2aaaac3c0d54, buf=0x419cee30, xattr=0x2aaab80077e0, postparent=0x419cedc0)
    at ../../../../../xlators/performance/io-threads/src/io-threads.c:199
#3  0x00002aaaab6720cb in pl_lookup_cbk (frame=0x2acd2dc8a418, cookie=0x2acd2dcdb7ec, this=0x1366f20, op_ret=0, op_errno=22, 
    inode=0x2aaaac3c0d54, buf=0x419cee30, dict=0x2aaab80077e0, postparent=0x419cedc0)
    at ../../../../../xlators/features/locks/src/posix.c:1452
#4  0x00002aaaab45a2f9 in posix_acl_lookup_cbk (frame=0x2acd2dcdb7ec, cookie=0x2acd2dcc5460, this=0x1365de0, op_ret=0, op_errno=22, 
    inode=0x2aaaac3c0d54, buf=0x419cee30, xattr=0x2aaab80077e0, postparent=0x419cedc0)
    at ../../../../../xlators/system/posix-acl/src/posix-acl.c:708
#5  0x00002aaaab23d4a4 in posix_lookup (frame=0x2acd2dcc5460, this=0x1364c40, loc=0x2acd2e183278, xattr_req=0x2aaab8021dd0)
    at ../../../../../xlators/storage/posix/src/posix.c:616
#6  0x00002aaaab45a60c in posix_acl_lookup (frame=0x2acd2dcdb7ec, this=0x1365de0, loc=0x2acd2e183278, xattr=0x2aaab8021dd0)
    at ../../../../../xlators/system/posix-acl/src/posix-acl.c:753
#7  0x00002aaaab672575 in pl_lookup (frame=0x2acd2dc8a418, this=0x1366f20, loc=0x2acd2e183278, xattr_req=0x2aaab8021dd0)
    at ../../../../../xlators/features/locks/src/posix.c:1491
#8  0x00002aaaab883629 in iot_lookup_wrapper (frame=0x2acd2dcdc374, this=0x1367f20, loc=0x2acd2e183278, xattr_req=0x2aaab8021dd0)
    at ../../../../../xlators/performance/io-threads/src/io-threads.c:209
#9  0x00002acd2cd9d358 in call_resume_wind (stub=0x2acd2e183240) at ../../../libglusterfs/src/call-stub.c:2408
#10 0x00002acd2cda3954 in call_resume (stub=0x2acd2e183240) at ../../../libglusterfs/src/call-stub.c:3859
#11 0x00002aaaab883199 in iot_worker (data=0x136ea60) at ../../../../../xlators/performance/io-threads/src/io-threads.c:129
#12 0x00000030b960673d in start_thread () from /lib64/libpthread.so.0
#13 0x00000030b8ed44bd in clone () from /lib64/libc.so.6
(gdb)  f 1
#1  0x00002aaaabaada3f in quota_update_inode_contribution (frame=0x2acd2df58650, cookie=0x2acd2dcdc374, this=0x1368fe0, op_ret=0, 
    op_errno=22, inode=0x2aaaac3c0d54, buf=0x419cee30, dict=0x2aaab80077e0, postparent=0x419cedc0)
    at ../../../../../xlators/features/marker/src/marker-quota.c:1626
1626            LOCK (&contribution->lock);
(gdb) p *contribution
$1 = {contri_list = {next = 0x100000000, prev = 0x200000004}, contribution = 46912720142304, 
  gfid = "@\321\001\270\252*\000\000\000\000\000\000\000\000\000", lock = -1}
(gdb)  p contribution->contri_list
$3 = {next = 0x100000000, prev = 0xa}
(gdb) p contribution->contri_list->next
$4 = (struct list_head *) 0x100000000
(gdb) p *contribution->contri_list->next
Cannot access memory at address 0x100000000
(gdb)

Comment 1 Raghavendra G 2011-08-31 01:46:44 UTC
Hi Jhonny,

Is it on version 3.2.2? This seems to be a case of memory corruption. After 3.2.2 following patches aimed at fixing corruptions have went in:
e559ea5f8056
276142d543f61296
0564d1198bd7fa9

Can you take a release which has these fixes and rerun the tests?

regards,
Raghavendra.

Comment 2 Raghavendra Bhat 2011-08-31 01:53:57 UTC
It was found in 3.2.3 only. Since bugzilla did not have 3.2.3 field in the release section I marked it 3.2.2. You can see that subject contains 3.2.3.

Comment 3 Anand Avati 2011-09-19 05:22:30 UTC
CHANGE: http://review.gluster.com/390 (Change-Id: I060e62c1fbb288179063a6d64d73bad1a6572661) merged in master by Vijay Bellur (vijay)

Comment 4 Anand Avati 2011-09-20 05:41:03 UTC
CHANGE: http://review.gluster.com/389 (Change-Id: Idb31e845bc876f46b476d8fa769d67d8db89e4a1) merged in release-3.2 by Vijay Bellur (vijay)

Comment 5 Amar Tumballi 2011-09-28 04:23:46 UTC
There is a patch in, but need to still solve the actual root cause.

Comment 6 Raghavendra G 2011-10-03 02:43:02 UTC
Hi Raghu,

Most likely this is a duplicate of bug #765356. Can you check whether http://review.gluster.com/#patch,sidebyside,538,1,xlators/features/marker/src/marker-quota.c fixes the issue? or you can just use latest release-3.2.

regards,
Raghavendra.

Comment 7 Amar Tumballi 2012-02-01 05:17:58 UTC
with later releases this works fine. Please file a new bug if happens again.


Note You need to log in before you can comment on or make changes to this bug.