Bug 1104915 - glusterfsd crashes while doing stress tests
Summary: glusterfsd crashes while doing stress tests
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: locks
Version: mainline
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Krutika Dhananjay
QA Contact:
URL:
Whiteboard:
Depends On: 1097102
Blocks: glusterfs-3.5.1
TreeView+ depends on / blocked
 
Reported: 2014-06-05 03:13 UTC by Krutika Dhananjay
Modified: 2014-06-24 11:06 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.5.1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1097102
Environment:
Last Closed: 2014-06-24 11:06:42 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Krutika Dhananjay 2014-06-05 03:13:18 UTC
+++ This bug was initially created as a clone of Bug #1097102 +++

Description of problem:

glusterfsd core dumped while doing some intense IO on machines having 60 drives.

Backtrace:
 
(gdb) bt
#0  uuid_unpack (in=0x8 <Address 0x8 out of bounds>, uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44
#1  0x00007feeba9e19d6 in uuid_unparse_x (uu=<value optimized out>, out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9", 
    fmt=0x7feebaa08e00 "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at ../../contrib/uuid/unparse.c:55
#2  0x00007feeba9be837 in uuid_utoa (uuid=0x8 <Address 0x8 out of bounds>) at common-utils.c:2138
#3  0x00007feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:396
#4  pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at inodelk.c:428
#5  0x00007feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, client=<value optimized out>) at posix.c:2550
#6  0x00007feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at client_t.c:368
#7  0x00007feeab77ed48 in server_connection_cleanup (this=0x2316390, client=0x27724a0, flags=<value optimized out>)
    at server-helpers.c:354
#8  0x00007feeab77ae2c in server_rpc_notify (rpc=<value optimized out>, xl=0x2316390, event=<value optimized out>, data=0x2bf51c0)
    at server.c:527
#9  0x00007feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, trans=0x2bf51c0) at rpcsvc.c:720
#10 0x00007feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, mydata=<value optimized out>, event=<value optimized out>, data=0x2bf51c0)
    at rpcsvc.c:758
#11 0x00007feeba778638 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:512
#12 0x00007feeb115e971 in socket_event_poll_err (fd=<value optimized out>, idx=<value optimized out>, data=0x2bf51c0, 
    poll_in=<value optimized out>, poll_out=0, poll_err=0) at socket.c:1071
#13 socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x2bf51c0, poll_in=<value optimized out>, poll_out=0, 
    poll_err=0) at socket.c:2240
#14 0x00007feeba9fc6a7 in event_dispatch_epoll_handler (event_pool=0x22e2d00) at event-epoll.c:384
#15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445
#16 0x0000000000407e93 in main (argc=19, argv=0x7fffea6c7f88) at glusterfsd.c:2023

Version-Release number of selected component (if applicable):
glusterfs 3.6.0 built on May 10 2014 13:57:11

How reproducible:
Intermittent.

Steps to Reproduce:
Create a 6x2 machine and run some intense IO on the machine.

Attached brick log.

--- Additional comment from Nagaprasad Sathyanarayana on 2014-05-14 01:29:58 EDT ---

Can you please provide some details about the test performed.

--- Additional comment from Sachidananda Urs on 2014-05-14 02:17:06 EDT ---

1. Compilebench. Which compiles the kernel.
2. fsstress - as part of ltp
3. small file creation/deletion
4. rsync huge amount of data as part of archival tests.

Comment 1 Anand Avati 2014-06-05 04:39:33 UTC
REVIEW: http://review.gluster.org/7981 (features/locks: Clean up logging of cleanup in DISCONNECT codepath) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

Comment 2 Anand Avati 2014-06-08 13:27:25 UTC
REVIEW: http://review.gluster.org/7981 (features/locks: Clean up logging of cleanup in DISCONNECT codepath) posted (#2) for review on master by Krutika Dhananjay (kdhananj)

Comment 3 Anand Avati 2014-06-08 13:36:58 UTC
REVIEW: http://review.gluster.org/7981 (features/locks: Clean up logging of cleanup in DISCONNECT codepath) posted (#3) for review on master by Krutika Dhananjay (kdhananj)

Comment 4 Anand Avati 2014-06-09 07:15:33 UTC
REVIEW: http://review.gluster.org/7981 (features/locks: Clean up logging of cleanup in DISCONNECT codepath) posted (#4) for review on master by Krutika Dhananjay (kdhananj)

Comment 5 Anand Avati 2014-06-09 08:10:12 UTC
REVIEW: http://review.gluster.org/7981 (features/locks: Clean up logging of cleanup in DISCONNECT codepath) posted (#5) for review on master by Krutika Dhananjay (kdhananj)

Comment 6 Anand Avati 2014-06-09 11:37:34 UTC
REVIEW: http://review.gluster.org/7981 (features/locks: Clean up logging of cleanup in DISCONNECT codepath) posted (#6) for review on master by Krutika Dhananjay (kdhananj)

Comment 7 Anand Avati 2014-06-12 01:43:09 UTC
COMMIT: http://review.gluster.org/7981 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit b9856eca80e2f820c88f60fdc6cb1427905671af
Author: Krutika Dhananjay <kdhananj>
Date:   Thu Jun 5 09:22:34 2014 +0530

    features/locks: Clean up logging of cleanup in DISCONNECT codepath
    
    Now, gfid is printed as opposed to path in cleanup messages.
    
    Also, refkeeper update is eliminated in inodelk and entrylk.
    Instead, the patch ensures inode and pl_inode are kept alive as
    long as there is atleast one lock (granted/blocked) on an inode.
    
    Also, every inode is unref'd appropriately on a DISCONNECT from the
    lock-owning client.
    
    Change-Id: I531b1a02fe1b889fdd7f54b1fd522e78a18ed1df
    BUG: 1104915
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: http://review.gluster.org/7981
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>

Comment 8 Anand Avati 2014-06-12 06:46:08 UTC
REVIEW: http://review.gluster.org/8042 (features/locks: Clean up logging of cleanup in DISCONNECT codepath) posted (#1) for review on release-3.5 by Krutika Dhananjay (kdhananj)

Comment 9 Anand Avati 2014-06-23 09:38:55 UTC
COMMIT: http://review.gluster.org/8042 committed in release-3.5 by Niels de Vos (ndevos) 
------
commit 5888a89fa8950be38ed3c5b000a37013f6656031
Author: Krutika Dhananjay <kdhananj>
Date:   Thu Jun 5 09:22:34 2014 +0530

    features/locks: Clean up logging of cleanup in DISCONNECT codepath
    
            Backport of http://review.gluster.org/7981
    
    Now, gfid is printed as opposed to path in cleanup messages.
    
    Also, refkeeper update is eliminated in inodelk and entrylk.
    Instead, the patch ensures inode and pl_inode are kept alive as
    long as there is atleast one lock (granted/blocked) on an inode.
    
    Also, every inode is unref'd appropriately on a DISCONNECT from the
    lock-owning client.
    
    Change-Id: I234db688ad0d314f4936a16cc5af70a3bd071970
    BUG: 1104915
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: http://review.gluster.org/8042
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Reviewed-by: Niels de Vos <ndevos>

Comment 10 Niels de Vos 2014-06-24 11:06:42 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.1, please reopen this bug report.

glusterfs-3.5.1 has been announced on the Gluster Users mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-June/040723.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.