Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1049278 - [SNAPSHOT]: glusterd crashed while performing IO and taking snapshot at the same time
Summary: [SNAPSHOT]: glusterd crashed while performing IO and taking snapshot at the s...
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: snapshot
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: RHGS 3.0.0
Assignee: Avra Sengupta
QA Contact: senaik
Whiteboard: SNAPSHOT
Depends On: 1048831
TreeView+ depends on / blocked
Reported: 2014-01-07 10:29 UTC by Rahul Hinduja
Modified: 2016-09-17 12:55 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.4.1.snap.feb05.2014
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2014-09-22 19:31:33 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1278 0 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description Rahul Hinduja 2014-01-07 10:29:36 UTC
Description of problem:

While IO was in progress from FUSE and NFS mount tried to create multiple snaps of a given volume. Observed a glusterd crash with bt as follows:

(gdb) bt
#0  gf_store_mkstemp (shandle=0x64656b7361) at store.c:66
#1  0x00007fc9085be0c2 in glusterd_store_perform_snap_volume_store (volinfo=0xeb4fa0, snap_volinfo=0x7fc8f0195aa0) at glusterd-store.c:1371
#2  0x00007fc9085be17f in glusterd_store_snap_volume (volinfo=0xeb4fa0, snap=0x7fc8f019ab40) at glusterd-store.c:1422
#3  0x00007fc9085be453 in glusterd_store_perform_snap_store (volinfo=0xeb4fa0) at glusterd-store.c:1523
#4  0x00007fc9085fdce8 in glusterd_do_snap (volinfo=0xeb4fa0, snapname=0x7fc8fc124570 "snap33", dict=0x7fc90a8399f8, cg=<value optimized out>, cg_id=0x0, volcount=1, 
    snap_volid=0x7fc8fc1228a0 "!\267\017*P\323D\254\274\234\247\304\365r-snaps-E", cg_name=0x0) at glusterd-snapshot.c:3171
#5  0x00007fc9085ff666 in glusterd_snapshot_create_commit (dict=<value optimized out>, op_errstr=0x110a080, rsp_dict=<value optimized out>) at glusterd-snapshot.c:4055
#6  0x00007fc908600873 in glusterd_snapshot (dict=0x7fc90a8399f8, op_errstr=0x110a080, rsp_dict=0x7fc90a839a84) at glusterd-snapshot.c:4356
#7  0x00007fc908604f3e in gd_mgmt_v3_commit_fn (op=GD_OP_SNAP, dict=0x7fc90a8399f8, op_errstr=0x110a080, rsp_dict=0x7fc90a839a84) at glusterd-mgmt.c:174
#8  0x00007fc9086021c3 in glusterd_handle_commit_fn (req=0x7fc9084ee02c) at glusterd-mgmt-handler.c:546
#9  0x00007fc9085773cf in glusterd_big_locked_handler (req=0x7fc9084ee02c, actor_fn=0x7fc908601f80 <glusterd_handle_commit_fn>) at glusterd-handler.c:78
#10 0x0000003f09c4cdd2 in synctask_wrap (old_task=<value optimized out>) at syncop.c:293
#11 0x0000003213043bf0 in ?? () from /lib64/libc.so.6
#12 0x0000000000000000 in ?? ()

Version-Release number of selected component (if applicable):


How reproducible:

Steps to Reproduce:
1. Create a cluster of 4 servers (server1-4)
2. Create a volume (vol0)
3. Mount the volume on client (Fuse and NFS)
4. Start creating snapshot of volume from server1 (Used for loop to create 100 snapshots)
5. Start creating IO from the mount point 
6. After the successful creation of 30 snapshot, the snapshot creation failed and glusterd crashed on server2

Note: For creation of IO used arequal script:
./run.sh -w /mnt/vol0 -t arequal -l /mnt/logs-vol0/arequal.log

Actual results:

glusterd crashed with logs as follows:

frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-01-07 02:59:21configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.snap.dec30.2013git

Expected results:

glusterd should not crash and snap create should be successful

Comment 3 Rahul Hinduja 2014-01-21 12:05:31 UTC
Issue is reproducible with build: glusterfs-3.4.1.snap.jan15.2014git-1.el6.x86_64

IO pattern: compile_kernel


#0  gf_store_mkstemp (shandle=0x2d74736f68250000) at store.c:66
#1  0x00007fcaae138e82 in glusterd_store_perform_snap_volume_store (volinfo=0x7fca98001720, snap_volinfo=0x7fcaa84581d0) at glusterd-store.c:1379
#2  0x00007fcaae138f3f in glusterd_store_snap_volume (volinfo=0x7fca98001720, snap=0x7fcaa845afa0) at glusterd-store.c:1430
#3  0x00007fcaae139213 in glusterd_store_perform_snap_store (volinfo=0x7fca98001720) at glusterd-store.c:1534
#4  0x00007fcaae17c680 in glusterd_do_snap (volinfo=0x7fca98001720, snapname=0x7fcaa843fb80 "s45", dict=0x7fcab03c5638, cg=0x0, cg_id=0x0, volcount=1, 
    snap_volid=0x7fca985d7640 "\227\250Kں\355G\215\217V\210\336z\300\245(", cg_name=0x0) at glusterd-snapshot.c:3114
#5  0x00007fcaae17d1ac in glusterd_snapshot_create_commit (dict=<value optimized out>, op_errstr=0x24d7698, rsp_dict=<value optimized out>) at glusterd-snapshot.c:4026
#6  0x00007fcaae17d5c3 in glusterd_snapshot (dict=0x7fcab03c5638, op_errstr=0x24d7698, rsp_dict=0x7fcab03c7eb0) at glusterd-snapshot.c:4404
#7  0x00007fcaae18143e in gd_mgmt_v3_commit_fn (op=GD_OP_SNAP, dict=0x7fcab03c5638, op_errstr=0x24d7698, rsp_dict=0x7fcab03c7eb0) at glusterd-mgmt.c:174
#8  0x00007fcaae181f97 in glusterd_mgmt_v3_commit (conf=0x20a7890, op=GD_OP_SNAP, op_ctx=0x7fcab03c5034, req_dict=0x7fcab03c5638, op_errstr=0x24d7698, npeers=3)
    at glusterd-mgmt.c:957
#9  0x00007fcaae1845ec in glusterd_mgmt_v3_initiate_snap_phases (req=0x209ae5c, op=GD_OP_SNAP, dict=0x7fcab03c5034) at glusterd-mgmt.c:1578
#10 0x00007fcaae17baab in glusterd_handle_snapshot_fn (req=0x209ae5c) at glusterd-snapshot.c:4656
#11 0x00007fcaae0f148f in glusterd_big_locked_handler (req=0x209ae5c, actor_fn=0x7fcaae17b580 <glusterd_handle_snapshot_fn>) at glusterd-handler.c:78
#12 0x00000033d1c4ce52 in synctask_wrap (old_task=<value optimized out>) at syncop.c:293
#13 0x0000003213043bf0 in ?? () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()

Moving back the bug to assigned state.

Comment 4 senaik 2014-02-14 12:15:46 UTC
Faced brick crash while verifying this bug which is tracked by bz 1048831. Marking this bug as dependant of bz 1048831

bt :

(gdb) bt
#0  0x0000003e33032925 in raise () from /lib64/libc.so.6
#1  0x0000003e33034105 in abort () from /lib64/libc.so.6
#2  0x0000003e33070837 in __libc_message () from /lib64/libc.so.6
#3  0x0000003e33076166 in malloc_printerr () from /lib64/libc.so.6
#4  0x0000003e33078c93 in _int_free () from /lib64/libc.so.6
#5  0x00007fd2d06f603a in ltable_delete_locks (ltable=0x7fd2b0000ee0) at posix.c:2559
#6  0x00007fd2d06f6466 in disconnect_cbk (this=<value optimized out>, client=<value optimized out>) at posix.c:2619
#7  0x0000003555a63d9d in gf_client_disconnect (client=0x1cb7b50) at client_t.c:374
#8  0x00007fd2cbbbf608 in server_connection_cleanup (this=0x1c72570, client=0x1cb7b50, flags=<value optimized out>)
    at server-helpers.c:244
#9  0x00007fd2cbbbae0c in server_rpc_notify (rpc=<value optimized out>, xl=0x1c72570, event=<value optimized out>, 
    data=0x1cb6d50) at server.c:558
#10 0x0000003555e07cc5 in rpcsvc_handle_disconnect (svc=0x1c74490, trans=0x1cb6d50) at rpcsvc.c:682
#11 0x0000003555e09800 in rpcsvc_notify (trans=0x1cb6d50, mydata=<value optimized out>, 
    event=<value optimized out>, data=0x1cb6d50) at rpcsvc.c:720
#12 0x0000003555e0af18 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, 
    data=<value optimized out>) at rpc-transport.c:512
#13 0x00007fd2d1d72761 in socket_event_poll_err (fd=<value optimized out>, idx=<value optimized out>, 
    data=0x1cb6d50, poll_in=<value optimized out>, poll_out=0, poll_err=24) at socket.c:1071
#14 socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x1cb6d50, 
    poll_in=<value optimized out>, poll_out=0, poll_err=24) at socket.c:2239
#15 0x0000003555a66107 in event_dispatch_epoll_handler (event_pool=0x1c44ee0) at event-epoll.c:384
#16 event_dispatch_epoll (event_pool=0x1c44ee0) at event-epoll.c:445
#17 0x000000000040680a in main (argc=19, argv=0x7ffff54e6288) at glusterfsd.c:1964

Comment 6 Avra Sengupta 2014-03-24 11:37:00 UTC
Fixed with http://review.gluster.org/#/c/6903/

Comment 7 Nagaprasad Sathyanarayana 2014-04-21 06:17:47 UTC
Marking snapshot BZs to RHS 3.0.

Comment 8 Nagaprasad Sathyanarayana 2014-05-19 10:56:31 UTC
Setting flags required to add BZs to RHS 3.0 Errata

Comment 10 Rahul Hinduja 2014-06-05 08:40:00 UTC
Verified with build: glusterfs-

No crash observed while taking snapshot of a volume when arequal was in progress from fuse and nfs client.

Moving the bug to verified state.

Comment 12 errata-xmlrpc 2014-09-22 19:31:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.