Bug 820887 - glusterfs process crashed
Summary: glusterfs process crashed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Raghavendra Bhat
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-05-11 10:08 UTC by Shwetha Panduranga
Modified: 2015-12-01 16:45 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:11:09 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Mount log file (92.29 KB, text/x-log)
2012-05-11 11:37 UTC, Shwetha Panduranga
no flags Details
valgrind logs (32.00 KB, application/octet-stream)
2012-05-11 11:37 UTC, Shwetha Panduranga
no flags Details
Backtrace of core (14.46 KB, application/octet-stream)
2012-05-11 11:38 UTC, Shwetha Panduranga
no flags Details

Description Shwetha Panduranga 2012-05-11 10:08:15 UTC
Description of problem:
glusterfs process crashed due to invalid reads. 

Valgrind log:-
-------------
==2196== Invalid read of size 8
==2196==    at 0x4C571D1: fd_ctx_dump (fd.c:1051)
==2196==    by 0x4C41E69: inode_dump (inode.c:1614)
==2196==    by 0x4C42236: inode_table_dump (inode.c:1668)
==2196==    by 0x4C61456: gf_proc_dump_xlator_info (statedump.c:408)
==2196==    by 0x4C61EBF: gf_proc_dump_info (statedump.c:668)
==2196==    by 0x407972: glusterfs_sigwaiter (glusterfsd.c:1390)
==2196==    by 0x3A89A077F0: start_thread (in /lib64/libpthread-2.12.so)
==2196==    by 0x736B6FF: ???
==2196==  Address 0xda60c60 is 8 bytes after a block of size 296 alloc'd
==2196==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==2196==    by 0x4C5A7A8: __gf_calloc (mem-pool.c:150)
==2196==    by 0x4C56393: __fd_create (fd.c:603)
==2196==    by 0x4C5647E: fd_create (fd.c:634)
==2196==    by 0x6753AF8: fuse_create_resume (fuse-bridge.c:1766)
==2196==    by 0x6749153: fuse_resolve_done (fuse-resolve.c:467)
==2196==    by 0x6749229: fuse_resolve_all (fuse-resolve.c:496)
==2196==    by 0x674911C: fuse_resolve (fuse-resolve.c:453)
==2196==    by 0x6749200: fuse_resolve_all (fuse-resolve.c:492)
==2196==    by 0x67492A3: fuse_resolve_continue (fuse-resolve.c:512)
==2196==    by 0x6748C9E: fuse_resolve_parent (fuse-resolve.c:282)
==2196==    by 0x67490EC: fuse_resolve (fuse-resolve.c:446)


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
3.3.0qa40

How reproducible:
------------------
often

Steps to Reproduce:
----------------------
1.create a distribute-replicate volume (2 x 3. Available space: 200GB)
2.Create 2 gluster mounts and one nfs mount
3.Start dd on one of the gluster mount and nfs mount
4.start ping_pong on a file on other gluster mount
5.bounce bricks: 2 bricks from each replicate pair
6.Enable quota and set the quota limit-usage is set to 150GB on the volume.   
7.add-brick to the volume
8.start rebalance
9.stop rebalance
10.bounce bricks: one brick from each replicate pair.

repeat step 8 to 10 2-3 times.

glusterfs process crashed
  
Actual results:
-----------------
/root/create_dir_files.sh: line 20: cd: /mnt/gfsc1/fuse1.5: Transport endpoint is not connected
mkdir: cannot create directory `dir.5': Transport endpoint is not connected
/root/create_dir_files.sh: line 14: cd: dir.5: Transport endpoint is not connected
dd: opening `file.1': Transport endpoint is not connected
dd: opening `file.2': Transport endpoint is not connected
dd: opening `file.3': Transport endpoint is not connected
dd: opening `file.4': Transport endpoint is not connected
dd: opening `file.5': Transport endpoint is not connected

Expected results:
-----------------
glusterfs process should not crash

Additional info:
------------------

[05/10/12 - 22:43:12 root@QA-19 scripts]# gluster v i

Volume Name: vol
Type: Distributed-Replicate
Volume ID: 44a636c0-c661-45ca-a959-557b56664c98
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 172.17.251.58:/export/brick0
Brick2: 172.17.251.59:/export/brick1
Brick3: 172.17.251.60:/export/brick2
Brick4: 172.17.251.58:/export/brick3
Brick5: 172.17.251.59:/export/brick4
Brick6: 172.17.251.60:/export/brick5

After Add-brick operation:-
---------------------------

[05/10/12 - 23:51:36 root@QA-19 scripts]# gluster volume info
 
Volume Name: vol
Type: Distributed-Replicate
Volume ID: 44a636c0-c661-45ca-a959-557b56664c98
Status: Started
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: 172.17.251.58:/export/brick0
Brick2: 172.17.251.59:/export/brick1
Brick3: 172.17.251.60:/export/brick2
Brick4: 172.17.251.58:/export/brick3
Brick5: 172.17.251.59:/export/brick4
Brick6: 172.17.251.60:/export/brick5
Brick7: 172.17.251.58:/export/brick6
Brick8: 172.17.251.59:/export/brick7
Brick9: 172.17.251.59:/export/brick8
Options Reconfigured:
features.limit-usage: /:150GB
features.quota: on

Comment 1 Shwetha Panduranga 2012-05-11 11:37:18 UTC
Created attachment 583811 [details]
Mount log file

Comment 2 Shwetha Panduranga 2012-05-11 11:37:59 UTC
Created attachment 583812 [details]
valgrind logs

Comment 3 Shwetha Panduranga 2012-05-11 11:38:24 UTC
Created attachment 583813 [details]
Backtrace of core

Comment 4 Anand Avati 2012-05-17 10:49:05 UTC
CHANGE: http://review.gluster.com/3335 (libglusterfs/fd: while dumping the fd_ctx use fd->xl_count) merged in master by Vijay Bellur (vijay)

Comment 5 Anand Avati 2012-05-18 13:02:35 UTC
CHANGE: http://review.gluster.com/3369 (libglusterfs/fd: while dumping the fd_ctx use fd->xl_count) merged in release-3.3 by Vijay Bellur (vijay)

Comment 6 Shwetha Panduranga 2012-05-21 11:10:06 UTC
Bug fixed . Verified on 3.3.0qa42

Steps to verify:-
----------------
1.create a distribute-replicate volume (2 x 3.)

2.create a fuse mount. 

3.run "open-fd-test <filename>" on the fuse mount. (source available in glusterfs/extras/test/open-fd-tests.c. open-fd-test waits for user input.Hence ,input a string.input string is written to <filename>)

4.perform graph change

5.take statedump of the mount process.

If there is a crash, then the bug still exists.


Note You need to log in before you can comment on or make changes to this bug.