Bug 820887 - glusterfs process crashed
glusterfs process crashed
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: core (Show other bugs)
mainline
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Raghavendra Bhat
:
Depends On:
Blocks: 817967
  Show dependency treegraph
 
Reported: 2012-05-11 06:08 EDT by Shwetha Panduranga
Modified: 2015-12-01 11:45 EST (History)
2 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-24 13:11:09 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Mount log file (92.29 KB, text/x-log)
2012-05-11 07:37 EDT, Shwetha Panduranga
no flags Details
valgrind logs (32.00 KB, application/octet-stream)
2012-05-11 07:37 EDT, Shwetha Panduranga
no flags Details
Backtrace of core (14.46 KB, application/octet-stream)
2012-05-11 07:38 EDT, Shwetha Panduranga
no flags Details

  None (edit)
Description Shwetha Panduranga 2012-05-11 06:08:15 EDT
Description of problem:
glusterfs process crashed due to invalid reads. 

Valgrind log:-
-------------
==2196== Invalid read of size 8
==2196==    at 0x4C571D1: fd_ctx_dump (fd.c:1051)
==2196==    by 0x4C41E69: inode_dump (inode.c:1614)
==2196==    by 0x4C42236: inode_table_dump (inode.c:1668)
==2196==    by 0x4C61456: gf_proc_dump_xlator_info (statedump.c:408)
==2196==    by 0x4C61EBF: gf_proc_dump_info (statedump.c:668)
==2196==    by 0x407972: glusterfs_sigwaiter (glusterfsd.c:1390)
==2196==    by 0x3A89A077F0: start_thread (in /lib64/libpthread-2.12.so)
==2196==    by 0x736B6FF: ???
==2196==  Address 0xda60c60 is 8 bytes after a block of size 296 alloc'd
==2196==    at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==2196==    by 0x4C5A7A8: __gf_calloc (mem-pool.c:150)
==2196==    by 0x4C56393: __fd_create (fd.c:603)
==2196==    by 0x4C5647E: fd_create (fd.c:634)
==2196==    by 0x6753AF8: fuse_create_resume (fuse-bridge.c:1766)
==2196==    by 0x6749153: fuse_resolve_done (fuse-resolve.c:467)
==2196==    by 0x6749229: fuse_resolve_all (fuse-resolve.c:496)
==2196==    by 0x674911C: fuse_resolve (fuse-resolve.c:453)
==2196==    by 0x6749200: fuse_resolve_all (fuse-resolve.c:492)
==2196==    by 0x67492A3: fuse_resolve_continue (fuse-resolve.c:512)
==2196==    by 0x6748C9E: fuse_resolve_parent (fuse-resolve.c:282)
==2196==    by 0x67490EC: fuse_resolve (fuse-resolve.c:446)


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
3.3.0qa40

How reproducible:
------------------
often

Steps to Reproduce:
----------------------
1.create a distribute-replicate volume (2 x 3. Available space: 200GB)
2.Create 2 gluster mounts and one nfs mount
3.Start dd on one of the gluster mount and nfs mount
4.start ping_pong on a file on other gluster mount
5.bounce bricks: 2 bricks from each replicate pair
6.Enable quota and set the quota limit-usage is set to 150GB on the volume.   
7.add-brick to the volume
8.start rebalance
9.stop rebalance
10.bounce bricks: one brick from each replicate pair.

repeat step 8 to 10 2-3 times.

glusterfs process crashed
  
Actual results:
-----------------
/root/create_dir_files.sh: line 20: cd: /mnt/gfsc1/fuse1.5: Transport endpoint is not connected
mkdir: cannot create directory `dir.5': Transport endpoint is not connected
/root/create_dir_files.sh: line 14: cd: dir.5: Transport endpoint is not connected
dd: opening `file.1': Transport endpoint is not connected
dd: opening `file.2': Transport endpoint is not connected
dd: opening `file.3': Transport endpoint is not connected
dd: opening `file.4': Transport endpoint is not connected
dd: opening `file.5': Transport endpoint is not connected

Expected results:
-----------------
glusterfs process should not crash

Additional info:
------------------

[05/10/12 - 22:43:12 root@QA-19 scripts]# gluster v i

Volume Name: vol
Type: Distributed-Replicate
Volume ID: 44a636c0-c661-45ca-a959-557b56664c98
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 172.17.251.58:/export/brick0
Brick2: 172.17.251.59:/export/brick1
Brick3: 172.17.251.60:/export/brick2
Brick4: 172.17.251.58:/export/brick3
Brick5: 172.17.251.59:/export/brick4
Brick6: 172.17.251.60:/export/brick5

After Add-brick operation:-
---------------------------

[05/10/12 - 23:51:36 root@QA-19 scripts]# gluster volume info
 
Volume Name: vol
Type: Distributed-Replicate
Volume ID: 44a636c0-c661-45ca-a959-557b56664c98
Status: Started
Number of Bricks: 3 x 3 = 9
Transport-type: tcp
Bricks:
Brick1: 172.17.251.58:/export/brick0
Brick2: 172.17.251.59:/export/brick1
Brick3: 172.17.251.60:/export/brick2
Brick4: 172.17.251.58:/export/brick3
Brick5: 172.17.251.59:/export/brick4
Brick6: 172.17.251.60:/export/brick5
Brick7: 172.17.251.58:/export/brick6
Brick8: 172.17.251.59:/export/brick7
Brick9: 172.17.251.59:/export/brick8
Options Reconfigured:
features.limit-usage: /:150GB
features.quota: on
Comment 1 Shwetha Panduranga 2012-05-11 07:37:18 EDT
Created attachment 583811 [details]
Mount log file
Comment 2 Shwetha Panduranga 2012-05-11 07:37:59 EDT
Created attachment 583812 [details]
valgrind logs
Comment 3 Shwetha Panduranga 2012-05-11 07:38:24 EDT
Created attachment 583813 [details]
Backtrace of core
Comment 4 Anand Avati 2012-05-17 06:49:05 EDT
CHANGE: http://review.gluster.com/3335 (libglusterfs/fd: while dumping the fd_ctx use fd->xl_count) merged in master by Vijay Bellur (vijay@gluster.com)
Comment 5 Anand Avati 2012-05-18 09:02:35 EDT
CHANGE: http://review.gluster.com/3369 (libglusterfs/fd: while dumping the fd_ctx use fd->xl_count) merged in release-3.3 by Vijay Bellur (vijay@gluster.com)
Comment 6 Shwetha Panduranga 2012-05-21 07:10:06 EDT
Bug fixed . Verified on 3.3.0qa42

Steps to verify:-
----------------
1.create a distribute-replicate volume (2 x 3.)

2.create a fuse mount. 

3.run "open-fd-test <filename>" on the fuse mount. (source available in glusterfs/extras/test/open-fd-tests.c. open-fd-test waits for user input.Hence ,input a string.input string is written to <filename>)

4.perform graph change

5.take statedump of the mount process.

If there is a crash, then the bug still exists.

Note You need to log in before you can comment on or make changes to this bug.