Description of problem: glusterfs process crashed due to invalid reads. Valgrind log:- ------------- ==2196== Invalid read of size 8 ==2196== at 0x4C571D1: fd_ctx_dump (fd.c:1051) ==2196== by 0x4C41E69: inode_dump (inode.c:1614) ==2196== by 0x4C42236: inode_table_dump (inode.c:1668) ==2196== by 0x4C61456: gf_proc_dump_xlator_info (statedump.c:408) ==2196== by 0x4C61EBF: gf_proc_dump_info (statedump.c:668) ==2196== by 0x407972: glusterfs_sigwaiter (glusterfsd.c:1390) ==2196== by 0x3A89A077F0: start_thread (in /lib64/libpthread-2.12.so) ==2196== by 0x736B6FF: ??? ==2196== Address 0xda60c60 is 8 bytes after a block of size 296 alloc'd ==2196== at 0x4A04A28: calloc (vg_replace_malloc.c:467) ==2196== by 0x4C5A7A8: __gf_calloc (mem-pool.c:150) ==2196== by 0x4C56393: __fd_create (fd.c:603) ==2196== by 0x4C5647E: fd_create (fd.c:634) ==2196== by 0x6753AF8: fuse_create_resume (fuse-bridge.c:1766) ==2196== by 0x6749153: fuse_resolve_done (fuse-resolve.c:467) ==2196== by 0x6749229: fuse_resolve_all (fuse-resolve.c:496) ==2196== by 0x674911C: fuse_resolve (fuse-resolve.c:453) ==2196== by 0x6749200: fuse_resolve_all (fuse-resolve.c:492) ==2196== by 0x67492A3: fuse_resolve_continue (fuse-resolve.c:512) ==2196== by 0x6748C9E: fuse_resolve_parent (fuse-resolve.c:282) ==2196== by 0x67490EC: fuse_resolve (fuse-resolve.c:446) Version-Release number of selected component (if applicable): ------------------------------------------------------------- 3.3.0qa40 How reproducible: ------------------ often Steps to Reproduce: ---------------------- 1.create a distribute-replicate volume (2 x 3. Available space: 200GB) 2.Create 2 gluster mounts and one nfs mount 3.Start dd on one of the gluster mount and nfs mount 4.start ping_pong on a file on other gluster mount 5.bounce bricks: 2 bricks from each replicate pair 6.Enable quota and set the quota limit-usage is set to 150GB on the volume. 7.add-brick to the volume 8.start rebalance 9.stop rebalance 10.bounce bricks: one brick from each replicate pair. repeat step 8 to 10 2-3 times. glusterfs process crashed Actual results: ----------------- /root/create_dir_files.sh: line 20: cd: /mnt/gfsc1/fuse1.5: Transport endpoint is not connected mkdir: cannot create directory `dir.5': Transport endpoint is not connected /root/create_dir_files.sh: line 14: cd: dir.5: Transport endpoint is not connected dd: opening `file.1': Transport endpoint is not connected dd: opening `file.2': Transport endpoint is not connected dd: opening `file.3': Transport endpoint is not connected dd: opening `file.4': Transport endpoint is not connected dd: opening `file.5': Transport endpoint is not connected Expected results: ----------------- glusterfs process should not crash Additional info: ------------------ [05/10/12 - 22:43:12 root@QA-19 scripts]# gluster v i Volume Name: vol Type: Distributed-Replicate Volume ID: 44a636c0-c661-45ca-a959-557b56664c98 Status: Started Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 172.17.251.58:/export/brick0 Brick2: 172.17.251.59:/export/brick1 Brick3: 172.17.251.60:/export/brick2 Brick4: 172.17.251.58:/export/brick3 Brick5: 172.17.251.59:/export/brick4 Brick6: 172.17.251.60:/export/brick5 After Add-brick operation:- --------------------------- [05/10/12 - 23:51:36 root@QA-19 scripts]# gluster volume info Volume Name: vol Type: Distributed-Replicate Volume ID: 44a636c0-c661-45ca-a959-557b56664c98 Status: Started Number of Bricks: 3 x 3 = 9 Transport-type: tcp Bricks: Brick1: 172.17.251.58:/export/brick0 Brick2: 172.17.251.59:/export/brick1 Brick3: 172.17.251.60:/export/brick2 Brick4: 172.17.251.58:/export/brick3 Brick5: 172.17.251.59:/export/brick4 Brick6: 172.17.251.60:/export/brick5 Brick7: 172.17.251.58:/export/brick6 Brick8: 172.17.251.59:/export/brick7 Brick9: 172.17.251.59:/export/brick8 Options Reconfigured: features.limit-usage: /:150GB features.quota: on
Created attachment 583811 [details] Mount log file
Created attachment 583812 [details] valgrind logs
Created attachment 583813 [details] Backtrace of core
CHANGE: http://review.gluster.com/3335 (libglusterfs/fd: while dumping the fd_ctx use fd->xl_count) merged in master by Vijay Bellur (vijay)
CHANGE: http://review.gluster.com/3369 (libglusterfs/fd: while dumping the fd_ctx use fd->xl_count) merged in release-3.3 by Vijay Bellur (vijay)
Bug fixed . Verified on 3.3.0qa42 Steps to verify:- ---------------- 1.create a distribute-replicate volume (2 x 3.) 2.create a fuse mount. 3.run "open-fd-test <filename>" on the fuse mount. (source available in glusterfs/extras/test/open-fd-tests.c. open-fd-test waits for user input.Hence ,input a string.input string is written to <filename>) 4.perform graph change 5.take statedump of the mount process. If there is a crash, then the bug still exists.