Bug 1387214

Summary: SMB[md-cache]: while running dd from cifs mount to create a large file and doing multiple graph switch causes I/O error and fill in logs
Product: Red Hat Gluster Storage Reporter: surabhi <sbhaloth>
Component: md-cacheAssignee: Poornima G <pgurusid>
Status: CLOSED WORKSFORME QA Contact: Vivek Das <vdas>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, pgurusid, rcyriac, rgowdapp, rhinduja, rhs-bugs, rjoseph, sbhaloth, vdas
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-20 07:47:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description surabhi 2016-10-20 11:16:44 UTC
Description of problem:
******************************
While doing I/O from cifs client and doing graph switch multiple times, seeing input output error on client and huge warning messages on client logs.

[2016-10-19 14:10:07.870224] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a4f5cd8) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870263] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a543010) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870301] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a542f54) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870339] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a542e98) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870402] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a542ddc) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870443] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a542d20) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870481] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a542c64) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870528] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a542ba8) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870569] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a542aec) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870607] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a542a30) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870670] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a542974) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870710] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a4f5c1c) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870748] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a5428b8) with refcount(2) found du
ring cleanup
[2016-10-19 14:10:07.870787] W [inode.c:1809:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x40d) [0x7fd4f61f15ed] -->/lib64/libglusterfs.so.0(inode_table_des
troy_all+0x51) [0x7fd4f5af85d1] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xd7) [0x7fd4f5af84d7] ) 0-gfapi: Active inode(0x7fd48a5427fc) with refcount(2) found du
ring cleanup
:


Version-Release number of selected component (if applicable):
glusterfs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64

How reproducible:
With multiple graph switch

Steps to Reproduce:
1.As explained in description
2.Command used dd if=/dev/zero of=file2 bs=1G count=1024
3.

Actual results:
Input output error and huge warning logs.


Expected results:
there should not be input output errors and huge warning logs generated.


Additional info:

Comment 2 Poornima G 2016-10-27 08:40:33 UTC
I see that the log messages show that there was a fini, was there a disconnect?
Could you try graph-switches like quick-read/add-brick etc. ? Is this seen when md-cache is disabled as well?

Comment 3 surabhi 2016-11-07 12:13:35 UTC
Tried the test with graph-switches by turning readdir-ahead on and off and quick-read on and off, didn't observed the issue.

Will test once again with write-behind and md-cache off and update the bug.

Comment 4 Atin Mukherjee 2016-11-08 13:08:35 UTC
Keeping the needinfo back to Surabhi as we still need the results when md-cache is turned off along with write-behind.

Comment 5 surabhi 2016-11-17 06:19:40 UTC
Vivek ,
Could you please test and update the results as per comment#3

Comment 8 Vivek Das 2016-11-23 08:44:01 UTC
Summary of tests
------------------
1. With mdcache disabled
      Command used dd
      ll /mnt/cifs (in a loop) where /mnt/cifs is the mount point
      write-behind on & off (in a loop)

      Create deep directory using python script
      ll /mnt/cifs (in a loop)  where /mnt/cifs is the mount point
      write-behind on & off (in a loop)

2. With md-cache enabled
      Command used dd
      ll /mnt/cifs (in a loop)  where /mnt/cifs is the mount point
      write-behind on & off (in a loop)

      Create deep directory using python script
      ll /mnt/cifs (in a loop)  where /mnt/cifs is the mount point
      write-behind on & off (in a loop)

3. Distaf test cases that does a graph switch ON & OFF while dd command is ran.
      Ran this on a loop

Well trying all the above test scenarios the original bug related to IO error was NOT seen. But i was hitting issue with the above scenario where the dd command was getting hang and for which i have reported another bug #1396449. So i suspect that it might happen that because of the hang issue of the dd command the IO error is not discovered and that we can wait for the fix to #1396449 and then we can re-try the above steps to reproduce the IO error.

Comment 9 Raghavendra G 2016-11-23 08:51:25 UTC
Is it possible to attach statedump of smbd process when I/O is hung?

Comment 12 surabhi 2016-11-29 10:01:28 UTC
As per the triaging we all have the agreement that this BZ has to be fixed in rhgs-3.2.0. Providing qa_ack

Comment 17 Vivek Das 2016-12-20 06:53:36 UTC
Versions:
--------
glusterfs-3.8.4-9.el7rhgs.x86_64
samba-client-4.4.6-3.el7rhgs.x86_64



1. With md-cache enabled
      Command used dd
      ll /mnt/cifs (in a loop)  where /mnt/cifs is the mount point
      write-behind on & off (in a loop)

      Command used dd
      ll /mnt/cifs (in a loop) where /mnt/cifs is the mount point
      readdir-ahead on & off (in a loop)

      Create deep directory using python script
      ll /mnt/cifs (in a loop)  where /mnt/cifs is the mount point
      write-behind on & off (in a loop)

2. Distaf test cases that does a graph switch ON & OFF while dd command is ran.
      Ran this on a loop

No Input Output errors or huge warning messages were encountered.