Bug 767948

Summary: [7235e5b1af090ffc9d87ac59daadf7926433b495] fsx segfaults for large number of operations on fusemount
Product: [Community] GlusterFS Reporter: Rahul C S <rahulcs>
Component: io-cacheAssignee: Raghavendra G <rgowdapp>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs, kaushal, rfortier
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:38:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: 1f3a0dd4742a2fcd3215aee4a5e22125d7ea4f4d Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    

Description Rahul C S 2011-12-15 10:53:43 UTC
Description of problem:
running fsx for very less operations does not cause and error & it passes.
Only when running fsx for number of operations greater than 50000 induces the segfault. 

if option N is not present then it runs for infinity number of operations. I checked on the backend ext4 fs and found that segfault was not happening.[

root@Dagobah:~/newmount# /opt/qa/tools/fsx -R -W file
mapped writes DISABLED
truncating to largest ever: 0x32740
truncating to largest ever: 0x39212
truncating to largest ever: 0x3bae9
truncating to largest ever: 0x3c1e3
truncating to largest ever: 0x3d1cd
truncating to largest ever: 0x3e8b8
truncating to largest ever: 0x3ed14
truncating to largest ever: 0x3f9c2
truncating to largest ever: 0x3ff9f
truncating to largest ever: 0x3ffa9
truncating to largest ever: 0x3ffb0
truncating to largest ever: 0x3ffce
truncating to largest ever: 0x3ffe0
truncating to largest ever: 0x3fff1
skipping zero size read
truncating to largest ever: 0x3fff4
skipping zero size read
skipping zero size write
skipping zero size write
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
short read: 0x0 bytes instead of 0x896f
LOG DUMP (314546 total operations):
314547(179 mod 256): READ	0x69a7 thru 0x9721	(0x2d7b bytes)
314548(180 mod 256): WRITE	0x1cf2d thru 0x26681	(0x9755 bytes)
Segmentation fault

fsx logged this message in dmesg for the segfault:
[65852.012707] fsx[20769]: segfault at b ip 00007f66cbe2c436 sp 00007fff6ad5ef20 error 4 in libc-2.13.so[7f66cbde5000+18a000]


2nd Trial:
root@Dagobah:~/newmount# /opt/qa/tools/fsx -R -W -N 500000 file
mapped writes DISABLED
truncating to largest ever: 0x32740
truncating to largest ever: 0x39212
truncating to largest ever: 0x3bae9
truncating to largest ever: 0x3c1e3
truncating to largest ever: 0x3d1cd
truncating to largest ever: 0x3e8b8
truncating to largest ever: 0x3ed14
truncating to largest ever: 0x3f9c2
truncating to largest ever: 0x3ff9f
truncating to largest ever: 0x3ffa9
truncating to largest ever: 0x3ffb0
truncating to largest ever: 0x3ffce
truncating to largest ever: 0x3ffe0
truncating to largest ever: 0x3fff1
skipping zero size read
truncating to largest ever: 0x3fff4
skipping zero size read
qskipping zero size write
skipping zero size write
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
truncating to largest ever: 0x3fffe
short read: 0x0 bytes instead of 0x9ca3
LOG DUMP (389625 total operations):
389626(250 mod 256): READ	0x8913 thru 0xf6a2	(0x6d90 bytes)
389627(251 mod 256): READ	0x1e547 thru 0x225e9	(0x40a3 bytes)
389628(252 mod 256): READ	0x12dbd thru 0x20e29	(0xe06d bytes)
389629(253 mod 256): WRITE	0x20fdc thru 0x2d9d5	(0xc9fa bytes)
389630(254 mod 256): WRITE	0x35693 thru 0x3d3f5	(0x7d63 bytes)
389631(255 mod 256): READ	0x3587c thru 0x3d8d6	(0x805b bytes)
Segmentation fault

dmesg entry:
[67820.836352] fsx[20915]: segfault at b ip 00007f9f8e186436 sp 00007fff3a196c70 error 4 in libc-2.13.so[7f9f8e13f000+18a000]

Nothing in our log files. 

Version-Release number of selected component (if applicable):


How reproducible:
Everytime.

Steps to Reproduce:
1. create a volume with 1 brick
2. run "/opt/qa/tools/fsx -R -W -N 500000 file" on the mount point.

  
Actual results:
fsx segfaults showing number of operations done on the file.

Expected results:
fsx should complete 500000 ops and say "All operations complete A-OK!"

Additional info:

Comment 1 Kaushal 2012-02-29 07:53:04 UTC
The segmentation fault here is a problem of fsx not gluster. The segfault occurs while printing the log to the logfile. But this segfault is hit because one of the reads done by fsx, returns with a size smaller than what was requested.

Testing this further with a small program which tries to read the the same data that fsx tried and failed. When I run this program on the file on the mount point, read returns a size lesser than the size requested, as in fsx. But running the program on the file on the brick, read returns the correct size.

Comment 2 Kaushal 2012-03-01 07:13:22 UTC
Some more observations after further testing. 
The short read in fsx seems to happen at holes in the file, mostly those holes which are at the end of the file.
This problem goes away when disabling the io-cache translator.
This behavior is similar to https://bugzilla.redhat.com/show_bug.cgi?id=782286

Comment 3 Kaushal 2012-03-01 10:06:16 UTC
Put in an incorrect bug link above. The correct bug is https://bugzilla.redhat.com/show_bug.cgi?id=788962

Comment 4 Anand Avati 2012-03-20 16:49:05 UTC
CHANGE: http://review.gluster.com/2981 (performance/io-cache: destroy the pages which survived in __ioc_page_destroy because of non NULL waitq in ioc_fault_cbk.) merged in master by Vijay Bellur (vijay)

Comment 5 Anand Avati 2012-03-20 17:42:48 UTC
CHANGE: http://review.gluster.com/2982 (performance/read-ahead: destroy pages which survived from being flushed in flush_region because of frames waiting on them, after those frames are served.) merged in master by Vijay Bellur (vijay)

Comment 6 Rahul C S 2012-04-05 09:38:08 UTC
root@Dagobah:/data/mounts/fuse# /opt/qa/tools/fsx -R -W -N 500000 file
mapped writes DISABLED
truncating to largest ever: 0x32740
truncating to largest ever: 0x39212
truncating to largest ever: 0x3bae9
truncating to largest ever: 0x3c1e3
truncating to largest ever: 0x3d1cd
truncating to largest ever: 0x3e8b8
truncating to largest ever: 0x3ed14
truncating to largest ever: 0x3f9c2
truncating to largest ever: 0x3ff9f
truncating to largest ever: 0x3ffa9
truncating to largest ever: 0x3ffb0
truncating to largest ever: 0x3ffce
truncating to largest ever: 0x3ffe0
truncating to largest ever: 0x3fff1
skipping zero size read
truncating to largest ever: 0x3fff4
skipping zero size read
skipping zero size write
skipping zero size write
skipping zero size read
skipping zero size read
skipping zero size read
skipping zero size read
truncating to largest ever: 0x3fffe
skipping zero size read
All operations completed A-OK!

Works as expected with latest git head.