Description of problem: running fsx for very less operations does not cause and error & it passes. Only when running fsx for number of operations greater than 50000 induces the segfault. if option N is not present then it runs for infinity number of operations. I checked on the backend ext4 fs and found that segfault was not happening.[ root@Dagobah:~/newmount# /opt/qa/tools/fsx -R -W file mapped writes DISABLED truncating to largest ever: 0x32740 truncating to largest ever: 0x39212 truncating to largest ever: 0x3bae9 truncating to largest ever: 0x3c1e3 truncating to largest ever: 0x3d1cd truncating to largest ever: 0x3e8b8 truncating to largest ever: 0x3ed14 truncating to largest ever: 0x3f9c2 truncating to largest ever: 0x3ff9f truncating to largest ever: 0x3ffa9 truncating to largest ever: 0x3ffb0 truncating to largest ever: 0x3ffce truncating to largest ever: 0x3ffe0 truncating to largest ever: 0x3fff1 skipping zero size read truncating to largest ever: 0x3fff4 skipping zero size read skipping zero size write skipping zero size write skipping zero size read skipping zero size read skipping zero size read skipping zero size read short read: 0x0 bytes instead of 0x896f LOG DUMP (314546 total operations): 314547(179 mod 256): READ 0x69a7 thru 0x9721 (0x2d7b bytes) 314548(180 mod 256): WRITE 0x1cf2d thru 0x26681 (0x9755 bytes) Segmentation fault fsx logged this message in dmesg for the segfault: [65852.012707] fsx[20769]: segfault at b ip 00007f66cbe2c436 sp 00007fff6ad5ef20 error 4 in libc-2.13.so[7f66cbde5000+18a000] 2nd Trial: root@Dagobah:~/newmount# /opt/qa/tools/fsx -R -W -N 500000 file mapped writes DISABLED truncating to largest ever: 0x32740 truncating to largest ever: 0x39212 truncating to largest ever: 0x3bae9 truncating to largest ever: 0x3c1e3 truncating to largest ever: 0x3d1cd truncating to largest ever: 0x3e8b8 truncating to largest ever: 0x3ed14 truncating to largest ever: 0x3f9c2 truncating to largest ever: 0x3ff9f truncating to largest ever: 0x3ffa9 truncating to largest ever: 0x3ffb0 truncating to largest ever: 0x3ffce truncating to largest ever: 0x3ffe0 truncating to largest ever: 0x3fff1 skipping zero size read truncating to largest ever: 0x3fff4 skipping zero size read qskipping zero size write skipping zero size write skipping zero size read skipping zero size read skipping zero size read skipping zero size read truncating to largest ever: 0x3fffe short read: 0x0 bytes instead of 0x9ca3 LOG DUMP (389625 total operations): 389626(250 mod 256): READ 0x8913 thru 0xf6a2 (0x6d90 bytes) 389627(251 mod 256): READ 0x1e547 thru 0x225e9 (0x40a3 bytes) 389628(252 mod 256): READ 0x12dbd thru 0x20e29 (0xe06d bytes) 389629(253 mod 256): WRITE 0x20fdc thru 0x2d9d5 (0xc9fa bytes) 389630(254 mod 256): WRITE 0x35693 thru 0x3d3f5 (0x7d63 bytes) 389631(255 mod 256): READ 0x3587c thru 0x3d8d6 (0x805b bytes) Segmentation fault dmesg entry: [67820.836352] fsx[20915]: segfault at b ip 00007f9f8e186436 sp 00007fff3a196c70 error 4 in libc-2.13.so[7f9f8e13f000+18a000] Nothing in our log files. Version-Release number of selected component (if applicable): How reproducible: Everytime. Steps to Reproduce: 1. create a volume with 1 brick 2. run "/opt/qa/tools/fsx -R -W -N 500000 file" on the mount point. Actual results: fsx segfaults showing number of operations done on the file. Expected results: fsx should complete 500000 ops and say "All operations complete A-OK!" Additional info:
The segmentation fault here is a problem of fsx not gluster. The segfault occurs while printing the log to the logfile. But this segfault is hit because one of the reads done by fsx, returns with a size smaller than what was requested. Testing this further with a small program which tries to read the the same data that fsx tried and failed. When I run this program on the file on the mount point, read returns a size lesser than the size requested, as in fsx. But running the program on the file on the brick, read returns the correct size.
Some more observations after further testing. The short read in fsx seems to happen at holes in the file, mostly those holes which are at the end of the file. This problem goes away when disabling the io-cache translator. This behavior is similar to https://bugzilla.redhat.com/show_bug.cgi?id=782286
Put in an incorrect bug link above. The correct bug is https://bugzilla.redhat.com/show_bug.cgi?id=788962
CHANGE: http://review.gluster.com/2981 (performance/io-cache: destroy the pages which survived in __ioc_page_destroy because of non NULL waitq in ioc_fault_cbk.) merged in master by Vijay Bellur (vijay)
CHANGE: http://review.gluster.com/2982 (performance/read-ahead: destroy pages which survived from being flushed in flush_region because of frames waiting on them, after those frames are served.) merged in master by Vijay Bellur (vijay)
root@Dagobah:/data/mounts/fuse# /opt/qa/tools/fsx -R -W -N 500000 file mapped writes DISABLED truncating to largest ever: 0x32740 truncating to largest ever: 0x39212 truncating to largest ever: 0x3bae9 truncating to largest ever: 0x3c1e3 truncating to largest ever: 0x3d1cd truncating to largest ever: 0x3e8b8 truncating to largest ever: 0x3ed14 truncating to largest ever: 0x3f9c2 truncating to largest ever: 0x3ff9f truncating to largest ever: 0x3ffa9 truncating to largest ever: 0x3ffb0 truncating to largest ever: 0x3ffce truncating to largest ever: 0x3ffe0 truncating to largest ever: 0x3fff1 skipping zero size read truncating to largest ever: 0x3fff4 skipping zero size read skipping zero size write skipping zero size write skipping zero size read skipping zero size read skipping zero size read skipping zero size read truncating to largest ever: 0x3fffe skipping zero size read All operations completed A-OK! Works as expected with latest git head.