Bug 1218951 - Spurious failures in fop-sanity.t
Summary: Spurious failures in fop-sanity.t
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1218940
Blocks: 1163543
TreeView+ depends on / blocked
 
Reported: 2015-05-06 09:23 UTC by Nithya Balachandran
Modified: 2016-06-16 12:59 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of: 1218940
Environment:
Last Closed: 2016-06-16 12:59:05 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nithya Balachandran 2015-05-06 09:23:47 UTC
+++ This bug was initially created as a clone of Bug #1218940 +++

Description of problem:

tests/basic/fop-sanity.t fails randomly

Version-Release number of selected component (if applicable):


How reproducible:
Randomly on the regression systems
However, it is consistently reproducible if tests/basic/fop-sanity.c is modified to add the following line immediately before the call to open():
fprintf(stdout, "Test string");

Steps to Reproduce:
1.
2.
3.

Actual results:
Result: PASS
tests/basic/fops-sanity.t .. 
1..11
ok 1
ok 2
No volumes present
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
volume set: success
ok 9
read failed: No data available
read returning junk
fd based file operation 1 failed
read failed: No data available
read returning junk
fstat failed : No data available
fd based file operation 2 failed
read failed: No data available
read returning junk
dup fd based file operation failed
not ok 10 
FAILED COMMAND: ./fops-sanity damn
/home/nbalachandran/Projects/gluster/code/glusterfs
ok 11
Failed 1/11 subtests

Expected results:
Test should pass

Additional info:

--- Additional comment from Nithya Balachandran on 2015-05-06 04:59:51 EDT ---

The man page for open() states that the mode must be passed in if the flags include O_CREAT. The calls to open() in the fops-sanity.c did not include the mode argument. This seemed to cause the file to be created with some random mode which ended up creating a 'T' file. DHT then proceeded to treat the file as a linkto file instead of a data file, causing the test to fail.

Fix:
Provide a mode argument to the open() calls.

Comment 1 Nithya Balachandran 2015-05-06 09:25:17 UTC
Patch posted at:

http://review.gluster.org/#/c/10590/

Comment 2 Nithya Balachandran 2015-05-07 05:13:26 UTC
Additional information:


open() appears to pass in a random (?) value for the mode even if not provided in the call. This mode appears to include the T bit and ends up creating a DHT 'linkto' file. Fragments of the strace outputs for both the failed and fixed test are as follows:


Working Strace output:
----------------------------------------------
arch_prctl(ARCH_SET_FS, 0x7f0fb9a8c740) = 0
mprotect(0x3062bb4000, 16384, PROT_READ) = 0
mprotect(0x604000, 4096, PROT_READ)     = 0
mprotect(0x306261f000, 4096, PROT_READ) = 0
munmap(0x7f0fb9a8f000, 116173)          = 0
open("temp-xattr-test-file_2", O_RDWR|O_CREAT, 0666) = 3    <==== correct mode 
ftruncate(3, 0)                         = 0
write(3, "This is my second string\n", 25) = 25
lseek(3, 0, SEEK_SET)                   = 0
read(3, "This is my second string\n", 25) = 25
fstat(3, {st_mode=S_IFREG|0644, st_size=25, ...}) = 0
fchmod(3, 0640)                         = 0
fchown(3, 10001, 10001)                 = 0
fsync(3)                                = 0
fsetxattr(3, "trusted.xattr-test", "working", 8, 0) = 0
fdatasync(3)                            = 0
flistxattr(3, NULL, 0)                  = 36
fgetxattr(3, "trusted.xattr-test", 0x0, 0) = 8
fremovexattr(3, "trusted.xattr-test")   = 0
close(3)                                = 0
unlink("temp-xattr-test-file_2")        = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 9), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0fb9aab000
write(1, "fd based file operation 2 passed"..., 33fd based file operation 2 passed
) = 33
exit_group(0)

Comment 3 Nithya Balachandran 2015-05-07 05:16:21 UTC
Strace without the fix. As this is random behaviour, the mode here maps to a non-linkto file so the tests pass in this case:
mprotect(0x604000, 4096, PROT_READ)     = 0
mprotect(0x306261f000, 4096, PROT_READ) = 0
munmap(0x7fb8dd4f8000, 116173)          = 0
open("temp-xattr-test-file_2", O_RDWR|O_CREAT, 03777775624535142) = 3  <=== wierd mode

ftruncate(3, 0)                         = 0
write(3, "This is my second string\n", 25) = 25
lseek(3, 0, SEEK_SET)                   = 0
read(3, "This is my second string\n", 25) = 25
fstat(3, {st_mode=S_IFREG|S_ISVTX|0140, st_size=25, ...}) = 0
fchmod(3, 0640)                         = 0
fchown(3, 10001, 10001)                 = 0
fsync(3)                                = 0
fsetxattr(3, "trusted.xattr-test", "working", 8, 0) = 0
fdatasync(3)                            = 0
flistxattr(3, NULL, 0)                  = 36
fgetxattr(3, "trusted.xattr-test", 0x0, 0) = 8
fremovexattr(3, "trusted.xattr-test")   = 0
close(3)                                = 0
unlink("temp-xattr-test-file_2")        = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 9), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb8dd514000
write(1, "fd based file operation 2 passed"..., 33fd based file operation 2 passed
) = 33
exit_group(0)                           = ?

Comment 4 Nithya Balachandran 2015-05-07 05:19:43 UTC
This test can be made to fail consistently by adding an fprintf (stdout, ..) before the open ():
Strace without the fix and with additional fprintf(). The mode here maps to a linkto file so the tests fail:

mprotect(0x306261f000, 4096, PROT_READ) = 0
munmap(0x7f3e795c1000, 116173)          = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 9), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f3e795dd000
open("temp-xattr-test-file_2", O_RDWR|O_CREAT, 03014256725020) = 3
ftruncate(3, 0)                         = 0
write(3, "This is my second string\n", 25) = 25
lseek(3, 0, SEEK_SET)                   = 0
read(3, 0x7fffd6b88750, 25)             = -1 ENODATA (No data available)
write(2, "read failed: No data available\n", 31read failed: No data available
) = 31
write(2, "read returning junk\n", 20read returning junk
)   = 20
fstat(3, 0x7fffd6b887d0)                = -1 ENODATA (No data available)
write(2, "fstat failed : No data available"..., 33fstat failed : No data available
) = 33
fchmod(3, 0640)                         = 0
fchown(3, 10001, 10001)                 = 0
fsync(3)                                = 0
fsetxattr(3, "trusted.xattr-test", "working", 8, 0) = 0
fdatasync(3)                            = 0
flistxattr(3, NULL, 0)                  = 36
fgetxattr(3, "trusted.xattr-test", 0x0, 0) = 8
fremovexattr(3, "trusted.xattr-test")   = 0
close(3)                                = 0
unlink("temp-xattr-test-file_2")        = 0
write(2, "fd based file operation 2 failed"..., 33fd based file operation 2 failed
) = 33
write(1, "Testing", 7Testing)                  = 7
exit_group(-1)                          = ?
+++ exited with 255 +++

Comment 5 Nithya Balachandran 2015-05-07 05:21:55 UTC
I don't know how the strange mode value is selected, but passing in a correct mode solves the problem.

Comment 6 Anand Avati 2015-05-07 08:54:11 UTC
COMMIT: http://review.gluster.org/10590 committed in master by Raghavendra G (rgowdapp) 
------
commit a661f7f54cef34aa39894818568a2c1b462e8cbc
Author: Nithya Balachandran <nbalacha>
Date:   Tue May 5 23:47:58 2015 +0530

    tests: Spurious failure in fop-sanity.t
    
    Modified the calls to open in fops-sanity.c to pass in
    the mode as well if flags includes O_CREAT (as per man page).
    The missing mode randomly caused T files to be created causing DHT
    to treat them as linkto files and fail the fop.
    
    Modified 2 other files where the mode was not being provided.
    
    Change-Id: I047573d43655b4957d0703f7df36238f7e729c1f
    BUG: 1218951
    Signed-off-by: Nithya Balachandran <nbalacha>
    Reviewed-on: http://review.gluster.org/10590
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System
    Reviewed-by: Raghavendra G <rgowdapp>
    Tested-by: Raghavendra G <rgowdapp>

Comment 7 Raghavendra Talur 2016-03-08 20:16:46 UTC
"tests" component is for tests framework only.
File a bug under test component if you find a bug in 
1. any of the *.rc files under tests/ 
2. run-tests.sh


For everything else, the bug should be filed on
1. component which is being tested by .t file if the .t file requires fix.
2. component which is causing a valid .t file to fail in regression.

I have used my best judgement here to move the bug to right component.
In case of ambiguity, I have placed the blame on the .t file component.

Please consider test bugs under the same backlog list that tracks other bugs in your component.

Comment 8 Niels de Vos 2016-06-16 12:59:05 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.