Bug 961993

Summary: POSIX test suite failure.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ben Turner <bturner>
Component: glusterfsAssignee: Ravishankar N <ravishankar>
Status: CLOSED EOL QA Contact: Ben Turner <bturner>
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: nsathyan, rhs-bugs, rwheeler, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-03 17:22:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Turner 2013-05-10 22:00:41 UTC
Description of problem:

I started running a new POSIX test suite developed by RH(its called doio) as part of FS sanity overnights and I am seeing failures on the 2.1 bits.  Here is the output:

Iterations:            Infinite
Profile Dir:           None
History Queue Dir:     /tmp
History Queue Depth    100
Inter-operation delay: None:0
Prealloc type:         Sparse
Lock regions:          None
Verify Syscall:        read
Message Interval:      1000
Release Interval:      0
IOV count:             3
Processes:             8
Dump XIORS:            No
Ignore Errors:         No
---------------------------------------------------------------
xiogen starting up with the following:

Iterations:      500000
Seed:            27317
Offset-mode:     sequential
Overlap Flag:    off
Skip Creats:     off
Zero Length File:off
Mintrans:        512
Maxtrans:        4096
Requests:        read,write
Syscalls:        read,readv,write,writev
IO Type:         buffered

Test Devices:

Path                                                      Size
                                                        (bytes)
---------------------------------------------------------------
doio_2                                                 51200000
*** xdoio(pid: 27323) DATA COMPARISON ERROR doio_2 ***
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 7954
    1st 32 expected bytes:  I:27323:write*I:27323:write*I:27
    1st 32 actual bytes:    ................................

*** xdoio(pid: 27320) DATA COMPARISON ERROR doio_2 ***
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 3204
    1st 32 expected bytes:  I:27320:write*I:27320:write*I:27
    1st 32 actual bytes:    ................................

*** xdoio(pid: 27319) DATA COMPARISON ERROR doio_2 ***
Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 0
    1st 32 expected bytes:  I:27319:write*I:27319:write*I:27
    1st 32 actual bytes:    ................................

Pid    xiors    syscall    wall       write    read     lock_wait
                seconds    seconds    MB/s     MB/s     seconds 
---    -----    -------    -------    -----    ----     ---------
27319  1        0.0032     1.7847     0.002    0.002    0.000   
27320  1        0.0020     1.7783     0.000    0.000    0.000   
27321  2        1.7722     1.7779     0.000    0.003    0.000   
27322  2        0.0059     1.7867     0.001    0.003    0.000   
27323  1        1.7724     1.7760     0.001    0.001    0.000   
27324  1        0.0050     1.7809     0.000    0.001    0.000   
27325  1        0.0024     1.7866     0.000    0.000    0.000   
27326  3        1.7784     1.7836     0.002    0.005    0.000   
===    =====    =======    =======    =====    ====     =========
       12       5.3414     0.0000     inf      inf      0.000   
Dumping XIOR Queue to /tmp/xdoio.27318.XIORQ

This same test runs fine on the Anshi GA bits.

Version-Release number of selected component (if applicable):

glusterfs-3.4.0.5rhs-1.el6rhs.x86_64

How reproducible:

Every run.

Steps to Reproduce:
1.  Create 6x2 volume.
2.  Mount via glusterfs.
3.  Run doio
  
Actual results:

Failure.

Expected results:

Test pass.

Additional info:

I'll update the BZ with more details on what doio is doing at the time of the failure as I find them.

Comment 3 Amar Tumballi 2013-05-16 09:32:26 UTC
Ben, we found one memory corruption issue after 3.4.0.6rhs build, when you get a chance, can you please try re-running it with new ISO (so it has 3.4.0.8rhs bits).

Comment 9 Ben Turner 2013-10-08 19:06:20 UTC
This is now working with cifs as of:

glusterfs-3.4.0.34rhs-1.el6rhs.x86_64

glusterfs tests are queued.

Comment 10 Ben Turner 2013-10-11 19:12:26 UTC
I spoke too soon, doio is working on cifs and glusterfs mounts _only_ with pure distribute volumes, on pure rep and dist-rep volumes the doio writer processes hang in D state.  Its not immediately failing like before, now it just hangs.

Comment 11 Vivek Agarwal 2014-02-20 08:36:23 UTC
adding 3.0 flag and removing 2.1.z

Comment 14 Vivek Agarwal 2015-12-03 17:22:17 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.