Bug 379021

Summary: GFS2: xdoio with jdata and directio fails
Product: Red Hat Enterprise Linux 5 Reporter: Robert Peterson <rpeterso>
Component: kernelAssignee: Steve Whitehouse <swhiteho>
Status: CLOSED NOTABUG QA Contact: GFS Bugs <gfs-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 5.1CC: cluster-maint, djansa, rpeterso
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-13 08:37:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert Peterson 2007-11-12 23:03:11 UTC
Description of problem:
I reran a single-node test based on something I picked up from Dean
and the test failed.

Version-Release number of selected component (if applicable):
RHEL5.2 beta

How reproducible:
Always

Steps to Reproduce:

service cman start
service clvmd start
mkfs.gfs2 -O -t bob_cluster2:test_gfs -p lock_dlm -j 3 /dev/trin_vg/hell
mount -tgfs2 /dev/trin_vg/hell /mnt/hell
chattr +j /mnt/hell
PATH=$PATH:/usr/tests/sts-rhel5.1/bin/
cd /mnt/hell
xiogen -i 60s -f direct -m sequential -s read -t 1b -T1000b -F
1000b:single_reader_direct_sequential | xdoio -p fill
  
Actual results:
Can not open single_reader_direct_sequential: Invalid argument
(and the test fails)

Expected results:
No error; test should succeed.

Additional info:
This is very similar to (or even the same as) bug #254073.
At the time, we attributed the problem to a bad patch we did for 008,
reverted the patch and it seemed to work again.  So the bug was closed
as NOTABUG.

I did an strace -f of the failure, and saw this:

...The file in question is opened, and 512000 bytes written to it...
[pid 14997] write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 512) = 512
[pid 14997] write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 512) = 512
[pid 14997] write(3, "", 0)             = 0
[pid 14997] close(3)                    = 0
[pid 14997] brk(0x1ed58000)             = 0x1ed58000
[pid 14997] brk(0x1ed47000)             = 0x1ed47000
[pid 14997] brk(0x1ed33000)             = 0x1ed33000
[pid 14997] futex(0x2aaaaacd1000, FUTEX_WAKE, 1) = 0
[pid 14997] futex(0x2aaaaacd2000, FUTEX_WAKE, 1) = 1
[pid 14997] futex(0x2aaaaacd0000, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 14998] <... futex resumed> )       = 0
[pid 14998] open("single_reader_direct_sequential", O_RDONLY|O_DIRECT) = -1
EINVAL (Invalid argument)
[pid 14998] write(2, "Can not open single_reader_direc"..., 63Can not open
single_reader_direct_sequential: Invalid argument
) = 63
[pid 14998] futex(0x2aaaaacd1000, FUTEX_WAKE, 1) = 0
[pid 14998] futex(0x2aaaaacd0000, FUTEX_WAKE, 1 <unfinished ...>
[pid 14997] <... futex resumed> )       = 0
[pid 14998] <... futex resumed> )       = 1
[pid 14997] brk(0x1ed5b000)             = 0x1ed5b000
[pid 14997] brk(0x1ed47000)             = 0x1ed47000
[pid 14997] futex(0x2aaaaacd1000, FUTEX_WAKE, 1) = 0
[pid 14997] futex(0x2aaaaacd1000, FUTEX_WAKE, 1) = 0
[pid 14997] futex(0x2aaaaacd2000, FUTEX_WAKE, 1) = 0
...
It's that second open that seems to be the problem.

Comment 1 Robert Peterson 2007-11-12 23:04:47 UTC
I should mention that this same test works just fine if I don't
do the chattr to turn on the jdata attribute.


Comment 2 Robert Peterson 2007-11-12 23:38:11 UTC
This problem still occurs in the -4 version, but does NOT occur in the
SRPM version of gfs2 in the -54 kernel.


Comment 3 Steve Whitehouse 2007-11-13 08:37:35 UTC
jdata and directio is a meaningless combination, it should always have been
disallowed and even though it appeared to work before, it wasn't actually doing
jdata at all. We had previously agreed that we'd disable this combination and
hence the new behaviour.


Comment 4 Wendy Cheng 2007-11-13 14:43:57 UTC
But the puzzle here is that none of GFS2 code path exactly disables this
combination. Does it ? Why would the open fail ? 

Comment 5 Steve Whitehouse 2007-11-13 14:55:07 UTC
The error is returned due to the (intentional) NULL pointer in the directIO slot
of the address space operations for jdata files.