Description of problem: I reran a single-node test based on something I picked up from Dean and the test failed. Version-Release number of selected component (if applicable): RHEL5.2 beta How reproducible: Always Steps to Reproduce: service cman start service clvmd start mkfs.gfs2 -O -t bob_cluster2:test_gfs -p lock_dlm -j 3 /dev/trin_vg/hell mount -tgfs2 /dev/trin_vg/hell /mnt/hell chattr +j /mnt/hell PATH=$PATH:/usr/tests/sts-rhel5.1/bin/ cd /mnt/hell xiogen -i 60s -f direct -m sequential -s read -t 1b -T1000b -F 1000b:single_reader_direct_sequential | xdoio -p fill Actual results: Can not open single_reader_direct_sequential: Invalid argument (and the test fails) Expected results: No error; test should succeed. Additional info: This is very similar to (or even the same as) bug #254073. At the time, we attributed the problem to a bad patch we did for 008, reverted the patch and it seemed to work again. So the bug was closed as NOTABUG. I did an strace -f of the failure, and saw this: ...The file in question is opened, and 512000 bytes written to it... [pid 14997] write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 512) = 512 [pid 14997] write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 512) = 512 [pid 14997] write(3, "", 0) = 0 [pid 14997] close(3) = 0 [pid 14997] brk(0x1ed58000) = 0x1ed58000 [pid 14997] brk(0x1ed47000) = 0x1ed47000 [pid 14997] brk(0x1ed33000) = 0x1ed33000 [pid 14997] futex(0x2aaaaacd1000, FUTEX_WAKE, 1) = 0 [pid 14997] futex(0x2aaaaacd2000, FUTEX_WAKE, 1) = 1 [pid 14997] futex(0x2aaaaacd0000, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 14998] <... futex resumed> ) = 0 [pid 14998] open("single_reader_direct_sequential", O_RDONLY|O_DIRECT) = -1 EINVAL (Invalid argument) [pid 14998] write(2, "Can not open single_reader_direc"..., 63Can not open single_reader_direct_sequential: Invalid argument ) = 63 [pid 14998] futex(0x2aaaaacd1000, FUTEX_WAKE, 1) = 0 [pid 14998] futex(0x2aaaaacd0000, FUTEX_WAKE, 1 <unfinished ...> [pid 14997] <... futex resumed> ) = 0 [pid 14998] <... futex resumed> ) = 1 [pid 14997] brk(0x1ed5b000) = 0x1ed5b000 [pid 14997] brk(0x1ed47000) = 0x1ed47000 [pid 14997] futex(0x2aaaaacd1000, FUTEX_WAKE, 1) = 0 [pid 14997] futex(0x2aaaaacd1000, FUTEX_WAKE, 1) = 0 [pid 14997] futex(0x2aaaaacd2000, FUTEX_WAKE, 1) = 0 ... It's that second open that seems to be the problem.
I should mention that this same test works just fine if I don't do the chattr to turn on the jdata attribute.
This problem still occurs in the -4 version, but does NOT occur in the SRPM version of gfs2 in the -54 kernel.
jdata and directio is a meaningless combination, it should always have been disallowed and even though it appeared to work before, it wasn't actually doing jdata at all. We had previously agreed that we'd disable this combination and hence the new behaviour.
But the puzzle here is that none of GFS2 code path exactly disables this combination. Does it ? Why would the open fail ?
The error is returned due to the (intentional) NULL pointer in the directIO slot of the address space operations for jdata files.