Please have a public description of what you're solving in the bug (unless it's a security issue), in which case your judgment to have only essential details public.
COMMIT: https://review.gluster.org/7447 committed in master by Raghavendra G (firstname.lastname@example.org)
Author: Raghavendra G <email@example.com>
Date: Fri Apr 11 15:58:47 2014 +0530
performance/read-ahead: prevent stale data being returned to application.
Assume that fd is shared by two application threads/processes.
T0 read is triggered from app-thread t1 and read call passes through
T1 app-thread t2 issues a write. The page on which read from t1 is
waiting is marked stale
T2 write-behind caches write and indicates to application as write
T3 app-thread t2 issues read to same region. Since, there is already a
page for that region (created as part of read at T0), this read
request waits on that page to be filled (though it is stale, which
is a bug).
T4 read (triggered at T0) completes from brick (with write still
pending). Now both read requests from t1 and t2 are served this data
(though data is stale from app-thread t2's perspective - which is a
T5 write is flushed to brick by write-behind.
Fix is to not to serve data from a stale page, but instead initiate a
fresh read to back-end.
Signed-off-by: Raghavendra G <firstname.lastname@example.org>
Smoke: Gluster Build System <email@example.com>
CentOS-regression: Gluster Build System <firstname.lastname@example.org>
Reviewed-by: Csaba Henk <email@example.com>
NetBSD-regression: NetBSD Build System <firstname.lastname@example.org>
Reviewed-by: Zhou Zhengping <email@example.com>
Reviewed-by: Amar Tumballi <firstname.lastname@example.org>
Description of problem:
"Disk Stress" and "Disk Verification" jobs always failed on win7-32/win2012 guests with "viostor+glusterfs". And job can pass without glusterfs
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.boot guest with:
-M pc -m 2G -smp 2 \
-cpu Nehalem \
-usb -device usb-tablet \
-drive file=gluster://10.66.73.33:24007/gv0/win7-32.raw,format=raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none \
-device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup1 -device e1000,netdev=hostnet0,mac=00:52:4f:75:65:35,id=net0 \
-uuid 4870b5fd-65d1-488a-9333-3c4e095285b0 \
-rtc-td-hack -no-kvm-pit-reinjection \
-chardev socket,id=a111,path=/tmp/monitor-win2012-serial,server,nowait -mon chardev=a111,mode=readline \
-name win7-32-blk-glusterfs \
-drive file=gluster://10.66.73.33:24007/gv0/win7-32-data.raw,if=none,media=disk,format=raw,rerror=stop,werror=stop,cache=none,aio=native,id=blk-disk1,serial=win8.1-32-blk -device virtio-blk-pci,drive=blk-disk1,id=disk1 \
-vnc :1 -vga cirrus \
-monitor stdio \
-cdrom /usr/share/virtio-win/virtio-win.iso \
-global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0
2.submit jobs in hck2.1
job always failed
job can pass
gluster volume info
Volume Name: gv0
Volume ID: 67dac294-691b-47e9-8e31-c99be865428f
Number of Bricks: 1
The HCK failure is caused by a Gluster bug. The read-ahead and write-behind
xlators are producing incorrect I/O patterns and lead to data corruption.
The workload is generated by the Windows HCK DiskIO.exe tool:
diskio /d "PCI\VEN_1AF4&DEV_1001&SUBSYS_00021AF4&REV_00\3&13C0B0C5&0&20" /b 32KB /t 00:10:00 /c sv /a /o
Block size: 32 KB
IO depth: 32
Pattern: sequential write+read to verify
The error message from DiskIO.exe is:
T8 Error: Data corruption at block offset 1285390336!
T9 Error: Data corruption at block offset 1285488640!
T4 Error: Data corruption at block offset 1285586944!
T7 Error: Data corruption at block offset 1285455872!
T8 Error: Data corruption at block offset 1286537216!
The offsets and number of corruptions can change between runs.
The following SystemTap output for QEMU shows the write+read for the first
corruption at offset 1285390336:
virtio_blk_handle_write req 0x7f9e81543130 sector 2510528 nsectors 64
bdrv_aio_multiwrite num_callbacks 1 num_reqs 1
virtio_blk_rw_complete req 0x7f9e81543130 ret 0
virtio_blk_handle_read req 0x7f9e81973330 sector 2510528 nsectors 64
bdrv_aio_readv bs 0x7f9e7fac7740 sector_num 2510528 nb_sectors 64 opaque 0x7f9e81973330
virtio_blk_rw_complete req 0x7f9e81973330 ret 0
This matches the workload's I/O pattern:
1. A write completes and is followed by a read.
2. sector 2510528 == offset 1285390336 (512-byte sectors)
3. nsectors 64 == 32 KB block size
So far everything seems normal.
The I/O pattern at the Gluster network protocol level should be similar, but
the Wireshark output reveals the client has produced an invalid I/O pattern that
causes data corruption:
Frame 546, WRITE, 1285324800, 64 KB
Frame 578, READ, 1285292032, 128 KB
Frame 579, READ, 1285292032, 128 KB
Frame 582, WRITE, 1285390336, 32 KB
The WRITE command in Frame 582 was sent *after* the READ command in Frame 579.
There are also other quirks:
1. An identical read request was sent in Frame 578 and 579. This is a duplicate I/O and both are pending at the same time!
2. The write request was sent in Frame 582 while the read requests from Frame 578 and 579 are still pending. In POSIX I/O issuing reads and writes to the same sector in parallel is undefined!
I disabled various combinations of xlators and discovered that this bug only happens when read-ahead and write-behind are enabled together.
Gluster team: please see the files I have attached, they should allow you to debug this in detail.
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.
glusterfs-3.12.0 has been announced on the Gluster mailinglists , packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update infrastructure for your distribution.