1414242 – [whql][virtio-block+glusterfs]"Disk Stress" and "Disk Verification" job always failed on win7-32/win2012/win2k8R2 guest

Bug 1414242 - [whql][virtio-block+glusterfs]"Disk Stress" and "Disk Verification" job always failed on win7-32/win2012/win2k8R2 guest

Summary: [whql][virtio-block+glusterfs]"Disk Stress" and "Disk Verification" job alway...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	read-ahead
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1071722 1449311 1449313 1449314
TreeView+	depends on / blocked

Reported:	2017-01-18 07:06 UTC by Raghavendra G
Modified:	2017-09-05 17:25 UTC (History)
CC List:	20 users (show)
Fixed In Version:	glusterfs-3.12.0
Clone Of:	1071722
Clones:	1449311 1449313 1449314 (view as bug list)
Environment:
Last Closed:	2017-09-05 17:25:47 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Nigel Babu 2017-01-18 12:50:37 UTC

Please have a public description of what you're solving in the bug (unless it's a security issue), in which case your judgment to have only essential details public.

Comment 2 Worker Ant 2017-05-09 10:06:42 UTC

COMMIT: https://review.gluster.org/7447 committed in master by Raghavendra G (rgowdapp) 
------
commit 2ff39c5cbea6fbda0d7a442f55e6dc2a72efb171
Author: Raghavendra G <rgowdapp>
Date:   Fri Apr 11 15:58:47 2014 +0530

    performance/read-ahead: prevent stale data being returned to application.
    
    Assume that fd is shared by two application threads/processes.
    
    T0 read is triggered from app-thread t1 and read call passes through
       write-behind.
    T1 app-thread t2 issues a write. The page on which read from t1 is
       waiting is marked stale
    T2 write-behind caches write and indicates to application as write
       complete.
    T3 app-thread t2 issues read to same region. Since, there is already a
       page for that region (created as part of read at T0), this read
       request waits on that page to be filled (though it is stale, which
       is a bug).
    T4 read (triggered at T0) completes from brick (with write still
       pending). Now both read requests from t1 and t2 are served this data
       (though data is stale from app-thread t2's perspective - which is a
       bug)
    T5 write is flushed to brick by write-behind.
    
    Fix is to not to serve data from a stale page, but instead initiate a
    fresh read to back-end.
    
    Change-Id: Id6af733464fa41bb4e81fd29c7451c73d06453fb
    BUG: 1414242
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: https://review.gluster.org/7447
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Csaba Henk <csaba>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Zhou Zhengping <johnzzpcrystal>
    Reviewed-by: Amar Tumballi <amarts>

Comment 3 Raghavendra G 2017-05-09 15:01:10 UTC

Description of problem:
"Disk Stress" and "Disk Verification" jobs always failed on win7-32/win2012 guests with "viostor+glusterfs". And job can pass without glusterfs

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.59rhs-1.el7.x86_64
virtio-win-1.6.8-4.el7.noarch
kernel-3.10.0-84.el7.x86_64
qemu-kvm-rhev-1.5.3-45.el7.x86_64
seabios-1.7.2.2-11.el7.x86_64
windows guest:win7-32/win2012

How reproducible:
100%

Steps to Reproduce:
1.boot guest with:
/usr/libexec/qemu-kvm \
-M pc -m 2G -smp 2 \
-cpu Nehalem \
-usb -device usb-tablet \
-drive file=gluster://10.66.73.33:24007/gv0/win7-32.raw,format=raw,if=none,id=drive-ide0-0-0,werror=stop,rerror=stop,cache=none \
-device ide-drive,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup1 -device e1000,netdev=hostnet0,mac=00:52:4f:75:65:35,id=net0 \
-uuid 4870b5fd-65d1-488a-9333-3c4e095285b0 \
-rtc-td-hack -no-kvm-pit-reinjection \
-chardev socket,id=a111,path=/tmp/monitor-win2012-serial,server,nowait -mon chardev=a111,mode=readline \
-name win7-32-blk-glusterfs \
-drive file=gluster://10.66.73.33:24007/gv0/win7-32-data.raw,if=none,media=disk,format=raw,rerror=stop,werror=stop,cache=none,aio=native,id=blk-disk1,serial=win8.1-32-blk -device virtio-blk-pci,drive=blk-disk1,id=disk1 \
-vnc :1 -vga cirrus \
-monitor stdio \
-cdrom /usr/share/virtio-win/virtio-win.iso \
-global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2.submit jobs in hck2.1

Actual results:
job always failed

Expected results:
job can pass 

Additional info:
gluster volume info
Volume Name: gv0
Type: Distribute
Volume ID: 67dac294-691b-47e9-8e31-c99be865428f
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.66.73.33:/data/gv0/brick1

Analysis:
The HCK failure is caused by a Gluster bug.  The read-ahead and write-behind
xlators are producing incorrect I/O patterns and lead to data corruption.

The workload is generated by the Windows HCK DiskIO.exe tool:
diskio /d "PCI\VEN_1AF4&DEV_1001&SUBSYS_00021AF4&REV_00\3&13C0B0C5&0&20" /b 32KB /t 00:10:00 /c sv /a /o

Block size: 32 KB
IO depth: 32
Pattern: sequential write+read to verify

The error message from DiskIO.exe is:
T8 Error: Data corruption at block offset 1285390336!
T9 Error: Data corruption at block offset 1285488640!
T4 Error: Data corruption at block offset 1285586944!
T7 Error: Data corruption at block offset 1285455872!
T8 Error: Data corruption at block offset 1286537216!

The offsets and number of corruptions can change between runs.

The following SystemTap output for QEMU shows the write+read for the first
corruption at offset 1285390336:
virtio_blk_handle_write req 0x7f9e81543130 sector 2510528 nsectors 64
[...]
bdrv_aio_multiwrite num_callbacks 1 num_reqs 1
virtio_blk_rw_complete req 0x7f9e81543130 ret 0
[...]
virtio_blk_handle_read req 0x7f9e81973330 sector 2510528 nsectors 64
bdrv_aio_readv bs 0x7f9e7fac7740 sector_num 2510528 nb_sectors 64 opaque 0x7f9e81973330
[...]
virtio_blk_rw_complete req 0x7f9e81973330 ret 0

This matches the workload's I/O pattern:
1. A write completes and is followed by a read.
2. sector 2510528 == offset 1285390336 (512-byte sectors)
3. nsectors 64 == 32 KB block size

So far everything seems normal.

The I/O pattern at the Gluster network protocol level should be similar, but
the Wireshark output reveals the client has produced an invalid I/O pattern that
causes data corruption:
Frame 546, WRITE, 1285324800, 64 KB
Frame 578, READ, 1285292032, 128 KB
Frame 579, READ, 1285292032, 128 KB
Frame 582, WRITE, 1285390336, 32 KB

The WRITE command in Frame 582 was sent *after* the READ command in Frame 579.

There are also other quirks:
1. An identical read request was sent in Frame 578 and 579.  This is a duplicate I/O and both are pending at the same time!
2. The write request was sent in Frame 582 while the read requests from Frame 578 and 579 are still pending.  In POSIX I/O issuing reads and writes to the same sector in parallel is undefined!

I disabled various combinations of xlators and discovered that this bug only happens when read-ahead and write-behind are enabled together.

Gluster team: please see the files I have attached, they should allow you to debug this in detail.

Comment 4 Shyamsundar 2017-09-05 17:25:47 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.

areis
ddumas
hhuang
jcody
juzhang
lijin
mdeng
michen
nigelb
nkinder
rbalakri
rgowdapp
rhs-bugs
rtalur
rwheeler
ssaha
stefanha
storage-qa-internal
virt-maint
vrozenfe