1273509 – KVM PPC guest disk fails with "sd 0:0:0:0: aborting command. lun 0x8000000000000000"

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1273509 - KVM PPC guest disk fails with "sd 0:0:0:0: aborting command. lun 0x8000000000000000"

Summary: KVM PPC guest disk fails with "sd 0:0:0:0: aborting command. lun 0x8000000000...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.2
Hardware:	ppc64le
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	7.3
Assignee:	David Gibson
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	RHEV3.6PPC 1230910
TreeView+	depends on / blocked

Reported:	2015-10-20 15:23 UTC by Thomas Huth
Modified:	2016-02-21 13:32 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-11-16 08:34:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Guest console log (67.91 KB, text/plain) 2015-10-21 03:32 UTC, Qunfang Zhang	no flags	Details
dmesg output of the crashed guest1 (45.97 KB, text/plain) 2015-10-30 14:35 UTC, Thomas Huth	no flags	Details
dmesg output of the crashed guest36 (37.95 KB, text/plain) 2015-10-30 14:38 UTC, Thomas Huth	no flags	Details
Patch with the additional kernel debug print statements (4.12 KB, patch) 2015-11-05 21:37 UTC, Thomas Huth	no flags	Details \| Diff
Guest that crashed with the additional kernel printk / dump_stack statements (107.86 KB, text/plain) 2015-11-05 21:47 UTC, Thomas Huth	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	132155	0	None	None	None	2019-07-29 17:04:17 UTC

Description Thomas Huth 2015-10-20 15:23:34 UTC

Description of problem:
While trying to reproduce bug 1269467, I've seen one of the guests crashes in a different way today, with a disk failure:

[20013.523437] ltptest gf01 start
[20204.858446] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c62b860
[20204.858562] sd 0:0:0:0: successfully aborted task tag 0xc00000007c62b860
[20204.858907] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c620588
[20204.858954] sd 0:0:0:0: successfully aborted task tag 0xc00000007c620588
[20204.960232] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c620760
[20204.960294] sd 0:0:0:0: successfully aborted task tag 0xc00000007c620760
[20204.960623] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c620938
[20204.966444] sd 0:0:0:0: successfully aborted task tag 0xc00000007c620938
[20204.967093] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c620b10
[20204.967151] sd 0:0:0:0: successfully aborted task tag 0xc00000007c620b10
[20204.967740] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c620ce8
[20204.967963] sd 0:0:0:0: successfully aborted task tag 0xc00000007c620ce8
[20204.968513] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c621098
[20204.968695] sd 0:0:0:0: successfully aborted task tag 0xc00000007c621098
[20446.246082] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c622130
[20446.246170] sd 0:0:0:0: successfully aborted task tag 0xc00000007c622130
[20446.247877] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c622e18
[20446.247998] sd 0:0:0:0: successfully aborted task tag 0xc00000007c622e18
[20446.250508] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c6233a0
[20446.250556] sd 0:0:0:0: successfully aborted task tag 0xc00000007c6233a0
[20446.250885] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c623928
[20446.250926] sd 0:0:0:0: successfully aborted task tag 0xc00000007c623928
[20446.251532] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000007c623eb0
[20446.251579] sd 0:0:0:0: successfully aborted task tag 0xc00000007c623eb0
[20446.251804] sd 0:0:0:0: [sda] Medium access timeout failure. Offlining disk!
[20446.251881] sd 0:0:0:0: resetting device. lun 0x8000000000000000
[20446.251921] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
[20446.251964] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 01 04 84 38 00 04 00 00
[20446.252008] blk_update_request: I/O error, dev sda, sector 17073208
[20446.252110] sd 0:0:0:0: rejecting I/O to offline device
[20446.252140] sd 0:0:0:0: [sda] killing request
[20446.252312] sd 0:0:0:0: rejecting I/O to offline device
[20446.252342] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[20446.252346] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 01 04 a4 38 00 00 80 00
[20446.252347] blk_update_request: I/O error, dev sda, sector 17081400
[20446.252534] sd 0:0:0:0: [sda] killing request
[20446.252713] sd 0:0:0:0: rejecting I/O to offline device
[20446.252823] sd 0:0:0:0: rejecting I/O to offline device
[20446.253020] sd 0:0:0:0: rejecting I/O to offline device
[20446.253066] sd 0:0:0:0: rejecting I/O to offline device
[20446.253135] sd 0:0:0:0: rejecting I/O to offline device
[20446.253190] sd 0:0:0:0: rejecting I/O to offline device
[20446.253386] sd 0:0:0:0: rejecting I/O to offline device
[20446.253730] sd 0:0:0:0: rejecting I/O to offline device
[20446.253771] sd 0:0:0:0: rejecting I/O to offline device
[20446.253813] sd 0:0:0:0: rejecting I/O to offline device
[20446.253854] Read-error on swap-device (8:0:1213056)
[20446.253898] Read-error on swap-device (8:0:1213184)
[20446.253937] Read-error on swap-device (8:0:1213312)
[20446.253988] Read-error on swap-device (8:0:1213440)
[20446.254028] sd 0:0:0:0: rejecting I/O to offline device
[20446.254067] Read-error on swap-device (8:0:2263168)
[20446.254123] Read-error on swap-device (8:0:2263296)
[20446.254161] Read-error on swap-device (8:0:2263424)
[20446.254196] Read-error on swap-device (8:0:2263552)
[20446.254226] Read-error on swap-device (8:0:2263680)
[20446.254257] sd 0:0:0:0: rejecting I/O to offline device
[20446.254285] Read-error on swap-device (8:0:2288128)
[20446.254324] Read-error on swap-device (8:0:2288256)
[20446.254362] Read-error on swap-device (8:0:2288384)
[20446.254406] sd 0:0:0:0: rejecting I/O to offline device
[20446.254444] Read-error on swap-device (8:0:2358656)
[20446.254485] Read-error on swap-device (8:0:2358784)
[20446.254524] Read-error on swap-device (8:0:2358912)
[20446.254563] sd 0:0:0:0: rejecting I/O to offline device
[20446.254633] sd 0:0:0:0: rejecting I/O to offline device
[20446.254694] sd 0:0:0:0: rejecting I/O to offline device
[20446.254735] sd 0:0:0:0: rejecting I/O to offline device
[20446.254940] Read-error on swap-device (8:0:1165440)
[20446.255208] Read-error on swap-device (8:0:1165568)
[20446.255253] Read-error on swap-device (8:0:1165696)
[20446.255298] Read-error on swap-device (8:0:1165824)
[20446.255340] Read-error on swap-device (8:0:1165952)
[20446.255812] Read-error on swap-device (8:0:1166080)
[20446.255900] Read-error on swap-device (8:0:1166208)
[20446.255949] Read-error on swap-device (8:0:1166336)
[20446.255997] sd 0:0:0:0: rejecting I/O to offline device
[20446.256046] sd 0:0:0:0: rejecting I/O to offline device
[20446.256109] sd 0:0:0:0: rejecting I/O to offline device
[20446.256157] sd 0:0:0:0: rejecting I/O to offline device
[20446.256205] sd 0:0:0:0: rejecting I/O to offline device
[20446.256252] sd 0:0:0:0: rejecting I/O to offline device
[20446.256297] sd 0:0:0:0: rejecting I/O to offline device
[20446.256346] sd 0:0:0:0: rejecting I/O to offline device
[20446.256391] sd 0:0:0:0: rejecting I/O to offline device
[20446.256440] sd 0:0:0:0: rejecting I/O to offline device
[20446.256488] sd 0:0:0:0: rejecting I/O to offline device
[20446.256534] sd 0:0:0:0: rejecting I/O to offline device
[20446.256581] sd 0:0:0:0: rejecting I/O to offline device
[20446.256627] sd 0:0:0:0: rejecting I/O to offline device
[20446.256674] sd 0:0:0:0: rejecting I/O to offline device
[20446.256718] sd 0:0:0:0: rejecting I/O to offline device
[20446.256763] sd 0:0:0:0: rejecting I/O to offline device
[20446.256813] sd 0:0:0:0: rejecting I/O to offline device
[20446.256863] sd 0:0:0:0: rejecting I/O to offline device
[20446.256909] sd 0:0:0:0: rejecting I/O to offline device
[20446.256957] sd 0:0:0:0: rejecting I/O to offline device
[20446.257006] sd 0:0:0:0: rejecting I/O to offline device
[20446.257054] sd 0:0:0:0: rejecting I/O to offline device
[20446.257096] sd 0:0:0:0: rejecting I/O to offline device
[20446.257143] sd 0:0:0:0: rejecting I/O to offline device
[20446.257190] sd 0:0:0:0: rejecting I/O to offline device
[20446.257237] sd 0:0:0:0: rejecting I/O to offline device
[20446.257284] sd 0:0:0:0: rejecting I/O to offline device
[20446.257329] sd 0:0:0:0: rejecting I/O to offline device
[20446.257375] sd 0:0:0:0: rejecting I/O to offline device
[20446.257422] sd 0:0:0:0: rejecting I/O to offline device
[20446.257468] sd 0:0:0:0: rejecting I/O to offline device
[20446.257515] sd 0:0:0:0: rejecting I/O to offline device
[20446.257561] sd 0:0:0:0: rejecting I/O to offline device
[20446.257606] sd 0:0:0:0: rejecting I/O to offline device
[20446.257652] sd 0:0:0:0: rejecting I/O to offline device
[20446.257697] sd 0:0:0:0: rejecting I/O to offline device
[20446.257741] sd 0:0:0:0: rejecting I/O to offline device
[20446.257788] sd 0:0:0:0: rejecting I/O to offline device
[20446.257833] sd 0:0:0:0: rejecting I/O to offline device
[20446.257880] sd 0:0:0:0: rejecting I/O to offline device
[20446.257926] sd 0:0:0:0: rejecting I/O to offline device
[20446.257974] sd 0:0:0:0: rejecting I/O to offline device
[20446.258021] sd 0:0:0:0: rejecting I/O to offline device
[20446.258065] sd 0:0:0:0: rejecting I/O to offline device
[20446.258111] sd 0:0:0:0: rejecting I/O to offline device
[20446.258155] sd 0:0:0:0: rejecting I/O to offline device
[20446.258204] sd 0:0:0:0: rejecting I/O to offline device
[20446.258249] sd 0:0:0:0: rejecting I/O to offline device
[20446.258297] sd 0:0:0:0: rejecting I/O to offline device
[20446.258341] sd 0:0:0:0: rejecting I/O to offline device
[20446.258389] sd 0:0:0:0: rejecting I/O to offline device
[20446.258436] sd 0:0:0:0: rejecting I/O to offline device
[20446.258480] sd 0:0:0:0: rejecting I/O to offline device
[20446.258528] sd 0:0:0:0: rejecting I/O to offline device
[20446.258574] sd 0:0:0:0: rejecting I/O to offline device
[20446.258620] sd 0:0:0:0: rejecting I/O to offline device
[20446.258666] sd 0:0:0:0: rejecting I/O to offline device
[20446.258710] sd 0:0:0:0: rejecting I/O to offline device
[20446.258757] sd 0:0:0:0: rejecting I/O to offline device
[20446.258804] sd 0:0:0:0: rejecting I/O to offline device
[20446.258850] sd 0:0:0:0: rejecting I/O to offline device
[20446.258895] sd 0:0:0:0: rejecting I/O to offline device
[20446.258940] sd 0:0:0:0: rejecting I/O to offline device
[20446.258983] sd 0:0:0:0: rejecting I/O to offline device
[20446.259031] sd 0:0:0:0: rejecting I/O to offline device
[20446.259077] sd 0:0:0:0: rejecting I/O to offline device
[20446.259119] sd 0:0:0:0: rejecting I/O to offline device
[20446.259165] sd 0:0:0:0: rejecting I/O to offline device
[20446.259213] sd 0:0:0:0: rejecting I/O to offline device
[20446.259255] sd 0:0:0:0: rejecting I/O to offline device
[20446.259302] sd 0:0:0:0: rejecting I/O to offline device
[20446.259348] sd 0:0:0:0: rejecting I/O to offline device
[20446.259396] sd 0:0:0:0: rejecting I/O to offline device
[20446.259441] sd 0:0:0:0: rejecting I/O to offline device
[20446.259486] sd 0:0:0:0: rejecting I/O to offline device
[20446.259533] sd 0:0:0:0: rejecting I/O to offline device
[20446.259578] sd 0:0:0:0: rejecting I/O to offline device
[20446.259627] sd 0:0:0:0: rejecting I/O to offline device
[20446.259673] sd 0:0:0:0: rejecting I/O to offline device
[20446.259719] sd 0:0:0:0: rejecting I/O to offline device
[20446.259765] sd 0:0:0:0: rejecting I/O to offline device
[20446.259814] sd 0:0:0:0: rejecting I/O to offline device
[20446.259858] sd 0:0:0:0: rejecting I/O to offline device
[20446.259908] sd 0:0:0:0: rejecting I/O to offline device
[20446.259955] sd 0:0:0:0: rejecting I/O to offline device
[20446.260000] sd 0:0:0:0: rejecting I/O to offline device
[20446.260046] sd 0:0:0:0: rejecting I/O to offline device
[20446.260093] sd 0:0:0:0: rejecting I/O to offline device
[20446.260137] sd 0:0:0:0: rejecting I/O to offline device
[20446.260184] sd 0:0:0:0: rejecting I/O to offline device
[20446.260227] sd 0:0:0:0: rejecting I/O to offline device
[20446.260275] sd 0:0:0:0: rejecting I/O to offline device
[20446.260319] sd 0:0:0:0: rejecting I/O to offline device
[20446.260368] sd 0:0:0:0: rejecting I/O to offline device
[20446.260417] sd 0:0:0:0: rejecting I/O to offline device
[20446.260464] sd 0:0:0:0: rejecting I/O to offline device
[20446.260512] sd 0:0:0:0: rejecting I/O to offline device
[20446.260560] sd 0:0:0:0: rejecting I/O to offline device
[20446.260606] sd 0:0:0:0: rejecting I/O to offline device
[20446.260653] sd 0:0:0:0: rejecting I/O to offline device
[20446.260698] sd 0:0:0:0: rejecting I/O to offline device
[20446.260744] sd 0:0:0:0: rejecting I/O to offline device
[20446.260792] sd 0:0:0:0: rejecting I/O to offline device
[20446.260838] sd 0:0:0:0: rejecting I/O to offline device
[20446.260885] sd 0:0:0:0: rejecting I/O to offline device
[20446.260928] sd 0:0:0:0: rejecting I/O to offline device
[20446.260970] sd 0:0:0:0: rejecting I/O to offline device
[20446.261014] sd 0:0:0:0: rejecting I/O to offline device
[20446.261059] sd 0:0:0:0: rejecting I/O to offline device
[20446.261104] sd 0:0:0:0: rejecting I/O to offline device
[20446.261150] sd 0:0:0:0: rejecting I/O to offline device
[20446.261197] sd 0:0:0:0: rejecting I/O to offline device
[20446.261243] sd 0:0:0:0: rejecting I/O to offline device
[20446.261289] sd 0:0:0:0: rejecting I/O to offline device
[20446.261335] sd 0:0:0:0: rejecting I/O to offline device
[20446.261381] sd 0:0:0:0: rejecting I/O to offline device
[20446.261420] sd 0:0:0:0: rejecting I/O to offline device
[20446.261458] sd 0:0:0:0: rejecting I/O to offline device
[20446.261497] sd 0:0:0:0: rejecting I/O to offline device
[20446.261537] sd 0:0:0:0: rejecting I/O to offline device
[20446.261575] sd 0:0:0:0: rejecting I/O to offline device
[20446.261614] sd 0:0:0:0: rejecting I/O to offline device
[20446.261653] sd 0:0:0:0: rejecting I/O to offline device
[20446.261692] sd 0:0:0:0: rejecting I/O to offline device
[20446.261733] sd 0:0:0:0: rejecting I/O to offline device
[20446.261771] sd 0:0:0:0: rejecting I/O to offline device
[20446.261810] sd 0:0:0:0: rejecting I/O to offline device
[20446.261849] sd 0:0:0:0: rejecting I/O to offline device
[20446.261889] sd 0:0:0:0: rejecting I/O to offline device
[20446.261934] sd 0:0:0:0: rejecting I/O to offline device
[20446.261979] sd 0:0:0:0: rejecting I/O to offline device
[20446.262026] sd 0:0:0:0: rejecting I/O to offline device
[20446.262071] sd 0:0:0:0: rejecting I/O to offline device
[20446.262126] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
[20446.262155] sd 0:0:0:0: rejecting I/O to offline device
[20446.262158] sd 0:0:0:0: rejecting I/O to offline device
[20446.262160] sd 0:0:0:0: rejecting I/O to offline device
[20446.262198] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[20446.262199] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 01 04 bc b8 00 04 00 00
[20446.262200] blk_update_request: I/O error, dev sda, sector 17087672
[20446.262311] Buffer I/O error on device sda5, logical block 1498007
[20446.262311] lost page write due to I/O error on sda5
[20446.262312] Buffer I/O error on device sda5, logical block 1498008
[20446.262313] lost page write due to I/O error on sda5
[20446.262314] Buffer I/O error on device sda5, logical block 1498009
[20446.262314] lost page write due to I/O error on sda5
[20446.262315] Buffer I/O error on device sda5, logical block 1498010
[20446.262315] lost page write due to I/O error on sda5
[20446.262316] Buffer I/O error on device sda5, logical block 1498011
[20446.262316] lost page write due to I/O error on sda5
[20446.262317] Buffer I/O error on device sda5, logical block 1498012
[20446.262317] lost page write due to I/O error on sda5
[20446.262318] Buffer I/O error on device sda5, logical block 1498013
[20446.262318] lost page write due to I/O error on sda5
[20446.262319] Buffer I/O error on device sda5, logical block 1498014
[20446.262319] lost page write due to I/O error on sda5
[20446.262320] Buffer I/O error on device sda5, logical block 1498015
[20446.262320] lost page write due to I/O error on sda5
[20446.262929] sd 0:0:0:0: rejecting I/O to offline device
[20446.262931] Read-error on swap-device (8:0:1520768)
[20446.262932] Read-error on swap-device (8:0:1520896)
[20446.262932] Read-error on swap-device (8:0:1521024)
[20446.264716] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 01 04 a4 b8 00 04 00 00
[20446.264754] blk_update_request: I/O error, dev sda, sector 17081528
[20446.264823] sd 0:0:0:0: rejecting I/O to offline device
[20446.264883] sd 0:0:0:0: rejecting I/O to offline device
[20446.264921] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
[20446.264965] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 01 04 8c 38 00 04 00 00
[20446.264998] sd 0:0:0:0: rejecting I/O to offline device
[20446.264999] Read-error on swap-device (8:0:2348160)
[20446.265001] Read-error on swap-device (8:0:2348288)
[20446.265002] Read-error on swap-device (8:0:2348416)
[20446.265003] Read-error on swap-device (8:0:2348544)
[20446.265004] Read-error on swap-device (8:0:2348672)
[20446.265005] Read-error on swap-device (8:0:2348800)
[20446.265005] Read-error on swap-device (8:0:2348928)
[20446.265006] Read-error on swap-device (8:0:2349056)
[20446.265036] sd 0:0:0:0: rejecting I/O to offline device
[20446.265047] Read-error on swap-device (8:0:4545664)
[20446.265055] master[1628]: bad frame in setup_rt_frame: 00003fffde3f8490 nip 0000000026ea59cc lr 0000000026ea59cc
[20446.265063] master[1628]: bad frame in setup_rt_frame: 00003fffde3f8490 nip 0000000026ea59cc lr 0000000026ea59cc
[20446.265074] Read-error on swap-device (8:0:4545792)
[20446.265076] Read-error on swap-device (8:0:4545920)
[20446.265077] Read-error on swap-device (8:0:4546048)
[20446.265078] Read-error on swap-device (8:0:4546176)
[20446.265080] Read-error on swap-device (8:0:4546304)
[20446.265081] Read-error on swap-device (8:0:4546432)
[20446.265082] Read-error on swap-device (8:0:4546560)
[20446.265592] blk_update_request: I/O error, dev sda, sector 17075256
[20446.265632] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[20446.265673] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 01 04 94 38 00 04 00 00
[20446.265726] blk_update_request: I/O error, dev sda, sector 17077304
[20446.265762] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
[20446.265804] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 01 04 90 38 00 04 00 00
[20446.265839] blk_update_request: I/O error, dev sda, sector 17076280
[20446.265880] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
[20446.265919] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 01 04 a8 b8 00 04 00 00
[20446.265951] blk_update_request: I/O error, dev sda, sector 17082552
[20446.265986] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[20446.266026] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 01 04 ac b8 00 04 00 00
[20446.266058] blk_update_request: I/O error, dev sda, sector 17083576
[20446.266093] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[20446.266132] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 01 04 b0 b8 00 04 00 00
[20446.266164] blk_update_request: I/O error, dev sda, sector 17084600
[20446.266242] sd 0:0:0:0: rejecting I/O to offline device
[20446.266278] sd 0:0:0:0: rejecting I/O to offline device
[20446.266306] sd 0:0:0:0: rejecting I/O to offline device
[20446.266334] sd 0:0:0:0: rejecting I/O to offline device
[20446.266363] sd 0:0:0:0: rejecting I/O to offline device
[20446.266390] sd 0:0:0:0: rejecting I/O to offline device
[20446.266419] sd 0:0:0:0: rejecting I/O to offline device
[20446.266447] sd 0:0:0:0: rejecting I/O to offline device
[20446.266540] sd 0:0:0:0: rejecting I/O to offline device
[20446.266568] Read-error on swap-device (8:0:1302784)
[20446.266595] sd 0:0:0:0: rejecting I/O to offline device
[20446.266622] Read-error on swap-device (8:0:1303040)
[20446.266649] sd 0:0:0:0: rejecting I/O to offline device
[20446.266675] Read-error on swap-device (8:0:1303424)
[20446.266735] sd 0:0:0:0: rejecting I/O to offline device
[20446.266762] Read-error on swap-device (8:0:1522304)
[20446.266788] Read-error on swap-device (8:0:1522432)
[20446.266814] Read-error on swap-device (8:0:1522560)
[20446.266840] Read-error on swap-device (8:0:1522688)
[20446.266903] XFS (sda5): metadata I/O error: block 0x1185b5e ("xlog_iodone") error 5 numblks 64
[20446.266949] XFS (sda5): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xd000000000d6c208
[20446.267023] XFS (sda5): Log I/O Error Detected.  Shutting down filesystem
[20446.267055] XFS (sda5): Please umount the filesystem and rectify the problem(s)
[20446.267095] XFS (sda5): metadata I/O error: block 0x1185b1e ("xlog_iodone") error 5 numblks 64
[20446.267140] XFS (sda5): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xd000000000d6c208
[20446.267198] XFS (sda5): metadata I/O error: block 0x1185ade ("xlog_iodone") error 5 numblks 64
[20446.267244] XFS (sda5): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xd000000000d6c208
[20446.267301] XFS (sda5): metadata I/O error: block 0x1185a9e ("xlog_iodone") error 5 numblks 64
[20446.267346] XFS (sda5): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xd000000000d6c208
[20446.267403] XFS (sda5): metadata I/O error: block 0x1185a5e ("xlog_iodone") error 5 numblks 64
[20446.267448] XFS (sda5): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xd000000000d6c208
[20446.267506] XFS (sda5): metadata I/O error: block 0x1185a1e ("xlog_iodone") error 5 numblks 64
[20446.267551] XFS (sda5): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xd000000000d6c208
[20446.267610] XFS (sda5): metadata I/O error: block 0x11859de ("xlog_iodone") error 5 numblks 64
[20446.267655] XFS (sda5): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xd000000000d6c208
[20446.267713] XFS (sda5): metadata I/O error: block 0x118599e ("xlog_iodone") error 5 numblks 64
[20446.267758] XFS (sda5): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xd000000000d6c208
[20446.293524] sd 0:0:0:0: rejecting I/O to offline device
[20446.293652] Read-error on swap-device (8:0:1301120)
[20446.293744] sd 0:0:0:0: rejecting I/O to offline device
[20446.293834] Read-error on swap-device (8:0:1301376)
[20446.295057] sd 0:0:0:0: rejecting I/O to offline device
[20446.295151] Read-error on swap-device (8:0:2349184)
[20446.295250] Read-error on swap-device (8:0:2349312)
[20446.295356] Read-error on swap-device (8:0:2349440)
[20446.295462] Read-error on swap-device (8:0:2349568)
[20446.295568] Read-error on swap-device (8:0:2349696)
[20446.295673] Read-error on swap-device (8:0:2349824)
[20446.295778] Read-error on swap-device (8:0:2349952)
[20446.295884] Read-error on swap-device (8:0:2350080)
[20446.296012] sd 0:0:0:0: rejecting I/O to offline device
[20446.296067] Read-error on swap-device (8:0:2352000)
[20446.296095] Read-error on swap-device (8:0:2352128)
[20446.296139] systemd[1]: bad frame in setup_rt_frame: 00003fffe037c800 nip 00003fff7f6c478c lr 0000000058c68150
[20446.296198] systemd[1]: bad frame in setup_rt_frame: 00003fffe037c800 nip 00003fff7f6c478c lr 0000000058c68150
[20446.296577] sd 0:0:0:0: rejecting I/O to offline device
[20446.296671] Read-error on swap-device (8:0:4550784)
[20446.296766] Read-error on swap-device (8:0:4550912)
[20446.296861] Read-error on swap-device (8:0:4551040)
[20446.296956] Read-error on swap-device (8:0:4551168)
[20446.297054] sd 0:0:0:0: rejecting I/O to offline device
[20446.297151] Read-error on swap-device (8:0:4551424)
[20446.297247] Read-error on swap-device (8:0:4551552)
[20446.297342] Read-error on swap-device (8:0:4551680)
[20446.297711] audit: *NO* daemon at audit_pid=578
[20446.298314] sd 0:0:0:0: rejecting I/O to offline device
[20446.298347] Read-error on swap-device (8:0:1292416)
[20446.325321] sd 0:0:0:0: rejecting I/O to offline device
[20446.325383] Read-error on swap-device (8:0:1366656)
[20446.325414] sd 0:0:0:0: rejecting I/O to offline device
[20446.325441] Read-error on swap-device (8:0:1366912)
[20446.325587] XFS (sda5): xfs_log_force: error -5 returned.
[20446.357960] sd 0:0:0:0: rejecting I/O to offline device
[20446.358006] Read-error on swap-device (8:0:1053824)
[20446.358034] Read-error on swap-device (8:0:1053952)
[20446.358061] Read-error on swap-device (8:0:1054080)
[20446.358088] Read-error on swap-device (8:0:1054208)
[20446.358115] Read-error on swap-device (8:0:1054336)
[20446.358145] sd 0:0:0:0: rejecting I/O to offline device
[20446.358172] Read-error on swap-device (8:0:1054592)
[20446.358198] Read-error on swap-device (8:0:1054720)
[20446.525908] sd 0:0:0:0: rejecting I/O to offline device
[20446.525979] Read-error on swap-device (8:0:1233024)
[20446.526008] Read-error on swap-device (8:0:1233152)
[20446.526035] Read-error on swap-device (8:0:1233280)
[20446.526061] Read-error on swap-device (8:0:1233408)
[20446.526093] Read-error on swap-device (8:0:1233536)
[20446.526120] Read-error on swap-device (8:0:1233664)
[20446.526226] sd 0:0:0:0: rejecting I/O to offline device
[20446.526253] Read-error on swap-device (8:0:1229952)
[20446.526279] Read-error on swap-device (8:0:1230080)
[20446.526308] Read-error on swap-device (8:0:1230208)
[20446.526334] Read-error on swap-device (8:0:1230336)
[20446.526361] Read-error on swap-device (8:0:1230464)
[20446.526386] Read-error on swap-device (8:0:1230592)
[20446.526413] Read-error on swap-device (8:0:1230720)
[20446.526439] Read-error on swap-device (8:0:1230848)
[20446.526483] runltp[17789]: bad frame in setup_rt_frame: 00003fffdfdd27c0 nip 00003fff7ebf478c lr 000000001004fc54
[20446.526541] runltp[17789]: bad frame in setup_rt_frame: 00003fffdfdd27c0 nip 00003fff7ebf478c lr 000000001004fc54
[20446.526806] sd 0:0:0:0: rejecting I/O to offline device
[20446.526836] Read-error on swap-device (8:0:4553856)
[20446.526864] Read-error on swap-device (8:0:4553984)
[20446.526891] Read-error on swap-device (8:0:4554112)
[20446.526917] Read-error on swap-device (8:0:4554240)
[20446.526951] Read-error on swap-device (8:0:4554368)
[20446.526986] Read-error on swap-device (8:0:4554496)
[20446.527023] Read-error on swap-device (8:0:4554624)
[20446.527060] Read-error on swap-device (8:0:4554752)
[20446.527111] sd 0:0:0:0: rejecting I/O to offline device
[20446.527115] screen[1214]: bad frame in setup_rt_frame: 00003fffc72dfd00 nip 00000000433f603c lr 00000000433f603c
[20446.527130] screen[1214]: bad frame in setup_rt_frame: 00003fffc72dfd00 nip 00000000433f603c lr 00000000433f603c
[20446.527549] Read-error on swap-device (8:0:1234304)
[20446.527577] Read-error on swap-device (8:0:1234432)
[20446.527605] Read-error on swap-device (8:0:1234560)
[20446.527634] Read-error on swap-device (8:0:1234688)
[20446.527663] Read-error on swap-device (8:0:1234816)
[20446.527691] Read-error on swap-device (8:0:1234944)
[20446.533033] sd 0:0:0:0: rejecting I/O to offline device
[20446.533156] Read-error on swap-device (8:0:1056896)
[20446.533250] Read-error on swap-device (8:0:1057024)
[20446.533340] Read-error on swap-device (8:0:1057152)
[20446.533429] Read-error on swap-device (8:0:1057280)
[20446.533519] Read-error on swap-device (8:0:1057408)
[20446.533608] Read-error on swap-device (8:0:1057536)
[20446.533697] Read-error on swap-device (8:0:1057664)
[20446.533786] Read-error on swap-device (8:0:1057792)
[20446.533913] make[371]: bad frame in setup_rt_frame: 00003fffc8241820 nip 00003fff981109ac lr 000000001001563c
[20446.536655] sd 0:0:0:0: rejecting I/O to offline device
[20446.536699] Read-error on swap-device (8:0:1193216)
[20446.536734] sd 0:0:0:0: rejecting I/O to offline device
[20446.536761] Read-error on swap-device (8:0:1193856)
[20446.536831] systemd-journald[32174]: /run/log/journal/f500732e88ed4f15b18f051ba27149e9/system.journal: Journal file corrupted, rotating.
[20447.002146] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
               
[20447.002396] CPU: 0 PID: 1 Comm: systemd Not tainted 3.10.0-325.el7.ppc64le #1
[20447.002438] Call Trace:
[20447.002489] [c00000007eec39a0] [c000000000017760] show_stack+0x80/0x330 (unreliable)
[20447.002559] [c00000007eec3a50] [c00000000095ea88] dump_stack+0x30/0x44
[20447.002602] [c00000007eec3a70] [c0000000009548b8] panic+0xf4/0x26c
[20447.002649] [c00000007eec3af0] [c0000000000da680] do_exit+0xb50/0xb60
[20447.002689] [c00000007eec3be0] [c0000000000da74c] do_group_exit+0x5c/0x100
[20447.002729] [c00000007eec3c20] [c0000000000f3130] get_signal_to_deliver+0x210/0xa20
[20447.002775] [c00000007eec3d10] [c00000000001828c] do_signal+0x5c/0x320
[20447.002815] [c00000007eec3e00] [c0000000000186ac] do_notify_resume+0x8c/0x100
[20447.002855] [c00000007eec3e30] [c00000000000a730] ret_from_except_lite+0x5c/0x60

Version-Release number of selected component (if applicable):
Guest and host kernel: 3.10.0-325.el7.ppc64le
QEMU: qemu-kvm-rhev-2.3.0-31.el7.ppc64le

How reproducible:
Only seen it once so far

Steps to Reproduce:
See bug 1269467 for how to set up the guests.

Actual results:
Guest crashed

Expected results:
Guest should not crash

Additional info:
Not sure if it is related, but earlier today, the partition that contains the guest disk images ran full, so the guests got paused inbetween. After making some space, I resumed the guests and they continued running. The crash happened a couple of hours later.
Also, not sure if it is related, but in the libvirt qemu log file, there are these messages about "vscsi_process_tsk_mgmt":

2015-10-20 07:18:44.827+0000: starting up libvirt version: 1.2.17, package: 12.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-09-30-10:20:47, ppc-026.build.eng.bos.redhat.com), qemu version: 2.3.0 (qemu-kvm-rhev-2.3.0-31.el7)
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name bz1256718g16 -S -machine pseries-rhel7.2.0,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid cdcec26c-fdf6-400b-bb84-43d87c2a3d2b -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-bz1256718g16/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device spapr-vscsi,id=scsi0,reg=0x2000 -usb -drive file=/scratch/bz1256718g16.img,if=none,id=drive-scsi0-0-0-0,format=qcow2 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=71,id=hostnet0,vhost=on,vhostfd=65 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:58:fa:ce:85:16,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,reg=0x30000000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg timestamp=on
char device redirected to /dev/pts/36 (label charserial0)
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 01
vscsi_process_tsk_mgmt 08

Comment 3 David Gibson 2015-10-20 23:31:52 UTC

I don't think it's plausible we'll fix this for RHEL 7.2 (plus qemu-kvm-rhev has a slightly different release schedule).  Moving to RHEL 7.2.z.  It also looks like it is a problem with the spapr-vscsi driver, which I don't think will be our recommended choice for normal operations, so I think we can afford to defer on that basis.

Comment 4 David Gibson 2015-10-20 23:51:21 UTC

A quick look at the spapr-vscsi code suggests that the "vscsi_process_tsk_mgmt" messages probably aren't related.  They look to be the result of an unconditional fprintf() in what appears to be a standard if not particularly often called code path.  Which is a bug in itself, but probably not related to the problem here.

Comment 5 Qunfang Zhang 2015-10-21 03:31:24 UTC

Hi, David and Thomas

During the testing on the host kernel-3.10.0-325.el7.ppc64le, I usually hit the issue that guest failed to boot with some xfs error like below:

2015-10-20 03:18:19: [   21.151786] XFS (dm-0): Internal error XFS_WANT_CORRUPTED_GOTO at line 1600 of file fs/xfs/libxfs/xfs_alloc.c.  Caller xfs_free_extent+0x118/0x160 [xfs]
2015-10-20 03:18:19: [   21.151892] CPU: 1 PID: 981 Comm: accounts-daemon Not tainted 3.10.0-325.el7.ppc64le #1
2015-10-20 03:18:19: [   21.151937] Call Trace:
2015-10-20 03:18:19: [   21.151982] [c000000006ca3690] [c000000000017760] show_stack+0x80/0x330 (unreliable)
2015-10-20 03:18:19: [   21.152078] [c000000006ca3740] [c00000000095ea88] dump_stack+0x30/0x44
2015-10-20 03:18:19: [   21.152143] [c000000006ca3760] [d0000000016c5c0c] xfs_error_report+0x5c/0x70 [xfs]
2015-10-20 03:18:19: [   21.152206] [c000000006ca37c0] [d00000000166be08] xfs_free_ag_extent+0x558/0xa80 [xfs]
2015-10-20 03:18:19: [   21.152270] [c000000006ca3880] [d00000000166fba8] xfs_free_extent+0x118/0x160 [xfs]
2015-10-20 03:18:19: [   21.152335] [c000000006ca3930] [d0000000016bd114] xfs_bmap_finish+0x1c4/0x230 [xfs]
2015-10-20 03:18:19: [   21.152402] [c000000006ca3990] [d0000000016dbce8] xfs_itruncate_extents+0x1b8/0x360 [xfs]
2015-10-20 03:18:19: [   21.152468] [c000000006ca3a90] [d0000000016dbf50] xfs_inactive_truncate+0xc0/0x2d0 [xfs]
2015-10-20 03:18:19: [   21.152534] [c000000006ca3af0] [d0000000016dc508] xfs_inactive+0x178/0x1a0 [xfs]
2015-10-20 03:18:19: [   21.152600] [c000000006ca3b20] [d0000000016e3c94] xfs_fs_evict_inode+0x94/0x110 [xfs]
2015-10-20 03:18:19: [   21.152659] [c000000006ca3b60] [c00000000032b5d0] evict+0xf0/0x350
2015-10-20 03:18:19: [   21.152705] [c000000006ca3ba0] [c000000000321e58] d_kill+0x148/0x230
2015-10-20 03:18:19: [   21.152750] [c000000006ca3be0] [c000000000322db0] dput+0x280/0x2b0
2015-10-20 03:18:19: [   21.152796] [c000000006ca3c30] [c00000000031b314] SyS_renameat2+0x6f4/0x7b0
2015-10-20 03:18:19: [   21.152845] [c000000006ca3e10] [c00000000031b440] SyS_rename+0x30/0x50
2015-10-20 03:18:19: [   21.152895] [c000000006ca3e30] [c00000000000a17c] system_call+0x38/0xb4
2015-10-20 03:18:19: [   21.152945] XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 135 of file fs/xfs/xfs_bmap_util.c.  Return address = 0xd0000000016bd15c
2015-10-20 03:18:19: [   21.168398] XFS (dm-0): Corruption of in-memory data detected.  Shutting down filesystem
2015-10-20 03:18:19: [   21.168471] XFS (dm-0): Please umount the filesystem and rectify the problem(s)


Is this a possible similar issue with this one or might be a new one?  Currently it's easy to reproduce in my environment with the following version:

kernel-3.10.0-325.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7.ppc64le

I will attach a full guest console log.

Thanks,
Qunfang

Comment 6 Qunfang Zhang 2015-10-21 03:32:06 UTC

Created attachment 1084961 [details]
Guest console log

Comment 7 David Gibson 2015-10-21 04:15:55 UTC

Qunfang,

My best guess would be tha the XFS problem is a different bug, but it's possible it's related to this one.

Are you able to reproduce the XFS problem with a newly installed guest, or only with an existing guest image?  If it's the second, it's possible your image has been corrupted, rather than it being a problem with the actual guest code.

Comment 8 Qunfang Zhang 2015-10-21 04:32:19 UTC

(In reply to David Gibson from comment #7)
> Qunfang,
> 
> My best guess would be tha the XFS problem is a different bug, but it's
> possible it's related to this one.
> 
> Are you able to reproduce the XFS problem with a newly installed guest, or
> only with an existing guest image?  If it's the second, it's possible your
> image has been corrupted, rather than it being a problem with the actual
> guest code.

David

It's reproduced on a newly installed guest. I was running a test loop to test bug bug 1269467 yesterday, first it installed a rhel7.2 guest and then update kernel to -325, and then it implemented a simple "boot" case. This issue happened at that moment.  Could I file a new bz to track it in case it's different issue? 

Thanks,
Qunfang

Comment 9 Laurent Vivier 2015-10-21 18:46:42 UTC

The cause of the VM crash is the init process exited:

[20447.002146] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

It seems the init process, systemd, exited because it was not able to manage a corrupted journal file:

[20446.536831] systemd-journald[32174]: /run/log/journal/f500732e88ed4f15b18f051ba27149e9/system.journal: Journal file corrupted, rotating.

But it also seems swap is totally unavailable:

[20446.533786] Read-error on swap-device (8:0:1057792)

So some userspace processes also crash:

[20446.296139] systemd[1]: bad frame in setup_rt_frame: 00003fffe037c800 nip 000
03fff7f6c478c lr 0000000058c68150
[20446.527130] screen[1214]: bad frame in setup_rt_frame: 00003fffc72dfd00 nip 00000000433f603c lr 00000000433f603c
[20446.533913] make[371]: bad frame in setup_rt_frame: 00003fffc8241820 nip 00003fff981109ac lr 000000001001563c

The root cause is the disk has been offlined:

[20446.251804] sd 0:0:0:0: [sda] Medium access timeout failure. Offlining disk!

The kernel tried to reset the disk, with no result:

[20446.251881] sd 0:0:0:0: resetting device. lun 0x8000000000000000
[20446.251921] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_ABORT driverbyte=DR
IVER_OK
[20446.251964] sd 0:0:0:0: [sda] CDB: Write(10) 2a 00 01 04 84 38 00 04 00 00
[20446.252008] blk_update_request: I/O error, dev sda, sector 17073208

So we can guess the problem is at the qemu level, but connecting gdb to the qemu process show us that nothing is hanging:

  Id   Target Id         Frame 
  6    Thread 0x3fffab81eaf0 (LWP 26018) "qemu-kvm" syscall ()
    at ../sysdeps/unix/sysv/linux/powerpc/syscall.S:29
  5    Thread 0x3fffaa7eeaf0 (LWP 26026) "qemu-kvm" 0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690c30, mutex=0x2859cbb8 <qemu_global_mutex>)
    at pthread_cond_wait.c:155
  4    Thread 0x3fffa9fceaf0 (LWP 26027) "qemu-kvm" 0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690cc0, mutex=0x2859cbb8 <qemu_global_mutex>)
    at pthread_cond_wait.c:155
  3    Thread 0x3fffa97aeaf0 (LWP 26028) "qemu-kvm" 0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690db0, mutex=0x2859cbb8 <qemu_global_mutex>)
    at pthread_cond_wait.c:155
  2    Thread 0x3fffa8f8eaf0 (LWP 26029) "qemu-kvm" 0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690f00, mutex=0x2859cbb8 <qemu_global_mutex>)
    at pthread_cond_wait.c:155
* 1    Thread 0x3fffab825400 (LWP 26008) "qemu-kvm" 0x00003fffad248194 in __GI_ppoll (fds=0x10015761a00, nfds=<optimized out>, timeout=<optimized out>, 
    sigmask=<optimized out>) at ../sysdeps/unix/sysv/linux/ppoll.c:56

All threads are ready and waiting for events...

Qemu monitor can give us some information:

info block -v
drive-scsi0-0-0-0: /scratch/bz1256718g16.img (qcow2)
    Cache mode:       writeback
    Backing file:     /var/lib/libvirt/images/bz1256718.img (chain depth: 1)

Images:
image: /scratch/bz1256718g16.img
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 7.7G
cluster_size: 65536
backing file: /var/lib/libvirt/images/bz1256718.img
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
image: /var/lib/libvirt/images/bz1256718.img
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 20G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: true
    refcount bits: 16
    corrupt: false

info block-jobs
No active jobs

info blockstats
drive-scsi0-0-0-0: rd_bytes=2017307136 wr_bytes=33990882304 rd_operations=32470 wr_operations=389920 flush_operations=37547 wr_total_time_ns=132269016401546 rd_total_time_ns=4856657752054 flush_total_time_ns=14766355215433 rd_merged=0 wr_merged=0

But there is no obvious reason of the disk timeout.

Comment 10 David Gibson 2015-10-22 04:33:07 UTC

It looks to me like the disk has stopped responding to requests completely.  I think that would explain all the several guest side symptoms mentioned in comment 9.

I think attaching gdb is the right place to start.  Because of all the VMs still running there everything is slow, though.

So, I took a core from the live qemu process using gcore(1), which I've transferred over to ibm-p8-virt-01:~dwg/bz1274509/core.26008

I've installed the debuginfos there as well, and having a look.

All thread backtraces:

(gdb) thread 1
[Switching to thread 1 (Thread 0x3fffab81eaf0 (LWP 26018))]
#0  syscall () at ../sysdeps/unix/sysv/linux/powerpc/syscall.S:29
29		PSEUDO_RET
(gdb) bt
#0  syscall () at ../sysdeps/unix/sysv/linux/powerpc/syscall.S:29
#1  0x0000000028381aec in futex_wait (val=4294967295, ev=<optimized out>) at util/qemu-thread-posix.c:301
#2  qemu_event_wait (ev=<optimized out>) at util/qemu-thread-posix.c:399
#3  0x000000002839638c in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:233
#4  0x00003fffae108728 in start_thread (arg=0x3fffab81eaf0) at pthread_create.c:310
#5  0x00003fffad257ae0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:109

(gdb) thread 2
[Switching to thread 2 (Thread 0x3fffaa7eeaf0 (LWP 26026))]
#0  0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690c30, mutex=0x2859cbb8 <qemu_global_mutex>) at pthread_cond_wait.c:155
155	      lll_futex_wait (&cond->__data.__futex, futex_val, pshared);
(gdb) bt
#0  0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690c30, mutex=0x2859cbb8 <qemu_global_mutex>) at pthread_cond_wait.c:155
#1  0x00000000283815d8 in qemu_cond_wait (cond=<optimized out>, mutex=<optimized out>) at util/qemu-thread-posix.c:132
#2  0x0000000028011a1c in qemu_kvm_wait_io_event (cpu=<optimized out>) at /usr/src/debug/qemu-2.3.0/cpus.c:912
#3  qemu_kvm_cpu_thread_fn (arg=0x10016d50000) at /usr/src/debug/qemu-2.3.0/cpus.c:949
#4  0x00003fffae108728 in start_thread (arg=0x3fffaa7eeaf0) at pthread_create.c:310
#5  0x00003fffad257ae0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:109

(gdb) thread 3
[Switching to thread 3 (Thread 0x3fffa9fceaf0 (LWP 26027))]
#0  0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690cc0, mutex=0x2859cbb8 <qemu_global_mutex>) at pthread_cond_wait.c:155
155	      lll_futex_wait (&cond->__data.__futex, futex_val, pshared);
(gdb) bt
#0  0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690cc0, mutex=0x2859cbb8 <qemu_global_mutex>) at pthread_cond_wait.c:155
#1  0x00000000283815d8 in qemu_cond_wait (cond=<optimized out>, mutex=<optimized out>) at util/qemu-thread-posix.c:132
#2  0x0000000028011a1c in qemu_kvm_wait_io_event (cpu=<optimized out>) at /usr/src/debug/qemu-2.3.0/cpus.c:912
#3  qemu_kvm_cpu_thread_fn (arg=0x10016d78000) at /usr/src/debug/qemu-2.3.0/cpus.c:949
#4  0x00003fffae108728 in start_thread (arg=0x3fffa9fceaf0) at pthread_create.c:310
#5  0x00003fffad257ae0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:109

(gdb) thread 4
[Switching to thread 4 (Thread 0x3fffa97aeaf0 (LWP 26028))]
#0  0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690db0, mutex=0x2859cbb8 <qemu_global_mutex>) at pthread_cond_wait.c:155
155	      lll_futex_wait (&cond->__data.__futex, futex_val, pshared);
(gdb) bt
#0  0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690db0, mutex=0x2859cbb8 <qemu_global_mutex>) at pthread_cond_wait.c:155
#1  0x00000000283815d8 in qemu_cond_wait (cond=<optimized out>, mutex=<optimized out>) at util/qemu-thread-posix.c:132
#2  0x0000000028011a1c in qemu_kvm_wait_io_event (cpu=<optimized out>) at /usr/src/debug/qemu-2.3.0/cpus.c:912
#3  qemu_kvm_cpu_thread_fn (arg=0x10016da0000) at /usr/src/debug/qemu-2.3.0/cpus.c:949
#4  0x00003fffae108728 in start_thread (arg=0x3fffa97aeaf0) at pthread_create.c:310
#5  0x00003fffad257ae0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:109

(gdb) thread 5
[Switching to thread 5 (Thread 0x3fffa8f8eaf0 (LWP 26029))]
#0  0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690f00, mutex=0x2859cbb8 <qemu_global_mutex>) at pthread_cond_wait.c:155
155	      lll_futex_wait (&cond->__data.__futex, futex_val, pshared);
(gdb) bt
#0  0x00003fffae10dd50 in __pthread_cond_wait (cond=0x10015690f00, mutex=0x2859cbb8 <qemu_global_mutex>) at pthread_cond_wait.c:155
#1  0x00000000283815d8 in qemu_cond_wait (cond=<optimized out>, mutex=<optimized out>) at util/qemu-thread-posix.c:132
#2  0x0000000028011a1c in qemu_kvm_wait_io_event (cpu=<optimized out>) at /usr/src/debug/qemu-2.3.0/cpus.c:912
#3  qemu_kvm_cpu_thread_fn (arg=0x10016dc8000) at /usr/src/debug/qemu-2.3.0/cpus.c:949
#4  0x00003fffae108728 in start_thread (arg=0x3fffa8f8eaf0) at pthread_create.c:310
#5  0x00003fffad257ae0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:109

(gdb) thread 6
[Switching to thread 6 (Thread 0x3fffab825400 (LWP 26008))]
#0  0x00003fffad248194 in __GI_ppoll (fds=0x10015761a00, nfds=<optimized out>, timeout=<optimized out>, sigmask=<optimized out>)
    at ../sysdeps/unix/sysv/linux/ppoll.c:56
56	      result = INLINE_SYSCALL (ppoll, 5, fds, nfds, timeout, sigmask,
(gdb) bt
#0  0x00003fffad248194 in __GI_ppoll (fds=0x10015761a00, nfds=<optimized out>, timeout=<optimized out>, sigmask=<optimized out>)
    at ../sysdeps/unix/sysv/linux/ppoll.c:56
#1  0x00000000282ebeec in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at qemu-timer.c:310
#3  0x00000000282eafcc in os_host_main_loop_wait (timeout=-1) at main-loop.c:248
#4  main_loop_wait (nonblocking=<optimized out>) at main-loop.c:503
#5  0x0000000027fcc630 in main_loop () at vl.c:1818
#6  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4394


Looks like threads 2, 3, 4, 5 are the vcpu threads, so 1 & 6 are likely to be the ones handling IO.

I think we'll need to try to look into the spapr-vscsi internal data structures to see if we can spot something awry.

Comment 11 David Gibson 2015-10-22 04:39:41 UTC

Thought of another approach, to get some info via the qemu hmp monitor.  I'm still taking a rather scattershot approach at the moment, in the hopes that some thread to tug shows up.

ibm-p8-virt-02:~$ sudo virsh qemu-monitor-command --hmp bz1256718g16 info block
drive-scsi0-0-0-0: /scratch/bz1256718g16.img (qcow2)
    Cache mode:       writeback
    Backing file:     /var/lib/libvirt/images/bz1256718.img (chain depth: 1)

ibm-p8-virt-02:~$ sudo virsh qemu-monitor-command --hmp bz1256718g16 info qtree
bus: main-system-bus
  type System
  dev: spapr-pci-host-bridge, id ""
    index = 0 (0x0)
    buid = 576460752840294400 (0x800000020000000)
    liobn = 2147483648 (0x80000000)
    mem_win_addr = 1102195982336 (0x100a0000000)
    mem_win_size = 66571993088 (0xf80000000)
    io_win_addr = 1101659111424 (0x10080000000)
    io_win_size = 65536 (0x10000)
    dynamic-reconfiguration = true
    dma_win_addr = 0 (0x0)
    dma_win_size = 1073741824 (0x40000000)
    bus: pci.0
      type PCI
      dev: virtio-balloon-pci, id "balloon0"
        class = 255 (0xff)
        virtio-pci-bus-master-bug-migration = false
        disable-legacy = false
        disable-modern = true
        addr = 03.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        class Class 00ff, addr 00:03.0, pci id 1af4:1002 (sub 1af4:0005)
        bar 0: i/o at 0x20 [0x3f]
        bus: virtio-bus
          type virtio-pci-bus
          dev: virtio-balloon-device, id ""
            indirect_desc = true
            event_idx = true
            notify_on_empty = true
            any_layout = true
      dev: virtio-net-pci, id "net0"
        ioeventfd = false
        vectors = 3 (0x3)
        virtio-pci-bus-master-bug-migration = false
        disable-legacy = false
        disable-modern = true
        addr = 01.0
        romfile = "pxe-virtio.rom"
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        class Ethernet controller, addr 00:01.0, pci id 1af4:1000 (sub 1af4:0001)
        bar 0: i/o at 0x40 [0x5f]
        bar 1: mem at 0xc0000000 [0xc0000fff]
        bar 6: mem at 0xffffffffffffffff [0x3fffe]
        bus: virtio-bus
          type virtio-pci-bus
          dev: virtio-net-device, id ""
            csum = true
            guest_csum = true
            gso = true
            guest_tso4 = true
            guest_tso6 = true
            guest_ecn = true
            guest_ufo = true
            guest_announce = true
            host_tso4 = true
            host_tso6 = true
            host_ecn = true
            host_ufo = true
            mrg_rxbuf = true
            status = true
            ctrl_vq = true
            ctrl_rx = true
            ctrl_vlan = true
            ctrl_rx_extra = true
            ctrl_mac_addr = true
            ctrl_guest_offloads = true
            mq = false
            mac = "52:58:fa:ce:85:16"
            vlan = <null>
            netdev = "hostnet0"
            x-txtimer = 150000 (0x249f0)
            x-txburst = 256 (0x100)
            tx = ""
            indirect_desc = true
            event_idx = true
            notify_on_empty = true
            any_layout = true
      dev: pci-ohci, id ""
        masterbus = ""
        num-ports = 3 (0x3)
        firstport = 0 (0x0)
        addr = 00.0
        romfile = ""
        rombar = 1 (0x1)
        multifunction = false
        command_serr_enable = true
        class USB controller, addr 00:00.0, pci id 106b:003f (sub 1af4:1100)
        bar 0: mem at 0xc0080000 [0xc00800ff]
        bus: usb-bus.0
          type usb-bus
  dev: spapr-vio-bridge, id ""
    bus: spapr-vio
      type spapr-vio-bus
      dev: spapr-vty, id "vty@30000000"
        reg = 805306368 (0x30000000)
        chardev = "charserial0"
        irq = 4103 (0x1007)
      dev: spapr-vscsi, id "scsi0"
        reg = 8192 (0x2000)
        irq = 4102 (0x1006)
        bus: scsi0.0
          type SCSI
          dev: scsi-hd, id "scsi0-0-0-0"
            drive = "drive-scsi0-0-0-0"
            logical_block_size = 512 (0x200)
            physical_block_size = 512 (0x200)
            min_io_size = 0 (0x0)
            opt_io_size = 0 (0x0)
            discard_granularity = 4096 (0x1000)
            ver = "2.3.0"
            serial = ""
            vendor = "QEMU"
            product = "QEMU HARDDISK"
            removable = false
            dpofua = false
            wwn = 0 (0x0)
            port_wwn = 0 (0x0)
            port_index = 0 (0x0)
            max_unmap_size = 1073741824 (0x40000000)
            max_io_size = 2147483647 (0x7fffffff)
            cyls = 16383 (0x3fff)
            heads = 16 (0x10)
            secs = 63 (0x3f)
            channel = 0 (0x0)
            scsi-id = 0 (0x0)
            lun = 0 (0x0)
      dev: spapr-nvram, id "nvram@71000000"
        reg = 1895825408 (0x71000000)
        drive = ""
        irq = 4097 (0x1001)
  dev: spapr-rtc, id ""
  dev: xics-kvm, id ""

ibm-p8-virt-02:~$ sudo virsh qemu-monitor-command --hmp bz1256718g16 info block-jobs
No active jobs

Comment 12 David Gibson 2015-10-22 04:50:19 UTC

Took a look in the logs in /var/log/libvirt/qemu/*

There doesn't look to be anything in bz1256718g16.log except the vscsi_process_tsk_mgmt messages Thomas already mentioned.

I cross checked those in the other logfiles, and they appear in many, so initially that seemed to confirm my guess that the message was unrelated to the problems.

BUT then I noticed that most of the logs include "vscsi_process_tsk_mgmt 01".  Only two of the logs include "vscsi_process_tsk_mgmt 08": bz1256718g16.log and bz12567g23.log.

g23 doesn't show as crashed, but when I went to the console, it's spewing 
        [163473.344034] XFS (sda5): xfs_log_force: error -5 returned.



So it seems that "vscsi_process_tsk_mgmt 08" - which is the last message in the logs for both g16 and g23, is an indication that something has gone badly wrong.


But, actually, that doesn't seem so illuminating as I hoped.  Looks like that code is just the code for the operation that vscsi_process_tsk_mgmt() is being asked to perform.  08 is:
    SRP_TSK_LUN_RESET      = 0x08,

So chances are that is just the signs that the guest has already hit problems and is trying to recover by resetting the LUN.  The actual problem in qemu probably occurred before that.

Comment 13 David Gibson 2015-10-22 04:52:17 UTC

Laurent,

I'm out of time for today.  Does any of the poking about above give you some things to try?

The current state of the qemu threads probably isn't very useful, as you've noted.  I think the next place to look is in the spapr-vscsi internal data structures.  Unfortunately navigating to them from globals via the QOM heirarchy labyrinth will probably be pretty tricky :/.

Comment 14 David Gibson 2015-10-22 05:01:32 UTC

Qunfang, regarding comment 5.

My initial guess was the XFS problem was unrelated, but given the new evidence of bz12567g23, I now strongly suspect these problems are related.

Do your XFS problems on appear when using spapr-vscsi, or do they also appear using virtio-blk and/or virtio-scsi?

Comment 15 Laurent Vivier 2015-10-22 10:36:17 UTC

I've tried to read data from the disk and it works, so the disk image is ok:

qemu-monitor-command --hmp bz1256718g16 "qemu-io drive-scsi0-0-0-0 \"read -v 0x0000 0x10000\""

Comment 16 Laurent Vivier 2015-10-22 12:34:20 UTC

(In reply to David Gibson from comment #14)
> Qunfang, regarding comment 5.
> 
> My initial guess was the XFS problem was unrelated, but given the new
> evidence of bz12567g23, I now strongly suspect these problems are related.
> 
> Do your XFS problems on appear when using spapr-vscsi, or do they also
> appear using virtio-blk and/or virtio-scsi?

According to the logs, the disk seems to be on an USB bus.

Comment 17 David Gibson 2015-10-23 00:14:54 UTC

Laurent, re comment 16.

Sorry, which logs do you mean indicate a disk on USB?

Comment 18 David Gibson 2015-10-23 00:17:11 UTC

Qunfang,

Are you able to reproduce the XFS problems with *guest* kernel versiojn -324 (or earlier?  I'm trying to work out if the XFS patches in -325 could be connected to this.

Comment 19 Qunfang Zhang 2015-10-23 01:19:00 UTC

(In reply to David Gibson from comment #18)
> Qunfang,
> 
> Are you able to reproduce the XFS problems with *guest* kernel versiojn -324
> (or earlier?  I'm trying to work out if the XFS patches in -325 could be
> connected to this.

When I hit the bug in the first place, it could be reproduced for both -324 and -325 *guest kernel* on a *host* with -325 kernel.  

For the comment 4 question, I will test it today since I did not get the host yesterday.

Comment 20 David Gibson 2015-10-23 04:03:06 UTC

Qunfang,

Thanks for the extra info in comment 19.  To clarify a different case, have you ever seen the problem with a *host* kernel earlier than that -325?

Comment 21 Qunfang Zhang 2015-10-23 08:19:55 UTC

(In reply to David Gibson from comment #20)
> Qunfang,
> 
> Thanks for the extra info in comment 19.  To clarify a different case, have
> you ever seen the problem with a *host* kernel earlier than that -325?

Hi, David and Thomas

It's proved that this is an autotest script issue, two or multiple qemu-kvm process start to boot a same guest image...  After revert that patch, the issue does not exist. Sorry for the trouble taking to you.

Qunfang

Comment 22 Qunfang Zhang 2015-10-23 08:22:02 UTC

(In reply to Qunfang Zhang from comment #21)
> (In reply to David Gibson from comment #20)
> > Qunfang,
> > 
> > Thanks for the extra info in comment 19.  To clarify a different case, have
> > you ever seen the problem with a *host* kernel earlier than that -325?
> 
> Hi, David and Thomas
> 
> It's proved that this is an autotest script issue, two or multiple qemu-kvm
> process start to boot a same guest image...  After revert that patch, the
> issue does not exist. Sorry for the trouble taking to you.

It is caused by a buggy autotest patch.

> 
> Qunfang

Comment 23 Laurent Vivier 2015-10-23 12:46:41 UTC

(In reply to David Gibson from comment #17)
> Laurent, re comment 16.
> 
> Sorry, which logs do you mean indicate a disk on USB?

Sorry David, I didn't read correctly the logs from comment 6:

2015-10-20 03:17:45: Populating /pci@800000020000000
2015-10-20 03:17:45:                      00 2800 (D) : 1af4 1000    virtio [ net ]
2015-10-20 03:17:45:                      00 2000 (D) : 1af4 1004    virtio [ scsi ]
2015-10-20 03:17:45: Populating /pci@800000020000000/scsi@4
2015-10-20 03:17:46:        SCSI: Looking for devices
2015-10-20 03:17:46:           100000000000000 DISK     : "QEMU     QEMU HARDDISK    2.3."
2015-10-20 03:17:46:                      00 1800 (D) : 106b 003f    serial bus [ usb-ohci ]
2015-10-20 03:17:46:                      00 0000 (D) : 1234 1111

The disk seems to be on virtio-scsi.

Comment 24 David Gibson 2015-10-26 00:04:44 UTC

Qunfang,

Thanks for the update.  Closing as NOTABUG.

Comment 25 David Gibson 2015-10-26 01:09:45 UTC

Sorry, got confused ther.  The XFS problems turned out to be an autotest bug, but the original spapr-vscsi problem reported by Thomas isn't.  Re-opening to track that.

Without reproducing at least a second time I doubt we'll make much progress, though.

Comment 26 David Gibson 2015-10-26 02:36:36 UTC

Ah, I think I might have an explanation for this.

I just tried restarting all the guests on ibm-p8-virt-02 to do another round of tests for bug 1269467.  Almost immediately 20 of the 30 guests went into paused state.  I tried manually resuming one and it gave errors similar to the one here.

Further investigation showed that the problem was /scratch was full, so qemu paused the guests because it needed to expand their disk images.  I've now redistributed the guests between /scratch and /var/lib/libvirt/images so there should be enough space for them.

So, I suspect that the original case was due to the same problem, but the guest images hadn't expanded quite so much yet, meaning it only hit on one VM.

Thomas, do you remember if you manually resumed a guest at any point?  Or somehow the guest might have ended up with an error mode other than enospc, so it would attempt to carry on and crash when /scratch ran out of room.

I'll leave this open a bit longer just in case we hit this again in circumstances which make it clear it's not the space problem.

Comment 27 IBM Bug Proxy 2015-10-26 14:22:29 UTC

------- Comment From hannsj_uhl.com 2015-10-26 14:21 EDT-------
.

Comment 28 Thomas Huth 2015-10-27 15:57:47 UTC

(In reply to David Gibson from comment #26)
> Thomas, do you remember if you manually resumed a guest at any point?  Or
> somehow the guest might have ended up with an error mode other than enospc,
> so it would attempt to carry on and crash when /scratch ran out of room.

As I mentioned it already in the description, I also ran into the problem that the partition that contains the disk images was full - and libvirt then of course put some guests into the paused state. I then manually deleted some of the guests (I initially defined 40, then deleted 10 of them) and then resumed the paused guests. But IIRC, the crash rather happened a couple of hours later (but my memory might be wrong here).

However, with the current test run that you've set up on the ibm-p8-rhevm-06 machine, two more guests crashed with this issue! And as far as I can see, the host partition with the disk images was not full this time. Both libvirt logs show the "vscsi_process_tsk_mgmt 08" again. The kernel output on guest 36 starts a little bit different this time:

[ 2026.139285] ltptest gf01 start
[ 2160.410767] INFO: task kworker/u8:2:222 blocked for more than 120 seconds.
[ 2217.030631] INFO: rcu_sched detected stalls on CPUs/tasks: { 2} (detected by 0, t=6099 jiffies, g=12594, c=12593, q=0)
[ 2217.030632] Task dump for CPU 2:
[ 2217.030635] kworker/u8:0    R  running task        0  5610      2 0x00000880
[ 2217.030682] Workqueue: scsi_tmf_0 scmd_eh_abort_handler
[ 2217.030683] Call Trace:
[ 2217.031301] [c0000000038336b0] [c0000000038336c0] 0xc0000000038336c0 (unreliable)
[ 2217.031303] [c000000003833880] [c00000000320aec0] 0xc00000000320aec0
[ 2217.031305] [c0000000038338e0] [00000000000000ee] 0xee
[ 2387.197167] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc0000000032091a8
[ 2387.197177] sd 0:0:0:0: successfully aborted task tag 0xc0000000032091a8
[ 2387.200353] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2387.200398] kworker/u8:2    D 0000000000000000     0   222      2 0x00000800
[ 2387.200465] Workqueue: writeback bdi_writeback_workfn (flush-8:0)
[ 2387.200509] Call Trace:
[ 2387.200551] [c0000000038fae00] [0000000000000008] 0x8 (unreliable)
[ 2387.200606] [c0000000038fafd0] [c0000000000164f8] __switch_to+0x268/0x450
[ 2387.200658] [c0000000038fb030] [c0000000009425b4] __schedule+0x324/0xa40
[ 2387.200699] [c0000000038fb100] [c00000000093eba8] schedule_timeout+0x388/0x460
[ 2387.200752] [c0000000038fb200] [c0000000009421dc] io_schedule+0xcc/0x180
[ 2387.200797] [c0000000038fb240] [c000000000460674] get_request+0x294/0x950
[ 2387.200840] [c0000000038fb380] [c000000000465364] blk_queue_bio+0xd4/0x570
[ 2387.200881] [c0000000038fb3e0] [c00000000045ff48] submit_bio+0x1a8/0x310
[ 2387.200954] [c0000000038fb490] [d000000000c79514] xfs_submit_ioend_bio.isra.9+0x54/0xa0 [xfs]
[ 2387.201019] [c0000000038fb4b0] [d000000000c79690] xfs_submit_ioend+0x130/0x190 [xfs]
[ 2387.201083] [c0000000038fb510] [d000000000c79994] xfs_vm_writepage+0x2a4/0x650 [xfs]
[ 2387.201141] [c0000000038fb5f0] [c00000000024efa8] write_cache_pages.constprop.10+0x1e8/0x510
[ 2387.201192] [c0000000038fb730] [c00000000024f320] generic_writepages+0x50/0x80
[ 2387.201258] [c0000000038fb7b0] [d000000000c77e38] xfs_vm_writepages+0x68/0xa0 [xfs]
[ 2387.201363] [c0000000038fb7f0] [c000000000250e10] do_writepages+0x50/0xe0
[ 2387.201407] [c0000000038fb860] [c000000000345ac0] __writeback_single_inode+0x60/0x340
[ 2387.201456] [c0000000038fb8b0] [c000000000347748] writeback_sb_inodes+0x358/0x660
[ 2387.201506] [c0000000038fb9c0] [c000000000347b84] __writeback_inodes_wb+0x134/0x1a0
[ 2387.201552] [c0000000038fba30] [c000000000347f6c] wb_writeback+0x37c/0x4d0
[ 2387.201592] [c0000000038fbb10] [c0000000003488bc] bdi_writeback_workfn+0x39c/0x5e0
[ 2387.201640] [c0000000038fbc50] [c000000000103cec] process_one_work+0x1dc/0x680
[ 2387.201687] [c0000000038fbcf0] [c000000000104310] worker_thread+0x180/0x500
[ 2387.201741] [c0000000038fbd80] [c00000000011061c] kthread+0xec/0x100
[ 2387.241720] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000000320af28
[ 2387.241784] sd 0:0:0:0: successfully aborted task tag 0xc00000000320af28
[ 2387.241882] sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc00000000320b2d8
[ 2387.241883] sd 0:0:0:0: aborted task tag 0xc00000000320b2d8 completed
[ 2387.241955] sd 0:0:0:0: [sda] Medium access timeout failure. Offlining disk!
[ 2387.242031] sd 0:0:0:0: resetting device. lun 0x8000000000000000
[ 2387.242039] sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
[ 2387.242042] sd 0:0:0:0: [sda] CDB: Read(10) 28 00 01 71 17 10 00 00 90 00
[ 2387.242045] blk_update_request: I/O error, dev sda, sector 24188688
[ 2387.242062] sd 0:0:0:0: rejecting I/O to offline device

The kernel output on the other crashed guest (guest 1) starts immediately with the "sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc000000003382890" again.

Comment 29 David Gibson 2015-10-28 00:27:45 UTC

Drat, yes those two crashes do look like the same problem, and we're definitely not out of disk space this time.  Looks like this is our new top-priority bug.

Comment 30 David Gibson 2015-10-28 03:30:10 UTC

I connected gdb to the qemu for guest bz1256718g36 on ibm-p8-rhevm-06.lab4.eng.bos.redhat.com.  Using gdb's nifty ability to call functions in the target (qdev_get_machine()) traced my way through to locating the VSCSIState structure for the dead disk:

(gdb) p ((VSCSIState *)0x10014dd8000)->vdev
$58 = {
  qdev = {
    parent_obj = {
      class = 0x10014c83e00, 
      free = 0x3fff99f1abb0 <g_free>, 
      properties = {
        tqh_first = 0x100164d3a70, 
        tqh_last = 0x100164d4a50
      }, 
      ref = 4, 
      parent = 0x10014c50b70
    }, 
    id = 0x10014c70498 "scsi0", 
    realized = true, 
    pending_deleted_event = false, 
    opts = 0x10014d102d0, 
    hotplugged = 0, 
    parent_bus = 0x10014ca4890, 
    gpios = {
      lh_first = 0x0
    }, 
    child_bus = {
      lh_first = 0x10014dd8380
    }, 
    num_child_bus = 1, 
    instance_id_alias = -1, 
    alias_required_for_version = 0
  }, 
  reg = 8192, 
  irq = 4102, 
  signal_state = 1, 
  crq = {
    qladdr = 201326592, 
    qsize = 65536, 
    qnext = 51360, 
    SendFunc = 0x268ac3e0 <vscsi_do_crq>
  }, 
  as = {
    rcu = {
      next = 0x0, 
      func = 0x0
    }, 
    name = 0x10014c7f288 "scsi0", 
    root = 0x10014dd8188, 
    current_map = 0x10014c52190, 
    ioeventfd_nb = 0, 
    ioeventfds = 0x0, 
    dispatch = 0x100164db310, 
    next_dispatch = 0x100164db310, 
    dispatch_listener = {
      begin = 0x26821070 <mem_begin>, 
      commit = 0x26820960 <mem_commit>, 
      region_add = 0x26821600 <mem_add>, 
      region_del = 0x0, 
      region_nop = 0x26821600 <mem_add>, 
      log_start = 0x0, 
      log_stop = 0x0, 
      log_sync = 0x0, 
      log_global_start = 0x0, 
      log_global_stop = 0x0, 
      eventfd_add = 0x0, 
      eventfd_del = 0x0, 
      coalesced_mmio_add = 0x0, 
      coalesced_mmio_del = 0x0, 
      priority = 0, 
      address_space_filter = 0x10014dd80a0, 
      link = {
        tqe_next = 0x10016420250, 
        tqe_prev = 0x10014fe06d0
      }
    }, 
    address_spaces_link = {
      tqe_next = 0x10016420208, 
      tqe_prev = 0x10014fe06e0
    }
  }, 
  mrroot = {
    parent_obj = {
      class = 0x10014e501e0, 
      free = 0x0, 
      properties = {
        tqh_first = 0x100164d3d40, 
        tqh_last = 0x100164d3f10
      }, 
      ref = 1, 
      parent = 0x10014dd8000
    }, 
    ops = 0x26d0cfa0 <unassigned_mem_ops>, 
    iommu_ops = 0x0, 
    opaque = 0x0, 
    container = 0x0, 
    size = {
      lo = 2147483648, 
      hi = 0
    }, 
    addr = 0, 
    destructor = 0x2687d640 <memory_region_destructor_none>, 
    ram_addr = 0, 
    align = 0, 
    subpage = false, 
    terminates = false, 
    romd_mode = true, 
    ram = false, 
    skip_dump = false, 
    readonly = false, 
    enabled = true, 
    rom_device = false, 
    warning_printed = false, 
    flush_coalesced_mmio = false, 
    alias = 0x0, 
    alias_offset = 0, 
    priority = 0, 
    may_overlap = false, 
    subregions = {
      tqh_first = 0x10014e40420, 
      tqh_last = 0x10014dd8338
    }, 
    subregions_link = {
      tqe_next = 0x0, 
      tqe_prev = 0x0
    }, 
    coalesced = {
      tqh_first = 0x0, 
      tqh_last = 0x10014dd8250
    }, 
    name = 0x1001641c6a0 "iommu-spapr-root", 
    dirty_log_mask = 0 '\000', 
    ioeventfd_nb = 0, 
    ioeventfds = 0x0, 
    iommu_notify = {
      notifiers = {
        lh_first = 0x0
      }
    }
  }, 
  mrbypass = {
    parent_obj = {
      class = 0x10014e501e0, 
      free = 0x0, 
      properties = {
        tqh_first = 0x100164d3f70, 
        tqh_last = 0x100164d4140
      }, 
      ref = 1, 
      parent = 0x10014dd8000
    }, 
    ops = 0x26d0cfa0 <unassigned_mem_ops>, 
    iommu_ops = 0x0, 
    opaque = 0x0, 
    container = 0x10014dd8188, 
    size = {
      lo = 2147483648, 
      hi = 0
    }, 
    addr = 0, 
    destructor = 0x2687d640 <memory_region_destructor_none>, 
    ram_addr = 0, 
    align = 0, 
    subpage = false, 
    terminates = false, 
    romd_mode = true, 
    ram = false, 
    skip_dump = false, 
    readonly = false, 
    enabled = false, 
    rom_device = false, 
    warning_printed = false, 
    flush_coalesced_mmio = false, 
    alias = 0x10014c84000, 
    alias_offset = 0, 
    priority = 1, 
    may_overlap = true, 
    subregions = {
      tqh_first = 0x0, 
      tqh_last = 0x10014dd8328
    }, 
    subregions_link = {
      tqe_next = 0x0, 
      tqe_prev = 0x10014e404d8
    }, 
    coalesced = {
      tqh_first = 0x0, 
      tqh_last = 0x10014dd8348
    }, 
    name = 0x1001641c700 "iommu-spapr-bypass", 
    dirty_log_mask = 0 '\000', 
    ioeventfd_nb = 0, 
    ioeventfds = 0x0, 
    iommu_notify = {
      notifiers = {
        lh_first = 0x0
      }
    }
  }, 
  tcet = 0x10014e40380
}


(gdb) p ((VSCSIState *)0x10014dd8000)->bus
$57 = {
  qbus = {
    obj = {
      class = 0x10014ca4260, 
      free = 0x0, 
      properties = {
        tqh_first = 0x100164d2da0, 
        tqh_last = 0x100164d53b0
      }, 
      ref = 3, 
      parent = 0x10014dd8000
    }, 
    parent = 0x10014dd8000, 
    name = 0x1001641c840 "scsi0.0", 
    hotplug_handler = 0x10014dd8380, 
    max_index = 1, 
    realized = true, 
    children = {
      tqh_first = 0x1001641c8e0, 
      tqh_last = 0x1001641c8f0
    }, 
    sibling = {
      le_next = 0x0, 
      le_prev = 0x10014dd8060
    }
  }, 
  busnr = 0, 
  unit_attention = {
    key = 0 '\000', 
    asc = 0 '\000', 
    ascq = 0 '\000'
  }, 
  info = 0x26d0dbb8 <vscsi_scsi_info>
}


All the requests in the ->reqs seem to be inactive.

The current state however, is after the guest has tried a whole pile of recovery actions, apparently to no avail.  It looks like we've hit a SCSI error, which triggers aborting of all outstanding commands, and then various levels of reset.

I think we're going to need to instrument the guest kernel and/or qemu in order to give us some information closer to the initial error state.  Which means finding time to reproduce this around the testing for bug 1269467.

Comment 31 David Gibson 2015-10-28 03:41:01 UTC

IBM,

Could you try to reproduce this too?  Can't be sure at this stage, but so far this smells like a bug which would be upstream (and therefore in PowerKVM) as well.

Current reproducer, such as it is:
    1. Start 30-40 guests all running the LTP testsuite in a loop, from a spapr-vscsi virtual disk.
    2. Put the host under memory pressure
    3. Wait several days

Comment 32 Zhengtong 2015-10-29 04:23:00 UTC

Hi David,

I am trying to reproduce the this bug. I have some questions about your reproducer.

1. When you are reproducing, what is the memory size of you guest? will 30-40 guests cause memory over commit on host?
2. You tested with spapr-vscsi virtual disk, right? I mean os disk.

Comment 33 David Gibson 2015-10-29 04:43:26 UTC

Zhengtong,

So far we've seen this bug while trying to reproduce bug 1269467 - the steps from comment 31 are based on that.  To answer your question
    1. The guests we've been using so far which have shown this have 2G of RAM each.  It looks like this will not, on its own, overcommit host memory, but we've also been using Thomas' "malloc bomb" program from bug 1269467 to force the system into memory pressure.

    2. That's correct.  So far, it looks like this bug is specific to the spapr-vscsi device.  The guests just have a single 10G (logical size) system disk on spapr-vscsi, backed by qcow2.  I believe many of the guest disk images are sharing a qcow2 backing file, but I don't know if that is relevant.

Comment 35 David Gibson 2015-10-29 10:19:09 UTC

Thoughts on debugging instrumentation.

 * Making qemu pause the guest when it issues the "vscsi_process_tsk_mgmt 08" message might be a good idea to avoid having the guest continue to spin its wheels fruitlessly.

 * On the guest side, about the furthest I managed to trace the problem back by code inspection was scsi_unjam_host().  Putting a dump_stack() in there to see where it's being called from might be a good start.

 * Again in the guest, putting some printk()s when Scsi_host->host_failed (IIUC this is the number of failed commands on the scsi controller) is incremented is probably a good idea too.

 * From there we'll want to instrument to qemu to see when it is failing a command and work backwards towards the root cause.

Since the reproduce is so difficult working back from that will be pretty painful.

Another approach that could be attempt in parallel:
  * When the "vscsi_process_tsk_mgmt 08" message is printed, set a flag in the driver to turn on verbose logging.
  * Once the flag is on, print some messages on every request; this might shed some light on why the guest's attempts to reset the vscsi controller aren't improving the situation at all.

Comment 36 Laurent Vivier 2015-10-29 19:45:49 UTC

(In reply to David Gibson from comment #35)
> Thoughts on debugging instrumentation.
> 
>  * Making qemu pause the guest when it issues the "vscsi_process_tsk_mgmt
> 08" message might be a good idea to avoid having the guest continue to spin
> its wheels fruitlessly.

As "vscsi_process_tsk_mgmt 08" is a SRP_TSK_LUN_RESET, perhaps we can consider the reset as the cause and not the consequence of the problem.

According to the logs, the following functions are called to abort commands and reset the LUN:

ibmvscsi_eh_abort_handler()
ibmvscsi_eh_device_reset_handler() -> SRP_TSK_LUN_RESET

As the host is heavy loaded, and the time base of the guest is the TBR of the host, I think we can have easily some timeouts at the guest level (moreover if we stop and start manually the guest, see BZ1264258).

We can also try slow down the SCSI LUN to reproduce the problem easily:

"scsi-disk: add -device scsi-disk, slow=on property"
https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg03004.html

Comment 37 Thomas Huth 2015-10-30 14:35:59 UTC

Created attachment 1087944 [details]
dmesg output of the crashed guest1

Comment 38 Thomas Huth 2015-10-30 14:38:13 UTC

Created attachment 1087946 [details]
dmesg output of the crashed guest36

Comment 39 Thomas Huth 2015-10-30 15:31:14 UTC

(In reply to Laurent Vivier from comment #36)
[...]
> We can also try slow down the SCSI LUN to reproduce the problem easily:
> 
> "scsi-disk: add -device scsi-disk, slow=on property"
> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg03004.html

Good idea ... I tried to provoke the problem manually today by pausing and resuming a guest which is doing disk accesses, but I failed to trigger this issue this way.
So I think using a patched QEMU might be a good approach instead. However, I think we need a slightly different patch. Looking at the dmesg output, there is a "Medium access timeout failure. Offlining disk!" message at the point in time when the real trouble starts. Looking at the kernel sources, this is printed by sd_eh_action() in drivers/scsi/sd.c. This features this comment:

	/*
	 * The device has timed out executing a medium access command.
	 * However, the TEST UNIT READY command sent during error
	 * handling completed successfully. Either the device is in the
	 * process of recovering or has it suffered an internal failure
	 * that prevents access to the storage medium.
	 */
	sdkp->medium_access_timed_out++;

	/*
	 * If the device keeps failing read/write commands but TEST UNIT
	 * READY always completes successfully we assume that medium
	 * access is no longer possible and take the device offline.
	 */
	if (sdkp->medium_access_timed_out >= sdkp->max_medium_access_timeouts) {
		scmd_printk(KERN_ERR, scmd,
			    "Medium access timeout failure. Offlining disk!\n");
		scsi_device_set_state(scmd->device, SDEV_OFFLINE);

		return FAILED;
	}

So to provoke this situation, we could try to patch QEMU so that it delays READ and WRITE commands on request, but not other commands like TEST UNIT READY. The patch you suggested seems to do it just the other way round ... but I hope it can be adapted quite easily...

Comment 40 Zhengtong 2015-11-03 03:04:54 UTC

Hi Thomas, 
Is there a qemu build patched for easier reproduce now ? I am keeping reproducing it, and always failed in pass ~70 hours.

Comment 41 Thomas Huth 2015-11-03 06:24:47 UTC

(In reply to Zhengtong from comment #40)
> Hi Thomas, 
> Is there a qemu build patched for easier reproduce now ? I am keeping
> reproducing it, and always failed in pass ~70 hours.

I'm still working on it - unfortunately, it is not as easy as I thought. I can trigger the "sd 0:0:0:0: aborting command. lun 0x8000000000000000 ..." message in the guest with my modified QEMU, but the guest then recovers and I do not get the rest of the crash. I'll let you know when I find a better approach.

Comment 42 Thomas Huth 2015-11-03 21:25:47 UTC

I haven't been able to force the very same situation with my modified QEMU yet. But looking at the kernel sources and the dmesg output, I have an (unverified) theory what might be the problem here: The message "Medium access timeout failure. Offlining disk!" indicates that the function sd_eh_action() has been called, and it set the SDEV_OFFLINE on the disk. Then the function ibmvscsi_eh_device_reset_handler() is called since we can see the "resetting device. lun ..." in the log - and I assume that the reset is working since we did not see any of the other messages of that function in the log.
However, any further SCSI commands fail, since the SDEV_OFFLINE flag is still set for the device (as indicated by the "rejecting I/O to offline device" messages) ... so I wonder whether the ibmvscsi code should clear the SDEV_OFFLINE flag after successfully resetting the device?

Comment 43 David Gibson 2015-11-04 04:15:13 UTC

Hm, I can't seem, to see other SCSI drivers clearing the SDEV_OFFLINE state that way, so that doesn't really seem like the right approach.

This still looks to me like the LUN reset and the disk offlining are symptoms, not the cause, triggered by something else going wrong causing commands to fail.

Comment 44 Zhengtong 2015-11-05 03:01:29 UTC

Till now , I could only reproduce the error message  in /var/log/messages after guest rebooted.

Nov  4 02:11:54 dhcp71-185 kernel: sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc000000005804438
Nov  4 02:11:54 dhcp71-185 kernel: sd 0:0:0:0: successfully aborted task tag 0xc000000005804438
Nov  4 02:11:54 dhcp71-185 kernel: sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc0000000058049c0
Nov  4 02:11:54 dhcp71-185 kernel: sd 0:0:0:0: aborted task tag 0xc0000000058049c0 completed


Hi Thomas, 
Do you think the other error info in commet #c0 will be also printed into message log? May I know where were you collect those logs ? because I only checked this log(/var/log/messages) for error message, maybe I lost some important checking place. 

Now I am still doing this test with nohup for ltp process in each guest, so I could collect all the messages caused by LTP test.  waiting for more results.

Comment 45 Thomas Huth 2015-11-05 07:20:45 UTC

Zhengtong, since the disk of the guest fails with this bug, the dying kernel has no chance anymore to write out the messages to the log. So this is how I got the logs:

1. When using libvirt, make sure that the guest is not automatically rebooted when it crashed, i.e. use "<on_crash>preserve</on_crash>" in the xml config.

2. When the guest crashed ("virsh list" should say so), take a memory dump of the guest: virsh dump --memory-only --file /path/to/guestname.dump guestname

3. Make sure that you've got the "kernel-debuginfo" package installed that matches the kernel of your guest (needed for the vmlinux in the next step)

4. Start the "crash" utility:
   crash /usr/lib/debug/lib/modules/3.10.0-327.el7.ppc64le/vmlinux /path/to/guestname.dump

5. Type "dmesg" at the crash-prompt to get all the kernel messages

Comment 46 Thomas Huth 2015-11-05 08:45:07 UTC

Zhengtong (and all others who are trying to reproduce this issue): It would be helpful to increase the SCSI driver logging level in each of the guests with this command (from the sg3_utils package) when starting the test:

 scsi_logging_level -s -E 3 -T 3

... that hopefully gives some more valuable debug information when the problem occurs.

Comment 47 Thomas Huth 2015-11-05 12:46:39 UTC

I've now somehow managed to get a crash of a guest yesterday evening again, and this time I had some dump_stack() statements in various places of the scsi driver code - unfortunately that was still not enough to get a full picture for debugging what's going on. But from what I can see, after a couple of failed READs and WRITEs, the sd_eh_action() in drivers/scsi/sd.c set the device to SDEV_OFFLINE (likely because the other commands like TEST UNIT READY still finished in time). Then, later, the scsi code tries to unjam the device, finally also calling scsi_eh_ready_devs() which then calls ibmvscsi_eh_device_reset_handler() - and the device reset succeeds! However, since nothing in the error recovery chain is clearing the SDEV_OFFLINE anymore, further SCSI commands are rejected by the driver and thus the guest finally crashes since the root disk can not be accessed anymore.

So I can see two possibilities right now:

1) The host was really just so badly busy that the READs and WRITEs did not finish in time. In that case we should maybe consider to add a hack to set the device online again after the reset succeeded - or simply increase max_medium_access_timeouts to a much higher value (it's currently only 2!) - or tell the customers to user virtio-dataplane instead in case they have a heavy loaded host.

2) There is still a bug in QEMU and the READ or WRITE commands sometimes get lost / do not finish. We could intstrument QEMU to take time stamps for the VSCSI requests and in case the guest finally aborts one of the commands (i.e. calls vscsi_process_tsk_mgmt() with SRP_TSK_ABORT_TASK), print out the time the request has been in flight - and maybe even stop qemu so we can debug it further.

Comment 48 Thomas Huth 2015-11-05 21:35:07 UTC

I've got another crash, now with a guest kernel that contains some more debug printks and dump_stack statements. As far as I can see, my previous comment still applies, the scsi sd driver sets SDEV_OFFLINE since some READs or WRITEs failed, but that state gets never cleared again, even though the reset succeeds afterwards.

Anyway, this is how I get a crash happen a little bit easier:

1) Run "while true; do ./swapforce ; done" again in the background (http://team.virt.bos.redhat.com/~thuth/swapforce.c)

2) Put all guest disks on a dedicated partition ("/scratch" on our virt-02 server) and make some additional traffic on that disk. When the disk runs full, the guests get paused by libvirt, then wait a little bit and then resume them:

while true ; do date ; dd if=/dev/zero of=/scratch/zero.dat bs=100M ; sleep 600 ; rm -f /scratch/zero.dat ; for ((i=50;i<=79;i++)); do virsh resume ltp-stressor-g$i; done ; sleep 320 ; done

Comment 49 Thomas Huth 2015-11-05 21:37:48 UTC

Created attachment 1090383 [details]
Patch with the additional kernel debug print statements

Added the patch with the additional debug printks and dump_stack statements that I've used so far

Comment 50 Thomas Huth 2015-11-05 21:47:30 UTC

Created attachment 1090400 [details]
Guest that crashed with the additional kernel printk / dump_stack statements

Comment 51 Zhengtong 2015-11-06 06:12:57 UTC

(In reply to Thomas Huth from comment #48)
> I've got another crash, now with a guest kernel that contains some more
> debug printks and dump_stack statements. As far as I can see, my previous
> comment still applies, the scsi sd driver sets SDEV_OFFLINE since some READs
> or WRITEs failed, but that state gets never cleared again, even though the
> reset succeeds afterwards.
> 
> Anyway, this is how I get a crash happen a little bit easier:
> 
> 1) Run "while true; do ./swapforce ; done" again in the background
> (http://team.virt.bos.redhat.com/~thuth/swapforce.c)
> 
> 2) Put all guest disks on a dedicated partition ("/scratch" on our virt-02
> server) and make some additional traffic on that disk. When the disk runs
> full, the guests get paused by libvirt, then wait a little bit and then
> resume them:
> 
> while true ; do date ; dd if=/dev/zero of=/scratch/zero.dat bs=100M ; sleep
> 600 ; rm -f /scratch/zero.dat ; for ((i=50;i<=79;i++)); do virsh resume
> ltp-stressor-g$i; done ; sleep 320 ; done

hi Thomas, 
In these steps, should ltp test still be needed running in guests?

Comment 52 Thomas Huth 2015-11-06 13:18:26 UTC

(In reply to Zhengtong from comment #51)
> 
> hi Thomas, 
> In these steps, should ltp test still be needed running in guests?

Yes, I still had LTP running in the guests ... additionally, it might be good to add some more disk traffic in the guest, too, so I also had a "while true ; do ls -R / ; dd if=/dev/sda of=/dev/null bs=10M ; done" running in the guests, too. I'm starting these from /etc/rc.local automatically in the guests, together with LTP, so that I just have to start the guests without the need to log in there anymore. I can also provide you with my disk image if you want, just let me know.

Comment 53 IBM Bug Proxy 2015-11-06 13:23:20 UTC

------- Comment From hannsj_uhl.com 2015-11-06 13:20 EDT-------
.

Comment 54 Zhengtong 2015-11-09 02:30:13 UTC

Hi Thomas,
That will be great if you could share your image to me. I have tried to reproduce this more than 1 week, but no progress excepts just "sd 0:0:0:0: aborting command. lun 0x8000000000000000, tag 0xc000000005804438" error msg.

Do you have any method to share it to me ? or contact on IRC ?

Comment 55 Thomas Huth 2015-11-09 13:17:01 UTC

Disk image has been provided ==> clearing needinfo.

Comment 56 Thomas Huth 2015-11-09 13:18:50 UTC

(In reply to David Gibson from comment #30)
> I connected gdb to the qemu for guest bz1256718g36 on
> ibm-p8-rhevm-06.lab4.eng.bos.redhat.com.  Using gdb's nifty ability to call
> functions in the target (qdev_get_machine()) traced my way through to
> locating the VSCSIState structure for the dead disk

By the way, in case we need it again, I think I've found a even more convenient way to get to the VSCSIState structure in gdb (assuming there is only one vscsi disk):

print object_resolve_path_type("","spapr-vscsi",0)

Comment 57 David Gibson 2015-11-10 03:04:57 UTC

Kevin,

We're hoping to get some block expertise in on this.

Problem is that under high load conditions, the details of which we haven't pinned down yet, one of the virtual disks of a VM on Power stops working.  It seems to only happen with the Power-specific spapr-vscsi (qemu side) / ibmvscsi (guest side) virtual disk device.  We're not yet sure whether if the root cause is guest side or host side (qemu).

Our current working theory is that it's just because spapr-vscsi doesn't perform that well, and so it's easier to trigger access timeouts on it.  That in turn is triggering error handling in the guest kernel which causes it to offline the disk, after that, nothing else works, of course.

So the questions are:
   1) Does that seem like a plausible theory
   2) The disk stays offline, even after the driver attempts a LUN reset (which is reported as successful) - is that the expected behaviour?
   3) Should the guest driver be doing anything to mark the disk back online after a reset?

Comment 58 Kevin Wolf 2015-11-10 09:53:06 UTC

If a host is very loaded, it might happen that requests are completed rather
slowly. I'm not sure how likely it is that it could be slow enough that
timeouts trigger, but you could enable/add some tracepoints in qemu and then
check the timing of requests around the point where the disk failed.

In order to excluse any error conditions that come from the host, I would
recommend testing with werror=stop,rerror=stop. Then the device emulation will
only fail requests that are already invalid (i.e. the guests did something
wrong, like accesses beyond the disk size), but I/O errors would stop the VM.
The other option is that the guest decides on its own that the request failed,
while in reality they succeed, which seems to be what is happening here (the
guest driver times out).

Your questions 2 and 3 really seem to be related to the guest driver. You
should probably ask kernel block layer people about how a Linux block driver
should behave. I can't help you with that.

Comment 59 IBM Bug Proxy 2015-11-10 10:04:29 UTC

------- Comment From bugproxy.com 2015-10-22 15:03 EDT-------

------- Comment From bugproxy.com 2015-10-23 01:24 EDT-------

------- Comment From bugproxy.com 2015-10-23 02:24 EDT-------

------- Comment From bugproxy.com 2015-10-23 05:05 EDT-------

------- Comment From bugproxy.com 2015-10-23 09:24 EDT-------

------- Comment From bugproxy.com 2015-10-23 13:55 EDT-------

------- Comment From hannsj_uhl.com 2015-10-26 15:21 EDT-------

------- Comment From bugproxy.com 2015-10-26 15:24 EDT-------

------- Comment From bugproxy.com 2015-10-27 17:05 EDT-------

------- Comment From bugproxy.com 2015-10-28 01:58 EDT-------

------- Comment From bugproxy.com 2015-10-28 04:35 EDT-------

------- Comment From bugproxy.com 2015-10-28 04:44 EDT-------

------- Comment From bugproxy.com 2015-10-29 05:25 EDT-------

------- Comment From bugproxy.com 2015-10-29 05:45 EDT-------

------- Comment From bugproxy.com 2015-10-29 11:26 EDT-------

------- Comment From bugproxy.com 2015-10-29 20:55 EDT-------

------- Comment From bugproxy.com 2015-10-30 16:36 EDT-------

------- Comment From bugproxy.com 2015-11-03 04:16 EDT-------

------- Comment From bugproxy.com 2015-11-03 07:35 EDT-------

------- Comment From bugproxy.com 2015-11-03 23:11 EDT-------

------- Comment From bugproxy.com 2015-11-04 05:26 EDT-------

------- Comment From bugproxy.com 2015-11-05 04:06 EDT-------

------- Comment From bugproxy.com 2015-11-05 08:25 EDT-------

------- Comment From bugproxy.com 2015-11-06 01:27 EDT-------

------- Comment From hannsj_uhl.com 2015-11-06 14:20 EDT-------

------- Comment From bugproxy.com 2015-11-06 14:24 EDT-------

------- Comment From bugproxy.com 2015-11-09 03:36 EDT-------

------- Comment From bugproxy.com 2015-11-09 14:26 EDT-------

------- Comment From bugproxy.com 2015-11-10 04:14 EDT-------

------- Comment From bugproxy.com 2015-11-10 11:05 EDT-------

Comment 60 Thomas Huth 2015-11-10 12:04:36 UTC

I've now done a test with a modified QEMU to see how long the commands take if the host is very loaded: I've added a timestamp to the vscsi_req struct which gets initialized in vscsi_got_payload(). In vscsi_command_complete() and in vscsi_process_tsk_mgmt() (for cancelled requests), I check the current time again and print the time difference when it is bigger than 20 seconds.

After running that on a loaded system (with my "swapforce" stressor and some additional disk traffic in the background), I can quickly see that a lot of commands sometimes take more than 40 seconds to finish. And sometimes they also take more than 100 seconds ... and some apparently even more than 120 seconds - this is when the guest decides that the command is dead and cancels it via the SRP_TSK_ABORT_TASK functionality.

Comment 61 Hanns-Joachim Uhl 2015-11-10 12:49:43 UTC

... please ignore comment #59 ... sorry for the noise ...

Comment 62 Thomas Huth 2015-11-13 09:53:11 UTC

I've now also asked for some advice from one of our kernel SCSI experts, and he confirmed that it is normal that SCSI disk are not brought automatically online again once they've been marked as offline. He also suggested to try to increase the SCSI timeout values for the disk.

So I've now run my test (50 guests running LTP and additional disk traffic on the host ibm-p8-virt-01 with additional "swapforce" stressor and additional disk traffic on the host) for two and a half day, with increased timeouts for the vSCSI disk of the guest: So far I haven't seen any guest crashes during these 2.5 days. Thus I think a valid fix / workaround for this issue is to simply increase the timeout values to avoid this situation if necessary.

Comment 65 Thomas Huth 2015-11-16 08:34:34 UTC

Closing this now as NOTABUG since it is not really a bug itself, but rather a subsequent error of overloading the host. In case somebody hits this "Medium access timeout failure. Offlining disk!" problem with spapr-vscsi disks, the following steps should be taken to avoid this situation:

1) Use virto-blk with dataplane instead of spapr-vscsi if possible

2) In case this is not possible, try to make sure that the host is not overloaded

3) If overloading of the host can not be avoided, increase the timeout values of the vSCSI disk in the guest, e.g. something like this:

echo 50 > /sys/devices/vio/2000/host0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/max_medium_access_timeouts
echo 300 > /sys/devices/vio/2000/host0/target0:0:0/subsystem/drivers/sd/0:0:0:0/timeout

Note You need to log in before you can comment on or make changes to this bug.