Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1540696

Summary:	VDO crash recovery code can lead to BUG: unable to handle kernel paging request
Product:	Red Hat Enterprise Linux 7	Reporter:	Thomas Jaskiewicz <tjaskiew>
Component:	kmod-kvdo	Assignee:	Thomas Jaskiewicz <tjaskiew>
Status:	CLOSED ERRATA	QA Contact:	Jakub Krysl <jkrysl>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.5	CC:	awalsh, jkrysl, limershe, salmy, sweettea
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	6.1.0.133	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-10 16:27:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Thomas Jaskiewicz 2018-01-31 19:26:22 UTC

Description of problem:

Our nightly VDO testing includes tests that simulate a system crash by abruptly changing the storage device to read-only.  We then expect to be able to start the VDO device with only acceptable data loss.  Last night's test run showed us an instance of a BUG: unable to handle kernel paging request during VDO crash recovery.

Version-Release number of selected component (if applicable):


How reproducible:

This is hard to reproduce.  The test crashed the storage device before we had written a full index for the first time.  The recovery code interpreted the old data in storage as in index, and the index search code walked off the end of a buffer.  Which caused the kernel page fault.

The test probably feiled to write one of the sectors of the index, and if that sector read as all zeroes, it could lead to the kernel page fault.

Steps to Reproduce:
1.  Crssh the system when we have started a VDO but haven't written a full index, and have a bad data block on the storage device.  This is possible but not likely.


Actual results:
BUG: unable to handle kernel paging request


Expected results:
Silence because there is no such kernel paging request.

Additional info:

We do try to verify that the index page is valid before using it.  If the page has a
last byte that is all ones, we will not see this problem.  The fix is to ensure that the last byte of an index page is all ones.

The only reasonable QA testing is SanityOnly

Comment 4 Thomas Jaskiewicz 2018-02-01 16:34:13 UTC

This is not a security bug, because we do notice when we have gone off the end of a buffer.  When we search the index, there is a step where we look for the next bit in the bit stream that is not-zero.  We normally have 7 guard bytes of all ones at the end of the index which prevent buffer overruns.  In this case, the simulated crash caused us not to write the sector containing the guard bytes and we read all zeroes instead.

Comment 6 Jakub Krysl 2018-02-15 10:17:21 UTC

I could not find a way to reproduce it. Regression tests found no issues, so verified with SanityOnly.

Comment 9 errata-xmlrpc 2018-04-10 16:27:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0900