Bug 1264076

Summary:	crypt target is not properly handling 'suspend --noflush'
Product:	[Fedora] Fedora	Reporter:	Zdenek Kabelac <zkabelac>
Component:	kernel	Assignee:	Mikuláš Patočka <mpatocka>
Status:	CLOSED NOTABUG	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rawhide	CC:	agk, gansalmon, gmazyland, itamar, jonathan, kernel-maint, madhu.chinakonda, mbroz, mchehab, msnitzer, okozina
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-09-21 16:46:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Zdenek Kabelac 2015-09-17 12:54:35 UTC

Description of problem:


When using 'crypt' target with parallel encrypting (enhancement from 4.0 kernel)
we lost ability to use  'dmsetup suspend --noflush' to suspend device without blocking.

As of now with 4.3-rc1 - when crypto device is in use (being busy) and 'suspend --noflush' is executed - it communicates with layer bellow - thus such target can't be replace if the layer below gets 'frozen'.

As an easy reproducer -

create LV
create luksFormat on top of this LV
open & mkfs & mount
create some load on this mounted volume
(i.e. while : ; do echo 1 ; sleep 1; done >/mnt/test/write)

So while having table like this:
vg-lvol0: 0 106496 linear 7:0 2048
cryptdev: 0 102400 crypt aes-xts-plain64 0000000000000000000000000000000000000000000000000000000000000000 0 253:0 4096

suspend vg-lvol0
suspend --noflush --nolockfs  cryptdev

--> gets frozen - while it should have pass.
(to unblock -   resume LV, resume crypt-device)

As a workaround user can use:

cryptosetup --perf-submit_from_crypt_cpus --perf-same_cpu_crypt luksOpen

to open Luks - in this case  'suspend --noflush' always works
(thus it's possible to replace table line)

Version-Release number of selected component (if applicable):
cryptsetup-1.6.8-2.fc24.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
suspend --noflush  always works

Additional info:

Comment 1 Milan Broz 2015-09-17 13:00:27 UTC

This is problem inside dmcrypt kernel module...

Comment 2 Mikuláš Patočka 2015-09-21 16:46:43 UTC

Suspend must be done from the top device to bottom. Suspend in reversed order isn't supposed to work. Suspend in reverse order is racy - if there is any bio blocked in the lower device, suspend of the higher device gets stuck. Dm crypt parallelization probably changes timing so that it triggers the race condition.