Bug 1264076

Summary: crypt target is not properly handling 'suspend --noflush'
Product: [Fedora] Fedora Reporter: Zdenek Kabelac <zkabelac>
Component: kernelAssignee: Mikuláš Patočka <mpatocka>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: agk, gansalmon, gmazyland, itamar, jonathan, kernel-maint, madhu.chinakonda, mbroz, mchehab, msnitzer, okozina
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-21 16:46:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zdenek Kabelac 2015-09-17 12:54:35 UTC
Description of problem:


When using 'crypt' target with parallel encrypting (enhancement from 4.0 kernel)
we lost ability to use  'dmsetup suspend --noflush' to suspend device without blocking.

As of now with 4.3-rc1 - when crypto device is in use (being busy) and 'suspend --noflush' is executed - it communicates with layer bellow - thus such target can't be replace if the layer below gets 'frozen'.

As an easy reproducer -

create LV
create luksFormat on top of this LV
open & mkfs & mount
create some load on this mounted volume
(i.e. while : ; do echo 1 ; sleep 1; done >/mnt/test/write)

So while having table like this:
vg-lvol0: 0 106496 linear 7:0 2048
cryptdev: 0 102400 crypt aes-xts-plain64 0000000000000000000000000000000000000000000000000000000000000000 0 253:0 4096

suspend vg-lvol0
suspend --noflush --nolockfs  cryptdev

--> gets frozen - while it should have pass.
(to unblock -   resume LV, resume crypt-device)

As a workaround user can use:

cryptosetup --perf-submit_from_crypt_cpus --perf-same_cpu_crypt luksOpen

to open Luks - in this case  'suspend --noflush' always works
(thus it's possible to replace table line)

Version-Release number of selected component (if applicable):
cryptsetup-1.6.8-2.fc24.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
suspend --noflush  always works

Additional info:

Comment 1 Milan Broz 2015-09-17 13:00:27 UTC
This is problem inside dmcrypt kernel module...

Comment 2 Mikuláš Patočka 2015-09-21 16:46:43 UTC
Suspend must be done from the top device to bottom. Suspend in reversed order isn't supposed to work. Suspend in reverse order is racy - if there is any bio blocked in the lower device, suspend of the higher device gets stuck. Dm crypt parallelization probably changes timing so that it triggers the race condition.