Bug 242985

Summary: kernel dm-crypt: OOM and lockup when using PAE kernel
Product: [Fedora] Fedora Reporter: Daphne Shaw <dshaw>
Component: kernelAssignee: Milan Broz <mbroz>
Status: CLOSED RAWHIDE QA Contact: Corey Marthaler <cmarthal>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: agk, ccb, cebbert, christophe.varoqui, davej, dwysocha, egoggin, jbrassow, junichi.nomura, kueda, lmb, mbroz, prockai, pvrabec, tranlan
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-30 02:50:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of the crasher script none

Description Daphne Shaw 2007-06-06 20:01:05 UTC
Description of problem:

Running mke2fs on a dm-crypt device causes a flurry of OOMs on the console and
finally a general lockup.

Version-Release number of selected component (if applicable):

kernel-PAE-2.6.20-1.2952.fc6
device-mapper-1.02.13-1.fc6

How reproducible:

This is a SATA system, and the first SATA disk is partioned as follows:

Disk /dev/sda: 160.0 GB, 160000000000 bytes
255 heads, 63 sectors/track, 19452 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

/dev/sda1               1          13      104391   83  Linux
/dev/sda2              14          26      104422+  83  Linux
/dev/sda3              27         400     3004155   82  Linux swap / Solaris
/dev/sda4             401       19452   153035190    5  Extended
/dev/sda5             401         712     2506108+  83  Linux
/dev/sda6             713        1024     2506108+  83  Linux
/dev/sda7            1025        2332    10506478+  83  Linux
/dev/sda8            2333        3640    10506478+  83  Linux
/dev/sda9            3641       19452   127009858+  83  Linux

Creating a dm-crypt device for /dev/sda9:

 echo "0 254017661 crypt aes-cbc-essiv:sha256 01234567890123456789012345678901 0
/dev/sda9 2056" | dmsetup create crypt-device

Make a filesystem on it:

 mke2fs /dev/mapper/crypt-device

For me, if I repeat the mke2fs step 3-4 times, the system will start to OOM over
and over for a few seconds, and then freeze up completely.

Comment 1 Daphne Shaw 2007-06-07 00:00:54 UTC
Also note that creating a dm-crypt device using a whole unpartitioned disk on
the same machine (e.g. "/dev/sdb" instead of "/dev/sda9") works just fine.

Comment 2 Milan Broz 2007-06-07 08:43:28 UTC
Please attach system info - memory size, syslog messages. 

Is it reproducible using standard (no PAE) kernel ?
You can try to use lvmdump (from lvm2 package) to collect some info about system
and attach it to this bz.

Will help if you use sync between repeated mke2fs ?


Comment 3 Daphne Shaw 2007-06-07 12:57:12 UTC
There were no lines at all logged in syslog (once the OOMing starts, things go
bad very quickly).  There were some OOM reports on the console speeding by, but
none in syslog.

Here's memory info:
             total       used       free     shared    buffers     cached
Mem:       4142464     192156    3950308          0      76096      56992
-/+ buffers/cache:      59068    4083396
Swap:      3004144          0    3004144

Note that no swap is used, and the machine has plenty of free RAM as well when
it starts to OOM.

I will test your other questions shortly.

Comment 4 Daphne Shaw 2007-06-12 17:42:32 UTC
I just tested kernel-PAE-2.6.20-1.2952.fc6 against kernel-2.6.20-1.2952.fc6.

I was able to repeat the failure using kernel-PAE-2.6.20-1.2952.fc6
I was NOT able to repeat the failure using kernel-2.6.20-1.2952.fc6

That is, it only fails with the PAE kernel.  Sync-ing after each run did not
make a difference on either PAE or non-PAE: PAE always failed, and non-PAE
always succeeded.


Comment 5 Chuck Ebbert 2007-06-20 21:01:27 UTC
Can you get the contents of /proc/vmstat:

(1) before running mke2fs (or is that what's above)
(2) after each run of mke2fs that succeeeds

Comment 6 Daphne Shaw 2007-06-21 22:22:54 UTC
Created attachment 157581 [details]
Output of the crasher script

Using this script, I can get the failure to happen within 3-4 runs.  The
attachment is the output.  Note that run #3 didn't complete.

# for i in `seq 1 10`
> do
> echo "Pass $i" >> output
> echo BEFORE >> output
> cat /proc/vmstat >> output
> sync
> mke2fs /dev/mapper/crypt-device
> echo AFTER >> output
> cat /proc/vmstat >> output
> sync
> done

Comment 7 Chuck Ebbert 2007-06-22 19:17:57 UTC
Kernel 2962 has dm-crypt bugfixes from 2.6.22 applied. Can you test that?
It's in the updates-testing repo.

Comment 8 Daphne Shaw 2007-06-22 19:56:48 UTC
I tested kernel 2962, and there is still a problem.  It shows up in a slightly
different fashion in that the machine freezes without first showing the OOMs on
the console, but the end result is the same.

Comment 9 Milan Broz 2007-11-28 07:09:49 UTC
I cannot reproduce this on 2.6.24-rc rawhide kernel
(kernel-PAE-2.6.24-0.42.rc3.git1.fc9, using 6GB RAM)

There were some changes (per BDI limits, dm-crypt bugfixes) in 2.6.24-rc kernel
which, I think, should prevent that.
(I expect that problem was related to committing too much work for internal
crypt threads.)
Please could you verify that it works with some 2.6.24 test kernel ? 

[changed fc6 -> fc-devel]