Red Hat Bugzilla – Bug 281691
kernel dm crypt: ext3 fs errors if using encrypted swap and suspend
Last modified: 2013-02-28 23:05:50 EST
Description of problem:
I am using cryptsetup to fully encrypt my harddisk (except for a small /boot
partition to boot off). It works really well - I had to create my own initrd to
include all the crypto and dm kernel modules, and cryptsetup - but it rocks.
However, every "few" reboots (e.g. full reboot or recovering from a
hibernate-to-disk) fsck reports the file system is unclean and kicks off a full
fsck. Sometimes it finds nothing wrong, and sometimes it has to fix up some file
- typically in /tmp.
When I notice this, if I go back through the syslogs, I can see this was bound
to happen, as the kernel would have been reporting a ext3 error beforehand.
Sep 2 16:54:18 tnz-jhaar-lt kernel: EXT3-fs error (device dm-0):
ext3_free_blocks_sb: bit already cleared for block 7548941
dm-0 is my ext3-based "/" partition. I also encrypt my swap partition - and that
has never caused a problem BTW...
I first saw this on FC6 and thought it indicated I had a bad disk. I replaced
the disk and took the opportunity to install from scratch FC7. So maybe this is
actually a hardware problem too - but that's pretty unlikely.
Version-Release number of selected component (if applicable):
FC7, fully updated via yum, running 188.8.131.52-65.fc7
Happens every reboot I do after those ext3 errors show up in syslog. Happened
Aug 14, Aug 28 and Sep 6.
Steps to Reproduce:
1. no idea
(In reply to comment #0)
> Sep 2 16:54:18 tnz-jhaar-lt kernel: EXT3-fs error (device dm-0):
> ext3_free_blocks_sb: bit already cleared for block 7548941
> I first saw this on FC6 and thought it indicated I had a bad disk. I replaced
> the disk and took the opportunity to install from scratch FC7. So maybe this is
> actually a hardware problem too - but that's pretty unlikely.
Maybe it is not the disk but the controller/mainboard or memory that is defect
here. Can you run memtest86+ to check your memory?
> dm-0 is my ext3-based "/" partition. I also encrypt my swap partition - and
> that has never caused a problem BTW...
Do you use your swap partition a lot or is it rather unused?
Btw. there is already a patch for encrypted root being developed, maybe you want
to use it and help there, see #124789
ah, and probably you can use smartmontools you check you hard drives. Do you
have only one ext3 error every time you have one or are there several of it?
I've let memetst86+ run overnight on the machine - it detected no RAM problems.
I have also sat down and gone through the syslogs. This problem with "EXT3-fs
error" errors occurring happens minutes to hours after a reboot or
hibernate-to-disk - and produces either one or many " EXT3-fs error" records. It
just happened this morning when I brought my laptop into work - got 8500+ in a
one minute period! I immediately rebooted and fsck'ed the disk - other than a
bunch of inode errors, it looked fine. However, I can see two files in
/lost+found from a few months ago - one is an /sbin/iscsid binary - something I
don't think I've ever used...
Well big things have happened since last time. I managed to convince Dell it was
a hardware fault, and they replaced both my harddisk again and my motherboard
(with the disk controller).
...but here it is a week later and I've just had a serious occurance of the
"EXT3-fs error" yet again. This time ext3 lost the inodes of 4 files on my
harddisk - including /usr/bin/swatch and /lib/iptables/libipt_CLASSIFY.so -
which means after the fsck they were GONE.
This has got to be a software problem doesn't it?
One thing. When I went to restore my FC7 system onto the new harddisk, I hit the
same problem I had the first time - namely that the FC7 boot CD/DVD doesn't have
any support for cryptsetup or the appropriate kern modules. So I couldn't use
FC7 to actual create the cryptsetup partitions to restore onto. So I grabbed
Ubuntu (which does support cryptsetup) and used it to create the encrypted
partitions, and then restored onto that.
So the question is: does the Sept release of cryptsetup on Ubuntu match what
you'd expect? If not, if you can tell me how to create an encrypted partition
using a FC7 DVD I'd be happy to do it again...
BTW: "cryptsetup lukDump" and "cryptsetup status" don't return anything that
looks like a version number. If there are issues with cryptsetup, probably been
able to tell what version created a partition would help from a support
I've just had a thought - could this be a configuration problem more than a
I created my own initrd to mount the encrypted root and swap partitions at boot
mkdir -p /dev/mapper
cryptsetup luksOpen /dev/sda3 root-enc
mount -t ext3 /dev/mapper/root-enc /mnt-root
cryptsetup luksOpen /dev/sda2 swap-enc --key-file=/mnt-root/etc/crypto-swap.key
echo Creating root device.
mkrootdev -t ext3 -o defaults,noreservation,ro /dev/mapper/root-enc
but I've done nothing to correctly umount it all during a halt,reboot or (more
As it 99.99% works, I wonder if it could be that root&swap are umounted
correctly - but there isn't a "cryptsetup remove"? Could that cause subtle
BTW I don't use /etc/crypttab as I specifically mount root and swap in initrd...
I have reinstalled again - this time placing the dm_crypt root and swap
partitions on top of LVM - which appears to be the more "correct" way (although
a waste of time on a laptop IMHO). Nothing but Redhat tools were used to
construct this version.
Anyway, as Milan Broz requested - attached is the lvmdump from this system.
More symptoms. I successfully suspended (via pm-hibernate) 6+ times today, each
time it booted, initrd would ask for the password to unlock the root partition,
and then called a password file on (the now unencrypted) /etc/ to decrypt the
swap - so it could resume. All worked splendidly...
Until the last time. It resumed well, but almost immediately started reporting
Oct 18 17:59:17 tnz-jhaar-lt kernel: EXT3-fs error (device dm-2):
ext3_free_blocks_sb: bit already cleared for block 7162385
Over 300 such events in ONE sec - and then it no more reports. But I didn't
notice and merrily went about my business installing software and generally
Then 2 hours later there was a sudden burst of over 700 events - and the system
actually froze at that stage and I rebooted, and had to manually "fsck -y /" to
fix it. Didn't lose any files that time - but I normally do :-(
So this laptop has had it's disk replaced 3 times and it's motherboard twice on
my instance this isn't a software problem. But this must be?
Here's the section of my custom "init" in my initrd related to cryptsetup:
echo Scanning logical volumes
lvm vgscan --ignorelockingfailure
echo Activating logical volumes
lvm vgchange -ay --ignorelockingfailure VolGroup00
cryptsetup luksOpen /dev/VolGroup00/LogVol00 root-enc
mount -t ext3 /dev/mapper/root-enc /mnt-root
cryptsetup luksOpen /dev/VolGroup00/LogVol01 swap-enc --key-file=/mnt-root/etc/c
echo Creating root device.
mkrootdev -t ext3 -o defaults,noreservation,ro /dev/mapper/root-enc
echo Mounting root filesystem.
Created attachment 230781 [details]
lvmdump of affected system
There is now a discussion about corruptions with dm-crypt on the dm-crypt
Maybe this is the same issue that you reported.
That looks like a different issue to mine.
I have new piece of information (I am using dm-crypt to encrypt my entire system
- both root (/) and swap. i.e. only /boot isn't encrypted, and initrd calls
cryptsetup to initialize the crypto).
My problem appears to occur exclusively when I suspend-to-disk. If I do a full
shutdown and restart, then I never seem to trigger the problem. However, if I
suspend, then there's around a 1-in-4 chance that ext3 will declare something's
wrong and will do a full fsck. If I'm lucky, it will find nothing wrong, if I'm
unlucky, files go missing. The fact that it "mostly" works makes me feel this
cannot be a configuration or "things being done in the wrong order during
This laptop has had all hardware components replaced (thanks Dell) and still has
this symptom - so I'm left thinking this has to be a software problem.
Oh yeah - I've reinstalled this laptop in both LVM (with dm-crypt on top) and
raw partition mode (i.e. dm-crypt on top of /dev/sda) and got the same issue -
so this isn't a LVM problem for me.
(In reply to comment #8)
No, this is different issue (the issue you pointed out was caused by faulty hw).
(In reply to comment #9)
> My problem appears to occur exclusively when I suspend-to-disk.
yes, this is very important information.
Do you see a corruption without encrypted swap ?
(using encrypted root filesystem only)
(I am trying to find out in which part of the process corruption happens.)
You mean run it with unencrypted swap?
OK, I've re-jigged it and we'll see what happens. I should know within a few
days/week if it's going to happen or not
OK, I've been hauling my laptop between home and work all week, suspending to
(unencrypted) disk exclusively, and have had ZERO problems.
So it looks like this issue only occurs when the swap partition is encrypted -
and then only some of the time.
Hope that helps
Kernel problem, probably it sometimes lost data during hibernate and writing to
swap through dm-crypt.
So is this a known problem, or should I be reporting it to someone else?...
PS: It is still working fine (ie suspend to disk) with unencrypted swap.
(In reply to comment #15)
> So is this a known problem, or should I be reporting it to someone else?...
If you are able to reproduce this on upstream kernel, maybe someone on kernel
list could help.
(I expect that some flush is missing in the process of suspend, so there is
still some unfinished work in the crypt queue. Maybe things will complicate a
little bit more because of recent changes in block layer - zero-sized barriers
which are still rejected by DM targets. Just quick thoughts, this need some
I have this problem in my dm-crypt TODO list but currently there are some issues
with higher priority.
I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.
There hasn't been much activity on this bug for a while.
Jason, have you been able to test this on a upstream kernel. If not, do you need
Milan, have you been able to look further into this issue?
I've been running dm-crypto on my / partition - but have removed it from swap as
that was where the problem was.
I've just re-enabled that and will start suspending-to-disk again 100% crypto.
I'll let you know in a few days if the problem comes back again
Well that didn't take long :-(
I did 4 cycles of suspending encrypted disk to encrypted swap and each time let
it reboot all the way up to a working state. It looked good.
However, on the 4th time, it also looked good, but then 10 minutes after the
final services had restarted/unfrozen, I started seeing these infamous words again
kernel: EXT3-fs error (device dm-2): ext3_free_blocks_sb: bit already cleared
for block 7907504
kernel: EXT3-fs error (device dm-2): ext3_free_inode: bit already cleared for
kernel: EXT3-fs warning (device dm-2): ext3_unlink: Deleting nonexistent file
a real mess :-(
Unless you have any other ideas, I'm going back to unencrypted swap - before I
loose any more /usr/bin files...
We need add another sync in suspend path - I already read the code but still
have no time to create a patch and some tests + kernel build.
Anyway increasing severity of this bug.
FYI I have just replaced my Dell X300 with a Dell 430 laptop and moved up to FC8.
I implemented encrypted "/" again - but left swap unencrypted due to this fault.
It has been working 100% well for 3 weeks (hibernating to disk several times a
week) - until today...
Same problem - ext3 errors all over the place after coming out of
hibernate/suspend. I've just rebooted, typed in the password to decrypt the disk
and now I'm seeing
ata1.00: BMDMA stat 0x25
ata1.00: cmd c8/00........
EXT3-fs: Can't read superblock on 2nd try
It's toast :-(
Either my disk just died (yeah, right) or dm-crypt just killed it.
I'm going back to unencrypted with encfs. That was rock-solid. :-(
The last issue seems to me like hw fault... or not?
(There should be no problems in encrypted root only.)
Yes. I think I jumped the gun. It's just that I've had 3 disk replacements since
I started using dm-crypt - I'm starting to blame it for everything.
Do you know if there's any work on the encrypted-swap-and-hibernation bug I've
Adding this bug to F9 blocker list because encrypted root and swap is supported
configuration in F9 time frame.
Jason, please could you confirm my assumptions (from attached logs):
- corruption was seen even on uniprocessor (no dualcore/SMP, just single CPU)
- you are using different encryption for swap and root (aes+twofish)
Can we get a statement as to what this bug actually is, and if it's really a
blocker for Fedora 9 (of which there is very little time left for development)?
(In reply to comment #25)
> Jason, please could you confirm my assumptions (from attached logs):
> - corruption was seen even on uniprocessor (no dualcore/SMP, just single CPU)
> - you are using different encryption for swap and root (aes+twofish)
Sorry I took so long to answer this - I never received an email alert.
I have had it on two laptops: one single-processor, and now this one - a dualcore.
As far as what crypto type is in use, I *think* they are different. Can you tell
me what command I could run that would tell me what dm-crypt settings are on each?
(In reply to comment #27)
> me what command I could run that would tell me what dm-crypt settings are on
Don't worry - lvmdump did the trick.
With this newer machine I have been unsuccessful even with using the same crypto
options for swap as well as the root - i.e aes-cbc-essiv:sha256
I was under the impression that hibernate was unsupported with encrypted swap -
am I wrong here? I'm just going through the F9 blocker list. I realize that
encrypted swap is the default with F9 if you tick the 'encrypt system' box in
anaconda. I tried to hibernate my encrypted rawhide laptop and completely
failed today - the system just booted when I turned it back on rather than
resuming from swap.
If it is true that hibernate is unsupported w/encrypted swap, then we're going
to need a release note...
To do "proper" full disk encrytion (like all the commercial Windows products do
BTW...), you really have to encrypt the swap.
What's really missing with cryptsetup is some form of kernel password storage
area, where a "cryptsetkey" command early in the initrd boot process could
prompt for the password, and then use it on any future invocation of cryptsetup.
That way you could prompt for the password, and then use it to decrypt swap
and/or root before doing the resume. Without it I for one am stuck in the
hand-crafted hell of creating a static password file on (encrypted) root, and
running cryptsetup on root first to grab the key file to decrypt swap - before
I only came up with the "cryptsetkey" concept last night - I might have to
harass the cryptsetup author about it :-) After it had mounted everything it
needed to in initrd, you could run "cryptsetkey --delete" to trash the password
from "kernel memory" (I'm no programmer - but hopefully you get the gist ;-)
So to get back to your question, yes - you are probably correct. However, I
think it's a bit bizarre Linux distributions still have figured out how to do
"proper" whole disk encryption when Windows figured it out many years ago. :-(
(In reply to comment #30)
> That way you could prompt for the password, and then use it to decrypt swap
> and/or root before doing the resume. Without it I for one am stuck in the
> hand-crafted hell of creating a static password file on (encrypted) root, and
> running cryptsetup on root first to grab the key file to decrypt swap - before
> the resume!
Exactly this is now possible in Fedora9. It asks for LUKS password before
running LVM scan, so all physical volume can be fully encrypted. And resume runs
from logical volume mapped to swap on this encrypted volume.
I run two notebooks, both (in simple tests) resumed from encrypted swap.
(But the bug this bugzilla is about is still here, just I wasn't able to
reproduce it without additional hacks yet).
Anyway, I saw other problem: because root and swap are encrypted, standard
initscrits don't correctly umount/remove encryption mapping (because cryptsetup
and initscripts runs from device, which need to be umounted'luksClosed!).
So there should be some shutdown ramdisk or so (I need to check in recent
version of F9, if it is still true, it need new bug report).
> I only came up with the "cryptsetkey" concept last night - I might have to
> harass the cryptsetup author about it :-) After it had mounted everything it
> needed to in initrd, you could run "cryptsetkey --delete" to trash the password
> from "kernel memory" (I'm no programmer - but hopefully you get the gist ;-)
You mean password for unlocking LUKS? It is not stored in memory after unlocking
Wipe master key (used for encryption algorithm in dm-crypt) from kernel memory
command is already supported in dm-crypt kernel module through dm message
interface... no idea why it is not used. (already possible with dmsetup - see
some thread on dm-crypt mailing list)
I'll make some more notes to this bugzilla later, just currently busy with some
other work, sorry.
what about remounting readonly? Having to add a ramdisk just to do shutdown
cleanly is a bit severe. Can't remounting readonly before powering off get
around the problem? Or is it that dm-crypt still has some unfinished writes
hanging about? If so, wouldn't that be a bug?
Sure, it remounts read-only if it cannot umount. But this is enough for
non-encrypted system, not for dm-crypted one.
Master key is still in memory after read-only remount (possible DRAM data
retention attack etc.)
I am not sure if dm-crypt internal queue is flushed here properly (but sync
should be enough here in shutdown path - so probably not big problem).
Ok, our default setup, which is swap as part of LVM, and encrypting the LVM
physical volume, works just fine with suspend and hibernate. I'm going to
remove this from the blocker list, as it's not really a case our installers will
I talked to the primary authors of dm-crypt this week and they said I should be
doing that too.
I'll reinstall my laptop next week - and put both swap and root into the same
LVM. We'll see what that does :-)
I've been running FC8 on a LVM'ed cm-crypt volume as per the above suggestions
for over 3 weeks now with ZERO problems.
That appears to be it! Having separate dm-crypt partitions for swap and root was
the problem - putting both on the same dm-crypt partition appears to have solved
everything for me :-)
So yes, I'm now looking forward to FC9 with the supported crypto - no more need
for me to manually create initrd's :-)
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
Hi, I have the same problem, but I don't use neither LVM nor crypted filesystem
and I get similar error:
kernel: EXT3-fs error (device /dev/sda1): ext3_free_blocks_sb: bit already cleared
for block 7907504
And it raised after updating to kernel 184.108.40.206. After shutting down the
computer and new start of system, I get this error and whole partition is bad. I
have full my root partition because file /var/log/messages saturate whole root
partition. If I delete this file, partition is still full. And after checking
partition with e3fsck I get many errors. And now I'm not able to boot system.
The bug reported in comment#38 is something related to ext3 corruption, for sure not related to volume encryption.
I was not able to reproduce it and several kernel version (and even Fedora version) was released since this bug was opened...
Closing this bug, if you still see a corruption when using recent Fedora version amd encrypted swap, please open new bug with the exact description of kernel version and how to reproduce it.
(Because F9 and F10 supports encryption in installer and no bug reports so far I expect it is fixed...)