Description of problem: I am using cryptsetup to fully encrypt my harddisk (except for a small /boot partition to boot off). It works really well - I had to create my own initrd to include all the crypto and dm kernel modules, and cryptsetup - but it rocks. However, every "few" reboots (e.g. full reboot or recovering from a hibernate-to-disk) fsck reports the file system is unclean and kicks off a full fsck. Sometimes it finds nothing wrong, and sometimes it has to fix up some file - typically in /tmp. When I notice this, if I go back through the syslogs, I can see this was bound to happen, as the kernel would have been reporting a ext3 error beforehand. Sep 2 16:54:18 tnz-jhaar-lt kernel: EXT3-fs error (device dm-0): ext3_free_blocks_sb: bit already cleared for block 7548941 dm-0 is my ext3-based "/" partition. I also encrypt my swap partition - and that has never caused a problem BTW... I first saw this on FC6 and thought it indicated I had a bad disk. I replaced the disk and took the opportunity to install from scratch FC7. So maybe this is actually a hardware problem too - but that's pretty unlikely. Version-Release number of selected component (if applicable): FC7, fully updated via yum, running 2.6.22.4-65.fc7 cryptsetup-luks-1.0.3-4.fc7 How reproducible: Happens every reboot I do after those ext3 errors show up in syslog. Happened Aug 14, Aug 28 and Sep 6. Steps to Reproduce: 1. no idea 2. 3. Actual results: Expected results: Additional info:
(In reply to comment #0) > Sep 2 16:54:18 tnz-jhaar-lt kernel: EXT3-fs error (device dm-0): > ext3_free_blocks_sb: bit already cleared for block 7548941 > I first saw this on FC6 and thought it indicated I had a bad disk. I replaced > the disk and took the opportunity to install from scratch FC7. So maybe this is > actually a hardware problem too - but that's pretty unlikely. Maybe it is not the disk but the controller/mainboard or memory that is defect here. Can you run memtest86+ to check your memory? > dm-0 is my ext3-based "/" partition. I also encrypt my swap partition - and > that has never caused a problem BTW... Do you use your swap partition a lot or is it rather unused? Btw. there is already a patch for encrypted root being developed, maybe you want to use it and help there, see #124789
ah, and probably you can use smartmontools you check you hard drives. Do you have only one ext3 error every time you have one or are there several of it?
I've let memetst86+ run overnight on the machine - it detected no RAM problems. I have also sat down and gone through the syslogs. This problem with "EXT3-fs error" errors occurring happens minutes to hours after a reboot or hibernate-to-disk - and produces either one or many " EXT3-fs error" records. It just happened this morning when I brought my laptop into work - got 8500+ in a one minute period! I immediately rebooted and fsck'ed the disk - other than a bunch of inode errors, it looked fine. However, I can see two files in /lost+found from a few months ago - one is an /sbin/iscsid binary - something I don't think I've ever used...
Well big things have happened since last time. I managed to convince Dell it was a hardware fault, and they replaced both my harddisk again and my motherboard (with the disk controller). ...but here it is a week later and I've just had a serious occurance of the "EXT3-fs error" yet again. This time ext3 lost the inodes of 4 files on my harddisk - including /usr/bin/swatch and /lib/iptables/libipt_CLASSIFY.so - which means after the fsck they were GONE. This has got to be a software problem doesn't it? One thing. When I went to restore my FC7 system onto the new harddisk, I hit the same problem I had the first time - namely that the FC7 boot CD/DVD doesn't have any support for cryptsetup or the appropriate kern modules. So I couldn't use FC7 to actual create the cryptsetup partitions to restore onto. So I grabbed Ubuntu (which does support cryptsetup) and used it to create the encrypted partitions, and then restored onto that. So the question is: does the Sept release of cryptsetup on Ubuntu match what you'd expect? If not, if you can tell me how to create an encrypted partition using a FC7 DVD I'd be happy to do it again... BTW: "cryptsetup lukDump" and "cryptsetup status" don't return anything that looks like a version number. If there are issues with cryptsetup, probably been able to tell what version created a partition would help from a support perspective? Thanks Jason
I've just had a thought - could this be a configuration problem more than a software one? I created my own initrd to mount the encrypted root and swap partitions at boot time. mkblkdevs mkdir -p /dev/mapper cryptsetup luksOpen /dev/sda3 root-enc mkdir /mnt-root mount -t ext3 /dev/mapper/root-enc /mnt-root cryptsetup luksOpen /dev/sda2 swap-enc --key-file=/mnt-root/etc/crypto-swap.key umount /mnt-root resume /dev/mapper/swap-enc echo Creating root device. mkrootdev -t ext3 -o defaults,noreservation,ro /dev/mapper/root-enc but I've done nothing to correctly umount it all during a halt,reboot or (more importantly) hibernate. As it 99.99% works, I wonder if it could be that root&swap are umounted correctly - but there isn't a "cryptsetup remove"? Could that cause subtle corruption? BTW I don't use /etc/crypttab as I specifically mount root and swap in initrd...
I have reinstalled again - this time placing the dm_crypt root and swap partitions on top of LVM - which appears to be the more "correct" way (although a waste of time on a laptop IMHO). Nothing but Redhat tools were used to construct this version. Anyway, as Milan Broz requested - attached is the lvmdump from this system. More symptoms. I successfully suspended (via pm-hibernate) 6+ times today, each time it booted, initrd would ask for the password to unlock the root partition, and then called a password file on (the now unencrypted) /etc/ to decrypt the swap - so it could resume. All worked splendidly... Until the last time. It resumed well, but almost immediately started reporting Oct 18 17:59:17 tnz-jhaar-lt kernel: EXT3-fs error (device dm-2): ext3_free_blocks_sb: bit already cleared for block 7162385 Over 300 such events in ONE sec - and then it no more reports. But I didn't notice and merrily went about my business installing software and generally doing I/O. Then 2 hours later there was a sudden burst of over 700 events - and the system actually froze at that stage and I rebooted, and had to manually "fsck -y /" to fix it. Didn't lose any files that time - but I normally do :-( So this laptop has had it's disk replaced 3 times and it's motherboard twice on my instance this isn't a software problem. But this must be? Here's the section of my custom "init" in my initrd related to cryptsetup: echo Scanning logical volumes lvm vgscan --ignorelockingfailure echo Activating logical volumes lvm vgchange -ay --ignorelockingfailure VolGroup00 cryptsetup luksOpen /dev/VolGroup00/LogVol00 root-enc mkdir /mnt-root mount -t ext3 /dev/mapper/root-enc /mnt-root cryptsetup luksOpen /dev/VolGroup00/LogVol01 swap-enc --key-file=/mnt-root/etc/c rypto-swap.key umount /mnt-root resume /dev/mapper/swap-enc echo Creating root device. mkrootdev -t ext3 -o defaults,noreservation,ro /dev/mapper/root-enc echo Mounting root filesystem. mount /sysroot
Created attachment 230781 [details] lvmdump of affected system
There is now a discussion about corruptions with dm-crypt on the dm-crypt mailinglist: http://thread.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/2381 Maybe this is the same issue that you reported.
That looks like a different issue to mine. I have new piece of information (I am using dm-crypt to encrypt my entire system - both root (/) and swap. i.e. only /boot isn't encrypted, and initrd calls cryptsetup to initialize the crypto). My problem appears to occur exclusively when I suspend-to-disk. If I do a full shutdown and restart, then I never seem to trigger the problem. However, if I suspend, then there's around a 1-in-4 chance that ext3 will declare something's wrong and will do a full fsck. If I'm lucky, it will find nothing wrong, if I'm unlucky, files go missing. The fact that it "mostly" works makes me feel this cannot be a configuration or "things being done in the wrong order during hibernation" problem. This laptop has had all hardware components replaced (thanks Dell) and still has this symptom - so I'm left thinking this has to be a software problem.
Oh yeah - I've reinstalled this laptop in both LVM (with dm-crypt on top) and raw partition mode (i.e. dm-crypt on top of /dev/sda) and got the same issue - so this isn't a LVM problem for me. Jason
(In reply to comment #8) No, this is different issue (the issue you pointed out was caused by faulty hw). (In reply to comment #9) > My problem appears to occur exclusively when I suspend-to-disk. yes, this is very important information. Do you see a corruption without encrypted swap ? (using encrypted root filesystem only) (I am trying to find out in which part of the process corruption happens.)
You mean run it with unencrypted swap? OK, I've re-jigged it and we'll see what happens. I should know within a few days/week if it's going to happen or not
OK, I've been hauling my laptop between home and work all week, suspending to (unencrypted) disk exclusively, and have had ZERO problems. So it looks like this issue only occurs when the swap partition is encrypted - and then only some of the time. Hope that helps
Kernel problem, probably it sometimes lost data during hibernate and writing to swap through dm-crypt.
Hi there So is this a known problem, or should I be reporting it to someone else?... Thaks Jason PS: It is still working fine (ie suspend to disk) with unencrypted swap.
(In reply to comment #15) > So is this a known problem, or should I be reporting it to someone else?... If you are able to reproduce this on upstream kernel, maybe someone on kernel list could help. (I expect that some flush is missing in the process of suspend, so there is still some unfinished work in the crypt queue. Maybe things will complicate a little bit more because of recent changes in block layer - zero-sized barriers which are still rejected by DM targets. Just quick thoughts, this need some analysis...) I have this problem in my dm-crypt TODO list but currently there are some issues with higher priority.
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage There hasn't been much activity on this bug for a while. Jason, have you been able to test this on a upstream kernel. If not, do you need one building? Milan, have you been able to look further into this issue?
I've been running dm-crypto on my / partition - but have removed it from swap as that was where the problem was. I've just re-enabled that and will start suspending-to-disk again 100% crypto. I'll let you know in a few days if the problem comes back again
Well that didn't take long :-( I did 4 cycles of suspending encrypted disk to encrypted swap and each time let it reboot all the way up to a working state. It looked good. However, on the 4th time, it also looked good, but then 10 minutes after the final services had restarted/unfrozen, I started seeing these infamous words again kernel: EXT3-fs error (device dm-2): ext3_free_blocks_sb: bit already cleared for block 7907504 kernel: EXT3-fs error (device dm-2): ext3_free_inode: bit already cleared for inode 3935185 kernel: EXT3-fs warning (device dm-2): ext3_unlink: Deleting nonexistent file (3344191), 0 a real mess :-( Unless you have any other ideas, I'm going back to unencrypted swap - before I loose any more /usr/bin files...
We need add another sync in suspend path - I already read the code but still have no time to create a patch and some tests + kernel build. Anyway increasing severity of this bug.
FYI I have just replaced my Dell X300 with a Dell 430 laptop and moved up to FC8. I implemented encrypted "/" again - but left swap unencrypted due to this fault. It has been working 100% well for 3 weeks (hibernating to disk several times a week) - until today... Same problem - ext3 errors all over the place after coming out of hibernate/suspend. I've just rebooted, typed in the password to decrypt the disk and now I'm seeing ata1.00: BMDMA stat 0x25 ata1.00: cmd c8/00........ EXT3-fs: Can't read superblock on 2nd try mount failed. It's toast :-( Either my disk just died (yeah, right) or dm-crypt just killed it. I'm going back to unencrypted with encfs. That was rock-solid. :-(
The last issue seems to me like hw fault... or not? (There should be no problems in encrypted root only.)
Yes. I think I jumped the gun. It's just that I've had 3 disk replacements since I started using dm-crypt - I'm starting to blame it for everything. Do you know if there's any work on the encrypted-swap-and-hibernation bug I've been seeing? Thanks
Adding this bug to F9 blocker list because encrypted root and swap is supported configuration in F9 time frame.
Jason, please could you confirm my assumptions (from attached logs): - corruption was seen even on uniprocessor (no dualcore/SMP, just single CPU) - you are using different encryption for swap and root (aes+twofish)
Can we get a statement as to what this bug actually is, and if it's really a blocker for Fedora 9 (of which there is very little time left for development)?
(In reply to comment #25) > Jason, please could you confirm my assumptions (from attached logs): > - corruption was seen even on uniprocessor (no dualcore/SMP, just single CPU) > - you are using different encryption for swap and root (aes+twofish) > Sorry I took so long to answer this - I never received an email alert. I have had it on two laptops: one single-processor, and now this one - a dualcore. As far as what crypto type is in use, I *think* they are different. Can you tell me what command I could run that would tell me what dm-crypt settings are on each? Thanks Jason
(In reply to comment #27) > me what command I could run that would tell me what dm-crypt settings are on > each? Don't worry - lvmdump did the trick. With this newer machine I have been unsuccessful even with using the same crypto options for swap as well as the root - i.e aes-cbc-essiv:sha256
I was under the impression that hibernate was unsupported with encrypted swap - am I wrong here? I'm just going through the F9 blocker list. I realize that encrypted swap is the default with F9 if you tick the 'encrypt system' box in anaconda. I tried to hibernate my encrypted rawhide laptop and completely failed today - the system just booted when I turned it back on rather than resuming from swap. If it is true that hibernate is unsupported w/encrypted swap, then we're going to need a release note...
To do "proper" full disk encrytion (like all the commercial Windows products do BTW...), you really have to encrypt the swap. What's really missing with cryptsetup is some form of kernel password storage area, where a "cryptsetkey" command early in the initrd boot process could prompt for the password, and then use it on any future invocation of cryptsetup. That way you could prompt for the password, and then use it to decrypt swap and/or root before doing the resume. Without it I for one am stuck in the hand-crafted hell of creating a static password file on (encrypted) root, and running cryptsetup on root first to grab the key file to decrypt swap - before the resume! I only came up with the "cryptsetkey" concept last night - I might have to harass the cryptsetup author about it :-) After it had mounted everything it needed to in initrd, you could run "cryptsetkey --delete" to trash the password from "kernel memory" (I'm no programmer - but hopefully you get the gist ;-) So to get back to your question, yes - you are probably correct. However, I think it's a bit bizarre Linux distributions still have figured out how to do "proper" whole disk encryption when Windows figured it out many years ago. :-(
(In reply to comment #30) > That way you could prompt for the password, and then use it to decrypt swap > and/or root before doing the resume. Without it I for one am stuck in the > hand-crafted hell of creating a static password file on (encrypted) root, and > running cryptsetup on root first to grab the key file to decrypt swap - before > the resume! Exactly this is now possible in Fedora9. It asks for LUKS password before running LVM scan, so all physical volume can be fully encrypted. And resume runs from logical volume mapped to swap on this encrypted volume. I run two notebooks, both (in simple tests) resumed from encrypted swap. (But the bug this bugzilla is about is still here, just I wasn't able to reproduce it without additional hacks yet). Anyway, I saw other problem: because root and swap are encrypted, standard initscrits don't correctly umount/remove encryption mapping (because cryptsetup and initscripts runs from device, which need to be umounted'luksClosed!). So there should be some shutdown ramdisk or so (I need to check in recent version of F9, if it is still true, it need new bug report). > I only came up with the "cryptsetkey" concept last night - I might have to > harass the cryptsetup author about it :-) After it had mounted everything it > needed to in initrd, you could run "cryptsetkey --delete" to trash the password > from "kernel memory" (I'm no programmer - but hopefully you get the gist ;-) You mean password for unlocking LUKS? It is not stored in memory after unlocking IMHO. Wipe master key (used for encryption algorithm in dm-crypt) from kernel memory command is already supported in dm-crypt kernel module through dm message interface... no idea why it is not used. (already possible with dmsetup - see some thread on dm-crypt mailing list) I'll make some more notes to this bugzilla later, just currently busy with some other work, sorry.
what about remounting readonly? Having to add a ramdisk just to do shutdown cleanly is a bit severe. Can't remounting readonly before powering off get around the problem? Or is it that dm-crypt still has some unfinished writes hanging about? If so, wouldn't that be a bug? jason
Sure, it remounts read-only if it cannot umount. But this is enough for non-encrypted system, not for dm-crypted one. Master key is still in memory after read-only remount (possible DRAM data retention attack etc.) I am not sure if dm-crypt internal queue is flushed here properly (but sync should be enough here in shutdown path - so probably not big problem).
Ok, our default setup, which is swap as part of LVM, and encrypting the LVM physical volume, works just fine with suspend and hibernate. I'm going to remove this from the blocker list, as it's not really a case our installers will hit.
I talked to the primary authors of dm-crypt this week and they said I should be doing that too. I'll reinstall my laptop next week - and put both swap and root into the same LVM. We'll see what that does :-)
I've been running FC8 on a LVM'ed cm-crypt volume as per the above suggestions for over 3 weeks now with ZERO problems. That appears to be it! Having separate dm-crypt partitions for swap and root was the problem - putting both on the same dm-crypt partition appears to have solved everything for me :-) Crypt rocks. So yes, I'm now looking forward to FC9 with the supported crypto - no more need for me to manually create initrd's :-) Thanks again! Jason
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Hi, I have the same problem, but I don't use neither LVM nor crypted filesystem and I get similar error: kernel: EXT3-fs error (device /dev/sda1): ext3_free_blocks_sb: bit already cleared for block 7907504 And it raised after updating to kernel 2.6.25.6. After shutting down the computer and new start of system, I get this error and whole partition is bad. I have full my root partition because file /var/log/messages saturate whole root partition. If I delete this file, partition is still full. And after checking partition with e3fsck I get many errors. And now I'm not able to boot system. Booting failed.
The bug reported in comment#38 is something related to ext3 corruption, for sure not related to volume encryption. I was not able to reproduce it and several kernel version (and even Fedora version) was released since this bug was opened... Closing this bug, if you still see a corruption when using recent Fedora version amd encrypted swap, please open new bug with the exact description of kernel version and how to reproduce it. (Because F9 and F10 supports encryption in installer and no bug reports so far I expect it is fixed...)