Bug 426886 - snapshot creation fails non-deterministically with corrupt metadata in some tmpfs situations
Summary: snapshot creation fails non-deterministically with corrupt metadata in some t...
Alias: None
Product: Fedora
Classification: Fedora
Component: device-mapper-obsolete
Version: 8
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: LVM and device-mapper development team
QA Contact: Fedora Extras Quality Assurance
Depends On:
TreeView+ depends on / blocked
Reported: 2007-12-27 23:48 UTC by Jane Dogalt
Modified: 2009-01-09 05:38 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2009-01-09 05:38:14 UTC

Attachments (Terms of Use)

Description Jane Dogalt 2007-12-27 23:48:22 UTC
Description of problem:

the stock f8 livecd utilizes a dm-snapshot for the rootfs, created during
initramfs init.  The snapshot device is a loopback device associated with a file
created via the standard dd if=/dev/null of=/overlay count=0/1 seek=size.

for the f8 livecd this file lives in the root of the initramfs tmpfs which is no
longer visible after the boot sequence invokes movemount via nash.

This seems to work stably, as there would be many reports if not.

When I change this mechanism, such that a new tmpfs is mounted and dedicated to
this task, and create the snapshot file there, I get a _non-deterministic_
failure, with messages about not being able to create the snapshot due to
corrupt metadata.

I will add detail to this bug, but I just wanted to get something in the system.
 I can provide an initramfs, which can be booted with qemu -kernel -initrd
-append repeatedly, which will illustrate both the failure, and the non-determinism.

I have also encountered the failure mode when booting such a custom livecd on
actual hardware, so it is not just a qemu thing.  And when I simply comment out
the mounting of the tmpfs, such that the file lives on the initramfs tmpfs, I
cannot get the failure mode to happen.

again, if this bug isn't easy to resolve and sits around for awhile, I'll go
ahead and put together more details (exact error message) and test cases
(initramfs for download) as time goes on.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:

Additional info:

Comment 1 Alasdair Kergon 2007-12-28 01:30:04 UTC
Could be several things: if udev is doing anything, that can lead to problems
like this; or a stack of block devices not getting flushed (used to often happen
with loop, but AFAIK it got fixed).

Try putting sleeps and/or syncs (or explicit blockdev --flushbufs) between
different pairs of commands and see if any of them make the problem go away.

Comment 2 Milan Broz 2007-12-28 09:22:39 UTC
Some loop-related problems (dirty pages manipulation) was solved in 2.6.24-rc,
maybe you can try rawhide kernel too and check if there is some change (anyway,
using sync is in loop operation helps too).

Comment 3 Jane Dogalt 2008-01-08 07:13:14 UTC
Haven't tried 2.6.24-rc, but I did just try an excessive set of syncs and sleeps
between the mounting tmpfs, the file creation, the losetup, and the
dmsetup-create.  To no avail.  Though I do have a nice reproducibility setup
where I will soon post a kernel and initrd (kernel and non init script files in
initrd are all f8), and a qemu commandline and script, which when run, will
produce a dozen sets of boot messages, approx half success, half failure.

And when running the qemu command manually, you get dropped to a shell right
after the dmsetup failure.

I've never been much of a rawhide person, any chance this fix will hit an f8
updates kernel?

Comment 4 Jane Dogalt 2008-01-08 07:20:26 UTC
it will take a bit more work to package the reproducibility setup nicely, but
here is the relevent init script code FWIW (note, if mount line is commented
out, never fails)

    mkdir -p /mnt/.LiveOS/overlayfs
    zyx_root_overlay_loopdev=$( losetup -f )

        sleep 2 ; sync ; sleep 2 ; sync ; sleep 2 ; sync
        mount -n -t tmpfs -o mode=0755 none /mnt/.LiveOS/overlayfs
        sleep 2 ; sync ; sleep 2 ; sync ; sleep 2 ; sync
        dd if=/dev/null of=/mnt/.LiveOS/overlayfs/dmoverlay \
            bs=1024 count=1 seek=$((1*1024*1024)) > /dev/null 2>&1
        sleep 2 ; sync ; sleep 2 ; sync ; sleep 2 ; sync
        losetup $zyx_root_overlay_loopdev /mnt/.LiveOS/overlayfs/dmoverlay
        sleep 2 ; sync ; sleep 2 ; sync ; sleep 2 ; sync

    dmsetup create zyx-liveos-rw --table "0 $( blockdev --getsize
$zyx_root_base_loopdev ) snapshot $zyx_root_base_loopdev
$zyx_root_overlay_loopdev p 8"


and the occasional failure message:

ZyX initramfs: preparing copy-on-write root filesystem
squashfs: version 3.2-r2 (2007/01/15) Phillip Lougher
Clocksource tsc unstable (delta = 97890389 ns)
Time: pit clocksource has been installed.
device-mapper: snapshots: Invalid or corrupt snapshot
device-mapper: table: 253:0: snapshot: Failed to read snapshot metadata
device-mapper: ioctl: error adding target to table
device-mapper: reload ioctl failed: No such device or address
Command failed

ZyX initramfs: /init panic- unexpected problem occurred!
ZyX initramfs: entering debug mode ...

Comment 5 Bug Zapper 2008-11-26 09:11:32 UTC
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 

Comment 6 Bug Zapper 2009-01-09 05:38:14 UTC
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.