Red Hat Bugzilla – Bug 1465526
RuntimeError: mount_ro: /dev/cl/root on / (options: 'ro'): mount: mount /dev/mapper/cl-root on /sysroot failed: Structure needs cleaning
Last modified: 2017-07-05 05:15:33 EDT
Description of problem:
Using guestfs python bindings:
r = libguestfsmod.mount_ro(self._o, mountable, mountpoint)
E RuntimeError: mount_ro: /dev/cl/root on / (options: 'ro'): mount: mount
/dev/mapper/cl-root on /sysroot failed: Structure needs cleaning
libguestfs version: 1.36.4
Host is Fedora 26
The guest image is CentOS 7.3
For now only noticed it failing once.
The full logs(with libguestfs debug mode), are available in :
I think your filesystem is corrupt. If you look in the long log
just before the error you'll see lots of lines like:
[ 17.623837] XFS (dm-1): Metadata corruption detected at xfs_inode_buf_verify+0x73/0xf0 [xfs], xfs_inode block 0x509000
[ 17.648406] XFS (dm-1): Unmount and run xfs_repair
[ 17.659452] XFS (dm-1): First 64 bytes of corrupted metadata buffer:
[ 17.674282] ffffa380007a0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 17.694213] ffffa380007a0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 17.714461] ffffa380007a0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 17.734386] ffffa380007a0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 17.754657] XFS (dm-1): metadata I/O error: block 0x509000 ("xlog_recover_do..(read#2)") error 117 numblks 32
[ 17.783402] XFS (dm-1): log mount/recovery failed: error -117
[ 17.797346] XFS (dm-1): log mount failed
Thanks, missed that. I Wonder how the corruption happened - the disk was freshly created in the beginning of the test, and right before this exception was thrown and right after it, two other tests that used the exact same method passed(only different paths were attempted to be copied). Looking at the libguestfs debug output on the tests that did pass, I don't see those XFS errors.
The host itself is a VM(if that matters).
good output before:
good output after:
In general terms, provided there are no kernel or qemu bugs, libguestfs
guarantees that your changes are written to disk when you call
guestfs_shutdown on the handle, and it looks as if you are doing that.
Is it possible two handles are open read-write on the same disk? This
would cause instant corruption. (Or if something else not libguestfs
has the disk open for writes, eg a live VM).
I looked at the traces you supplied and I cannot see anything bad. You're
using the API correctly as far as I can tell.
I'm not writing anything to the guest with guestfs, it uses the 'mount_ro' call, and then should attempt to copy a file from the guest to the host(though it failed on the mount_ro before that).
I don't think there are any other handles open, however the VM is live indeed. That should be safe for 'mount_ro', no?
Few runs have passed and didn't see it happening again,
The VM could have not flushed all the changes to the disk, so when trying to mount the filesystem it is detected as corrupted, potentially changing its metadata to be able to mount it even in read-only mode; in case the disk not in read-only mode, this means writing to the same disk used by the running VM -> big problems ahead.
What is the exact add_drive command you are using? Does it include readonly=True? Or are using add_drive_ro, perhaps?
It is :
add_drive_opts(disk_path, format='qcow2', readonly=1)
Right, as Pino says it's safe to use mount_ro on a live VM, but
that doesn't mean you'll get consistent results. The guest can
be in the middle of writing to the disk and you may see those
writes in any order or not at all which can confuse the libguestfs
Also qcow2 makes this worse since you might not only be dealing
with partial guest changes, but partial qcow2 metadata changes.
Raw is a bit better.
You just have to close and retry if this happens.
If you really want a consistent view of a guest then you can get
qemu to export a point-in-time snapshot as NBD (even though the guest
is live and continues running) which libguestfs can read, but it
involves sending commands to the qemu monitor of the guest.
All right, using re-tries for now, seems to work.
Thanks for the explanations.
I have set retries - but at least for the one time I caught 'mount_ro' operation failing again, all retries afterwards failed as well. This led me to suspect, I'm not retrying properly. So my question is:
Is retrying over 'mount_ro' enough? Or I need to shut down the client, open a new connection, and start again from 'add_drive_ro'?
No it's definitely not enough. You must close and reopen the handle.