Bug 617200

Summary: mount operation failed and hung on some images which running in read-only mode
Product: [Community] Virtualization Tools Reporter: Richard W.M. Jones <rjones>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: unspecifiedCC: lilu, llim, mbooth, virt-maint
Target Milestone: ---Keywords: RHELNAK
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 617165 Environment:
Last Closed: 2010-07-22 15:55:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 617165    

Description Richard W.M. Jones 2010-07-22 13:30:21 UTC
+++ This bug was initially created as a clone of Bug #617165 +++

Description of problem:
1- If running images in read-only mode, SOME of them will cause I/O errors and hang forever during mount ro operation. Such as using "mount ro" in guestfish, or virt-ls, virt-cat tools on these images. 
2- But if we run these images in rw mode, this issue will not appear. And this issue will never happen with the image after this successful mount rw operation. Such as using "mount rw" or virt-edit tool on these images.
3- When "mount" operation executing in those images which have this issue, it shows "INFO: recovery required on readonly filesystem." at first. If this "mount" operation is ro, it will show some I/O error and hang. Otherwise the "mount" is rw, is will succeed and show "recovery completed."

Version-Release number of selected component (if applicable):
libguestfs-1.2.7-1.17.el6.x86_64

How reproducible:
always, unless after a successful "mount"

Steps to Reproduce:
1. running guestfish or virt-tools on a image which is "recovery required" in read-only mode.
2. if in guestfish, use mount ro. if using virt-tools, use virt-cat/virt-df/virt-ls etc.

Actual results:
mount failed with I/O errors showed (if using virt-tools, need env LIBGUESTFS_DEBUG=1 to show the info), then hung there

Expected results:
mount successfully, and output correct info of the image

Additional info:
the error output info while using virt-ls as an example is attached

--- Additional comment from lilu on 2010-07-22 07:25:02 EDT ---

Created an attachment (id=433671)
output info of virt-ls command with an image which has this problem

--- Additional comment from lilu on 2010-07-22 07:35:33 EDT ---

this issue happened on a RHEL5.5_32 image and a RHEL5.4_64 image

--- Additional comment from pm-rhel on 2010-07-22 07:42:44 EDT ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from pm-rhel on 2010-07-22 07:57:51 EDT ---

This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

--- Additional comment from qwan on 2010-07-22 08:40:39 EDT ---

after checked the images (boot it with qemu-kvm), there was some minor error in filesystem:
[...]
Mounting root filesystem.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access wilee be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
[...]

libguestfs should do some enhancement for such cases (quit after mount failed):

[host]# pstree -p 706
virt-ls(706)─┬─qemu-kvm(713)───{qemu-kvm}(715)
             └─virt-ls(714)
[host]# strace -p 706
Process 706 attached - interrupt to quit
select(7, [4 6], NULL, NULL, NULL^C <unfinished ...>
Process 706 detached
[host]# strace -p 714
Process 714 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
kill(713, SIG_0)                        = 0
kill(706, SIG_0)                        = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff68c46ac0)       = 0
kill(713, SIG_0)                        = 0
kill(706, SIG_0)                        = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff68c46ac0)       = 0
kill(713, SIG_0)                        = 0
kill(706, SIG_0)                        = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff68c46ac0)       = 0
kill(713, SIG_0)                        = 0
kill(706, SIG_0)                        = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff68c46ac0)       = 0
kill(713, SIG_0)                        = 0
kill(706, SIG_0)                        = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff68c46ac0)       = 0
kill(713, SIG_0)                        = 0
kill(706, SIG_0)                        = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff68c46ac0)       = 0
kill(713, SIG_0)                        = 0
kill(706, SIG_0)                        = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff68c46ac0)       = 0
kill(713, SIG_0)                        = 0
kill(706, SIG_0)                        = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff68c46ac0)       = 0
kill(713, SIG_0)                        = 0
kill(706, SIG_0)                        = 0

--- Additional comment from rjones on 2010-07-22 09:23:56 EDT ---

Reproducer #1:

$ cat make-image.sh 
#!/bin/sh -
guestfish <<EOF
sparse /tmp/test.img 1G
run
part-disk /dev/sda mbr
mkfs ext3 /dev/sda1
mount /dev/sda1 /
touch /hello
sync
sleep 1
kill-subprocess
EOF

$ ./make-image.sh
[ignore the error from the above command]
$ guestfish --ro -a /tmp/test.img -m /dev/sda1 -v

The error you will see is:

  mount -o ro /dev/vda1 /sysroot/
  EXT3-fs: INFO: recovery required on readonly filesystem.
  EXT3-fs: write access unavailable, cannot proceed.

Reproducer #2:


$ cat make-image.sh 
#!/bin/sh -
guestfish <<EOF
sparse /tmp/test.img 1G
run
part-disk /dev/sda mbr
pvcreate /dev/sda1
vgcreate VG /dev/sda1
lvcreate LV VG 128
mkfs ext3 /dev/VG/LV
mount /dev/VG/LV /
touch /hello
sync
sleep 1
kill-subprocess
EOF

$ ./make-image.sh
[ignore the error from the above command]
$ guestfish --ro -a /tmp/test.img -m /dev/VG/LV -v

The error you will see is different from reproducer #1 but
similar to the errors shown in Linglu's comment 1:

  mount -o ro /dev/VG/LV /sysroot/
  EXT3-fs: INFO: recovery required on readonly filesystem.
  EXT3-fs: write access will be enabled during recovery.
  end_request: I/O error, dev vda, sector 389
  Buffer I/O error on device dm-0, logical block 2
  lost page write due to I/O error on dm-0
  [more of these errors or hangs]

--- Additional comment from rjones on 2010-07-22 09:25:52 EDT ---

The solution to this is to remove the code which detects if the
readonly=yes|no option is supported by qemu.  Instead we should
just use snapshots as we do on Fedora.

http://git.annexia.org/?p=libguestfs.git;a=blob;f=src/guestfs.c;h=85a042a0a2eaf6b48d068f8e1d4ee0d854018e2f;hb=HEAD#l852

Since this is also an upstream bug, I will clone this and fix it
upstream first.

Comment 1 Richard W.M. Jones 2010-07-22 14:20:28 UTC
Patch posted upstream:
https://www.redhat.com/archives/libguestfs/2010-July/msg00065.html