Bug 742761

Summary: kvm guest boot failure due to unflushed /boot ext3 log
Product: Red Hat Enterprise Linux 6 Reporter: Moran Goldboim <mgoldboi>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED CANTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.3CC: acathrow, chellwig, knoel, mkenneth, rhod, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-08 13:20:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Moran Goldboim 2011-10-02 15:35:18 UTC
Description of problem:
Booting a rhel6 guest last forever, taking lots of cpu resources.
Avi has debugged the problem and saw that a char in the vm memory was replaced by a different one (though that in storage it was ok) which prevented the system to continue booting (replaced the "Vmlinuz" into "/mlinuz" in the boot command). 

Version-Release number of selected component (if applicable):
2.6.32-195.el6.x86_64
qemu-kvm-0.12.1.2-2.184.el6.x86_64

How reproducible:
happens always with this specific guest  

Steps to Reproduce:
no clear repo on how to get to this state but once there seems to be consistent.
  
Actual results:
guest fails to boot and stuck in endless loop

Expected results:


Additional info:

Comment 2 Avi Kivity 2011-10-03 15:16:10 UTC
/boot/grub/grub.conf on disk is truncated:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/mapper/vg_dhcp151128-lv_root
#          initrd /initrd-[generic-]version.img
#boot=/dev/vda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux (2.6.32-131.0.15.el6.x86_64)
	root (hd0,0)
	kernel /vmlinuz-2.6.32-131.0.15.el6.x86_64 ro root=/dev/mapper/vg_dhcp151128-lv_root rd_LVM_LV=vg_dhcp151128/lv_root rd_LVM_LV=vg_dhcp151128/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto rhgb quiet elevator=deadline processor.max_cstate=1
	initrd /


(note initrd ends in /).  The rest of the contents are in the ext3 journal, which grub doesn't replay.

Looks like after the last kernel install the guest rebooted without flushing its journals.

Comment 3 Avi Kivity 2011-10-03 15:36:05 UTC
Guest /etc/fstab:


#
# /etc/fstab
# Created by anaconda on Wed Aug 17 17:56:42 2011
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/vg_dhcp151128-lv_root /                       ext4    defaults        1 1
UUID=af1b9691-9f28-455a-b7ca-1eaa378382ec /boot                   ext4    defaults        1 2
/dev/mapper/vg_dhcp151128-lv_swap swap                    swap    defaults        0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0

Comment 4 Avi Kivity 2011-10-03 15:36:45 UTC
RHEL 6.2 beta host, RHEL 6.1 guest.

Comment 5 Avi Kivity 2011-10-03 15:37:29 UTC
Please detail the storage layout: ide/virtio, qcow2/raw, chained image or standalone image, etc.

Comment 6 Moran Goldboim 2011-10-03 16:02:20 UTC
Storage layout: virtio, qcow2 chained image on FC storage.

Comment 7 Dor Laor 2011-10-04 14:44:48 UTC
Can this get reproduced?
Was the VM shut-downed gracefully?

Comment 8 Moran Goldboim 2011-10-04 19:05:48 UTC
happened once, 1 out of 500 vms based on same template - all treated the same way - not gracefully.

Comment 9 Avi Kivity 2011-10-04 20:04:48 UTC
It happened to me quite a lot on phys hardware - before I learned to sync before rebooting.

What do you mean "not gracefully"? Were the guests rebooted hard?

Was the kernel or grub command line updated after the guests were provisioned from the template?

Comment 10 Avi Kivity 2011-10-04 20:05:20 UTC
I'd also like to look at the template itself.

Comment 11 Moran Goldboim 2011-10-04 20:42:52 UTC
Guests weren't doing much before hard rebooted and  grub command line wasn't updated after provisioning the guest from the templates.
I'll provide the template itself as well.

Comment 13 Avi Kivity 2011-10-09 11:01:26 UTC
The template's log it clear.