Bug 1111800 - virt-sparsify clobbered my MBR
Summary: virt-sparsify clobbered my MBR
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: libguestfs
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Richard W.M. Jones
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-21 16:04 UTC by Tom Horsley
Modified: 2014-06-24 21:10 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-06-24 21:10:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
debug output from virt-sparsify -v -x (53.76 KB, text/plain)
2014-06-23 21:49 UTC, Tom Horsley
no flags Details
debug output from sparsify that broke booting (53.59 KB, text/plain)
2014-06-24 01:06 UTC, Tom Horsley
no flags Details
Here's what the boot screen looks like. (20.41 KB, image/png)
2014-06-24 01:07 UTC, Tom Horsley
no flags Details

Description Tom Horsley 2014-06-21 16:04:23 UTC
Description of problem:

I ran virt-sparsify on a centos 6.5 virtual machine image, and the new sparse image would no longer boot. The bios screen said booting from hard disk... and then the cpu went to 100% and stayed there as long as I was willing to wait.
(original and new images were qcow2, partition inside the virtual machine was ext4).

I fixed the problem by connecting a livecd image to the cdrom, booting the virtual machine from the livecd, then chrooting into the disk image and running grub-install. After that it seemed to boot fine, and everything seems to function correctly.


Version-Release number of selected component (if applicable):
libguestfs-tools-c-1.26.3-2.fc20.x86_64


How reproducible:
ONly did it once.


Steps to Reproduce:
1.virt-sparsify centinil.qcow2 centinil-base.qcow2
2.
3.

Actual results:
new image won't boot (but is a lot smaller :-).

Expected results:
still able to boot

Additional info:

The history of this centos 6.5 virtual machine is a little weird, and might have had an effect on the MBR clobbering:

I first installed it with /dev/vda1 as swap and /dev/vda2 as ext4 root.

I later realized I'd have a much smaller image to backup if I split the swap disk out as a separate virtual disk and just backed up the disk image that holds the os.

I booted the virtual machine with a live CD and ran gparted to delete the /dev/vda1 swap partition and grow the /dev/vda2 / partition as large as possible. This left me with a slightly weird configuration where I only have the /dev/vda2 partition and there isn't a /dev/vda1 partition, but everything seems to work, and I didn't have to fiddle grub or fstab to keep it booting.

This was the state it was in when I ran virt-sparsify and the MBR seemed to get clobbered.

Comment 1 Richard W.M. Jones 2014-06-21 22:07:06 UTC
(In reply to Tom Horsley from comment #0)
> Description of problem:
> 
> I ran virt-sparsify on a centos 6.5 virtual machine image, and the new
> sparse image would no longer boot. The bios screen said booting from hard
> disk... and then the cpu went to 100% and stayed there as long as I was
> willing to wait.
> (original and new images were qcow2, partition inside the virtual machine
> was ext4).
> 
> I fixed the problem by connecting a livecd image to the cdrom, booting the
> virtual machine from the livecd, then chrooting into the disk image and
> running grub-install. After that it seemed to boot fine, and everything
> seems to function correctly.

I would humbly recommend virt-rescue for this.

I will look at the rest of this bug later.

Comment 2 Richard W.M. Jones 2014-06-23 08:11:50 UTC
I would need to see:

- the full debugging output of virt-sparsify, ie. virt-sparsify -v -x ...

- the contents of the partition table & MBR before and after

Having said that, virt-sparsify almost by design cannot change
the MBR or boot loader of the disk that it sparsifies.  So I
suspect something else might be going on here, such as your
expanded /dev/vda2 overshadows the bootloader (which would mean it's
only a matter of time before writing some file on vda2 would
trash the bootloader).

Comment 3 Tom Horsley 2014-06-23 21:49:54 UTC
Created attachment 911595 [details]
debug output from virt-sparsify -v -x

I'm afraid this isn't very useful. I'm attaching the debug run output, but I didn't have the original image any longer, so I ran sparsify on the already sparsified image in the hopes that it would reproduce the problem. Unfortunately (or fortunately, depending on how you look at it), the image it generated this time was perfectly fine. No boot clobbering happened. For what it is worth, here is the partition table and MBR dump from the sparsified image that I got to boot earlier by re-running grub-install:

Command (m for help): u
Changing display/entry units to sectors

Command (m for help): p

Disk /dev/vda: 53.7 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders, total 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00083ba6

   Device Boot      Start         End      Blocks   Id  System
/dev/vda2   *        2048   104857599    52427776   83  Linux

Command (m for help): v
Remaining 2047 unallocated 512-byte sectors
[root@centinil ~]# dd if=/dev/vda of=/root/mbr.dd bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 9.5712e-05 s, 5.3 MB/s
[root@centinil ~]# od -Ad -tx1z /root/mbr.dd
0000000 eb 48 90 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e c0  >.H..............<
0000016 fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00  >...|.........!..<
0000032 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07 75  >....8.u........u<
0000048 f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 03 02  >.........|...t..<
0000064 ff 00 00 20 01 00 00 00 00 02 fa 90 90 f6 c2 80  >... ............<
0000080 75 02 b2 80 ea 59 7c 00 00 31 c0 8e d8 8e d0 bc  >u....Y|..1......<
0000096 00 20 fb a0 40 7c 3c ff 74 02 88 c2 52 f6 c2 80  >. ..@|<.t...R...<
0000112 74 54 b4 41 bb aa 55 cd 13 5a 52 72 49 81 fb 55  >tT.A..U..ZRrI..U<
0000128 aa 75 43 a0 41 7c 84 c0 75 05 83 e1 01 74 37 66  >.uC.A|..u....t7f<
0000144 8b 4c 10 be 05 7c c6 44 ff 01 66 8b 1e 44 7c c7  >.L...|.D..f..D|.<
0000160 04 10 00 c7 44 02 01 00 66 89 5c 08 c7 44 06 00  >....D...f.\..D..<
0000176 70 66 31 c0 89 44 04 66 89 44 0c b4 42 cd 13 72  >pf1..D.f.D..B..r<
0000192 05 bb 00 70 eb 7d b4 08 cd 13 73 0a f6 c2 80 0f  >...p.}....s.....<
0000208 84 f0 00 e9 8d 00 be 05 7c c6 44 ff 00 66 31 c0  >........|.D..f1.<
0000224 88 f0 40 66 89 44 04 31 d2 88 ca c1 e2 02 88 e8  >..@f.D.1........<
0000240 88 f4 40 89 44 08 31 c0 88 d0 c0 e8 02 66 89 04  >..@.D.1......f..<
0000256 66 a1 44 7c 66 31 d2 66 f7 34 88 54 0a 66 31 d2  >f.D|f1.f.4.T.f1.<
0000272 66 f7 74 04 88 54 0b 89 44 0c 3b 44 08 7d 3c 8a  >f.t..T..D.;D.}<.<
0000288 54 0d c0 e2 06 8a 4c 0a fe c1 08 d1 8a 6c 0c 5a  >T.....L......l.Z<
0000304 8a 74 0b bb 00 70 8e c3 31 db b8 01 02 cd 13 72  >.t...p..1......r<
0000320 2a 8c c3 8e 06 48 7c 60 1e b9 00 01 8e db 31 f6  >*....H|`......1.<
0000336 31 ff fc f3 a5 1f 61 ff 26 42 7c be 7f 7d e8 40  >1.....a.&B|..}.@<
0000352 00 eb 0e be 84 7d e8 38 00 eb 06 be 8e 7d e8 30  >.....}.8.....}.0<
0000368 00 be 93 7d e8 2a 00 eb fe 47 52 55 42 20 00 47  >...}.*...GRUB .G<
0000384 65 6f 6d 00 48 61 72 64 20 44 69 73 6b 00 52 65  >eom.Hard Disk.Re<
0000400 61 64 00 20 45 72 72 6f 72 00 bb 01 00 b4 0e cd  >ad. Error.......<
0000416 10 ac 3c 00 75 f4 c3 00 00 00 00 00 00 00 00 00  >..<.u...........<
0000432 00 00 00 00 00 00 00 00 a6 3b 08 00 00 00 00 00  >.........;......<
0000448 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 20  >............... <
0000464 21 00 83 fe ff ff 00 08 00 00 00 f8 3f 06 00 00  >!...........?...<
0000480 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0000496 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa  >..............U.<
0000512

If I'm interpreting the MBR sector dump right, the 1st partition table entry is all zero and the 2nd entry points to my /dev/vda2 partition. Don't know if that all zero entry would confuse anyone.

If I get bored sometime, I might try to reproduce the original sequence that caused the problem and record the debug info from that.

Comment 4 Richard W.M. Jones 2014-06-23 22:04:03 UTC
This partition table seems OK.  It would have been a problem
if the other partition table covered the bootloader area
(which is usually "somewhere" between sector 1 & sector 2047,
depending on the guest details.)

If you look at the `libguestfs: trace:' lines in the debug
file, you can see that all virt-sparsify does it to mount
/dev/sda2 and write zeroes to the free space there.  It
doesn't touch the MBR or any other space outside mountable
partitions.

Comment 5 Tom Horsley 2014-06-23 23:00:13 UTC
It certainly seems like nothing could go wrong. Maybe it was cosmic rays and it will never happen again :-). Maybe the final qemu-img convert screwed up somehow. I may indeed try to reproduce the original image again just to see if I can make it happen again.

Comment 6 Tom Horsley 2014-06-24 01:06:28 UTC
Created attachment 911610 [details]
debug output from sparsify that broke booting

I did reinstall centos 6.5 from scratch trying to remember to do everything I did the first time, including creating the 12000 MB swap partition in /dev/vda1 then deleting it and resizing /dev/vda2. At this point, the virtual machine still boots with no problem.

I dug up the partition table and MBR sector from that image:

Command (m for help): u
Changing display/entry units to sectors

Command (m for help): p

Disk /dev/vda: 53.7 GB, 53687091200 bytes
255 heads, 63 sectors/track, 6527 cylinders, total 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00029e00

   Device Boot      Start         End      Blocks   Id  System
/dev/vda2   *        2048   104857599    52427776   83  Linux

[root@testbreak ~]# dd if=/dev/vda of=/root/mbr.dd bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000104564 s, 4.9 MB/s
[root@testbreak ~]# od -Ad -tx1z /root/mbr.dd
0000000 eb 48 90 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e c0  >.H..............<
0000016 fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00  >...|.........!..<
0000032 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07 75  >....8.u........u<
0000048 f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 03 02  >.........|...t..<
0000064 80 00 00 80 c8 ab 3e 05 00 08 fa 90 90 f6 c2 80  >......>.........<
0000080 75 02 b2 80 ea 59 7c 00 00 31 c0 8e d8 8e d0 bc  >u....Y|..1......<
0000096 00 20 fb a0 40 7c 3c ff 74 02 88 c2 52 f6 c2 80  >. ..@|<.t...R...<
0000112 74 54 b4 41 bb aa 55 cd 13 5a 52 72 49 81 fb 55  >tT.A..U..ZRrI..U<
0000128 aa 75 43 a0 41 7c 84 c0 75 05 83 e1 01 74 37 66  >.uC.A|..u....t7f<
0000144 8b 4c 10 be 05 7c c6 44 ff 01 66 8b 1e 44 7c c7  >.L...|.D..f..D|.<
0000160 04 10 00 c7 44 02 01 00 66 89 5c 08 c7 44 06 00  >....D...f.\..D..<
0000176 70 66 31 c0 89 44 04 66 89 44 0c b4 42 cd 13 72  >pf1..D.f.D..B..r<
0000192 05 bb 00 70 eb 7d b4 08 cd 13 73 0a f6 c2 80 0f  >...p.}....s.....<
0000208 84 f0 00 e9 8d 00 be 05 7c c6 44 ff 00 66 31 c0  >........|.D..f1.<
0000224 88 f0 40 66 89 44 04 31 d2 88 ca c1 e2 02 88 e8  >..@f.D.1........<
0000240 88 f4 40 89 44 08 31 c0 88 d0 c0 e8 02 66 89 04  >..@.D.1......f..<
0000256 66 a1 44 7c 66 31 d2 66 f7 34 88 54 0a 66 31 d2  >f.D|f1.f.4.T.f1.<
0000272 66 f7 74 04 88 54 0b 89 44 0c 3b 44 08 7d 3c 8a  >f.t..T..D.;D.}<.<
0000288 54 0d c0 e2 06 8a 4c 0a fe c1 08 d1 8a 6c 0c 5a  >T.....L......l.Z<
0000304 8a 74 0b bb 00 70 8e c3 31 db b8 01 02 cd 13 72  >.t...p..1......r<
0000320 2a 8c c3 8e 06 48 7c 60 1e b9 00 01 8e db 31 f6  >*....H|`......1.<
0000336 31 ff fc f3 a5 1f 61 ff 26 42 7c be 7f 7d e8 40  >1.....a.&B|..}.@<
0000352 00 eb 0e be 84 7d e8 38 00 eb 06 be 8e 7d e8 30  >.....}.8.....}.0<
0000368 00 be 93 7d e8 2a 00 eb fe 47 52 55 42 20 00 47  >...}.*...GRUB .G<
0000384 65 6f 6d 00 48 61 72 64 20 44 69 73 6b 00 52 65  >eom.Hard Disk.Re<
0000400 61 64 00 20 45 72 72 6f 72 00 bb 01 00 b4 0e cd  >ad. Error.......<
0000416 10 ac 3c 00 75 f4 c3 00 00 00 00 00 00 00 00 00  >..<.u...........<
0000432 00 00 00 00 00 00 00 00 00 9e 02 00 00 00 00 00  >................<
0000448 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 20  >............... <
0000464 21 00 83 fe ff ff 00 08 00 00 00 f8 3f 06 00 00  >!...........?...<
0000480 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0000496 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa  >..............U.<
0000512

I then ran:

export TMPDIR=/huge/tmp
virt-sparsify testbreak.qcow2 broken.qcow2

While that was running, I made a hard link to the qcow2 file it was filling with zeroes in the /huge/tmp directory so I could try booting from it and remove the final qemu-img convert from the equation.

Once it finished the zero filling, I tried booting from the fully expanded qcow2 image, and it would not boot. A message about pxe boot I never saw before came out, then it tries to boot from hard disk (I'll attach the screen shot) and the cpu gets pegged at 100%.

Booting from that broken image with a live CD, I dumped the partition table and MBR sector again, and they appear to be identical:

Command (m for help): u
Changing display/entry units to sectors.

Command (m for help): p
Disk /dev/vda: 50 GiB, 53687091200 bytes, 104857600 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00029e00

Device    Boot Start       End   Blocks  Id System
/dev/vda2 *     2048 104857599 52427776  83 Linux

0000000 eb 48 90 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e c0  >.H..............<
0000016 fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00  >...|.........!..<
0000032 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07 75  >....8.u........u<
0000048 f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 03 02  >.........|...t..<
0000064 80 00 00 80 c8 ab 3e 05 00 08 fa 90 90 f6 c2 80  >......>.........<
0000080 75 02 b2 80 ea 59 7c 00 00 31 c0 8e d8 8e d0 bc  >u....Y|..1......<
0000096 00 20 fb a0 40 7c 3c ff 74 02 88 c2 52 f6 c2 80  >. ..@|<.t...R...<
0000112 74 54 b4 41 bb aa 55 cd 13 5a 52 72 49 81 fb 55  >tT.A..U..ZRrI..U<
0000128 aa 75 43 a0 41 7c 84 c0 75 05 83 e1 01 74 37 66  >.uC.A|..u....t7f<
0000144 8b 4c 10 be 05 7c c6 44 ff 01 66 8b 1e 44 7c c7  >.L...|.D..f..D|.<
0000160 04 10 00 c7 44 02 01 00 66 89 5c 08 c7 44 06 00  >....D...f.\..D..<
0000176 70 66 31 c0 89 44 04 66 89 44 0c b4 42 cd 13 72  >pf1..D.f.D..B..r<
0000192 05 bb 00 70 eb 7d b4 08 cd 13 73 0a f6 c2 80 0f  >...p.}....s.....<
0000208 84 f0 00 e9 8d 00 be 05 7c c6 44 ff 00 66 31 c0  >........|.D..f1.<
0000224 88 f0 40 66 89 44 04 31 d2 88 ca c1 e2 02 88 e8  >..@f.D.1........<
0000240 88 f4 40 89 44 08 31 c0 88 d0 c0 e8 02 66 89 04  >..@.D.1......f..<
0000256 66 a1 44 7c 66 31 d2 66 f7 34 88 54 0a 66 31 d2  >f.D|f1.f.4.T.f1.<
0000272 66 f7 74 04 88 54 0b 89 44 0c 3b 44 08 7d 3c 8a  >f.t..T..D.;D.}<.<
0000288 54 0d c0 e2 06 8a 4c 0a fe c1 08 d1 8a 6c 0c 5a  >T.....L......l.Z<
0000304 8a 74 0b bb 00 70 8e c3 31 db b8 01 02 cd 13 72  >.t...p..1......r<
0000320 2a 8c c3 8e 06 48 7c 60 1e b9 00 01 8e db 31 f6  >*....H|`......1.<
0000336 31 ff fc f3 a5 1f 61 ff 26 42 7c be 7f 7d e8 40  >1.....a.&B|..}.@<
0000352 00 eb 0e be 84 7d e8 38 00 eb 06 be 8e 7d e8 30  >.....}.8.....}.0<
0000368 00 be 93 7d e8 2a 00 eb fe 47 52 55 42 20 00 47  >...}.*...GRUB .G<
0000384 65 6f 6d 00 48 61 72 64 20 44 69 73 6b 00 52 65  >eom.Hard Disk.Re<
0000400 61 64 00 20 45 72 72 6f 72 00 bb 01 00 b4 0e cd  >ad. Error.......<
0000416 10 ac 3c 00 75 f4 c3 00 00 00 00 00 00 00 00 00  >..<.u...........<
0000432 00 00 00 00 00 00 00 00 00 9e 02 00 00 00 00 00  >................<
0000448 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 20  >............... <
0000464 21 00 83 fe ff ff 00 08 00 00 00 f8 3f 06 00 00  >!...........?...<
0000480 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
0000496 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa  >..............U.<
0000512

So I have no idea what is actually broken, but whatever it is apparently gets fixed by re-running grub-install. I don't know what all magic grub does (and remember centos 6.5 is still using grub version 1), but something broke somewhere.

Comment 7 Tom Horsley 2014-06-24 01:07:16 UTC
Created attachment 911611 [details]
Here's what the boot screen looks like.

Comment 8 Richard W.M. Jones 2014-06-24 08:09:02 UTC
(In reply to Tom Horsley from comment #6)
> While that was running, I made a hard link to the qcow2 file it was filling
> with zeroes in the /huge/tmp directory so I could try booting from it and
> remove the final qemu-img convert from the equation.
> 
> Once it finished the zero filling, I tried booting from the fully expanded
> qcow2 image, and it would not boot. A message about pxe boot I never saw
> before came out, then it tries to boot from hard disk (I'll attach the
> screen shot) and the cpu gets pegged at 100%.

Instead of hardlinking to intermediate files that virt-sparsify
generates, which you should never be doing and doesn't prove
anything, what happens when you try to boot from the final image?

If the final image does not boot (but the source image does), then
you'll have to find a way to send me the source image, eg uploading
it to a website so I can download it and see what's going on.

Comment 9 Tom Horsley 2014-06-24 10:17:29 UTC
I should have mentioned that: The final image (broken.qcow2 from the command above) behaves identical to the fully expanded zero filled temp image. It won't boot, has the same MBR, and same partition table listed by fdisk. I just tested the temp image first because it was completed sooner and it would remove the final convert step from the list of things that might break :-).

Comment 10 Richard W.M. Jones 2014-06-24 10:44:05 UTC
If you 'xz --best' compress the source image, how big is that?

The only way I can see to proceed on this bug is if you can send
the source image, and I will take a look at what's going on.  You
could upload it to a file sharing website or with more difficulty
I could arrange some kind of FTP site.  If it contains sensitive
data you could email me the URL rather than posting it on the BZ.

Comment 11 Tom Horsley 2014-06-24 12:25:02 UTC
This is actually looking a lot more like something got screwed up by the gparted operations than by sparsifying. I've just tried two other ways to zero fill the image, one using virt-sparsify --in-place, and the other simply booting the virtual machine and doing a cat /dev/zero > /var/tmp/junk till no space is left. In both cases, the resulting image won't boot the exact same way. I'm not sure it is worth try to track down exactly what goes wrong (gparted already warns about possible corruption when moving a bootable partition :-).

Comment 12 Tom Horsley 2014-06-24 21:10:00 UTC
As a final nail in the coffin, I ran grub-install *before* doing the sparsify on the image that breaks, and when I sparsify that, it still boots flawlessly.

I know nothing about the innards of grub, but it must stick something in some no man's land area (or maybe just a pointer to something), and when I overwrote it with zeroes after moving the partition, it barfed. I was booting OK before that because I moved the partition far enough that it didn't get overwritten and was still sitting around in the free space.


Note You need to log in before you can comment on or make changes to this bug.