Bug 1211405
Summary: | sfdisk dump/restore alters partition table on some disks (at least Fedora cloud images) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> |
Component: | util-linux | Assignee: | Karel Zak <kzak> |
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | low | Docs Contact: | |
Priority: | unspecified | ||
Version: | 26 | CC: | awilliam, dustymabe, jonathan, kzak |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-05-29 11:57:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Adam Williamson
2015-04-13 23:02:13 UTC
The diff I pasted above is actually from a random VM's hard disk as I ran the test on the wrong disk...but it still illustrates the problem. I confirmed I do see a difference between 'before' and 'after' when running the same test on the right disk (a Cloud image). (In reply to awilliam from comment #0) > [root@f22sfdisk ~]# file -s /dev/vda > /dev/vda: DOS/MBR boot sector; partition 1 : ID=0x83, active, start-CHS > (0x0,32,33), end-CHS (0x17e,146,20), startsector 2048, 6144000 sectors > > # After sfdisk dump/restore > [root@f22sfdisk ~]# file -s /dev/vda > /dev/vda: DOS/MBR boot sector; partition 1 : ID=0x83, active, start-CHS > (0x2,0,33), end-CHS (0x3d1,4,20), startsector 2048, 6144000 sectors Do you want to say that the difference is in CHS addresses? Frankly I don't care about CHS, and plan is to drop it in long term. Important is that start and size in sectors is the same. Karel: I've no idea, you're the expert. :) I'm just logging the differences. The practical impact, however, in at least some cases, is serious: it stops the system booting. Take a look at https://bugzilla.redhat.com/show_bug.cgi?id=1147998#c0 , particularly the two xxd dumps. The first contains some bootloader code whose origins seem slightly mysterious, but we think it might be parted. That's the boot code you get after running an anaconda install to a completely clean disk with syslinux as the bootloader. Whatever it is, it is capable of booting the system - *until* an sfdisk dump/restore. Once an sfdisk dump/restore has happened, the system no longer boots. The 'fix' for #1147998 was really more of a workaround: we added lines in the cloud kickstarts' %post sections that do this: dd if=/usr/share/syslinux/mbr.bin of=/dev/vda that installs some boot code provided by syslinux. That's the second dump in https://bugzilla.redhat.com/show_bug.cgi?id=1147998#c0 . That boot code is apparently more sophisticated, and continues to work after whatever sfdisk does to the partition table. But it seems reasonable to me to characterize this as a workaround for the sfdisk bug, not really a *fix*. It doesn't seem correct for a straight dump/restore with sfdisk to stop the system booting. (In reply to awilliam from comment #3) > The practical impact, however, in at least some cases, is serious: it stops > the system booting. > > Take a look at https://bugzilla.redhat.com/show_bug.cgi?id=1147998#c0 , but this is already reported angainst utils-linux as bug #1210428 and it's already fixed by util-linux-2.26.1-1.fc22 (the update fixed two bugs: 1/ the problem with boot flag; 2/ the problem with erased boot sector.) So, do you report something else, or we duplicate #1210428 here? I have lost :-) > The 'fix' for #1147998 was really more of a workaround: we added lines in > the cloud kickstarts' %post sections that do this: > > dd if=/usr/share/syslinux/mbr.bin of=/dev/vda I guess this workaround is unnecessary with util-linux-2.26.1-1.fc22, try it. #1210428 was never the same bug as #1147998; they just have similar *effects*. The boot sector deletion and boot flag clearing issues were new in 2.26. The partition table modification issue already existed in 2.25 (and probably earlier). "I guess this workaround is unnecessary with util-linux-2.26.1-1.fc22, try it." It is not. We had some trouble getting the images to build, so dgilmore took the 'dd' out of %post , and we found that when he did that, #1147998 immediately came back. There was no issue with the boot flag and the boot sector was not zero'ed, but systems still did not reboot. He then managed to get an image to build with the 'dd' line back in %post, and rebooting works. So the cause of #1147998 has remained in util-linux all along, and we still need the dd in %post to work around it. Still not sure if the issue is sfdisk: # wget https://dl.fedoraproject.org/pub/alt/stage/22_Beta_RC1/Cloud-Images/x86_64/Images/Fedora-Cloud-Base-22_Beta-20150407.x86_64.qcow2 # modprobe nbd # qemu-nbd -c /dev/nbd0 Fedora-Cloud-Base-22_Beta-20150407.x86_64.qcow2 # xxd -l512 /dev/nbd0 > /tmp/before # sfdisk --dump /dev/nbd0 > dump # sfdisk /dev/nbd0 --force < dump # xxd -l512 /dev/nbd0 > /tmp/after # md5sum /tmp/before /tmp/after 22444b10a0a07dc7dde3bddd6bb3c1e3 /tmp/before 22444b10a0a07dc7dde3bddd6bb3c1e3 /tmp/after .. so no change. I'm able to reproduce the problem only with sfdisk 2.26 (= without bugfix for bug #1210428). I can try reproducer from comment #0 but I have real doubts that sfdisk behaves differently within VM (well, I guess VM image does not contain the broken v2.26). Hi Karel, I am using https://kojipkgs.fedoraproject.org//work/tasks/648/9470648/Fedora-Cloud-Base-22_Beta-20150411.x86_64.qcow2 and booting it on openstack. I first disable growpart with the following user-data: #cloud-config growpart: mode: off ignore_growroot_disabled: false and then I boot an instance, log in, obtain root, and install vim-common/file rpms. After that here are the exact commands I run and the output obtained: [root@f22sfdisk2 ~]# rpm -q util-linux util-linux-2.26.1-1.fc22.x86_64 [root@f22sfdisk2 ~]# [root@f22sfdisk2 ~]# xxd -l512 /dev/vda > before [root@f22sfdisk2 ~]# sfdisk -d /dev/vda > mbr.dump [root@f22sfdisk2 ~]# file -s /dev/vda /dev/vda: DOS/MBR boot sector; partition 1 : ID=0x83, active, start-CHS (0x0,32,33), end-CHS (0x17e,146,20), startsector 2048, 6144000 sectors [root@f22sfdisk2 ~]# sfdisk --force /dev/vda < mbr.dump Checking that no-one is using this disk right now ... FAILED This disk is currently in use - repartitioning is probably a bad idea. Umount all file systems, and swapoff all swap partitions on this disk. Use the --no-reread flag to suppress this check. Disk /dev/vda: 20 GiB, 21474836480 bytes, 41943040 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x28824fd4 Old situation: Device Boot Start End Sectors Size Id Type /dev/vda1 * 2048 6146047 6144000 3G 83 Linux >>> Script header accepted. >>> Script header accepted. >>> Script header accepted. >>> Script header accepted. >>> Created a new DOS disklabel with disk identifier 0x28824fd4. Created a new partition 1 of type 'Linux' and of size 3 GiB. /dev/vda2: New situation: Device Boot Start End Sectors Size Id Type /dev/vda1 * 2048 6146047 6144000 3G 83 Linux The partition table has been altered. Calling ioctl() to re-read partition table. Re-reading the partition table failed.: Device or resource busy The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8). Syncing disks. [root@f22sfdisk2 ~]# [root@f22sfdisk2 ~]# xxd -l512 /dev/vda > after [root@f22sfdisk2 ~]# diff before after 28,29c28,29 < 00001b0: 0000 0000 0000 0000 d44f 8228 0000 8020 .........O.(... < 00001c0: 2100 8392 547e 0008 0000 00c0 5d00 0000 !...T~......]... --- > 00001b0: 0000 0000 0000 0000 d44f 8228 0000 8000 .........O.(.... > 00001c0: 2102 8304 d4d1 0008 0000 00c0 5d00 0000 !...........]... [root@f22sfdisk2 ~]# [root@f22sfdisk2 ~]# file -s /dev/vda /dev/vda: DOS/MBR boot sector; partition 1 : ID=0x83, active, start-CHS (0x2,0,33), end-CHS (0x3d1,4,20), startsector 2048, 6144000 sectors I forgot to say this before. The operation performed in the previous comment renders the instance unbootable (won't reboot or normal boot successfully). kzak, have you been able to recreate the issue? Is there anything wrong with my reproducer that you can see? Marking as needinfo. Frankly, "booting it on openstack" is not too useful reproducer for me ;-) (As low-level system developer I do not use openstack and if I need virtual machine/container I use qemu/libvirt/systemd-nspawn/etc. ...I don't plan to study and debug openstack for this one issue.) It would be nice to have step by step reproducer. Note that all works as expected with qemu-nbd (comment #6). Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. this still looks to be valid. recently there's been a bug with cloud image composes which prevents %post from being run - so the workaround for this bug (which is *still* in cloud %post) was not getting run. And, indeed, the images don't have the right MBR contents and don't reach a bootloader on boot. This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle. Changing version to '26'. This message is a reminder that Fedora 26 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '26'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 26 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 26 changed to end-of-life (EOL) status on 2018-05-29. Fedora 26 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |