Bug 1278192

Summary: RFE: Add GRUB2 menu option to boot into snapshot or mimic functionality of ZFS Root Pools
Product: Red Hat Enterprise Linux 7 Reporter: Matt Ruzicka <mruzicka>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Default / Unclassified QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: agk, aperotti, b.j.smith, bmr, brsmith, cmarthal, cww, dconsoli, dustymabe, dwysocha, heinzm, hhei, jbrassow, joedward, jstodola, kperrier, lczerner, leiwang, lkuprova, mcsontos, mkardeh, mruzicka, msnitzer, mthacker, ngompa13, okozina, pasik, pbokoc, prajnoha, prockai, swhiteho, thornber, uwe.menges, xuli, yacao, zkabelac
Version: 7.2Keywords: FutureFeature, Reopened
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.176-2.el7 Doc Type: Enhancement
Doc Text:
New "boom" utility for managing LVM snapshot and image boot entries This release adds the "boom" command, which you can use to manage additional boot loader entries on the system. You can use it to create, delete, list, and modify auxiliary boot entries for system snapshots and images. The utility provides a single tool for managing boot menu entries for LVM snapshots; therefore you no longer need to manually edit boot loader configuration files and work with detailed kernel parameters. The tool is provided by the _lvm2-python-boom_ package.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 15:18:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1489841    
Bug Blocks: 1203710, 1298243, 1411715, 1477926    

Description Matt Ruzicka 2015-11-04 21:42:57 UTC
1. Account: Details in private comment - Account has TAM and SRM. 
      
2. What is the nature and description of the request?  

    Customer is used to using Solaris Live Update and ZFS Root Pools, which keeps a copy-on-write snapshot of the system for easy reversion/recovery of the system. This snapshot is presented as a boot option for easy reversion.  Customer is currently investigating LVM snapshots and yum-plugin-fs-snapshot and are open to moving to BTRFS if it comes out of tech preview, but want a solution which is more automated/built-in with easy selection during the boot process.
      
3. Why does the customer need this? (List the business requirements here)  

    Due to the customers internal segmentation/silo'ing, the Linux/Unix team has extremely limited access to their VMware environment.  They are looking for an easy way to revert to a previous state without the time/disruption necessary for a full restore.      

4. How would the customer like to achieve this? (List the functional requirements here)

    They would prefer this mimic the behavior or ZFS Root Pools, but would mostly like the ability to easily boot into a snapshot of a previous version of the file system.  The snapshot could either be created at a configurable interval and/or during updates.
      
5. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.  

    Proposed test would be, configure snapshot interval/enblement, verify snapshot was created, verify GRUG2 menu has been updated with boot from snapshot option, make a change to the system that would not trigger a new snapshot, reboot into snapshot from GRUB2.
      
6. Is there already an existing RFE upstream or in Red Hat Bugzilla?  

    I was not able to find an existing RFE, but I definitely could have missed it.  ReaR does not fully suit their requirements.
      
7. Does the customer have any specific timeline dependencies and which release would they like to target?  
    
    No specific timelines, but understands this would need to wait for RHEL 7.
  
8. Is the sales team involved in this request and do they have any additional input?  

    Not currently, but the Red Hat on-site Dedicated Engineer is part of this request.
      
9. List any affected packages or components.  

    system-storage-manager, snapper, lvm, btrfs?, xfs?, others?
      
10. Would the customer be able to assist in testing this functionality if implemented?  

    I expect they would.

Comment 4 Lukáš Czerner 2016-02-10 11:48:18 UTC
Unfortunately this RFE is way outside of scope of the system-storage-manager which is designed just to set-up and manage storage, not system boot options, so I am going to close this one.

Thanks!
-Lukas

Comment 5 Matt Ruzicka 2016-02-10 21:51:59 UTC
Hi Lukáš,

Is this a case where this RFE should just be attached to a different component?  Any suggestions on where we can take this next since we obviously have a few strategic customers looking for this functionality. Thank you.

Comment 6 Steve Whitehouse 2016-02-11 14:37:34 UTC
I would not encourage them to move to btrfs. XFS + LVM is is better solution, and one which should work today, without needing anything special to be done.

So I'm slightly confused as to what the request actually is - is there actually a missing feature here, or are they just asking us to make it easier to set up, e.g. via an anaconda feature?

That will help us direct the request to the right place.

Comment 7 Matt Ruzicka 2016-02-12 17:22:46 UTC
It may be an ease of use issue and a lack of deep experience with LVM snapshots and yum-plugin-fs-snapshot.  From what I'm reading (haven't been able to test in a lab yet), you'd config yum-plugin-fs-snapshot and it would create the LVM snapshot during the yum process. This part seems fine.  

The problem is in the restore. From what I'm seeing you'd have to do a 'lvconvert --merge ...' to restore.  This implies you can boot to the CLI to restore and I would also guess leaves you in a current or restored only situation.  The customer is looking for a means at GRUB2 or during the boot process, to be able to select the snapshot as the live system if a) the updated system is totally broken, b) they want to easily flip between the previous and updated system states to compare or test performance/functionality/etc, c) roll back to the previous state even if they can't get to the CLI for some reason, d) probalby other stuff I'm not thinking of.

Does this help clarify at all and does XFS+LVM already do this in a way I'm not aware of?

Any other feedback from others who have customers requesting this?

Comment 8 Bryan J Smith 2016-02-24 16:52:47 UTC
Just to second #c5 - if this should be attached to another component, please do so.  E.g., DM-LVM2? GRUB2? Kernel? Other?

Just to second #c7 - repeating from the original, customer case ...

Example requirements (may be incomplete):  
- System boots into an earlier (e.g., snapshot) of the system volumes, like root
- A GRUB or other boot menu to initiate the "alternative volume" boot
- One that is periodically updated, such as with a LV snapshot, minimizing consumption of space (and mitigates the need for dumps/tarballs, other than for nominal backup)
- Mode is exactly like the server (e.g., LV snapshots of system volumes), but allows the actual, original volumes to be restored and/or separate from the data volumes (which could be migrated to another server, being on the SAN)

Some plausible ideas/solutions (far from complete) ...
- Monthly (possibly monthly + weekly + daily) snapshot of system volumes
- Script to add "alternative volume(s)" to GRUB entry/initrd for root LV, modify initrd and /etc/fstab for alternative LVs
- Possible YUM Plugin/trigger to run the script to generate the alternative entry/initrd for alternative volume
- Possibly backup /boot that could also be targeted (in case /boot corrupted), if feasible
- Etc...

Comment 9 Ondrej Kozina 2016-03-23 14:24:47 UTC
I'll review what needs to be done from my point of view. Also, I'll write about lvm2 + xfs (ext4,...) scenario only. Also to keep it less complex I'll skip booting from lvm2 thin provisioning atm (a.k.a /boot on top of thin provisioned LV). For the purpose of this RFE let's say rootfs is installed on top of thin LV (such setup already works and is also supported in Anaconda (?)), but /boot stays to be stand alone partition.

1) grub2 plugin/extension that would generate listing of all rootfs snaphots so that user can select specific snapshot to mount during rootfs mount. AFAIK there's no such plugin for grub2 upstream. (there exists patchset for similar feature to boot btrfs suvolumes/snapshots but also not upstreamed).

2) provided user selected different volume to mount than orgin (current) rootfs LV we need to pass that info to dracut (perhaps via kernel boot params). dracut (the lvm2 module) has to ensure the proper LV gets activated in pre-rootfs phase and all other snapshots/thin LV with the same fs stay deactivated. This should be quite easy to implement.

I'll investigate these two basic requirements and let you know how time demanding it would be to implement both.

I still have some concerns especially with regard to additional features like rollback, how to provide user with some sensible UI, how to link proper kernel to boot with rootfs snapshot, etc but these two steps are required anyway no matter what features we'd eventually add to the rfe.

Also I'd suggest moving this to 7.4 instead of 7.3. There'd be quite many tools involved in the process (lvm2, grub2, dracut, new/extended yum plugin, maybe also snapper to provide better control over snapshot lifetime and automatic timeline snapshotting).

Comment 10 Alasdair Kergon 2016-03-23 21:21:44 UTC
Not to be confused with bug 998710 which deals with /boot on an lvm LV.

This one does not necessarily need changes to grub itself - it could be resolved through code in the initrd.  (grub passes a 'prompt for which root fs to pivot to' option to the initrd, which pulls up a list, asks the user, then pivots to it)

Comment 17 Mark Thacker 2016-11-30 21:29:40 UTC
Per all the previous comments before, I'm marking this as approved, but would like more scoping done on this. Seriously this could make a marked positive impact to our customers if delivered.

Comment 19 Mark Thacker 2017-02-01 19:08:05 UTC
This is determined to be out of scope for RHEL 7.4. Deferring to 7.5.

Comment 26 Bryn M. Reeves 2017-08-30 16:01:32 UTC
You can download the current design documents from GitHub:

  https://github.com/bmr-cymru/snapshot-boot-docs

There is a link to either the PDF download or the GitHub browser based PDF viewer in the README displayed on that bug.

If you have feedback, or would like to get involved in testing please feel free to get in touch here or on GitHub.

Comment 35 Marian Csontos 2017-11-03 12:39:16 UTC
Version 0.8 merged into the tree.

Comment 37 HuijingHei 2017-11-23 08:53:37 UTC
Hi,

I am Hyper-V QE, after install RHEL7.5 on Hyper-V host, and there is a new item "Snapshots", I am not quite sure how to use this, and when I enter "Snapshots", also confused about the error log. Could you help to confirm is this a bug? Or give useful logs when users face the same problem? It would be better if you  give clear steps about how to verify this feature. Thanks!

Version:
grub2-efi-x64-2.02-0.65.el7_4.2.x86_64
Host: 2016 WS, gen2

Steps:
1. Install RHEL-7.5-1107.1
2. Boot vm and check grub menu, there is "Snapshots"

Actual results:
1. Select the "Snapshots", and click the enter button, there is error:

error: file /EFI/redhat/x86_64-efi/blscfg.mod not found
error: can't find command bls_import

Press any key to continue...


Additional info:
Check the "Snapshots"

submenu "Snapshots" {
    insmod blscfg
    bls_import
}

Thanks
hhei

Comment 38 Bryn M. Reeves 2017-11-23 10:36:14 UTC
It's for snapshots created with the 'boom' boot manager: I'll look into making the submenu depend on whether there are any boom boot entries configured (which would mean that you don't see the menu at all unless the tool has been used).

If we can't do that for 7.5 it may be best to require users to manually enable it when required.

You can just delete the file /etc/grub.d/42_boom if this is interfering with other work.

I'll also look into why the BLS module does not seem to be available in the EFI build.

Comment 39 HuijingHei 2017-11-24 02:29:13 UTC
(In reply to Bryn M. Reeves from comment #38)
> It's for snapshots created with the 'boom' boot manager: I'll look into
> making the submenu depend on whether there are any boom boot entries
> configured (which would mean that you don't see the menu at all unless the
> tool has been used).
> 
> If we can't do that for 7.5 it may be best to require users to manually
> enable it when required.
> 
> You can just delete the file /etc/grub.d/42_boom if this is interfering with
> other work.
> 
> I'll also look into why the BLS module does not seem to be available in the
> EFI build.

Thanks for your help!

Comment 40 Bryn M. Reeves 2017-12-01 16:12:54 UTC
I've made some changes for the next build of the lvm2-python-boom package that will address some of the concerns here:

 - The Grub2 menu integration is inactive unless at least one boom entry exists.

 - A switch (BOOM_ENABLE_GRUB) is added to /etc/default/boom to allow the admin
   to completely disable the integration script.

There are also a handful of other improvements, including better examples in the man page, and a fix to call fdatasync() on newly written profile and entry data.

These are in boom-0.8.1 and will hopefully soon be added to the RHEL package.

Comment 47 Corey Marthaler 2018-02-16 20:16:20 UTC
Marking this feature verified with the latest rpms. 

The three items (create, list, delete) mentioned in the qa_ack comment https://bugzilla.redhat.com/show_bug.cgi?id=1278192#c31 work, as well as boot-ability of both thin and thick snapshot volumes. Also we've added regression coverage for boom bugs: bug 1528014, bug 1528018, bug 1540266, bug 1542952, bug 1543188, bug 1543186.


3.10.0-851.el7.x86_64

lvm2-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
lvm2-libs-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
lvm2-cluster-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
lvm2-lockd-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
lvm2-python-boom-0.8.5-4.el7    BUILT: Fri Feb 16 06:37:10 CST 2018
cmirror-2.02.177-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-libs-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-event-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-event-libs-1.02.146-4.el7    BUILT: Fri Feb 16 06:22:31 CST 2018
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 05:07:18 CST 2017


[root@mckinley-01 ~]# lvs -a -o +devices
  LV              VG               Attr       LSize   Pool   Origin Data%  Meta%  Devices          
  home            rhel_mckinley-01 Vwi-aotz-- 402.29g pool00        0.05
  [lvol0_pmspare] rhel_mckinley-01 ewi------- 232.00m                             /dev/sda2(1024)  
  pool00          rhel_mckinley-01 twi-aotz-- 452.29g               0.44   1.32   pool00_tdata(0)  
  [pool00_tdata]  rhel_mckinley-01 Twi-ao---- 452.29g                             /dev/sda2(1082)  
  [pool00_tmeta]  rhel_mckinley-01 ewi-ao---- 232.00m                             /dev/sda2(116869)
  root            rhel_mckinley-01 owi-aotz--  50.00g pool00        3.56
  swap            rhel_mckinley-01 -wi-ao----   4.00g                             /dev/sda2(0)     
  thick_snap1     rhel_mckinley-01 swi-a-s---  40.00m        root   18.77         /dev/sda2(116927)
  thin_snap2      rhel_mckinley-01 Vwi-a-tz--  50.00g pool00 root   3.56

[root@mckinley-01 ~]# boom list
BootID  Version                  Name                            RootDevice                       
acbe6dd 3.10.0-851.el7.x86_64    Red Hat Enterprise Linux Server /dev/rhel_mckinley-01/thick_snap1
ba2a4d1 3.10.0-851.el7.x86_64    Red Hat Enterprise Linux Server /dev/rhel_mckinley-01/thin_snap2 

[root@mckinley-01 ~]# boom show
Boot Entry (boot_id=acbe6dd)
  title thick snapshot 1
  machine-id 842c4d9191274441843ed218350441e5
  version 3.10.0-851.el7.x86_64
  linux /vmlinuz-3.10.0-851.el7.x86_64
  initrd /initramfs-3.10.0-851.el7.x86_64.img
  options root=/dev/rhel_mckinley-01/thick_snap1 ro rd.lvm.lv=rhel_mckinley-01/thick_snap1 rhgb quiet

Boot Entry (boot_id=ba2a4d1)
  title thin snapshot 2
  machine-id 842c4d9191274441843ed218350441e5
  version 3.10.0-851.el7.x86_64
  linux /vmlinuz-3.10.0-851.el7.x86_64
  initrd /initramfs-3.10.0-851.el7.x86_64.img
  options root=/dev/rhel_mckinley-01/thin_snap2 ro rd.lvm.lv=rhel_mckinley-01/thin_snap2 rhgb quiet

# reboot

## Booted thick snap
[root@mckinley-01 ~]# df -h
Filesystem                                 Size  Used Avail Use% Mounted on
/dev/mapper/rhel_mckinley--01-thick_snap1   50G  1.6G   49G   4% /
devtmpfs                                    63G     0   63G   0% /dev
tmpfs                                       63G     0   63G   0% /dev/shm
tmpfs                                       63G  9.7M   63G   1% /run
tmpfs                                       63G     0   63G   0% /sys/fs/cgroup
/dev/sda1                                 1014M  147M  868M  15% /boot
/dev/mapper/rhel_mckinley--01-home         403G   33M  403G   1% /home
tmpfs                                       13G     0   13G   0% /run/user/0
[root@mckinley-01 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-3.10.0-851.el7.x86_64 root=/dev/rhel_mckinley-01/thick_snap1 ro rd.lvm.lv=rhel_mckinley-01/thick_snap1 rhgb quiet

# reboot

## Booted thinp snap
[root@mckinley-01 ~]# df -h
Filesystem                                Size  Used Avail Use% Mounted on
/dev/mapper/rhel_mckinley--01-thin_snap2   50G  1.6G   49G   4% /
devtmpfs                                   63G     0   63G   0% /dev
tmpfs                                      63G     0   63G   0% /dev/shm
tmpfs                                      63G  9.7M   63G   1% /run
tmpfs                                      63G     0   63G   0% /sys/fs/cgroup
/dev/sda1                                1014M  147M  868M  15% /boot
/dev/mapper/rhel_mckinley--01-home        403G   33M  403G   1% /home
tmpfs                                      13G     0   13G   0% /run/user/0
[root@mckinley-01 ~]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-3.10.0-851.el7.x86_64 root=/dev/rhel_mckinley-01/thin_snap2 ro rd.lvm.lv=rhel_mckinley-01/thin_snap2 rhgb quiet

# reboot

## Booted back to original origin volume
[root@mckinley-01 ~]# boom list
BootID  Version                  Name                            RootDevice                       
acbe6dd 3.10.0-851.el7.x86_64    Red Hat Enterprise Linux Server /dev/rhel_mckinley-01/thick_snap1
ba2a4d1 3.10.0-851.el7.x86_64    Red Hat Enterprise Linux Server /dev/rhel_mckinley-01/thin_snap2 

[root@mckinley-01 ~]# boom delete acbe6dd
Deleted 1 entry
[root@mckinley-01 ~]# boom delete ba2a4d1
Deleted 1 entry

Comment 50 errata-xmlrpc 2018-04-10 15:18:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0853

Comment 51 Red Hat Bugzilla 2023-09-14 23:58:40 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days