1354681 – resize of luks on lvm fails with irrecoverable hard drive

Bug 1354681 - resize of luks on lvm fails with irrecoverable hard drive

Summary: resize of luks on lvm fails with irrecoverable hard drive

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	system-storage-manager
Sub Component:
Version:	24
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Jan Tulak
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-07-11 22:49 UTC by morgan read
Modified:	2017-08-08 15:31 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-08 15:31:59 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
full record of commands and output (14.65 KB, text/plain) 2016-07-11 22:49 UTC, morgan read	no flags	Details
ssm resize start to finish (3.15 KB, text/plain) 2016-07-12 16:31 UTC, Chris Murphy	no flags	Details
strace ssm resize (162.96 KB, text/plain) 2016-07-12 16:32 UTC, Chris Murphy	no flags	Details
View All

Description morgan read 2016-07-11 22:49:06 UTC

Created attachment 1178601 [details]
full record of  commands and output

Description of problem:
# ssm resize /dev/mapper/luks-68a97c0e-00b5-4ab8-8628-2fae2605b35f
Traceback (most recent call last):
  File "/usr/bin/ssm", line 48, in <module>
    sys.exit(main.main())
  File "/usr/lib/python3.5/site-packages/ssmlib/main.py", line 1875, in main
    args.func(args)
  File "/usr/lib/python3.5/site-packages/ssmlib/main.py", line 1060, in resize
    ret = args.volume['fs_info'].resize()
  File "/usr/lib/python3.5/site-packages/ssmlib/main.py", line 177, in resize
    return self._get_fs_func("resize", *args, **kwargs)
  File "/usr/lib/python3.5/site-packages/ssmlib/main.py", line 167, in _get_fs_func
    return func(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/ssmlib/main.py", line 230, in extN_resize
    new_size < self.data['fs_size']):
TypeError: unorderable types: NoneType() < int()
[root@morgansmachine ~]# 

Version-Release number of selected component (if applicable):
Unable to boot system in order to discover, but probably-
system-storage-manager-0.4-10.fc24

How reproducible:
I'm not planning on risking another hard drive to discover...

Steps to Reproduce:
1. # ssm resize -s-3G /dev/mapper/luks-a69b434b-c409-4612-a51e-4bb0162cb316
   # ssm resize -s-3G /dev/mapper/luks-d313ea5e-fe14-4967-b11c-ae0e03c348b6
2. # ssm resize -s-3G /dev/fedora_morgansmachine/home
   # ssm resize -s-3G /dev/fedora_morgansmachine/root
3. # ssm resize -s+3G /dev/fedora_morgansmachine/var
   # ssm resize /dev/mapper/luks-68a97c0e-00b5-4ab8-8628-2fae2605b35f

Actual results:
As above

Expected results:
luks volume expands to consume free space in lvm

Additional info:
Attached is the full output of my mechinations

Following the boot failure, following is the output of my trying to correct SSM's media munching via f24 live boot image-
[root@localhost ~]# e2fsck -fy
/dev/mapper/luks-a69b434b-c409-4612-a51e-4bb0162cb316
e2fsck 1.42.13 (17-May-2015)
The filesystem size (according to the superblock) is 51309056 blocks
The physical size of the device is 50522624 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? yes

[root@localhost ~]# resize2fs -fp
/dev/mapper/luks-a69b434b-c409-4612-a51e-4bb0162cb316 50522624
resize2fs 1.42.13 (17-May-2015)
Resizing the filesystem on
/dev/mapper/luks-a69b434b-c409-4612-a51e-4bb0162cb316 to 50522624 (4k)
blocks.
resize2fs: Can't read a block bitmap while trying to resize
/dev/mapper/luks-a69b434b-c409-4612-a51e-4bb0162cb316
Please run 'e2fsck -fy
/dev/mapper/luks-a69b434b-c409-4612-a51e-4bb0162cb316' to fix the
filesystem
after the aborted resize operation.
[root@localhost ~]# e2fsck -fy
/dev/mapper/luks-d313ea5e-fe14-4967-b11c-ae0e03c348b6
e2fsck 1.42.13 (17-May-2015)
The filesystem size (according to the superblock) is 5242368 blocks
The physical size of the device is 4455936 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? yes

[root@localhost ~]# resize2fs -fp
/dev/mapper/luks-d313ea5e-fe14-4967-b11c-ae0e03c348b6 4455936
resize2fs 1.42.13 (17-May-2015)
Resizing the filesystem on
/dev/mapper/luks-d313ea5e-fe14-4967-b11c-ae0e03c348b6 to 4455936 (4k)
blocks.
resize2fs: Can't read a block bitmap while trying to resize
/dev/mapper/luks-d313ea5e-fe14-4967-b11c-ae0e03c348b6
Please run 'e2fsck -fy
/dev/mapper/luks-d313ea5e-fe14-4967-b11c-ae0e03c348b6' to fix the
filesystem
after the aborted resize operation.
[root@localhost ~]#

Comment 1 morgan read 2016-07-11 22:53:37 UTC

Actually, looking at the definitions, severity should really be urgent, but I thought that was a bit self indulgent

Comment 2 morgan read 2016-07-12 11:45:25 UTC

This bug discussed here:
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org/thread/JILYU3Q7G6QT5G5NOZXZFCDAM6CLCKGO/

Comment 3 Chris Murphy 2016-07-12 15:30:23 UTC

There's a lot more than one bug in here. The most serious part of the bug appears to be in the very first step when the LUKS volume was shrunk, ssm literaly shrank only the LUKS volume, not the underlying LV, and not the ext4 file system above it. And that's weird to me because when I point ssm resize to an LV formatted ext4 with no LUKS in between, I get a message about first having to umount the volume, and when I say yes to that, it shrinks ext4 first, then it shrinks the LV.

So there is most definitely a logical flaw here where ssm did not check that this dmcrypt device is an actively mounted volume before changing the LUKS volume size. It's like it just sees LUKS as separate and not part of the whole stack.

The next bug that I think is bad is when the user chooses to resize the LV, while there are scary warnings this might cause data loss, those warnings are wrong. It will cause data loss because the filesystem hasn't been resized yet and ssm knows this in the ssm list output. So it lets the user nerf the file system which is just wrong. If a resize operation is going to destroy a file system, it should be flat out disallowed. Only permit file system destruction when deleting/removing a volume where the warning will be clear that this WILL lead to data loss, rather than the vague it MAY lead to data loss.

Comment 4 Chris Murphy 2016-07-12 16:14:43 UTC

[root@f24s ~]# cryptsetup status test5crypt
/dev/mapper/test5crypt is active and is in use.
  type:    LUKS1
  cipher:  aes-xts-plain64
  keysize: 256 bits
  device:  /dev/mapper/VG-test5
  offset:  4096 sectors
  size:    92270592 sectors
  mode:    read/write
[root@f24s ~]# ssm -v resize -s-3G /dev/mapper/test5crypt
[root@f24s ~]# cryptsetup status test5crypt
/dev/mapper/test5crypt is active and is in use.
  type:    LUKS1
  cipher:  aes-xts-plain64
  keysize: 256 bits
  device:  /dev/mapper/VG-test5
  offset:  4096 sectors
  size:    85979136 sectors
  mode:    read/write
[root@f24s ~]# ssm -v resize -s+3G /dev/mapper/test5crypt
SSM Error (2005): There is not enough space in the pool 'none' to grow volume '/dev/mapper/test5crypt' to size 46135296.0 KB!
[root@f24s ~]# 


a. Despite -v, it completely silently only reduces the LUKS volume which is an actively mounted ext4.
b. It can't be restored back to the original size.

It's a big bug I think.

Comment 5 Chris Murphy 2016-07-12 16:31:29 UTC

Created attachment 1178964 [details]
ssm resize start to finish

This shows creation of LV, LUKS, ext4, mount, ssm resize, umount, then remount fail. The fs totally gets nerfed as a result.

Comment 6 Chris Murphy 2016-07-12 16:32:44 UTC

Created attachment 1178965 [details]
strace ssm resize

Captured a strace ssm resize to see maybe why it's totally missing the check for the file system being mounted since this must fail for a shrink operation.

Comment 7 Chris Murphy 2016-07-12 16:38:23 UTC

[root@f24s ~]# mkfs.xfs /dev/mapper/test6crypt
meta-data=/dev/mapper/test6crypt isize=512    agcount=16, agsize=819152 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=4096   blocks=13106432, imaxpct=25
         =                       sunit=16     swidth=16 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=6399, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@f24s ~]# mount /dev/mapper/test6crypt /mnt/0
[root@f24s ~]# ssm -v resize -s-3G /dev/mapper/test6crypt
[root@f24s ~]#

Obviously that very bad since XFS doesn't support shrink at all. Clearly no fs mount state or fs type check is being done here first. It's really unsafe at any speed when LUKS is involved.

Comment 8 Lukáš Czerner 2016-07-14 09:12:02 UTC

Hello,

I am really sorry about that but indeed it's due to a bug in ssm when it comes to resizing LUKS volumes. This is definitely something I need to fix.

You're also right that the most problematic step was the first step, when the LUKS volume was resized. At that point it was relatively easy to fix, but following steps made it much worse.

Still, it should be possible to revert changes to the lvm volumes by restoring metadata from backups. The problem is that the file system with lvm backups was affected by this as well, however it should still be possible to retrieve the files from the file system without mounting it (in case of ext4 anyway) by using debugfs (from e2fsprogs package) and dump the contents of the backup file outside broken file system - one just has to hope it does not have any data parts on the cut-off part of the file system, but debugfs should warn you if that's the case.

However if I understand it correctly morgan already gave up on retrieving the data ?

-Lukas

Comment 9 morgan read 2016-07-14 09:18:36 UTC

Hi, yes - gave up.  Never mind.

Just for the sake of completeness - referencing ssm-devel list discussion on this:
https://sourceforge.net/p/storagemanager/mailman/storagemanager-devel/?viewmonth=201607&style=threaded

Thanks
Morgan.

Comment 10 Fedora Admin XMLRPC Client 2016-10-24 12:16:02 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 11 Fedora End Of Life 2017-07-25 21:44:41 UTC

This message is a reminder that Fedora 24 is nearing its end of life.
Approximately 2 (two) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 24. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '24'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 24 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 12 Fedora End Of Life 2017-08-08 15:31:59 UTC

Fedora 24 changed to end-of-life (EOL) status on 2017-08-08. Fedora 24 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.