Bug 2143277

Summary: [storage] Leapp can fail when there are too many LV partitions
Product: Red Hat Enterprise Linux 7 Reporter: Christophe Besson <cbesson>
Component: leapp-repositoryAssignee: Leapp Notifications Bot <leapp-notifications-bot>
Status: NEW --- QA Contact: upgrades-and-conversions
Severity: high Docs Contact:
Priority: high    
Version: 7.9CC: jcastran, nico.van.roijen, pstodulk
Target Milestone: rcKeywords: Reproducer
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Christophe Besson 2022-11-16 14:15:41 UTC
Description of problem:
Leapp crashes when there are too many LV partitions.
Customer had at least 55 partitions which were not useful for the IPU.
Commenting them out from fstab allows to go ahead.

Version-Release number of selected component (if applicable):
leapp-upgrade-el7toel8-0.17.0-1.el7_9.noarch

How reproducible:
Always

Steps to Reproduce:
1/ create a sparse file, it will be used as a temporary block device (here of 20GB)
# dd if=/dev/zero of=/root/block seek=40M count=1

2/ create 100 LV of 100MB under the VG "test".
# losetup /dev/loop0 /root/block
# pvcreate /dev/loop0
# vgcreate test /dev/loop0
# for i in $(seq 1 100); do lvcreate -L 100M -n lv$i test; done
# vgchange -ay test

3/ format them (here as ext3 like the customer, but I guess other fstype give the same symptoms)
# for i in $(seq 1 100); do mkfs.ext3 -F /dev/test/lv$i; done
# for i in $(seq 1 100); do mkdir /srv/fs$i; done
# for i in $(seq 1 100); do echo "/dev/test/lv$i /srv/fs$i ext3 defaults 0 0" >> /etc/fstab; done
# mount -a

4/ run a `leapp preupgrade`


Actual results:
2022-11-16 05:48:15.525 DEBUG    PID: 24941 leapp.workflow.TargetTransactionFactsCollection.target_userspace_creator: External command has started: ['mount', '-t', 'overlay', 'overlay2', '-o', 'lowerdir=/srv/fs45,upperdir=/var/lib/leapp/scratch/mounts/root_srv_fs45/upper,workdir=/var/lib/leapp/scratch/mounts/root_srv_fs45/work', '/var/lib/leapp/scratch/mounts/root_srv_fs45/root_srv_fs45']
2022-11-16 05:48:15.544 DEBUG    PID: 24941 leapp.workflow.TargetTransactionFactsCollection.target_userspace_creator: External command has finished: ['mount', '-t', 'overlay', 'overlay2', '-o', 'lowerdir=/srv/fs45,upperdir=/var/lib/leapp/scratch/mounts/root_srv_fs45/upper,workdir=/var/lib/leapp/scratch/mounts/root_srv_fs45/work', '/var/lib/leapp/scratch/mounts/root_srv_fs45/root_srv_fs45']
2022-11-16 05:48:15.549 DEBUG    PID: 24941 leapp.workflow.TargetTransactionFactsCollection.target_userspace_creator: External command has started: ['rm', '-rf', u'/var/lib/leapp/scratch/mounts/root_/system_overlay/srv/fs45']
Process Process-464:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 72, in _do_run
    actor_instance.run(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/leapp/actors/__init__.py", line 289, in run
    self.process(*args)
  File "/usr/share/leapp-repository/repositories/system_upgrade/common/actors/targetuserspacecreator/actor.py", line 52, in process
    userspacegen.perform()
  File "/usr/lib/python2.7/site-packages/leapp/utils/deprecation.py", line 42, in process_wrapper
    return target_item(*args, **kwargs)
  File "/usr/share/leapp-repository/repositories/system_upgrade/common/actors/targetuserspacecreator/libraries/userspacegen.py", line 671, in perform
    xfs_info=indata.xfs_info) as overlay:
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/share/leapp-repository/repositories/system_upgrade/common/libraries/overlaygen.py", line 229, in create_source_overlay
    cleanup_scratch(scratch_dir, mounts_dir)
  File "/usr/share/leapp-repository/repositories/system_upgrade/common/libraries/overlaygen.py", line 118, in cleanup_scratch
    api.current_logger().debug('Cleaning up mounts')
  File "/usr/lib64/python2.7/logging/__init__.py", line 1137, in debug
    self._log(DEBUG, msg, args, **kwargs)
  File "/usr/lib64/python2.7/logging/__init__.py", line 1268, in _log
    self.handle(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 1278, in handle
    self.callHandlers(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 1318, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 749, in handle
    self.emit(record)
  File "/usr/lib/python2.7/site-packages/leapp/logger/__init__.py", line 40, in emit
    self._do_emit(log_data)
  File "/usr/lib/python2.7/site-packages/leapp/logger/__init__.py", line 45, in _do_emit
    Audit(**log_data).store()
  File "/usr/lib/python2.7/site-packages/leapp/utils/audit/__init__.py", line 87, in store
    with get_connection(db) as connection:
  File "/usr/lib/python2.7/site-packages/leapp/utils/audit/__init__.py", line 73, in get_connection
    return create_connection(cfg.get('database', 'path'))
  File "/usr/lib/python2.7/site-packages/leapp/cli/commands/upgrade/util.py", line 26, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/leapp/utils/audit/__init__.py", line 60, in create_connection
    return _initialize_database(sqlite3.connect(path))
OperationalError: unable to open database file


=========================================================================================================
Actor target_userspace_creator unexpectedly terminated with exit code: 1 - Please check the above details

Expected results:
No crash.
Maybe an inhibitor telling to comment out partitions which are definitely not required during the IPU?
(that would also help in the XFS case where many ext4 images are created to circumvent an old xfs issue)

Additional info:
After the crash, partitions in /var/lib/leapp/scratch are still there.

To clean up the system, lazy-unmount those overlays:
# for mp in `mount | awk '/leapp.scratch/ {print $1}'`; do umount -vl $mp; done
# rm -rf /var/lib/leapp/*

Comment 3 Christophe Besson 2022-11-17 10:33:33 UTC
Just wanted to confirm the issue can also be observed with ext4 or xfs.
The issue does not occur with 8 LVs on this simulated block device, I didn't determine the limit (KO also with 60 LVs).

Comment 4 Petr Stodulka 2023-05-15 09:27:01 UTC
This could be possibly handled in future when we introduce cofiguration files leapp actors, so user could specifcy which partitions could be ignored for mounting, keeping the responsibility users in case of any further errors occurs during the DNF transaction.

Possibly this could be also improved by a check on which partitions does not contain any file tracked by RPM. But we do not want to go this way as such a check would affect the performance significantly, impacting many more users, so the prefered solution is the 1st one. However, not sure the feature will be delivered in RHEL 7 - being honest here, IPU 8 -> 9 has better chance to have the feature implemented. Keeping opened for RHEL 7 still for the planning.