Bug 1752902

Summary: [regression] kmod 20-25 runs much slower than 20-23 due to traversing extra folders of non-existent kernels
Product: Red Hat Enterprise Linux 7 Reporter: Oleksandr Natalenko <onatalen>
Component: kmodAssignee: Yauheni Kaliuta <ykaliuta>
Status: CLOSED ERRATA QA Contact: Ziqian SUN (Zamir) <zsun>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.7CC: casl, ian.croft2, skozina
Target Milestone: rcKeywords: Regression, WorkAround
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1755196 (view as bug list) Environment:
Last Closed: 2020-03-31 20:06:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1710953    
Attachments:
Description Flags
proposed implementation none

Description Oleksandr Natalenko 2019-09-17 14:12:18 UTC
Based on the support case 02464164.

The customer uses 3rd-party module "openafs" since early RHEL7 days. That way, over time, with each update, it leaves its traces under /lib/modules/`uname r/extra folder.

With kmod 20-25 update (BZ 1643299 I believe), each new kernel update takes enormous amount of time due to the fact that now weak-modules traverses all the folders from previously installed kernels even if those kernels do not exist anymore.

Practically, what we see is this:

===
…
/lib/modules/3.10.0-123.13.2.el7.x86_64/extra/openafs/openafs.ko
/lib/modules/3.10.0-123.13.2.el7.x86_64/extra/openafs
/lib/modules/3.10.0-123.13.2.el7.x86_64/extra
/lib/modules/3.10.0-123.13.2.el7.x86_64
…
===

(and the same almost for each kernel the customer ever installed)

And weak-modules gets it all (a brief snippet of debug output):

===
…
weak module for openafs.ko already exists for kernel 3.10.0-1062.el7.x86_64, update case?
Module openafs.ko from kernel 3.10.0-123.8.1.el7.x86_64 is not compatible with kernel 3.10.0-1062.el7.x86_64 in symbols:  truncate_inode_pages __d_drop d_instantiate page_put_link key_alloc noop_fsync lookup_one_len d_make_root invalidate_mapping_pages unregister_key_type d_splice_alias key_instantiate_and_link register_key_type path_put shrink_dcache_parent d_prune_aliases d_drop page_follow_link_light mntput key_put d_move page_readlink d_rehash kern_path key_validate flock_lock_file_wait truncate_setsize generic_read_dir have_submounts keyring_search names_cachep d_find_alias dput test_set_page_writeback dget_parent mntget d_invalidate d_path
Removing compatible module openafs.ko from kernel 3.10.0-1062.el7.x86_64
…
===

This takes over 5 mins to complete. With "empty" folders removed (except the one that contains actual module to be referenced by a weak update mechanism and two of currently installed kernel) it takes only about 30 seconds.

Clearly, weak-modules should avoid touching non-existent kernels. The issue is not "openafs"-specific.

This was discussed with ykaliuta via IRC, and now I'm framing our discussion and my investigation with the customer into this BZ.

Comment 2 Oleksandr Natalenko 2019-09-17 14:14:15 UTC
And, oh yeah, kmod 20-26 (BZ 1695763 I suppose) doesn't make any difference for the customer's case.

Comment 4 Ziqian SUN (Zamir) 2019-09-18 11:16:28 UTC
I'm setting qa_ack+ for quicker proceeding before actually reproducing.

Hi Oleksandr,

Is the customer willing to test the fix or providing some info how I can get the openafs module?

Thanks.

Comment 5 Oleksandr Natalenko 2019-09-18 11:26:20 UTC
Hi.

(In reply to Ziqian SUN (Zamir) from comment #4)
> Is the customer willing to test the fix or providing some info how I can get
> the openafs module?

Yes, so far the customer is coöperative, so I'd say they will be able to check the fix.

As I've mentioned in the BZ description, the issue is not specific to openafs module, and can be reproduced with any 3rd-party module as long as it is left as a residue from previous kernel installations.

If you'd like to stick to openafs specifically, though, it can be taken from here: [1].

[1] https://www.openafs.org/release/openafs-1.6.22.4.html

Comment 7 Yauheni Kaliuta 2019-09-18 20:29:14 UTC
> Clearly, weak-modules should avoid touching non-existent kernels. 

Not exactly. Having "extra" module compiled for one kernel but providing weak link to another is a valid usecase, even if the original kernel is not in the system (not anymore or even has not ever been and the module is installed separately from a package).

But current logic is to check all the extra/ kernels from oldest to the newest trying to install the weak links. As the result it ends up with the latest compatible weak module installed.

For that usecase the logic should be reverted. In sense, that the kernels should be checked from newest to oldest and if the module was already installed from a newer version, it should be skipped (and if the kernel provides only already installed modules, it should be skipped completely).

Implementing it in bash with the current grouping is a bit tricky for me, but hopefully I'll not break other usecases.

Comment 8 Oleksandr Natalenko 2019-09-19 06:10:54 UTC
> Not exactly

I totally get it. This is why:

> except the one that contains actual module to be referenced by a weak update mechanism and two of currently installed kernel

Comment 10 Yauheni Kaliuta 2019-09-20 16:42:31 UTC
Created attachment 1617277 [details]
proposed implementation

I have not tested it too much, but it passes my local testsuite.

Comment 11 Charles Slivkoff 2019-09-20 17:38:02 UTC
This appears to work in my test case.


$ time ./weak-modules --dry-run --add-kernel --verbose
weak module for openafs.ko already exists for kernel 3.10.0-1062.el7.x86_64, update case?
Module openafs.ko from kernel 3.10.0-862.el7.x86_64 is compatible with kernel 3.10.0-1062.el7.x86_64
/sbin/depmod -C /tmp/weak-modules.anp88V/depmod.conf -ae -F /boot/System.map-3.10.0-1062.el7.x86_64 3.10.0-1062.el7.x86_64

real	0m30.828s
user	0m20.119s
sys	0m6.739s

Comment 23 errata-xmlrpc 2020-03-31 20:06:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1142