Red Hat Bugzilla – Bug 1473879
DUD drivers fail to load on mis-matching kernels
Last modified: 2017-09-06 02:12:20 EDT
Created attachment 1302576 [details]
Sample DUD for reference if needed.
Here is something that I want to find out from Redhat.
When our modules are installed via RPM on a kernel other than they were compiled on, modules get installed in weak-updates dir and a symlink is created to those modules from extra dir.
However, this is not the same with DUD. For DUDs to work, we always have to match the kernel version of modules in DUD and the one being installed. This behavior is same since day one. However, I am wondering if that should be the case.
If yes, please help me understand why so. If not, help me find the workaround/fix for this.
Do note, by mis-match kernel I do not mean distant kernels. An example can by RHEL7.4 Snap 2 DUD doesn't work on RHEL7.4 Snap 3 or so.
Well, anaconda's dracut indeed has its own module compatibility check mechanism which relies on RPM providing matching kernel release in its kernel-modules Provides:  (in your example DUD, RPM has the following relevant Provides: "kernel-modules >= 3.10.0-685.el7.x86_64", I assume it can only be used during the installation of Snapshot 5.0 and higher) and not symbol version check. If this check succeeds, then it is hard-coded to extract the module on the running system and try to load it , so I do not readily expect any other issues.
Note, however, that while kernel release is not fixed (before GA), it is also possible that checksums of some non-whitelisted symbols might change, which would also prevent usage of a kernel module that utilizes non-whitelisted symbols with the kernel different from the one it has been built with. After GA such possibility is significantly less probable but still exists.
I've just checked the provided DUD with some recent ISO with kernel 3.10.0-693 and anaconda (or, more precisely, anaconda's dracut) has indeed found and loaded qed* drivers.
Can you try to build your DUD with some older kernel (for example, the one shipped with Snapshot 4, 3.10.0-681) and try to use it during the Snapshot 5 installation? Will the issue persist in this case?
We checked .685 DUD on .693 kernel and it indeed works. So that is inline with what you have mentioned.
Our test teams also confirm that this was not the case in past. Can you help me understand if this behavior was always same for anaconda? And if not, which versions onward this change was included.
(In reply to Sarang Radke from comment #4)
> We checked .685 DUD on .693 kernel and it indeed works. So that is inline
> with what you have mentioned.
> Our test teams also confirm that this was not the case in past. Can you help
> me understand if this behavior was always same for anaconda? And if not,
> which versions onward this change was included.
Sure. Is it possible for you to provide past DUDs and the information regarding which RHEL versions exposed that different behaviour? This would significantly ease the task of locating the possible anaconda change. Meanwhile, I'll try to look through recent anaconda changes in an attempt to guess what may affect the checks applied to DUDs.
We were not able to trace which RHEL distribution exhibited this behavior. The issue is not seen with current set of builds that we are doing.
You can close this bug. Thanks for all the inputs.
Thank you Sarang for thorough investigation. Closing per Comment#7.