Bug 1473879 - DUD drivers fail to load on mis-matching kernels
DUD drivers fail to load on mis-matching kernels
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: driver-update-program (Show other bugs)
7.5
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Eugene Syromiatnikov
Ziqian SUN (Zamir)
:
Depends On:
Blocks: 1472889
  Show dependency treegraph
 
Reported: 2017-07-21 20:33 EDT by Sarang Radke
Modified: 2017-09-06 02:12 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-09-06 02:12:20 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Sample DUD for reference if needed. (6.04 MB, application/octet-stream)
2017-07-21 20:33 EDT, Sarang Radke
no flags Details

  None (edit)
Description Sarang Radke 2017-07-21 20:33:04 EDT
Created attachment 1302576 [details]
Sample DUD for reference if needed.

Hi,

Here is something that I want to find out from Redhat.
 
When our modules are installed via RPM on a kernel other than they were compiled on, modules get installed in weak-updates dir and a symlink is created to those modules from extra dir.
 
However, this is not the same with DUD. For DUDs to work, we always have to match the kernel version of modules in DUD and the one being installed. This behavior is same since day one. However, I am wondering if that should be the case.
 
If yes, please help me understand why so. If not, help me find the workaround/fix for this.

Do note, by mis-match kernel I do not mean distant kernels. An example can by RHEL7.4 Snap 2 DUD doesn't work on RHEL7.4 Snap 3 or so.
Comment 2 Eugene Syromiatnikov 2017-07-22 17:08:35 EDT
Well, anaconda's dracut indeed has its own module compatibility check mechanism which relies on RPM providing matching kernel release in its kernel-modules Provides: [1] (in your example DUD, RPM has the following relevant Provides: "kernel-modules >= 3.10.0-685.el7.x86_64", I assume it can only be used during the installation of Snapshot 5.0 and higher) and not symbol version check. If this check succeeds, then it is hard-coded to extract the module on the running system and try to load it [2], so I do not readily expect any other issues.

Note, however, that while kernel release is not fixed (before GA), it is also possible that checksums of some non-whitelisted symbols might change, which would also prevent usage of a kernel module that utilizes non-whitelisted symbols with the kernel different from the one it has been built with. After GA such possibility is significantly less probable but still exists.

[1] https://github.com/rhinstaller/anaconda/blob/master/utils/dd/dd_list.c#L113
[2] https://github.com/rhinstaller/anaconda/blob/master/dracut/driver_updates.py#L350
Comment 3 Eugene Syromiatnikov 2017-07-25 09:49:22 EDT
I've just checked the provided DUD with some recent ISO with kernel 3.10.0-693 and anaconda (or, more precisely, anaconda's dracut) has indeed found and loaded qed* drivers. 

Can you try to build your DUD with some older kernel (for example, the one shipped with Snapshot 4, 3.10.0-681) and try to use it during the Snapshot 5 installation? Will the issue persist in this case?
Comment 4 Sarang Radke 2017-08-02 14:03:40 EDT
Eugene,

We checked .685 DUD on .693 kernel and it indeed works. So that is inline with what you have mentioned.

Our test teams also confirm that this was not the case in past. Can you help me understand if this behavior was always same for anaconda? And if not, which versions onward this change was included.
Comment 5 Eugene Syromiatnikov 2017-08-02 15:14:21 EDT
(In reply to Sarang Radke from comment #4)
> Eugene,
> 
> We checked .685 DUD on .693 kernel and it indeed works. So that is inline
> with what you have mentioned.
> 
> Our test teams also confirm that this was not the case in past. Can you help
> me understand if this behavior was always same for anaconda? And if not,
> which versions onward this change was included.

Sure. Is it possible for you to provide past DUDs and the information regarding which RHEL versions exposed that different behaviour? This would significantly ease the task of locating the possible anaconda change. Meanwhile, I'll try to look through recent anaconda changes in an attempt to guess what may affect the checks applied to DUDs.
Comment 7 Sarang Radke 2017-09-05 13:22:05 EDT
Hello Eugene/Stanislav,

We were not able to trace which RHEL distribution exhibited this behavior. The issue is not seen with current set of builds that we are doing.

You can close this bug. Thanks for all the inputs.
Comment 8 Stanislav Kozina 2017-09-06 02:12:20 EDT
Thank you Sarang for thorough investigation. Closing per Comment#7.

Note You need to log in before you can comment on or make changes to this bug.