Bug 1473879

Summary: DUD drivers fail to load on mis-matching kernels
Product: Red Hat Enterprise Linux 7 Reporter: Sarang Radke <sarang.radke>
Component: driver-update-programAssignee: Eugene Syromiatnikov <esyr>
Status: CLOSED NOTABUG QA Contact: Ziqian SUN (Zamir) <zsun>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.5CC: eriley, esyr, sarang.radke, skozina, zsun
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-06 06:12:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1472889    
Attachments:
Description Flags
Sample DUD for reference if needed. none

Description Sarang Radke 2017-07-22 00:33:04 UTC
Created attachment 1302576 [details]
Sample DUD for reference if needed.

Hi,

Here is something that I want to find out from Redhat.
 
When our modules are installed via RPM on a kernel other than they were compiled on, modules get installed in weak-updates dir and a symlink is created to those modules from extra dir.
 
However, this is not the same with DUD. For DUDs to work, we always have to match the kernel version of modules in DUD and the one being installed. This behavior is same since day one. However, I am wondering if that should be the case.
 
If yes, please help me understand why so. If not, help me find the workaround/fix for this.

Do note, by mis-match kernel I do not mean distant kernels. An example can by RHEL7.4 Snap 2 DUD doesn't work on RHEL7.4 Snap 3 or so.

Comment 2 Eugene Syromiatnikov 2017-07-22 21:08:35 UTC
Well, anaconda's dracut indeed has its own module compatibility check mechanism which relies on RPM providing matching kernel release in its kernel-modules Provides: [1] (in your example DUD, RPM has the following relevant Provides: "kernel-modules >= 3.10.0-685.el7.x86_64", I assume it can only be used during the installation of Snapshot 5.0 and higher) and not symbol version check. If this check succeeds, then it is hard-coded to extract the module on the running system and try to load it [2], so I do not readily expect any other issues.

Note, however, that while kernel release is not fixed (before GA), it is also possible that checksums of some non-whitelisted symbols might change, which would also prevent usage of a kernel module that utilizes non-whitelisted symbols with the kernel different from the one it has been built with. After GA such possibility is significantly less probable but still exists.

[1] https://github.com/rhinstaller/anaconda/blob/master/utils/dd/dd_list.c#L113
[2] https://github.com/rhinstaller/anaconda/blob/master/dracut/driver_updates.py#L350

Comment 3 Eugene Syromiatnikov 2017-07-25 13:49:22 UTC
I've just checked the provided DUD with some recent ISO with kernel 3.10.0-693 and anaconda (or, more precisely, anaconda's dracut) has indeed found and loaded qed* drivers. 

Can you try to build your DUD with some older kernel (for example, the one shipped with Snapshot 4, 3.10.0-681) and try to use it during the Snapshot 5 installation? Will the issue persist in this case?

Comment 4 Sarang Radke 2017-08-02 18:03:40 UTC
Eugene,

We checked .685 DUD on .693 kernel and it indeed works. So that is inline with what you have mentioned.

Our test teams also confirm that this was not the case in past. Can you help me understand if this behavior was always same for anaconda? And if not, which versions onward this change was included.

Comment 5 Eugene Syromiatnikov 2017-08-02 19:14:21 UTC
(In reply to Sarang Radke from comment #4)
> Eugene,
> 
> We checked .685 DUD on .693 kernel and it indeed works. So that is inline
> with what you have mentioned.
> 
> Our test teams also confirm that this was not the case in past. Can you help
> me understand if this behavior was always same for anaconda? And if not,
> which versions onward this change was included.

Sure. Is it possible for you to provide past DUDs and the information regarding which RHEL versions exposed that different behaviour? This would significantly ease the task of locating the possible anaconda change. Meanwhile, I'll try to look through recent anaconda changes in an attempt to guess what may affect the checks applied to DUDs.

Comment 7 Sarang Radke 2017-09-05 17:22:05 UTC
Hello Eugene/Stanislav,

We were not able to trace which RHEL distribution exhibited this behavior. The issue is not seen with current set of builds that we are doing.

You can close this bug. Thanks for all the inputs.

Comment 8 Stanislav Kozina 2017-09-06 06:12:20 UTC
Thank you Sarang for thorough investigation. Closing per Comment#7.