Bug 868409

Summary: Recent kernel-headers update broke builds against libnl-1.1
Product: [Fedora] Fedora Reporter: Stephen Gallagher <sgallagh>
Component: libnlAssignee: Dan Williams <dcbw>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: dcbw, gansalmon, itamar, jhrozek, jonathan, kernel-maint, laine, madhu.chinakonda, nhorman, rkhan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-17 14:31:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
RPM Build log for SSSD
none
Mock root log for SSSD none

Description Stephen Gallagher 2012-10-19 18:39:04 UTC
Created attachment 630206 [details]
RPM Build log for SSSD

Description of problem:
I discussed this with the kernel guys during the meeting today in #fedora-meeting. The general sense is that this is fallout from the UAPI work.

I discovered this because our nightly auto-builder for SSSD began to fail today suddenly.

Version-Release number of selected component (if applicable):
kernel-3.7.0-0.rc1.git2.1.fc19

How reproducible:


Steps to Reproduce:
1. Check-out SSSD's git master
2. Install necessary build dependencies (yum-builddep sssd)
3. Run 'make rpms'
  
Actual results:
SSSD build fails with errors trying to include netlink.h

Expected results:
SSSD should build successfully as it does in F18 and lower.

Additional info:

(02:27:11 PM) sgallagh: I think kernel 3.7.0-rc1.git2.1 is breaking builds linking against libnl-1.1
(02:27:13 PM) jwb: it will mean that the modules built via RPM get signed twice, once during modules_install, once after rpm debuginfo has removed that
(02:27:23 PM) pjones: jwb: probably means I should spend some time this afternoon looking at revocation
(02:27:30 PM) jwb: but it still works
(02:27:52 PM) jwb: things are looking good, and we'll likely be carrying no additional modsign patches in rawhide by hopefully -rc2
(02:27:55 PM) jwb: pjones, er... why?
(02:28:03 PM) davej: sgallagh: got a link to a failure?
(02:28:09 PM) pjones: jwb: so we can honor certs in db and dbx?
(02:28:10 PM) pjones: and mok
(02:28:16 PM) jwb: pjones, oh, that part
(02:28:17 PM) sgallagh: davej: http://www.fpaste.org/jtXb/
(02:28:31 PM) jwb: pjones, yeah, sure.  i haven't gotten to the SB aspects of modsign yet
(02:28:56 PM) jwb: sgallagh, that might be related to the UAPI work
(02:29:13 PM) jwb: i think there's a bug open somewhere on fixing kernel-headers to work with UAPI
(02:29:16 PM) sgallagh: jwb: I'm just reporting it. It broke SSSD's nightly auto-build
(02:29:27 PM) davej: yeah, looks like uapi fallout
14:30
(02:30:10 PM) davej: I've not looked to check we're actually packaging up include/uapi properly. jforbes ?
(02:30:11 PM) jforbes: There has been a good bit of uapi fallout through the merge window, though I am surprised that more showed up after rc1
(02:30:29 PM) jforbes: I will double check this afternoon
(02:30:38 PM) jwb: jforbes, dhowells is still merging big chunks of it.  per subsystem, per arch, etc
(02:30:52 PM) jwb: davej, that's the bug i mentioned earlier.  saw it go flying by
(02:31:04 PM) davej: ok
(02:31:14 PM) jwb: or one of them anyway
(02:31:54 PM) jforbes: Yeah, I will go through everyting this afternoon and make sure we are doing the right thing
(02:32:21 PM) jforbes: sgallagh: mind pinging me again if there are problems after git4?
(02:32:50 PM) sgallagh: jforbes: I was about to file a BZ. Want me to hold off?
(02:32:55 PM) sgallagh: What's the eta on git4?
(02:32:59 PM) jforbes: sgallagh: no, go ahead
(02:33:12 PM) jforbes: sgallagh: git4 will be built either late tonight or tomorrow AM
(02:33:44 PM) sgallagh: ok, I'll try to check on Monday. I'll be traveling this weekend
(02:34:02 PM) jforbes: Oh yeah, it's friday...
(02:34:29 PM) jforbes: So it will likely be rc2 then instead of git4

Comment 1 Stephen Gallagher 2012-10-19 18:39:34 UTC
Created attachment 630216 [details]
Mock root log for SSSD

Comment 2 Jakub Hrozek 2012-10-30 10:10:26 UTC
Hi,

any updates here? This is effectively blocking us from building new SSSD builds in rawhide. I'd have to disable the libnl support to work around the bug.

Comment 3 Thomas Graf 2012-10-30 13:15:05 UTC
The easiest thing here would be to add a small patch to the libnl-1 patch that removes the redefinition from the netlink-kernel.h file and add a build dependency requiring the new kernel that fails.

Comment 4 Neil Horman 2012-10-30 13:29:12 UTC
reassigning to libnl per tgr's comment #3

Comment 5 Josh Boyer 2012-10-30 14:33:58 UTC
So, basically libnl-1.1 is really old.  I mean, upstream libnl is already on 3.2.10.  Maybe it should be updated.

Anyway, here is what is happening:

The uapi rework moved the userspace portions of <linux/netlink.h> under a new header guard:

#ifndef _UAPI__LINUX_NETLINK_H
#define _UAPI__LINUX_NETLINK_H

That is literally the only difference:

[jwboyer@zod kernel]$ diff -Nup /usr/include/linux/netlink.h netlink.h 
--- /usr/include/linux/netlink.h	2012-10-08 13:41:19.000000000 -0400
+++ netlink.h	2012-10-30 09:53:52.264784878 -0400
@@ -1,5 +1,5 @@
-#ifndef __LINUX_NETLINK_H
-#define __LINUX_NETLINK_H
+#ifndef _UAPI__LINUX_NETLINK_H
+#define _UAPI__LINUX_NETLINK_H
 
 #include <linux/socket.h> /* for __kernel_sa_family_t */
 #include <linux/types.h>
@@ -150,4 +150,4 @@ struct nlattr {
 #define NLA_HDRLEN		((int) NLA_ALIGN(sizeof(struct nlattr)))
 
 
-#endif	/* __LINUX_NETLINK_H */
+#endif /* _UAPI__LINUX_NETLINK_H */

Now, libnl-1.1 seems to want to be clever and it provides /usr/include/netlink/netlink-kernel.h which seems to be some kind of stale copy of the in-kernel header.  Why it does this, I have no idea.  Probably because it's old.

Because of the header guard change, it seems to be getting these redefinition errors.  There really isn't much to be done here except to either patch the libnl netlink-kernel.h file for the new guard, or patch the kernel.  We're not going to patch the kernel.

It should be noted that libnl stopped using netlink/netlink-kernel.h 2 years ago with this commit:

https://github.com/tgraf/libnl/commit/82fe78582045d29d3b9a5bd2f5b233814bd23c23

This boils down to an old package carrying around a stale copy of in-kernel headers.  It should be fixed in that package.

Comment 6 Josh Boyer 2012-10-30 14:34:55 UTC
(In reply to comment #3)
> The easiest thing here would be to add a small patch to the libnl-1 patch
> that removes the redefinition from the netlink-kernel.h file and add a build
> dependency requiring the new kernel that fails.

Seems we had a mid-air collision :).

Easiest, sure.  I don't see why libnl can't be updated in rawhide though.

Comment 7 Jakub Hrozek 2012-10-30 16:04:49 UTC
(In reply to comment #5)
> So, basically libnl-1.1 is really old.  I mean, upstream libnl is already on
> 3.2.10.  Maybe it should be updated.

Except the APIs are not compatible, so there needs to be a -compat package to allow packages that use libnl (such as the SSSD) to transition.

Comment 8 Thomas Graf 2012-10-30 16:23:58 UTC
(In reply to comment #5)
> Now, libnl-1.1 seems to want to be clever and it provides
> /usr/include/netlink/netlink-kernel.h which seems to be some kind of stale
> copy of the in-kernel header.  Why it does this, I have no idea.  Probably
> because it's old.

Because back then it was the proper way to do it because the kernel headers provided by user space differed for almost every distribution.

> Because of the header guard change, it seems to be getting these
> redefinition errors.  There really isn't much to be done here except to
> either patch the libnl netlink-kernel.h file for the new guard, or patch the
> kernel.  We're not going to patch the kernel.

Fixing the libnl1 package is definitely the correct thing to do. Simply get rid of netlink-kernel.h. Relying on linux/netlink.h is fine now.

> It should be noted that libnl stopped using netlink/netlink-kernel.h 2 years
> ago with this commit:
> 
> https://github.com/tgraf/libnl/commit/
> 82fe78582045d29d3b9a5bd2f5b233814bd23c23
> 
> This boils down to an old package carrying around a stale copy of in-kernel
> headers.  It should be fixed in that package.

It's true that libnl3 no longer uses netlink-kernel.h but libnl 1.1 still does and since libnl 1.1 is no longer maintained it will not be fixed upstream.

I suggest to add a small patch to remove the stale header copies and use the new UAPI headers.

Comment 9 Thomas Graf 2012-10-30 17:48:09 UTC
(In reply to comment #7)
> (In reply to comment #5)
> > So, basically libnl-1.1 is really old.  I mean, upstream libnl is already on
> > 3.2.10.  Maybe it should be updated.
> 
> Except the APIs are not compatible, so there needs to be a -compat package
> to allow packages that use libnl (such as the SSSD) to transition.

libnl3 is already in Fedora so we can just remove libnl1 as soon as all dependencies have been removed.

It's really not a good idea to still use libnl1, it is no longer maintained and does not see any bugfixes and I have little interest to do so, especially because converting code from libnl1 to libnl3 is usually a relatively simple task.

Comment 10 Stephen Gallagher 2012-10-30 20:37:34 UTC
Part of the reason that SSSD hasn't made the switch to libnl3 is that it needs to support RHEL5 and RHEL6 as well as Fedora. Neither of those OSes currently support libnl3 (and won't ever). So it's a fair amount of ifdefs to get things working and up until now there was no real reason to make the effort.

Comment 11 Jakub Hrozek 2012-10-30 22:36:36 UTC
(In reply to comment #9)
> It's really not a good idea to still use libnl1, it is no longer maintained
> and does not see any bugfixes and I have little interest to do so,
> especially because converting code from libnl1 to libnl3 is usually a
> relatively simple task.

Not that I disagree with the general message, but I really think that when an incompatible library is introduced to Fedora, there should be
a) a loud announcement on the -devel (or one of the announce) lists
b) a compat package to ease the migration path

Providing these two might speed up the migration to libnl3 as it makes clear to the maintainer that libnl1 is no longer a viable option.

Comment 12 Thomas Graf 2012-10-30 22:55:11 UTC
(In reply to comment #10)
> Part of the reason that SSSD hasn't made the switch to libnl3 is that it
> needs to support RHEL5 and RHEL6 as well as Fedora. Neither of those OSes
> currently support libnl3 (and won't ever). So it's a fair amount of ifdefs
> to get things working and up until now there was no real reason to make the
> effort.

Understood but I don't see a reason why RHEL5 and RHEL6 can't be shipping libnl3 as well. Both versions of the library can be installed in parallel. Obviously applications can't link to both version at the same time but that seems like a manageable problem. Not sure why you rule out this possibility.

(In reply to comment #11)
> Not that I disagree with the general message, but I really think that when
> an incompatible library is introduced to Fedora, there should be
> a) a loud announcement on the -devel (or one of the announce) lists

Sure, no objection to that.

> b) a compat package to ease the migration path
> 
> Providing these two might speed up the migration to libnl3 as it makes clear
> to the maintainer that libnl1 is no longer a viable option.

What would a compat package gain us that keeping libnl1 around as long as needed can't? All I can see is that it brings libnl3 bugfixes to libnl1 users which gives them one more reason to postpone the switch again.

Comment 13 Jakub Hrozek 2012-10-30 23:15:18 UTC
(In reply to comment #12)
> > 
> > Providing these two might speed up the migration to libnl3 as it makes clear
> > to the maintainer that libnl1 is no longer a viable option.
> 
> What would a compat package gain us that keeping libnl1 around as long as
> needed can't? All I can see is that it brings libnl3 bugfixes to libnl1
> users which gives them one more reason to postpone the switch again.

Well, the above announcement could have said that the -compat package would only be maintained for a single Fedora release lifetime, giving it a limited and definitive TTL :-)

Also, if a package suddendly starts requiring a -compat package instead of a "regular" version, that's usually a sign that a change needs to be made.

That said, I've bumped the ticket that was tracking the conversion to libnl3 in the SSSD to be discussed again on our weekly triage meeting.

Comment 14 Fedora End Of Life 2013-04-03 14:26:37 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 15 Fedora End Of Life 2015-01-09 17:26:04 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 16 Fedora End Of Life 2015-02-17 14:31:31 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 17 Red Hat Bugzilla 2023-09-14 01:38:07 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days