Bug 2100287

Summary: popt 1.19 seems to break authselect, possibly others
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: authselectAssignee: Tomas Halman <thalman>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rawhideCC: ffesti, jhrozek, mdomonko, pbrezina, pmatilai, redhat-bugzilla, robatino, sgallagh, thalman
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: openqa
Fixed In Version: authselect-1.4.0-2.fc37 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-18 10:06:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2009537    

Description Adam Williamson 2022-06-23 01:11:19 UTC
openQA testing of the popt-1.19~rc1-1.fc37 update for Rawhide seems to suggest it's badly broken:

https://openqa.stg.fedoraproject.org/tests/overview?groupid=2&build=Update-FEDORA-2022-92edd473b6&version=37&distri=fedora

the tests that fail there are not failing on other updates before or after popt, suggesting that popt is indeed the cause. The freeipa tests seem to be failing because authselect is broken - freeipa tries to run this command:

authselect select sssd with-sudo --force --backup=pre_ipaclient_20220622042949

and it fails with:

[error] Unable to get profile information [2]: No such file or directory

The other tests that fail are tests that build an installer or live image containing the update and then try to install and boot from it. Building both types of image works.

Installing from the network install image works, but logging into the installed system always fails: when the test types a valid username into the console login prompt, it immediately says "Login incorrect" and cycles back to the login prompt. This is likely also caused by the authselect failure - anaconda runs authselect in the installed system root after installing packages, so likely this is going wrong and messing up the auth configuration of the installed system.

Booting the live images fails, showing errors at the dracut stage: it seems /sbin/sysctl is missing from the dracut environment and modprobe can't find a module it's trying to load. I'm not sure yet what's causing this, but it's definitely busted.

I've asked releng to untag the package before the next compose runs, otherwise Rawhide will be quite badly busted I think.

Comment 1 Panu Matilainen 2022-06-23 06:36:51 UTC
Ack. This is quite unexpected of course, there aren't that many changes to popt here, but then you never know.

Thanks for the report, I'll look into it ASAP.

Comment 2 Panu Matilainen 2022-06-23 08:07:02 UTC
Easily reproduced. authconfig appears to be relying on popt maintaining state after poptFreeContext(), in other words relying on popt leaking memory. And yes, popt was leaking memory here, probably for a long time, and this is one of the fixes in 1.19.

A pointer returned by poptGetArg() is only valid as long as the popt context is it was used on is valid. Authconfig needs to either
a) preserve the popt context as long as the data is needed (initialize context early, free just before exit is a good strategy to avoid these sort of issues)
b) make a copy of the data returned (and thus manage the allocation locally)

This is sufficient for fixing the above reproducer, I don't know if there are other cases left. I see a comment about discarding const "because it's not worth it" in cli_tool_common_opts() which may indicate another trouble spot.

diff --git a/src/cli/cli_tool.c b/src/cli/cli_tool.c
index 83bc1ef..4c1fbbe 100644
--- a/src/cli/cli_tool.c
+++ b/src/cli/cli_tool.c
@@ -379,7 +379,7 @@ errno_t cli_tool_popt_ex(struct cli_cmdline *cmdline,
             }
         }
 
-        *_fopt = fopt;
+        *_fopt = strdup(fopt);
     } else if (_fopt == NULL && fopt != NULL) {
         /* Unexpected free argument. */
         fprintf(stderr, _("Unexpected parameter: %s\n\n"), fopt);

This of course leaks memory, only now in authconfig (a cli tool leaking a bit of memory is a much lesser evil than a library doing so in any case). Fixing that is up to authconfig maintainers, I don't know whether a) or b) is preferable to them.

Comment 3 Panu Matilainen 2022-06-23 08:35:48 UTC
Just FWIW, https://github.com/rpm-software-management/popt/commit/7182e4618ad5a0186145fc2aa4a98c2229afdfa8 is what reveals the abuse in authconfig, and where there's one there may be more. We can certainly revert that patch temporarily if necessary, but I'd rather see the users fixed instead.

Comment 4 Pavel Březina 2022-06-23 10:17:04 UTC
Thank you. I will try to fix it soon, but it won't be sooner than the beginning of next month.

Comment 5 Adam Williamson 2022-06-23 19:31:58 UTC
The case which I'm not sure is caused by authconfig is live image boot. If we want to test, we'd need to run a test with the new popt and a patched authconfig and see if there's still a problem.

Panu, are you okay with waiting until Pavel can fix authconfig for popt 1.19 to go into Rawhide?

Comment 6 Adam Williamson 2022-06-23 23:23:29 UTC
Further note on the live image case: the error messages I was seeing about sysctl and modprobe aren't actually the issue. They appear even in boots with popt 1.18 that work fine, they're apparently not fatal problems. One of them is caused by anaconda trying to load cramfs which isn't in the initramfs environment, and the other is caused by a modprobe config file installed by nfs-utils that tries to run sysctl when loading the nfs module; sysctl also isn't in the initramfs environment. I've filed separate issues on those. Whatever causes live images not to boot, it's not those errors. So it may well just be authconfig again.

Comment 7 Panu Matilainen 2022-06-27 05:33:33 UTC
> Panu, are you okay with waiting until Pavel can fix authconfig for popt 1.19 to go into Rawhide?

I'm off to a vacation after this week and will only be back in August. Michal (cc'd), I suppose babysitting the new popt in rawhide and helping out debugging possible other failure cases falls on your lap in the meanwhile. 

I know at least gdisk is affected but that's already tracked (bug 2100391), there may be others.

/me notes s/authconfig/authselect/ in all of the above, old names die hard :D

Comment 8 Stephen Gallagher 2022-06-28 12:07:47 UTC
*** Bug 2101566 has been marked as a duplicate of this bug. ***

Comment 9 Panu Matilainen 2022-06-29 06:18:04 UTC
I'll build popt 1.19 with the memleak fix reverted to get out of the current limbo state - untagging is good for disaster prevention but rawhide + dist-git disagreeing is not a sustainable state.

Comment 10 Panu Matilainen 2022-06-29 08:19:11 UTC
FWIW, popt 1.19 minus the memleak patch is now back in rawhide.

Comment 11 Pavel Březina 2022-06-29 08:45:28 UTC
This one contains the memleak patch so I can use it for testing? https://koji.fedoraproject.org/koji/buildinfo?buildID=1990967

Comment 12 Panu Matilainen 2022-06-29 08:50:45 UTC
(In reply to Pavel Březina from comment #11)
> This one contains the memleak patch so I can use it for testing?
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1990967

Yes.

Comment 13 Pavel Březina 2022-07-08 11:35:12 UTC
Please, test this scratch build: https://koji.fedoraproject.org/koji/taskinfo?taskID=89233014

Comment 14 Adam Williamson 2022-07-15 22:03:58 UTC
Sorry for the delay: I tested, and it looks good. That authselect build looks like it works OK with the earlier popt build that has the memleak patch.

Comment 15 Adam Williamson 2022-07-18 16:50:46 UTC
I guess we'll leave it to popt maintainers to decide when to go back to an unpatched build. I'll open a new bug if I still see problems if/when that happens.

Comment 16 Panu Matilainen 2022-08-24 06:39:07 UTC
FYI, the temporary revert now undone in popt-1.19~rc1-4