Since util-linux 2.35-0.1 landed in Rawhide, I've noticed the Cloud tests in openQA (which use the test suite formerly run by autocloud) are sometimes failing due to agetty segfaulting. At first I mentioned this in https://bugzilla.redhat.com/show_bug.cgi?id=1783066 , but the cause of that turned out to be something else and with that fixed the agetty segfaults are still happening, so I'm filing this separately. It doesn't seem to always segfault at the same time - it seems like it can happen at almost any time during the test. The test has run ten times since the new util-linux landed (5 in prod, 5 in stg) and it's failed 4 times due to agetty crashes, each time in a different place, and passed 6 (I guess it's possible agetty segfaults may have happened in the passed test too but somehow not interfered with the test). I backtraced the first crash, I'll attach that backtrace to the bug; the backtrace pointed to "double free or corruption (out)".
Created attachment 1645905 [details] backtrace of the first time the crash happened
I'm not sure if this really qualifies as a release-blocking bug as I guess it's fairly odd to use a local tty on a Cloud image...the typical way to interact with the system would be to ssh into it...
(In reply to Adam Williamson from comment #2) > I'm not sure if this really qualifies as a release-blocking bug as I guess > it's fairly odd to use a local tty on a Cloud image...the typical way to > interact with the system would be to ssh into it... I guess a good piece of information would be to know if it is something about the way cloud image is configured that is causing this. i.e. would/could it happen to server and workstation as well and we're just seeing it here first?
Well, I think the VT consoles on typical Fedora installs are actually run by mingetty , not agetty. I did not look into it in detail yet, but my hand-wavy assumption so far has been that mingetty is being left out of the Cloud images either intentionally or by some kind of accident, and something is just falling back on using agetty instead.
In fact, another interesting thing is that the serial console install test started failing in Rawhide around the same time. Now I believe mingetty doesn't support serial lines, so presumably we run something else on serial consoles. I guess it may well be agetty, and so serial consoles may be broken due to this bug as well. I'll have to dig into that, because if that's the case, it *will* be a release blocker.
Hmm. so. We *do* use agetty on serial consoles. However, I can actually launch a serial console install in a local VM manually and have it work OK - it doesn't fail like it does in openQA. I'll have to fiddle with that a bit more.
This is very probably related to /etc/issue or /etc/issue.d (or /run/issue, /run/issue.d, /usr/lib/issue and /usr/lib/issue.d).
OK, I'm able to reproduce this probem. It's related to issue file autoreload -- this is reason why you see it on some machines where some network stuff (IP/hostname etc.) is updated after agetty start. In this case, agetty reloads the issue file. Unfortunately, it uses already freed pointer...
Fixed by upstream commit 9418ba6d05feed6061f5343741b1bc56e7bde663.
Great, thanks! I owe you several beers for not just telling me to run valgrind on it :P Do you mind if I backport it to Rawhide to see which openQA tests it fixes?
Oh, never mind, I see you did it already! That's great.
OK, the Cloud image test and serial console install both passed with today's Rawhide, so this is looking good. Thanks.
Thanks Adam for chasing this down and thanks Karel for fixing it!