Bug 1908005

Summary: After I did a "dnf upgrade -y", and rebooted, I lost access to "sudo bash" core dumped and "su -l" core dumped
Product: [Fedora] Fedora Reporter: David Marceau <uticdmarceau2007>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: acaringi, adscvr, agurenko, airlied, amatej, boldingd, bskeggs, c.fierst, chplee, dmach, fedora, hdegoede, holger, itamar, jarodwilson, jeremy, jglisse, jmracek, joe, jonathan, josef, jrohel, kernel-maint, lgoncalv, linville, masami256, mblaha, mchehab, mhatina, packaging-team-maint, pkratoch, pmatilai, proletarius101, ptalbert, rpm-software-management, stefano.biagiotti, steved, vmukhame
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dnf upgrade console output from a similar older drive onto the very same computer but without rebooting it yet.
none
Commenter 6 attachment 1
none
Commenter 6 attachment 2
none
Comment 14 attachement none

Description David Marceau 2020-12-15 16:33:33 UTC
Created attachment 1739388 [details]
dnf upgrade console output from a similar older drive onto the very same computer but without rebooting it yet.

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. dnf upgrade -y
2. reboot
3. log in as non-root user
4. sudo bash

Actual results:
sudo bash core dumped
su -l core dumped

Expected results:
I expected it to bring up a root shell.

Additional info:
It's on fedora rawhide on an internal sata sk hynix s31 ssd.

The only thing I witnessed that was odd during the dnf upgrade was something of an output error.

Comment 1 David Marceau 2020-12-15 16:34:11 UTC
  Running scriptlet: kernel-core-5.10.0-0.rc6.20201204git34816d20f173.92.fc34.x86_64                                                                             540/540 
sort: fflush failed: 'standard output': Broken pipe
sort: write error

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
sort: write failed: 'standard output': Broken pipe
sort: write error

  Running scriptlet: nss-3.59.0-2.fc34.x86_64                                                                                                                    540/540 
  Running scriptlet: rpm-4.16.1-1.fc34.x86_64                                                                                                                    540/540

Comment 2 Panu Matilainen 2020-12-16 07:51:45 UTC
That's unlikely to be dnf's fault, instead some central component is broken on rawhide and the whole works blew up.

Comment 3 Daniel Mach 2020-12-21 08:06:42 UTC
This really doesn't look like a dnf issue.

I can think of the following root causes:
* other components broken (does rebooting the system help?)
* broken hardware: SSD, RAM (what smartctl --health /dev/sdX or dmesg say?)
* broken file system

If your system boots, I recommend running `dnf remove --duplicates` to recover from a potentially inconsistent state.

Comment 4 Daniel Mach 2020-12-21 12:16:23 UTC
I just realized that you've rebooted already (it's mentioned in the summary).

Could you boot from a live media, mount the system partition and check if the symlinks in /usr/lib64 point to reasonable locations?
Also all the files under /usr/lib64 should have non-zero size.

You could use dnf from the live media to repair the mounted file by using `dnf --installroot=/path/to/mount ...`

Comment 5 proletarius101 2020-12-26 16:20:02 UTC
Same here. I can boot into the system since I have an older kernel available. No dead symlinks in /usr/lib64. All of them are non-zero.

Comment 6 CoenFierst 2020-12-27 12:00:34 UTC
I get similar error on regular Fedora 33 workstation update. This command:

sudo dnf distro-sync 2>&1 | tee update.txt

generates these lines:

(...)
  Uitvoeren van scriptlet: kernel-core-5.9.16-200.fc33.x86_64             23/23 
sort: fflush failed: 'standard output': Broken pipe
sort: write error

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
sort: write failed: 'standard output': Broken pipe
sort: write error
(...)

Apparently, the rest of the update continues regularly. After reboot these PCs work just fine. Perhaps caused by usage of fish as a shell. 
I'm attaching output of DNF from two different PCs here, hopes this helps.

Comment 7 CoenFierst 2020-12-27 12:02:54 UTC
Created attachment 1742258 [details]
Commenter 6 attachment 1 [details]

Comment 8 CoenFierst 2020-12-27 12:03:38 UTC
Created attachment 1742259 [details]
Commenter 6 attachment 2 [details]

Comment 9 Stefano Biagiotti 2020-12-29 18:38:27 UTC
Same error here on Fedora 33.

Excerpt from "dnf upgrade" output:

[...]
  Esecuzione scriptlet in corso: kernel-core-5.9.16-200.fc33.x86_64       34/34 
sort: fflush failed: 'standard output': Broken pipe
sort: write error

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
sort: write failed: 'standard output': Broken pipe
sort: write error

[...]

Comment 10 Daniel Mach 2021-01-04 12:29:30 UTC
Based on comment#6 and comment#9 the problem might be in kernel scriplet. Reassigning.

Comment 11 Daniel Mach 2021-01-04 12:55:18 UTC
*** Bug 1911033 has been marked as a duplicate of this bug. ***

Comment 12 amatej 2021-01-04 13:37:35 UTC
I am not sure but I think I also had this problem on rawhide (sudo was segfaulting and I couldn't login). It was caused by broken symlinks in /etc/pam.d/ and I fixed it by selecting a profile with authselect: authselect select minimal.

It did happen during a dnf update (most likely some scriptlet) but it was almost a month ago and I no longer have the logs.

Comment 13 Thorsten Leemhuis 2021-01-07 16:51:49 UTC
FWIW: See bug 1911038, it seems this is a problem with dracut, but I wonder if it's related to rpm or the running kernel somehow (it doesn't occur when dracut is called directly)

Bug 1911364 might be duplicate.

Comment 14 David Bolding 2021-01-16 15:59:37 UTC
I'm also getting this on my Fedora 33 system.  I think it's been happening on every kernel update since before christmas.  The system still boots; running authselect didn't do anything, and 'dnf remove --duplicates' didn't find anything.

I'll attach the output of me trying to reinstall the kernel.  (I'm also using RPMFusion, and that's fail a checksum; is that happening to anyone else with this problem?  Could that be related?)

Comment 15 David Bolding 2021-01-16 16:00:57 UTC
Created attachment 1748100 [details]
Comment 14 attachement

Comment 16 Thorsten Leemhuis 2021-01-17 14:21:58 UTC
TWIMC: the "broken pipe" messages that some people mentioned here seem to be a bug in kexec-tools. See Bug 1911038 for details.

They afaics are unlikely to be causing the "core dumped" problems that let to the creation of this bug report.