Bug 1908005 - After I did a "dnf upgrade -y", and rebooted, I lost access to "sudo bash" core dumped and "su -l" core dumped
Summary: After I did a "dnf upgrade -y", and rebooted, I lost access to "sudo bash" co...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1911033 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-15 16:33 UTC by David Marceau
Modified: 2021-01-18 13:26 UTC (History)
38 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)
dnf upgrade console output from a similar older drive onto the very same computer but without rebooting it yet. (310.33 KB, text/plain)
2020-12-15 16:33 UTC, David Marceau
no flags Details
Commenter 6 attachment 1 (5.33 KB, text/plain)
2020-12-27 12:02 UTC, CoenFierst
no flags Details
Commenter 6 attachment 2 (11.39 KB, text/plain)
2020-12-27 12:03 UTC, CoenFierst
no flags Details
Comment 14 attachement (4.16 KB, text/plain)
2021-01-16 16:00 UTC, David Bolding
no flags Details

Description David Marceau 2020-12-15 16:33:33 UTC
Created attachment 1739388 [details]
dnf upgrade console output from a similar older drive onto the very same computer but without rebooting it yet.

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. dnf upgrade -y
2. reboot
3. log in as non-root user
4. sudo bash

Actual results:
sudo bash core dumped
su -l core dumped

Expected results:
I expected it to bring up a root shell.

Additional info:
It's on fedora rawhide on an internal sata sk hynix s31 ssd.

The only thing I witnessed that was odd during the dnf upgrade was something of an output error.

Comment 1 David Marceau 2020-12-15 16:34:11 UTC
  Running scriptlet: kernel-core-5.10.0-0.rc6.20201204git34816d20f173.92.fc34.x86_64                                                                             540/540 
sort: fflush failed: 'standard output': Broken pipe
sort: write error

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
sort: write failed: 'standard output': Broken pipe
sort: write error

  Running scriptlet: nss-3.59.0-2.fc34.x86_64                                                                                                                    540/540 
  Running scriptlet: rpm-4.16.1-1.fc34.x86_64                                                                                                                    540/540

Comment 2 Panu Matilainen 2020-12-16 07:51:45 UTC
That's unlikely to be dnf's fault, instead some central component is broken on rawhide and the whole works blew up.

Comment 3 Daniel Mach 2020-12-21 08:06:42 UTC
This really doesn't look like a dnf issue.

I can think of the following root causes:
* other components broken (does rebooting the system help?)
* broken hardware: SSD, RAM (what smartctl --health /dev/sdX or dmesg say?)
* broken file system

If your system boots, I recommend running `dnf remove --duplicates` to recover from a potentially inconsistent state.

Comment 4 Daniel Mach 2020-12-21 12:16:23 UTC
I just realized that you've rebooted already (it's mentioned in the summary).

Could you boot from a live media, mount the system partition and check if the symlinks in /usr/lib64 point to reasonable locations?
Also all the files under /usr/lib64 should have non-zero size.

You could use dnf from the live media to repair the mounted file by using `dnf --installroot=/path/to/mount ...`

Comment 5 proletarius101 2020-12-26 16:20:02 UTC
Same here. I can boot into the system since I have an older kernel available. No dead symlinks in /usr/lib64. All of them are non-zero.

Comment 6 CoenFierst 2020-12-27 12:00:34 UTC
I get similar error on regular Fedora 33 workstation update. This command:

sudo dnf distro-sync 2>&1 | tee update.txt

generates these lines:

(...)
  Uitvoeren van scriptlet: kernel-core-5.9.16-200.fc33.x86_64             23/23 
sort: fflush failed: 'standard output': Broken pipe
sort: write error

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
sort: write failed: 'standard output': Broken pipe
sort: write error
(...)

Apparently, the rest of the update continues regularly. After reboot these PCs work just fine. Perhaps caused by usage of fish as a shell. 
I'm attaching output of DNF from two different PCs here, hopes this helps.

Comment 7 CoenFierst 2020-12-27 12:02:54 UTC
Created attachment 1742258 [details]
Commenter 6 attachment 1 [details]

Comment 8 CoenFierst 2020-12-27 12:03:38 UTC
Created attachment 1742259 [details]
Commenter 6 attachment 2 [details]

Comment 9 Stefano Biagiotti 2020-12-29 18:38:27 UTC
Same error here on Fedora 33.

Excerpt from "dnf upgrade" output:

[...]
  Esecuzione scriptlet in corso: kernel-core-5.9.16-200.fc33.x86_64       34/34 
sort: fflush failed: 'standard output': Broken pipe
sort: write error

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe
sort: write failed: 'standard output': Broken pipe
sort: write error

[...]

Comment 10 Daniel Mach 2021-01-04 12:29:30 UTC
Based on comment#6 and comment#9 the problem might be in kernel scriplet. Reassigning.

Comment 11 Daniel Mach 2021-01-04 12:55:18 UTC
*** Bug 1911033 has been marked as a duplicate of this bug. ***

Comment 12 amatej 2021-01-04 13:37:35 UTC
I am not sure but I think I also had this problem on rawhide (sudo was segfaulting and I couldn't login). It was caused by broken symlinks in /etc/pam.d/ and I fixed it by selecting a profile with authselect: authselect select minimal.

It did happen during a dnf update (most likely some scriptlet) but it was almost a month ago and I no longer have the logs.

Comment 13 Thorsten Leemhuis 2021-01-07 16:51:49 UTC
FWIW: See bug 1911038, it seems this is a problem with dracut, but I wonder if it's related to rpm or the running kernel somehow (it doesn't occur when dracut is called directly)

Bug 1911364 might be duplicate.

Comment 14 David Bolding 2021-01-16 15:59:37 UTC
I'm also getting this on my Fedora 33 system.  I think it's been happening on every kernel update since before christmas.  The system still boots; running authselect didn't do anything, and 'dnf remove --duplicates' didn't find anything.

I'll attach the output of me trying to reinstall the kernel.  (I'm also using RPMFusion, and that's fail a checksum; is that happening to anyone else with this problem?  Could that be related?)

Comment 15 David Bolding 2021-01-16 16:00:57 UTC
Created attachment 1748100 [details]
Comment 14 attachement

Comment 16 Thorsten Leemhuis 2021-01-17 14:21:58 UTC
TWIMC: the "broken pipe" messages that some people mentioned here seem to be a bug in kexec-tools. See Bug 1911038 for details.

They afaics are unlikely to be causing the "core dumped" problems that let to the creation of this bug report.


Note You need to log in before you can comment on or make changes to this bug.