Bug 1754807

Summary: Mock with systemd-nspawn completely hangs system on F31
Product: [Fedora] Fedora Reporter: Artem <ego.cordatus>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: high    
Version: 31CC: airlied, bskeggs, dan.cermak, elxreno, hdegoede, ichavero, itamar, jarodwilson, jdisnard, jeremy, jglisse, jkeating, john.j5live, jonathan, josef, kernel-maint, linville, lnykryn, masami256, mchehab, mebrown, mjg59, msekleta, msuchy, philip.wyett, praiskup, sanjay.ankur, s, steved, systemd-maint, vascom2, vitaly, williams, zbyszek
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-24 19:21:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1767097    
Attachments:
Description Flags
mock journalctl log
none
mock journalctl log #2 none

Description Artem 2019-09-24 06:33:58 UTC
Description of problem:
After upgrade to F31 from F30 mock periodically hangs system completely and the only way to fix this rebooting.

Version-Release number of selected component (if applicable):
mock-1.4.19-1.fc31

How reproducible:
Initiate mock build (happens not every time, 1:30).

Actual results:
System hangs.

Expected results:
Mock not hangs system.

Additional info:
At least three (including me) people confirm this issue on Fedora 31.

Comment 1 Vasiliy Glazov 2019-09-24 06:36:03 UTC
I can confirm this.
Problem present on all mock configs - rawhide, f31, epel7...

Comment 2 Artem 2019-09-24 21:01:48 UTC
I thought perhaps recent new systemd update could fix this but nope unfortunately. Happens again. I noticed this time it hangs on:

- Enable HW Info plugin... ☠️

Comment 3 Miroslav Suchý 2019-09-25 09:30:48 UTC
Hmm, I was trying to reproduce and run
  while true; do mock init; done
without any error.
But it happen to my colleague (praiskup) and it seems that in his case, it happens when you run mock after you run podman.

Comment 4 Vasiliy Glazov 2019-09-25 09:38:59 UTC
I am not use podman and it is not installed in my system.

Comment 5 Artem 2019-09-25 12:02:50 UTC
Also noticed seems like all this time when it hangs i run mock with such options:

$ mock -n --offline <foo>

Comment 6 Pavel Raiskup 2019-09-25 12:25:01 UTC
In my case, my box isn't hanged - but I can not run podman anymore.  I suspect
there's something broken with cgroups v2 and systemd-nspawn (which used by
mock).  Have you tried --old-chroot option (that turns systemd-nspawn off)?

Anyways, as I'm not able to reproduce this, having the --verbose or
--verbose + --trace output attached.

Comment 7 Artem 2019-09-25 12:31:02 UTC
Yep, yesterday i started trying with --old-chroot and will watch now...

Another bug which i struggle because of cgroups v2 is this:
https://bugzilla.redhat.com/show_bug.cgi?id=1751120

Comment 8 Zbigniew Jędrzejewski-Szmek 2019-10-14 07:55:54 UTC
What does "hangs" mean? A kernel crash, swap frenzy, something else? We need some details.
Please start with explanation what "hangs" means and provide the logs (journactl -b-1) from
around that time.

Comment 9 Artem 2019-10-14 08:23:00 UTC
Created attachment 1625512 [details]
mock journalctl log

> What does "hangs" mean?

When starting mock build almost right from start it silently stops and system completely unresponsive. No I/O activity, can't switch to tty3-7 from graphic mode. Literally can't do anything, just pressing Reset button helps.

* Attached journal.

Note: with --old-chroot no hangs. I did tons builds with it and never hangs.

Comment 10 Pavel Raiskup 2019-10-14 08:47:47 UTC
I suspect this could be duplicate to bug 1756972 or (unlikely) some glitch caused by
bug fixed by [1] or it can be related cgroups v2 bug [2].

Please try mock from 'dnf copr enable praiskup/mock-fixes' (v1.4.20-1.git.8.2feb615).
If the problem persists, it is likely [2] or even something else.

[1] https://github.com/rpm-software-management/mock/commit/c4eccaed8b41dedf11cc90a94481bcefc4ead2dc
[2] https://github.com/rpm-software-management/mock/issues/374

Comment 11 Miroslav Suchý 2019-10-15 09:32:11 UTC

*** This bug has been marked as a duplicate of bug 1756972 ***

Comment 12 Artem 2019-11-05 17:50:31 UTC
After this update
https://bodhi.fedoraproject.org/updates/FEDORA-2019-755583cbdf

and after ~60 successful build mock hangs the system again. Another guy confirm this as well.

Comment 13 Artem 2019-11-05 17:59:50 UTC
Created attachment 1633037 [details]
mock journalctl log #2

Comment 14 Pavel Raiskup 2019-11-05 20:02:50 UTC
Can you please provide more info?

- full mock configuration
- fedora version is 31?
- full mock command-line when it hanged
- what have you built when mock hanged the system?
- is kernel dead, does it react on sysrq?
- can you try with disabled swap?
- your / filesystem?
- are you using tmpfs plugin?

If this is caused by the fact that mock (or anything below) eats too
much RAM, you need to debug _what_ causes that.  It could be anything.

Comment 15 Artem 2019-11-05 20:21:48 UTC
> full mock configuration:
- It is default, except:
  config_opts['rpmbuild_networking'] = True

> fedora version is 31?
- F31. It was never happened before and only right after upgrade to F31.

> full mock command-line when it hanged
- Command from my attached log:
  /usr/libexec/mock/mock -n -r fedora-rawhide-x86_64 --rebuild --sources /home/tim/rpmbuild/SOURCES/ --spec rust-maildir.spec'

> is kernel dead, does it react on sysrq?
- sysrq was disabled. Can test this next time when this happens.

> your / filesystem?
- ext4. Enough free space.

> are you using tmpfs plugin?
- No. But if remember correctly i tried with tmpfs and it still hangs sometimes.


> If this is caused by the fact that mock (or anything below) eats too
> much RAM, you need to debug _what_ causes that.  It could be anything.

No i assure you. Could happens with tiny project as well which not requires a lot RAM.

---

Note: keep in mind that with --old-chroot workaround everything fine.

Comment 16 Pavel Raiskup 2019-11-06 08:20:12 UTC
I'm afraid we can not help here (mock maintainers).  If this is about the
F30 -> F31 move, it is unlikely caused by mock.  We don't have F31
specific patches.  From mock POV I'd have to close INSUFFICIENT_DATA.

Could you try with up2date systemd on F30 to confirm that this
really doesn't happen there?

I'm switching this against systemd, which is more likely to have issues
with cgroups v2.

Comment 17 Zbigniew Jędrzejewski-Szmek 2019-11-06 10:08:51 UTC
> When starting mock build almost right from start it silently stops and system completely unresponsive. No I/O activity, can't switch to tty3-7 from graphic mode. Literally can't do anything, just pressing Reset button helps.

That sounds more like a kernel bug. Or it could simply be a hardware issue,
e.g. bad memory, that just happens to be triggered in a specific scenario.

Right now there simply isn't enough information to figure out what is going on
here.

Comment 18 Artem 2019-11-06 10:25:58 UTC
> Or it could simply be a hardware issue, e.g. bad memory, that just happens to be triggered in a specific scenario.

There is literally zero issues with this hardware for 2+ years. Only mock (maybe not mock itself bit something related to mock/nspawn) hangs the system. 3 more people have exactly the same issue. They lazy to write it there on RHBZ, but they wrote about this in group chat every day.

> is kernel dead, does it react on sysrq?

- Update: kernel dead, sysrq don't react when this happens.

Comment 19 Zbigniew Jędrzejewski-Szmek 2019-11-06 10:30:02 UTC
It would be good to connect a serial console or netconsole to capture some debug messages when this happens.

Comment 20 Christian Dersch 2019-11-06 19:37:37 UTC
I have the same issue here, up to date Fedora 31. Hang happens from time to time, when it happens: Always when systemd gets installed into the chroot.

Comment 21 Pavel Raiskup 2019-11-07 06:26:28 UTC
> when it happens: Always when systemd gets installed into the chroot.

Right after package install?  Or when executing some RPM scriptlet?
I tried to rebuild package with 'BuildRequires: systemd' over night
with about 320 attempts, and no problem appeared.

What systemd is installed inside chroot (what chroot you build against)?

Do your affected boxes have anything in common?

Comment 22 Christian Dersch 2019-11-07 08:15:29 UTC
It happened when building siril for rawhide using fedpkg mockbuild, so fedora-rawhide-x86_64 has been used. The freeze happens in ~1 of 10 build attempts for me. The point of freeze:

  Running scriptlet: systemd-243-4.gitef67743.fc32.x86_64                                                                                            343/373 
  Installing       : systemd-243-4.gitef67743.fc32.x86_64                                                                                            343/373

Version of mock packages:
mock-1.4.21-1.fc31.noarch
mock-core-configs-31.7-1.fc31.noarch

Kernel: 5.3.8-300.fc31.x86_64 x86_64

Host systemd: systemd-243-4.gitef67743.fc31.x86_64

My Fedora installation is Fedora 31, but no new installation, I upgraded it from 30 some weeks ago.

Comment 23 Christian Dersch 2019-11-07 09:14:10 UTC
In addition some information about my box:

# inxi -SCI
System:    Host: r2d2 Kernel: 5.3.8-300.fc31.x86_64 x86_64 bits: 64 Console: tty 2 Distro: Fedora release 31 (Thirty One) 
Machine:   Type: Desktop System: MSI product: MS-7798 v: 1.0 serial: N/A 
           Mobo: MSI model: B75MA-P45 (MS-7798) v: 1.0 serial: N/A BIOS: American Megatrends v: 1.3 date: 07/30/2012 
CPU:       Topology: Quad Core model: Intel Core i7-2600K bits: 64 type: MT MCP L2 cache: 8192 KiB 
           Speed: 1596 MHz min/max: 1600/3800 MHz Core speeds (MHz): 1: 1596 2: 1596 3: 1597 4: 1596 5: 1596 6: 1596 7: 1596 
           8: 1596 
Info:      Processes: 214 Uptime: 1h 09m Memory: 15.31 GiB used: 1.57 GiB (10.2%) Shell: bash inxi: 3.0.36

Comment 24 Pavel Raiskup 2019-11-11 13:48:03 UTC
There's a kernel trace in bug 1767097.

Comment 25 Dan Čermák 2019-11-12 22:49:15 UTC
I have observed these hangs too, but they didn't appear to be related to the systemd rpm. Instead they mostly occurred during the launch of the mock root, but sometimes also later on.

Comment 26 Vasiliy Glazov 2019-11-15 09:27:46 UTC
Problem Still happen time to time on my system.

System:    Host: vascom Kernel: 5.3.11-300.fc31.x86_64 x86_64 bits: 64 Desktop: KDE Plasma 5.16.5 
           Distro: Fedora release 31 (Thirty One) 
CPU:       Topology: Quad Core model: Intel Core i7-4770 bits: 64 type: MT MCP L2 cache: 8192 KiB

Comment 27 Michael 2019-11-16 10:25:19 UTC
I have the same problem.

System:    Host: desktop.local Kernel: 5.3.11-300.fc31.x86_64 x86_64 bits: 64 
           Desktop: KDE Plasma 5.16.5 Distro: Fedora release 31 (Thirty One) 
CPU:       Topology: Quad Core model: AMD Phenom II X4 B40 bits: 64 type: MCP L2 cache: 2048 KiB

But If you add systemd.unified_cgroup_hierarchy=0 into kernel parameters then the problem will disappear (at least for me).

Comment 28 Vitaly 2019-11-16 10:47:16 UTC
systemd.unified_cgroup_hierarchy=0 can be used as workaround until this issue will be fixed in upstream.

Comment 29 Justin M. Forbes 2020-03-03 16:21:45 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 31 kernel bugs.

Fedora 31 has now been rebased to 5.5.7-200.fc31.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 32, and are still experiencing this issue, please change the version to Fedora 32.

If you experience different issues, please open a new bug report for those.

Comment 30 Artem 2020-03-04 11:32:27 UTC
> Fedora 31 has now been rebased to 5.5.7-200.fc31.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

Just tried latest 5.5.7 on F31 and unfortunately problem is still persist.

> If you have moved on to Fedora 32, and are still experiencing this issue, please change the version to Fedora 32.

I'll update soon on F32 and will test this for sure.

Comment 31 Artem 2020-03-21 16:37:26 UTC
Still experiencing this issue on F32.

- kernel version: 5.6.0-0.rc5.git0.2.fc32
- mock version: mock-2.1-1.fc32

Another guy said that after reinstalling Fedora completely he doesn't have this issue anymore. But not sure can we consider this is as fix or not.

Comment 32 Ben Cotton 2020-11-03 17:17:27 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 33 Ben Cotton 2020-11-24 19:21:06 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.