Bug 2188811

Summary:	limits on koji builds of chromium
Product:	[Fedora] Fedora	Reporter:	Than Ngo <than>
Component:	systemd	Assignee:	systemd-maint
Status:	CLOSED CURRENTRELEASE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	38	CC:	fedoraproject, filbranden, kevin, lnykryn, msekleta, ryncsn, systemd-maint, yuwatana, zbyszek
Target Milestone:	---
Target Release:	---
Hardware:	aarch64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-08-09 12:17:25 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Than Ngo 2023-04-22 13:38:19 UTC

We are running into some anoying limits on koji builds of chromium.

First, since a long time ago, the koji.service file we are using has:

TasksMax=infinity

But yet, chromium was failing, seemingly hitting a task limit.
"ninja: fatal: posix_spawn: Resource temporarily unavailable"
in the build and:
"kernel: cgroup: fork rejected by pids controller in
/machine.slice/machine-7d12b2e6dcfb4230b04d2c2c0b499171.scope/payload"
on the builder.

Investigation and some help from folks in the #devel room
(many thanks glb!)
Showed that the systemd-nspawn container mock started has:

systemctl show systemd-nspawn@0b3f01a2a8e345a389b30c477812c471
TasksMax=16384

So, I put in place a:
/etc/systemd/system/systemd-nspawn@.service.d/override.conf
with:

[Service]
TasksMax=infinity

and that seemed to be used for the mock systemd-nspawn containers.

However, the builds with lots of cpus is now failing later with:

Error: spawn /usr/bin/node-18 EAGAIN
     at Process.ChildProcess._handle.onexit
(node:internal/child_process:283:19)
     at onErrorNT (node:internal/child_process:476:16)
     at processTicksAndRejections (node:internal/process/task_queues:82:21)
[!] Error: unfinished hook action(s) on exit:

Is there yet another layer here that has another limit?

Is there anything here I can set that says "infinity all the way down' ?

Assistance welcome. I can file a systemd bug, but I am not sure
this is a bug more than a lack of documentation. 

Reproducible: Always

Steps to Reproduce:
build chromium with -j${RPM_BUILD_NCPUS} on aarch64 (fedora or epel)
Actual Results:  
the builds with lots of cpus is now failing later with:

Error: spawn /usr/bin/node-18 EAGAIN
     at Process.ChildProcess._handle.onexit
(node:internal/child_process:283:19)
     at onErrorNT (node:internal/child_process:476:16)
     at processTicksAndRejections (node:internal/process/task_queues:82:21)
[!] Error: unfinished hook action(s) on exit:

Expected Results:  
chromium should build fine

This issue appear only on aarch64, It works fine for x86_64

Comment 1 Zbigniew Jędrzejewski-Szmek 2023-05-21 13:00:15 UTC

> First, since a long time ago, the koji.service file we are using has:
> TasksMax=infinity
> [but] systemctl show systemd-nspawn@0b3f01a2a8e345a389b30c477812c471
> TasksMax=16384
> Is there yet another layer here that has another limit?

The limit is hierarchical, and the limit can be set at any level. To check,
start some process in mock, and recursively check it's cgroup limits:
  mock --shell 'sleep 666'
in another window
  pid=$(ps aux|grep 'sleep [6]66' | awk '{print $2}'); echo $pid
  cgroup=$(cat /proc/$pid/cgroup | grep '^0:' | cut -d: -f3); echo $cgroup
  while grep -H . /sys/fs/cgroup/$cgroup/pids.max 2>/dev/null; do cgroup=$(dirname $cgroup);done

The limit for the individual machine-*.scope is completely indepedent of
the limit for kojid.service. (It can be overriden, e.g. by creating a drop-in:
  # /etc/systemd/system/machine-.scope.d/80-tasks.conf
  [Scope]
  TasksMax=unlimited
)

> TasksMax=16384
That is strange. Why would the build system have the number of threads+processes
that is so high? Maybe it's leaking threads?

Anyway, I don't think there's a bug in systemd. It seems to be some missing configuration
in kojid and the way it starts nspawn, and/or some gross inefficiency in how the chromium
build works.

After looking into this, I see that systemd-nspawn doesn't have a config option for
TasksMax. If there is interest, we could add a command-line switch and a config option to
make this configurable, instead of the drop-in I suggested above. koji and/or mock could
even make use of it by default.

Comment 2 Kevin Fenzi 2023-05-26 16:21:20 UTC

(In reply to Zbigniew Jędrzejewski-Szmek from comment #1)
>
> The limit is hierarchical, and the limit can be set at any level. To check,
> start some process in mock, and recursively check it's cgroup limits:
>   mock --shell 'sleep 666'
> in another window
>   pid=$(ps aux|grep 'sleep [6]66' | awk '{print $2}'); echo $pid
>   cgroup=$(cat /proc/$pid/cgroup | grep '^0:' | cut -d: -f3); echo $cgroup
>   while grep -H . /sys/fs/cgroup/$cgroup/pids.max 2>/dev/null; do
> cgroup=$(dirname $cgroup);done

# pid=$(ps aux|grep 'sleep [6]66' | awk '{print $2}'); echo
$pid
2290608
# cgroup=$(cat /proc/$pid/cgroup | grep '^0:' | cut -d: -f3)
; echo $cgroup                                                                               
/user.slice/user-0.slice/session-275.scope
#   while grep -H . /sys/fs/cgroup/$cgroup/pids.max 2>/dev/null; do cgroup=$(dirname $cgroup);done
/sys/fs/cgroup//user.slice/user-0.slice/session-275.scope/pids.max:max
/sys/fs/cgroup//user.slice/user-0.slice/pids.max:678967
/sys/fs/cgroup//user.slice/pids.max:max
 
> The limit for the individual machine-*.scope is completely indepedent of
> the limit for kojid.service. (It can be overriden, e.g. by creating a
> drop-in:
>   # /etc/systemd/system/machine-.scope.d/80-tasks.conf
>   [Scope]
>   TasksMax=unlimited
> )

ok. I have put that in place on the 'heavybuilder' aarch64 builders. Can you try some builds with higher concurrency and see if it helps any?

> > TasksMax=16384
> That is strange. Why would the build system have the number of
> threads+processes
> that is so high? Maybe it's leaking threads?

Some builds (like chromium) are crazy. ;) 
 
> Anyway, I don't think there's a bug in systemd. It seems to be some missing
> configuration
> in kojid and the way it starts nspawn, and/or some gross inefficiency in how
> the chromium
> build works.

Or all the above. :)
 
> After looking into this, I see that systemd-nspawn doesn't have a config
> option for
> TasksMax. If there is interest, we could add a command-line switch and a
> config option to
> make this configurable, instead of the drop-in I suggested above. koji
> and/or mock could
> even make use of it by default.

Sounds good, lets see if that drop-in works...

Comment 3 Than Ngo 2023-05-26 16:28:27 UTC

Hi Kenvi,

should i try to build chromium now?

Comment 4 Kevin Fenzi 2023-05-26 16:35:07 UTC

Yes please. Try some higher concurrency builds and see if they work better now?

Comment 5 Than Ngo 2023-05-26 18:55:15 UTC

Hi Kevin,

it still breaks with errors: 

[headless_shell:36851/39818] ACTION //third_party/devtools-frontend/src/front_end/models/formatter:devtools_entrypoint-legacy-bundle(//build/toolchain/linux/unbundle:default)
FAILED: gen/third_party/devtools-frontend/src/front_end/models/formatter/formatter-legacy.js 
/usr/bin/python3 ../../third_party/node/node.py ../../third_party/devtools-frontend/src/node_modules/rollup/dist/bin/rollup --silent --config ../../third_party/devtools-frontend/src/scripts/build/rollup.config.js --input gen/third_party/devtools-frontend/src/front_end/models/formatter/formatter-legacy.prebundle.js --file gen/third_party/devtools-frontend/src/front_end/models/formatter/formatter-legacy.js --configDCHECK
Traceback (most recent call last):
  File "/builddir/build/BUILD/chromium-114.0.5735.45/out/Headless/../../third_party/node/node.py", line 39, in <module>
    RunNode(sys.argv[1:])
  File "/builddir/build/BUILD/chromium-114.0.5735.45/out/Headless/../../third_party/node/node.py", line 34, in RunNode
    raise RuntimeError('Command \'%s\' failed\n%s' % (' '.join(cmd), err))
RuntimeError: Command '/builddir/build/BUILD/chromium-114.0.5735.45/out/Headless/../../third_party/node/linux/node-linux-x64/bin/node ../../third_party/devtools-frontend/src/node_modules/rollup/dist/bin/rollup --silent --config ../../third_party/devtools-frontend/src/scripts/build/rollup.config.js --input gen/third_party/devtools-frontend/src/front_end/models/formatter/formatter-legacy.prebundle.js --file gen/third_party/devtools-frontend/src/front_end/models/formatter/formatter-legacy.js --configDCHECK' failed
Error: spawn /usr/bin/node-20 EAGAIN
    at Process.ChildProcess._handle.onexit (node:internal/child_process:285:19)
    at onErrorNT (node:internal/child_process:483:16)
    at processTicksAndRejections (node:internal/process/task_queues:82:21)

For more infos: 
  https://kojipkgs.fedoraproject.org//work/tasks/6098/101556098/build.log
  https://kojipkgs.fedoraproject.org//work/tasks/6098/101556098/hw_info.log

Comment 6 Kevin Fenzi 2023-05-26 19:12:09 UTC

[Fri May 26 18:13:44 2023] cgroup: fork rejected by pids controller in /machine.slice/machine-165afb17c920416bbd1f7ca2d7360ec6.scope/payload

# cat /etc/systemd/system/machine-.scope.d/80-tasks.conf 
[Scope]
TasksMax=unlimited

Comment 7 Zbigniew Jędrzejewski-Szmek 2023-07-17 13:57:18 UTC

The script I pasted above was not reliable and could get the wrong pid.
Mock containers are under machine.slice, but the comment#2 is for user.slice.
Please run the 'while … grep' loop for some process in the mock container.

On my machine I get:
$ pid=$(ps aux|grep '0 sleep [6]66' | awk '{print $2}'); echo $pid
1885224
$ cgroup=$(cat /proc/$pid/cgroup | grep '^0:' | cut -d: -f3); echo $cgroup
/machine.slice/machine-27ea351882a84842a221381ca2ebdecf.scope/payload
$ while grep -H . /sys/fs/cgroup/$cgroup/pids.max; do cgroup=$(dirname $cgroup);done
/sys/fs/cgroup//machine.slice/machine-27ea351882a84842a221381ca2ebdecf.scope/payload/pids.max:max
/sys/fs/cgroup//machine.slice/machine-27ea351882a84842a221381ca2ebdecf.scope/pids.max:max
/sys/fs/cgroup//machine.slice/pids.max:max
grep: /sys/fs/cgroup///pids.max: No such file or directory

Please note that the limit is hierarchical, i.e. if there's 'max' for
machine.slice/machine-27ea351882a84842a221381ca2ebdecf.scope, but e.g.
'16384' for machine.slice, then '16384' is the effective limit.
It's not enough to just check one level.

First we need to figure out if there's really a limit in place on the mock
machines. If not, then it must be something else.

Comment 8 Than Ngo 2023-07-17 18:05:25 UTC

(In reply to Zbigniew Jędrzejewski-Szmek from comment #7)
> The script I pasted above was not reliable and could get the wrong pid.
> Mock containers are under machine.slice, but the comment#2 is for user.slice.
> Please run the 'while … grep' loop for some process in the mock container.
> 
> On my machine I get:
> $ pid=$(ps aux|grep '0 sleep [6]66' | awk '{print $2}'); echo $pid
> 1885224
> $ cgroup=$(cat /proc/$pid/cgroup | grep '^0:' | cut -d: -f3); echo $cgroup
> /machine.slice/machine-27ea351882a84842a221381ca2ebdecf.scope/payload
> $ while grep -H . /sys/fs/cgroup/$cgroup/pids.max; do cgroup=$(dirname
> $cgroup);done
> /sys/fs/cgroup//machine.slice/machine-27ea351882a84842a221381ca2ebdecf.scope/
> payload/pids.max:max
> /sys/fs/cgroup//machine.slice/machine-27ea351882a84842a221381ca2ebdecf.scope/
> pids.max:max
> /sys/fs/cgroup//machine.slice/pids.max:max
> grep: /sys/fs/cgroup///pids.max: No such file or directory
> 
> Please note that the limit is hierarchical, i.e. if there's 'max' for
> machine.slice/machine-27ea351882a84842a221381ca2ebdecf.scope, but e.g.
> '16384' for machine.slice, then '16384' is the effective limit.
> It's not enough to just check one level.
> 
> First we need to figure out if there's really a limit in place on the mock
> machines. If not, then it must be something else.

Hi Kevin,

could you please check?

Thanks!

Comment 9 Kevin Fenzi 2023-07-27 21:24:11 UTC

[root@buildhw-a64-19 ~][PROD-IAD2]# pid=$(ps aux|grep '0 sleep [6]66' | awk '{print $2}'); echo $pid
1267608
[root@buildhw-a64-19 ~][PROD-IAD2]# cgroup=$(cat /proc/$pid/cgroup | grep '^0:' | cut -d: -f3); echo $cgroup
/machine.slice/machine-7dd27bc731de4e81ab52b582b14b3ac8.scope/payload
[root@buildhw-a64-19 ~][PROD-IAD2]# while grep -H . /sys/fs/cgroup/$cgroup/pids.max; do cgroup=$(dirname $cgroup);done
/sys/fs/cgroup//machine.slice/machine-7dd27bc731de4e81ab52b582b14b3ac8.scope/payload/pids.max:max
/sys/fs/cgroup//machine.slice/machine-7dd27bc731de4e81ab52b582b14b3ac8.scope/pids.max:16384
/sys/fs/cgroup//machine.slice/pids.max:max
grep: /sys/fs/cgroup///pids.max: No such file or directory

Comment 10 Zbigniew Jędrzejewski-Szmek 2023-07-31 12:45:48 UTC

Mea culpa, mea maxima culpa.

Please s/unlimited/infinity/ in /etc/systemd/system/machine-.scope.d/80-tasks.conf.

Comment 11 Than Ngo 2023-07-31 12:55:54 UTC

(In reply to Zbigniew Jędrzejewski-Szmek from comment #10)
> Mea culpa, mea maxima culpa.
> 
> Please s/unlimited/infinity/ in
> /etc/systemd/system/machine-.scope.d/80-tasks.conf.

Hi Kevin,

could you apply the above change so i can give a try?

Thank you!

Comment 12 Kevin Fenzi 2023-08-08 23:09:45 UTC

Sorry for the delay, I was at flock and then burried. ;( 

Done now!

Comment 13 Than Ngo 2023-08-09 12:15:55 UTC

Hi,

with the above change from comment 10 I successfully built chromium on aarch64 with all cpus.

   https://koji.fedoraproject.org/koji/taskinfo?taskID=104577778

Thank you so much!!!

Comment 14 Zbigniew Jędrzejewski-Szmek 2023-08-09 12:17:25 UTC

Great, let's close this then.

Comment 15 Kevin Fenzi 2023-08-09 15:01:32 UTC

Awesome. Thanks!