1723106 – Cronie runs at 100% CPU, when running in container

Bug 1723106 - Cronie runs at 100% CPU, when running in container

Summary: Cronie runs at 100% CPU, when running in container

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	cronie
Sub Component:
Version:	30
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Marcel Plch
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-06-22 21:35 UTC by sedrubal
Modified:	2020-05-26 18:22 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-05-26 18:22:42 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description sedrubal 2019-06-22 21:35:21 UTC

Description of problem:
When running cronie inside a docker container, it uses 100% of CPU time on all threads.

Version-Release number of selected component (if applicable):
cronie 1.5.4

How reproducible:
Always.

Steps to Reproduce:
1. docker run --rm -it fedora:30 bash
2. dnf -y install cronie
3. (rm -rf /etc/cron*)
4. echo '* * * * * touch /tmp/test' > /root/testcron
5. crontab /root/testcron
6. crontab -l
7. crond (-x ext -n)
8. Wait until the test job runs
9. Outside the container open htop.

Actual results:
You will see a crond instance running at 100%. It even continues running even if you stop crond inside the container, even if you use -n to prevent forking.

Expected results:
The job should be executed and crond should idle for one minute.

Additional info:
It looks like, that also /tmp/test won't be created. I'm not sure, if it is related to this problem or if there is another problem.

Comment 1 Tomas Mraz 2019-06-24 06:42:49 UTC

Could you please strace the process to find out where it is stuck? Or if strace does not reveal anything interesting, try to attach to it with gdb and produce a backtrace?

Comment 2 sedrubal 2019-06-25 00:00:28 UTC

Yes, of course. crond seems to fork and one process is sleeping at `nanosleep` and the other one is consuming the CPU.

According to strace it loops on:

```
close(142431952)                        = -1 EBADF (Bad file descriptor)
```

while the file descriptor increases by for each iteration.

I tried to attach to that process using GDB. The backtrace is:

```
#0  0x00007f9865eb7498 in __GI___close (fd=fd@entry=268692968) at ../sysdeps/unix/sysv/linux/close.c:27
#1  0x000055c8667189bc in child_process (e=e@entry=0x55c866d34fb0, jobenv=0x55c866d44070) at do_command.c:245
#2  0x000055c866719278 in do_command (e=0x55c866d34fb0, u=0x55c866d32a70) at do_command.c:74
#3  0x000055c866718261 in job_runqueue () at job.c:100
#4  0x000055c8667158da in main (argc=<optimized out>, argv=<optimized out>) at cron.c:478
```

GDB behaves a bit strange but as far as I understand the code, it tries to close a lot of file descriptors. After some seconds, fd in child_process is at 84590153 while fdmax is at 1073741816.

It may be related to SELinux, but I'm not sure...

Comment 3 Tomas Mraz 2019-06-25 09:33:56 UTC

What ulimit -n prints outside and inside the container?

The fdmax value is clearly bogus, I could add some guards for it but there are also other places which depend on getdtablesize returning something reasonable.

Comment 4 sedrubal 2019-07-08 07:50:36 UTC

- ulimit -n outside the container gives 1024 (on many different systems)
- ulimit -n inside the docker container gives 1073741816 (on many different systems. We did not configure any default ulimits for docker, so this seems to be the standard)
- ulimit -n inside a podman container started as user prints 1024
- ulimit -n inside a podman container started as root prints 1048576

Starting the container with an ulimit of 1024 seems to solve the problem. But I think, some guards would help, because cronie seems to be the only program having trouble with such a high fdmax value...

Comment 5 Tomas Mraz 2019-07-08 13:55:54 UTC

Fixed in upstream git.

Comment 6 Ben Cotton 2020-04-30 20:24:03 UTC

This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 7 Ben Cotton 2020-05-26 18:22:42 UTC

Fedora 30 changed to end-of-life (EOL) status on 2020-05-26. Fedora 30 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.