Bug 1112556

Summary: systemctl segfaults when there is too many journal files (more than 240)
Product: [Fedora] Fedora Reporter: Michael S. <misc>
Component: systemdAssignee: systemd-maint
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: johannbg, lnykryn, misc, msekleta, s, systemd-maint, vpavlin, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-06-29 21:17:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael S. 2014-06-24 08:09:59 UTC
Description of problem:
[root@liliana ovirt_puppet]# LC_ALL=C systemctl status   NetworkManager
NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled)
   Active: active (running) since Tue 2014-06-24 09:26:57 CEST; 32min ago
 Main PID: 20474 (NetworkManager)
   CGroup: /system.slice/NetworkManager.service
           |-20474 /usr/sbin/NetworkManager --no-daemon
           |-20578 /sbin/dhclient -d -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-wlan0.pid -lf /var/lib/NetworkManager/dhclient-5a87d2ee-4a84-4dab...
           |-21138 /usr/libexec/nm-openvpn-service
           |-21157 /usr/sbin/openvpn --remote ovpn-ams2.redhat.com --nobind --dev tun --proto tcp-client --port 443 --cipher AES-256-CBC --auth-nocache --re...
           `-21170 /usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts --bind-interfaces --pid-file=/var/run/NetworkManager/dnsmasq.pid --listen-a...

Jun 24 09:40:30 liliana.example.org dnsmasq[21170]: using nameserver 2002:52ee:dc52:1::1#53
Jun 24 09:40:30 liliana.example.org dnsmasq[21170]: using nameserver 192.168.76.1#53
Jun 24 09:40:30 liliana.example.org dnsmasq[21170]: cleared cache
Jun 24 09:40:30 liliana.example.org NetworkManager[20474]: <info> VPN plugin state changed: started (4)
Erreur de segmentation

I dig a bit with gdb, and the problem seems to be because it try to close a file that as already closed.
(gdb) bt
#0  0x00007ffff70df148 in md_close (a=0x55555585bee0) at md.c:771
#1  0x00007ffff70e0175 in _gcry_md_close (hd=<optimized out>) at md.c:783
#2  0x00007ffff70c98e5 in gcry_md_close (hd=<optimized out>) at visibility.c:783
#3  0x000055555558c10c in journal_file_close (f=0x55555585bce0) at ../src/journal/journal-file.c:159
#4  0x0000555555588ee8 in sd_journal_close (j=0x5555557af7d0) at ../src/journal/sd-journal.c:1826
#5  0x0000555555585af5 in sd_journal_closep (p=0x7fffffffd2c8) at ../src/journal/journal-internal.h:138
#6  show_journal_by_unit (f=0x7ffff6ea5400 <_IO_2_1_stdout_>, unit=<optimized out>, mode=OUTPUT_SHORT, n_columns=0, not_before=39965955434, how_many=10, 
    uid=0, flags=(OUTPUT_WARN_CUTOFF | OUTPUT_COLOR), system=true, ellipsized=0x7fffffffe207) at ../src/shared/logs-show.c:1160
#7  0x00005555555687c9 in print_status_info (ellipsized=0x7fffffffe207, i=0x7fffffffde00) at ../src/systemctl/systemctl.c:2788
#8  show_one (verb=<optimized out>, bus=<optimized out>, path=<optimized out>, show_properties=<optimized out>, new_line=<optimized out>, 
    ellipsized=0x7fffffffe207) at ../src/systemctl/systemctl.c:3456
#9  0x000055555556a547 in show (bus=0x5555557aca30, args=0x7fffffffe4e0) at ../src/systemctl/systemctl.c:3598
#10 0x000055555555bc65 in systemctl_main (error=0x7fffffffe320, argv=0x7fffffffe4d8, argc=<optimized out>, bus=0x5555557aca30)
    at ../src/systemctl/systemctl.c:5881
#11 main (argc=<optimized out>, argv=0x7fffffffe4d8) at ../src/systemctl/systemctl.c:6136

around frame 3 or 4, we give to sd_journal_close a structure which is half empty. (sorry do not have the detail, it was yesterday).


So wondering why, I started to look around and when I do this :

$ strace -e open systemctl status  systemd-journald

it end with :

open("/var/log/journal/85f9000de070d3b1ae7c548b781da507/system", O_RDONLY|O_CLOEXEC) = 246
open("/var/log/journal/85f9000de070d3b1ae7c548b781da507/system", O_RDONLY|O_CLOEXEC) = 247
open("/var/log/journal/85f9000de070d3b1ae7c548b781da507/system", O_RDONLY|O_CLOEXEC) = 247
open("/var/log/journal/85f9000de070d3b1ae7c548b781da507/system", O_RDONLY|O_CLOEXEC) = 247
open("/var/log/journal/85f9000de070d3b1ae7c548b781da507/system", O_RDONLY|O_CLOEXEC) = 247
open("/var/log/journal/85f9000de070d3b1ae7c548b781da507/system", O_RDONLY|O_CLOEXEC) = 247
open("/var/log/journal/85f9000de070d3b1ae7c548b781da507/system", O_RDONLY|O_CLOEXEC) = 247
open("/var/log/journal/85f9000de070d3b1ae7c548b781da507/system", O_RDONLY|O_CLOEXEC) = 247
open("/var/log/journal/85f9000de070d3b1ae7c548b781da507/system", O_RDONLY|O_CLOEXEC) = 247
open("/var/log/journal/85f9000de070d3b1ae7c548b781da507/system", O_RDONLY|O_CLOEXEC) = 247
open("/var/log/journal/85f9000de070d3b1ae7c548b781da507/system", O_RDONLY|O_CLOEXEC) = 247
open("/var/log/journal/system", O_RDONLY|O_CLOEXEC) = 15
open("/var/log/journal/system", O_RDONLY|O_CLOEXEC) = 15
open("/var/log/journal/system", O_RDONLY|O_CLOEXEC) = 15
open("/var/log/journal/system", O_RDONLY|O_CLOEXEC) = 15
open("/var/log/journal/system", O_RDONLY|O_CLOEXEC) = 15
open("/var/log/journal/system", O_RDONLY|O_CLOEXEC) = 15
open("/var/log/journal/system", O_RDONLY|O_CLOEXEC) = 15
open("/proc/sys/kernel/random/boot_id", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 4
juin 24 00:58:36 liliana.example.org systemd-journal[18469]: Permanent journal is using 4.0G (max allowed 4.0G, trying to leave 4.0G free of 752.1... 4.0G).
juin 24 00:58:36 liliana.example.org systemd-journal[18469]: Journal started
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f6aec90d000} ---
+++ killed by SIGSEGV +++
Erreur de segmentation


it seems to give back the same file descriptor when there is too much file open, around 246, which translate to roughly 240 files on disk. As soon as I remove enough file to be 240, it does work fine. It doesn't matter which file are removed.

For the record, I removed selinux to see.

Version-Release number of selected component (if applicable):
# rpm -q systemd
systemd-208-17.fc20.x86_64


How reproducible:
each time


Steps to Reproduce:
1. have more than 240 files in /var/log/journal/system@*journal
2. try to see any status
3.

Comment 1 Michael S. 2014-06-24 08:12:55 UTC
Just to clarify, everything should have been in /var/log/journal/85f9000de070d3b1ae7c548b781da507 but due to me trying to see the limit of file, I may have forgot the last part of the path, but it doesn't impact much.

Comment 2 Michael S. 2014-06-28 09:33:46 UTC
I just upgraded to rawhide, and the problem also exist.

Comment 3 Lennart Poettering 2015-01-08 02:25:18 UTC
What is is your "ulimit -n" value?

Comment 4 Michael S. 2015-01-19 22:25:10 UTC
$ ulimit -n
1024

Same for root as well.

Comment 5 Fedora End Of Life 2015-05-29 12:12:21 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 6 Fedora End Of Life 2015-06-29 21:17:53 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.