Bug 1301837 - can't view journal following recovery of fs being out of space
Summary: can't view journal following recovery of fs being out of space
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-26 06:28 UTC by Chris Murphy
Modified: 2016-12-20 18:12 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-20 18:12:39 UTC
Type: Bug


Attachments (Terms of Use)
strace journalctl fail (14.40 KB, text/plain)
2016-01-26 06:28 UTC, Chris Murphy
no flags Details
dmesg (107.99 KB, text/plain)
2016-01-26 06:28 UTC, Chris Murphy
no flags Details
ls -l /var/log/journal/<machineid>/ (656 bytes, text/plain)
2016-01-26 06:32 UTC, Chris Murphy
no flags Details

Description Chris Murphy 2016-01-26 06:28:12 UTC
Description of problem:

File system becomes 100% full, and systemd-journald complaints of journal corruption, and no space left on device. The fs itself has not complained (no kernel messages from fs driver), no other system functions have been compromised or complain. ~33GiB of extraneous files were immediately deleted freeing up plenty of space, but still journalctl results in I/O error.


Version-Release number of selected component (if applicable):


How reproducible:
Happens anytime the fs becomes nearly full and recovers.

Steps to Reproduce:
1. Fill up the file system to 100%, wait for journal corruption warning.
2. journalctl (fails)
3. Free up some space
4. journalctl (still fails)

Actual results:

journalctl fails with message:
Error was encountered while opening journal files: Input/output error


Expected results:

It should recover more gracefully.

Additional info:

dmesg includes:
[28902.287772] systemd-journald[558]: /var/log/journal/358f3fdc5df34832b44a6816f3b04881/system.journal: IO error, rotating.
[28902.437256] systemd-journald[558]: Failed to create new system journal: No space left on device
[28902.607478] systemd-journald[558]: Failed to create new user journal: No space left on device
[28987.503675] virbr0: port 2(vnet0) entered disabled state
[28987.505360] device vnet0 left promiscuous mode
[28987.505406] virbr0: port 2(vnet0) entered disabled state
[28987.581214] systemd-journald[558]: /var/log/journal/358f3fdc5df34832b44a6816f3b04881/user-1000.journal: Journal file corrupted, rotating.

Seems like it only tries once to create new journal files after no space left on device? There's no message of any additional attempts.


[root@f23m ~]# stat -f /
  File: "/"
    ID: f8f841f1d1529a58 Namelen: 255     Type: btrfs
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 14680064   Free: 8776780    Available: 8631880
Inodes: Total: 0          Free: 0

[root@f23m ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        3.9G     0  3.9G   0% /dev
tmpfs           3.9G  552K  3.9G   1% /dev/shm
tmpfs           3.9G  1.8M  3.9G   1% /run
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sda7        56G   23G   33G  41% /
tmpfs           3.9G  776K  3.9G   1% /tmp
/dev/sda7        56G   23G   33G  41% /boot
/dev/sda4       200M   14M  187M   7% /boot/efi
/dev/sda7        56G   23G   33G  41% /home
tmpfs           790M  8.0K  790M   1% /run/user/42
tmpfs           790M   44K  790M   1% /run/user/1000
/dev/loop0      5.0G  2.3G  2.9G  44% /mnt/btr2
/dev/sda7        56G   23G   33G  41% /mnt/btr

[root@f23m ~]# btrfs fi usage /
Overall:
    Device size:		  56.00GiB
    Device allocated:		  53.97GiB
    Device unallocated:		   2.03GiB
    Device missing:		     0.00B
    Used:			  22.39GiB
    Free (estimated):		  32.93GiB	(min: 32.93GiB)
    Data ratio:			      1.00
    Metadata ratio:		      1.00
    Global reserve:		 128.00MiB	(used: 0.00B)

Data,single: Size:52.94GiB, Used:22.04GiB
   /dev/sda7	  52.94GiB

Metadata,single: Size:1.00GiB, Used:363.00MiB
   /dev/sda7	   1.00GiB

System,single: Size:32.00MiB, Used:16.00KiB
   /dev/sda7	  32.00MiB

Unallocated:
   /dev/sda7	   2.03GiB

Comment 1 Chris Murphy 2016-01-26 06:28:43 UTC
Created attachment 1118358 [details]
strace journalctl fail

Comment 2 Chris Murphy 2016-01-26 06:28:58 UTC
Created attachment 1118359 [details]
dmesg

Comment 3 Chris Murphy 2016-01-26 06:32:05 UTC
Created attachment 1118360 [details]
ls -l /var/log/journal/<machineid>/

Interesting, the system.journal fallocate failed, and is 0 bytes. But systemd-journald doesn't try to rotate it out and create a new one so it's stuck.

Comment 4 Chris Murphy 2016-01-26 06:32:37 UTC
Forgot this
systemd-222-12.fc23.x86_64

Comment 5 Chris Murphy 2016-01-26 06:37:43 UTC
'systemctl restart systemd-journald' appears to fix this. 

So yeah, maybe a minor bug with easy work around, but might be nice if it failed more gracefully and tried to recreate the system and user journals? Some compromise time wise maybe to avoid race, maybe once every 2 minutes or 5 minutes?

Comment 6 Zbigniew Jędrzejewski-Szmek 2016-01-26 13:31:42 UTC
Looks like an bug in the error handling of fallocate.

Comment 7 Edgar Hoch 2016-03-30 19:51:13 UTC
I think my report is about the same bug, so I report my description and solution here.


I found that restarting systemd-journald is not sufficient. It does not remove emtpy *.journal files.

I think this is the same bug as described here, so I report it here.


Description of problem:

journalclt fails with an error message when there are empty files that match
/var/log/journal/*/*.journal

After filesystem /var was out of space (because another log file, or abrtd), then journald has created one or more files in /var/log/journal/SOME_RANDOM_STRING/ that are empty, then journalctl fails with the message

Error was encountered while opening journal files: Input/output error


Solution:

- journalclt should ignore empty journal files in the journal directory.
  It may report that here are empty files.

- These empty files should be deleted automatically.
  It can be done while journald is (re)started, but also when it is running.
  It can be done by journalctl if running as root, automatically or triggerded by an option.


Version-Release number of selected component (if applicable):
systemd-222-14.fc23.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Run "journalctl -e"
2. Run "touch /var/log/journal/*/nullfile.journal"
3. Run "journalctl -e"
4. Run "systemctl restart systemd-journald"
5. Run "journalctl -e"
6. Run "/bin/rm /var/log/journal/*/nullfile.journal"
7. Run "journalctl -e"

Actual results:
Step 1: journalctl working normal, printing journal entries.
Step 3: journalctl fails with error message as noted above.
Step 5: Restart of journald doesn't help, journalctl still fails with error message as noted above.
Step 7: journalctl working normal, printing journal entries.


Expected results:
journalctl never fails.
In step 3 it should print a note (on std error) that the file /var/log/journal/*/nullfile.journal exists but is empty.
This empty file should be removed automatically in step 4.

Comment 8 Fedora End Of Life 2016-11-24 15:12:01 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 9 Fedora End Of Life 2016-12-20 18:12:39 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.