Bug 1508984

Summary:	Journal files corrupted after reboot on ext4 file systems
Product:	[Fedora] Fedora	Reporter:	Morten Stevens <ms>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rawhide	CC:	airlied, bskeggs, esandeen, ewk, hdegoede, herrold, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, jsynacek, kernel-maint, linville, lnykryn, mchehab, mjg59, msekleta, ms, rami_adrees, rds1944, ssahani, s, steved, systemd-maint, zbyszek
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-23 20:02:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Morten Stevens 2017-11-02 15:35:37 UTC

Description of problem:

I'm not sure if this is a kernel bug with EXT4 or a systemd related issue. I have many F26, F27 and Fedora rawhide virtual machines. With the latest linux kernel (starting from 4.13 on F26) I see corrupted systemd journal files after every reboot. This is reproducible. I have also deleted all in /var/log/journal/* and restarted systemd-journald to have a clean newly created journal directory. But also with a newly created journal directory you will get corrupted journal files after a reboot. I'm also able to reproduce this issue on a fedora rawhide virtual machine.

Version-Release number of selected component (if applicable):

kernel-4.14.0-0.rc7.git0.1.fc28.x86_64
systemd-235-3.fc28.x86_64

How reproducible:

Steps to Reproduce:
1. Install fedora rawhide and use ext4 as root filesystem
2. Create a new clean journal log directory rm -rf /var/log/journal/*
3. systemctl restart systemd-journald
4. Check journal files:
journalctl --verify
PASS: /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal

There is only 1 file. all okay.

5) Check inode number

4325488 /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal

system.journal has inode number 4325488

------------------------------------
Until here all works as expected
------------------------------------

6. reboot

7. Check dmesg. You will get two error messages related to inode number 4325488 (/var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal)

[ 3.889532] EXT4-fs (dm-0): Delayed block allocation failed for inode 4325488 at logical offset 649 with max blocks 3 with error 121
[ 3.889571] EXT4-fs (dm-0): This should not happen!! Data will be lost

[ 6.010659] systemd-journald[600]: /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal: Journal file corrupted, rotating.

8. Verify journal files and check inode number

journalctl --verify
PASS: /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal
2893b0: Invalid object
File corruption detected at /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system:2893b0 (of 8388608 bytes, 31%).
FAIL: /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system (Bad message)

ls -i system
4325488 system

Actual results:

[ 6.010659] systemd-journald[600]: /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal: Journal file corrupted, rotating.

Expected results:

no corrupted journal files after reboot

Comment 1 Jan Synacek 2017-11-07 07:41:18 UTC

(In reply to Morten Stevens from comment #0)
> Actual results:
> [    3.889532] EXT4-fs (dm-0): Delayed block allocation failed for inode
> 4325488 at logical offset 649 with max blocks 3 with error 121
> [    3.889571] EXT4-fs (dm-0): This should not happen!! Data will be lost

That's a kernel problem.

Comment 2 Rami 2018-01-23 11:57:53 UTC

i have the same problm with fedora 27, not only the Journal files were corrupted after rebooting, at some points system/databse files got corrupted too so the system crashed.

i replaced the SCSI-Controller to LSI Logic Parallel. (i used Paravirtual before). that solve my problem and the systems are stable now.

the Problem still exist when using (Paravirtual or LSI Logic SAS)

Comment 3 Morten Stevens 2018-02-12 16:01:14 UTC

This is also possible... But with the latest 4.14 and 4.15 I'm not able to reproduce it. Maybe, it's already fixed upstream?

@Rami: Are you able to reproduce it after updating to the latest 4.14.18 or 4.15.2 kernel?

Comment 4 Rami 2018-02-14 12:18:14 UTC

@Morten Stevens: Yes i'm still able to reproduce. i didn't check the 4.15.2 only 4.14.18

Kernel 4.14.18-300.fc27.x86_64

rm -rf /var/log/journal/*
systemctl restart systemd-journald
journalctl --verify
PASS: /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system.journal

ll -i /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system.journal
531930 -rw-r-----+ 1 root systemd-journal 8388608 Feb 14 12:00 /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system.journal

reboot

dmesg

[Wed Feb 14 12:04:07 2018] EXT4-fs (dm-0): Delayed block allocation failed for inode 531930 at logical offset 291 with max blocks 2 with error 121
[Wed Feb 14 12:04:07 2018] EXT4-fs (dm-0): This should not happen!! Data will be lost

[Wed Feb 14 12:04:16 2018] systemd-journald[536]: /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system.journal: Journal file corrupted, rotating.

journalctl --verify
PASS: /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system.journal
123868: Invalid object
File corruption detected at /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system:123868 (of 8388608 bytes, 14%).
FAIL: /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system (Bad message)

ll -i /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system
531930 -rw-r-----+ 1 root systemd-journal 8388608 Feb 14 12:04 /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system

Comment 5 Morten Stevens 2018-02-20 12:59:35 UTC

@Rami

Very interesting... I'm not able to reproduce it with 4.14.18 or 4.15.x.

Could try to reproduce it with the latest F27 4.15 kernel? And if you don't need  the journal files try to delete the corrupted journal files before updating to 4.15 with rm -rf /var/log/journal/35fd8595287b4e07b59244474a51a1c4/

Comment 6 Guido 2018-04-01 05:40:50 UTC

I, too, have this problem. See Bug 1560149. Also look at Bug 1533620 for xfs installation.

Comment 7 Eric Sandeen 2018-04-01 16:19:21 UTC

@Guido, your issue is different, the xfs issue is different as well.  Those bugs are different issues, related to log replay after a "clean" reboot.

@Rami, can you attach an entire dmesg when this happens?  And what is the state of the filesystem holding the systemd journal, how full is it?

Comment 8 Eric Sandeen 2018-04-01 16:36:02 UTC

Sorry, same question for Morten - entire dmesg when this happens, and state of the filesystem at the time ...

Comment 9 Morten Stevens 2018-04-23 11:09:18 UTC

(In reply to Eric Sandeen from comment #8)
> Sorry, same question for Morten - entire dmesg when this happens, and state
> of the filesystem at the time ...

Hello Eric,

I'm not sure if this was a kernel or systemd related issue. But I'm not able to reproduce it with the linux 4.15+ and the latest systemd update (systemd-234-10.git5f8984e.fc27) for F27.

Changelog for systemd-234-10.git5f8984e.fc27

- various fixes for journalctl leaking file descriptors on very quick file rotation (upstream issues #7998, #8198)

Maybe, this fixed it?

Comment 10 Eric Sandeen 2018-04-23 20:02:41 UTC

AFAICT systemd(?) probably provoked the issue by not cleanly unmounting root.  The kernel is behaving as expected in response.

If you have time to do a little A/B testing w/ the prior version of systemd to confirm that it was in fact the systemd update that fixed it, it'd be helpful.  Otherwise I guess I'll just close this for now as CURRENTRELEASE.

If anyone still sees this with at least the versions Morten noted in comment #9 please re-open with further details.

Thanks,
-Eric