Hide Forgot
Description of problem: I'm not sure if this is a kernel bug with EXT4 or a systemd related issue. I have many F26, F27 and Fedora rawhide virtual machines. With the latest linux kernel (starting from 4.13 on F26) I see corrupted systemd journal files after every reboot. This is reproducible. I have also deleted all in /var/log/journal/* and restarted systemd-journald to have a clean newly created journal directory. But also with a newly created journal directory you will get corrupted journal files after a reboot. I'm also able to reproduce this issue on a fedora rawhide virtual machine. Version-Release number of selected component (if applicable): kernel-4.14.0-0.rc7.git0.1.fc28.x86_64 systemd-235-3.fc28.x86_64 How reproducible: Steps to Reproduce: 1. Install fedora rawhide and use ext4 as root filesystem 2. Create a new clean journal log directory rm -rf /var/log/journal/* 3. systemctl restart systemd-journald 4. Check journal files: journalctl --verify PASS: /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal There is only 1 file. all okay. 5) Check inode number 4325488 /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal system.journal has inode number 4325488 ------------------------------------ Until here all works as expected ------------------------------------ 6. reboot 7. Check dmesg. You will get two error messages related to inode number 4325488 (/var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal) [ 3.889532] EXT4-fs (dm-0): Delayed block allocation failed for inode 4325488 at logical offset 649 with max blocks 3 with error 121 [ 3.889571] EXT4-fs (dm-0): This should not happen!! Data will be lost [ 6.010659] systemd-journald[600]: /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal: Journal file corrupted, rotating. 8. Verify journal files and check inode number journalctl --verify PASS: /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal 2893b0: Invalid object File corruption detected at /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system:2893b0 (of 8388608 bytes, 31%). FAIL: /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system (Bad message) ls -i system 4325488 system Actual results: [ 3.889532] EXT4-fs (dm-0): Delayed block allocation failed for inode 4325488 at logical offset 649 with max blocks 3 with error 121 [ 3.889571] EXT4-fs (dm-0): This should not happen!! Data will be lost [ 6.010659] systemd-journald[600]: /var/log/journal/9b68b2f4000f44fba8d932c5df8dde87/system.journal: Journal file corrupted, rotating. Expected results: no corrupted journal files after reboot
(In reply to Morten Stevens from comment #0) > Actual results: > [ 3.889532] EXT4-fs (dm-0): Delayed block allocation failed for inode > 4325488 at logical offset 649 with max blocks 3 with error 121 > [ 3.889571] EXT4-fs (dm-0): This should not happen!! Data will be lost That's a kernel problem.
i have the same problm with fedora 27, not only the Journal files were corrupted after rebooting, at some points system/databse files got corrupted too so the system crashed. i replaced the SCSI-Controller to LSI Logic Parallel. (i used Paravirtual before). that solve my problem and the systems are stable now. the Problem still exist when using (Paravirtual or LSI Logic SAS)
This is also possible... But with the latest 4.14 and 4.15 I'm not able to reproduce it. Maybe, it's already fixed upstream? @Rami: Are you able to reproduce it after updating to the latest 4.14.18 or 4.15.2 kernel?
@Morten Stevens: Yes i'm still able to reproduce. i didn't check the 4.15.2 only 4.14.18 Kernel 4.14.18-300.fc27.x86_64 rm -rf /var/log/journal/* systemctl restart systemd-journald journalctl --verify PASS: /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system.journal ll -i /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system.journal 531930 -rw-r-----+ 1 root systemd-journal 8388608 Feb 14 12:00 /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system.journal reboot dmesg [Wed Feb 14 12:04:07 2018] EXT4-fs (dm-0): Delayed block allocation failed for inode 531930 at logical offset 291 with max blocks 2 with error 121 [Wed Feb 14 12:04:07 2018] EXT4-fs (dm-0): This should not happen!! Data will be lost [Wed Feb 14 12:04:16 2018] systemd-journald[536]: /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system.journal: Journal file corrupted, rotating. journalctl --verify PASS: /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system.journal 123868: Invalid object File corruption detected at /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system:123868 (of 8388608 bytes, 14%). FAIL: /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system (Bad message) ll -i /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system 531930 -rw-r-----+ 1 root systemd-journal 8388608 Feb 14 12:04 /var/log/journal/35fd8595287b4e07b59244474a51a1c4/system
@Rami Very interesting... I'm not able to reproduce it with 4.14.18 or 4.15.x. Could try to reproduce it with the latest F27 4.15 kernel? And if you don't need the journal files try to delete the corrupted journal files before updating to 4.15 with rm -rf /var/log/journal/35fd8595287b4e07b59244474a51a1c4/
I, too, have this problem. See Bug 1560149. Also look at Bug 1533620 for xfs installation.
@Guido, your issue is different, the xfs issue is different as well. Those bugs are different issues, related to log replay after a "clean" reboot. @Rami, can you attach an entire dmesg when this happens? And what is the state of the filesystem holding the systemd journal, how full is it?
Sorry, same question for Morten - entire dmesg when this happens, and state of the filesystem at the time ...
(In reply to Eric Sandeen from comment #8) > Sorry, same question for Morten - entire dmesg when this happens, and state > of the filesystem at the time ... Hello Eric, I'm not sure if this was a kernel or systemd related issue. But I'm not able to reproduce it with the linux 4.15+ and the latest systemd update (systemd-234-10.git5f8984e.fc27) for F27. Changelog for systemd-234-10.git5f8984e.fc27 - various fixes for journalctl leaking file descriptors on very quick file rotation (upstream issues #7998, #8198) Maybe, this fixed it?
AFAICT systemd(?) probably provoked the issue by not cleanly unmounting root. The kernel is behaving as expected in response. If you have time to do a little A/B testing w/ the prior version of systemd to confirm that it was in fact the systemd update that fixed it, it'd be helpful. Otherwise I guess I'll just close this for now as CURRENTRELEASE. If anyone still sees this with at least the versions Morten noted in comment #9 please re-open with further details. Thanks, -Eric