Bug 1105877 - kernel says "warning: checktime reached, running e2fsck is recommended", but e2fsck doesn't actually want to run
Summary: kernel says "warning: checktime reached, running e2fsck is recommended", but ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Carlos Maiolino
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-08 12:59 UTC by Harald Reindl
Modified: 2019-12-13 14:54 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-13 14:54:23 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Harald Reindl 2014-06-08 12:59:20 UTC
"touch /forcefsck" leads in deprecated warnings but in fact at least 
on Fedora 19 *you need it* because the fsck don't happen otherwise

for sure, the last reboot of the machine below complaind too
so why don't it happen at boot ?
________________________________

Jun  8 14:51:31 localhost systemd-fsck[349]: system: sauber, 31614/640848 Dateien, 390558/2559727 Blöcke
Jun  8 14:51:31 localhost kernel: [    4.855832] EXT4-fs (md127): warning: checktime reached, running e2fsck is recommended
Jun  8 14:51:31 localhost kernel: [    5.975985] EXT4-fs (md126): warning: checktime reached, running e2fsck is recommended
Jun  8 14:51:31 localhost kernel: [    6.632882] EXT4-fs (md125): warning: checktime reached, running e2fsck is recommended
________________________________

tune2fs -l /dev/md127
tune2fs 1.42.7 (21-Jan-2013)
Filesystem volume name:   system
Last mounted on:          /
Filesystem UUID:          22f62744-8fd7-4090-aff8-b35ef38b4b74
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl nobarrier
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              640848
Block count:              2559727
Reserved block count:     25597
Free blocks:              2169169
Free inodes:              609234
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      624
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8112
Inode blocks per group:   507
Flex block group size:    16
Filesystem created:       Mon Mar 19 13:48:13 2012
Last mount time:          Sun Jun  8 14:51:19 2014
Last write time:          Sun Jun  8 14:51:19 2014
Mount count:              21
Maximum mount count:      -1
Last checked:             Mon Nov 25 20:45:49 2013
Check interval:           15552000 (6 months)
Next check after:         Sat May 24 21:45:49 2014
Lifetime writes:          178 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      bea48679-30ba-4a71-82d9-d4b2fe0a80b8
Journal backup:           inode blocks

Comment 1 Fedora End Of Life 2015-01-09 21:24:27 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 2 Fedora End Of Life 2015-02-18 11:43:53 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 3 Zbigniew Jędrzejewski-Szmek 2015-02-18 13:46:32 UTC
We need to deal with this in some way...

Comment 4 Michal Schmidt 2015-03-16 17:36:37 UTC
(In reply to Harald Reindl from comment #0)
> for sure, the last reboot of the machine below complaind too
> so why don't it happen at boot ?

e2fsck should have said something like "/dev/md127 has gone XY days without being checked, check forced" and perform the check.
I can see two possible reasons for why this did not happen. Either e2fsck detects the system clock is broken (by seeing the current date is before year 2010) or it's told to ignore the clock in /etc/e2fsck.conf (broken_system_clock = 1).

Comment 5 Michal Schmidt 2015-03-16 17:38:47 UTC
e2fsck may also defer the check if it detects the system is powered from a battery.

Comment 6 Harald Reindl 2015-03-16 17:41:25 UTC
none of both, the system-clock is fine and saw that behavior on different machines including VMware ESXI guests

cat /etc/e2fsck.conf
[options]
# If set to 1, prevent e2fsck from stopping boot just because the clock is wrong
broken_system_clock = 0

Comment 7 Harald Reindl 2015-03-16 17:42:54 UTC
also no battery - we talk about a HP Compaq Elite 8200 in the initial bugreport

Comment 8 Michal Schmidt 2015-03-16 18:01:23 UTC
How exactly are these mounts listed in /etc/fstab?

Comment 9 Harald Reindl 2015-03-16 18:02:44 UTC
UUID=209aeed4-95bd-4eb0-bdfa-fb346b603ce9  /boot                   ext4    defaults                                                                      0 1
UUID=22f62744-8fd7-4090-aff8-b35ef38b4b74  /                       ext4    defaults,commit=5,inode_readahead_blks=128,noatime,nodiratime,noquota         0 1
UUID=0b95905b-02c5-444b-af9e-7615cabebb38  /mnt/data               ext4    defaults,commit=5,inode_readahead_blks=128,noatime,nodiratime,noquota,nosuid  0 2

Comment 10 Michal Schmidt 2015-03-16 18:10:35 UTC
Hm, a similar configuration works fine for me here.

Would you please paste the output from:
journalctl SYSLOG_IDENTIFIER=systemd-fsck -b

Assuming that /mnt/data is one of the filesystems that should have been checked, but wasn't, please also paste:

- the contents of the unit file /run/systemd/generator/mnt-data.mount

- the output from:
  systemctl show /mnt/data

Comment 11 Harald Reindl 2015-03-16 18:16:35 UTC
sorry, there is no journal-data at the moment (non-persistent) and i can't reboot that machine which is 300 kilometers away from me acting as server currently

_____________________________________________________

[Unit]
SourcePath=/etc/fstab
Before=local-fs.target
RequiresOverridable=systemd-fsck@dev-disk-by\x2duuid-0b95905b\x2d02c5\x2d444b\x2daf9e\x2d7615cabebb38.service
After=systemd-fsck@dev-disk-by\x2duuid-0b95905b\x2d02c5\x2d444b\x2daf9e\x2d7615cabebb38.service

[Mount]
What=/dev/disk/by-uuid/0b95905b-02c5-444b-af9e-7615cabebb38
Where=/mnt/data
Type=ext4
FsckPassNo=2
Options=defaults,commit=5,inode_readahead_blks=128,noatime,nodiratime,noquota,nosuid
_____________________________________________________

Id=mnt-data.mount
Names=mnt-data.mount
Requires=systemd-fsck@dev-disk-by\x2duuid-0b95905b\x2d02c5\x2d444b\x2daf9e\x2d7615cabebb38.service -.mount
RequiresOverridable=systemd-fsck@dev-disk-by\x2duuid-0b95905b\x2d02c5\x2d444b\x2daf9e\x2d7615cabebb38.service
Wants=system.slice
BindsTo=dev-disk-by\x2duuid-0b95905b\x2d02c5\x2d444b\x2daf9e\x2d7615cabebb38.device
RequiredBy=local-fs.target home.mount tmp.mount var-tmp.mount
WantedBy=dev-disk-by\x2duuid-0b95905b\x2d02c5\x2d444b\x2daf9e\x2d7615cabebb38.device
Conflicts=umount.target
Before=local-fs.target umount.target home.mount tmp.mount var-tmp.mount
After=systemd-fsck@dev-disk-by\x2duuid-0b95905b\x2d02c5\x2d444b\x2daf9e\x2d7615cabebb38.service systemd-journald.socket dev-disk-by\x2duuid-0b95905b\x2d02c5\x2d444b\x2daf9e\x2d7615cabebb38.device local-fs-pre.target system.slice -.mount
RequiresMountsFor=/mnt /dev/disk/by-uuid/0b95905b-02c5-444b-af9e-7615cabebb38
Description=/mnt/data
LoadState=loaded
ActiveState=active
SubState=mounted
FragmentPath=/run/systemd/generator/mnt-data.mount
SourcePath=/etc/fstab
InactiveExitTimestamp=Fr 2015-03-13 00:50:29 CET
InactiveExitTimestampMonotonic=6727220
ActiveEnterTimestamp=Fr 2015-03-13 00:50:29 CET
ActiveEnterTimestampMonotonic=6989649
ActiveExitTimestampMonotonic=0
InactiveEnterTimestampMonotonic=0
CanStart=yes
CanStop=yes
CanReload=yes
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureIsolate=no
IgnoreOnIsolate=yes
IgnoreOnSnapshot=no
NeedDaemonReload=no
JobTimeoutUSec=0
ConditionTimestamp=Fr 2015-03-13 00:50:29 CET
ConditionTimestampMonotonic=6630851
ConditionResult=yes
Transient=no
Slice=system.slice
ControlGroup=/system.slice/mnt-data.mount
Where=/mnt/data
What=/dev/md2
Options=rw,nosuid,noatime,nodiratime,rw,data=ordered,inode_readahead_blks=128
Type=ext4
TimeoutUSec=1min 30s
ExecMount={ path=/bin/mount ; argv[]=/bin/mount /dev/disk/by-uuid/0b95905b-02c5-444b-af9e-7615cabebb38 /mnt/data -t ext4 -o defaults,commit=5,inode_readahead_blks=128,noatime,nodiratime,noquota,nosuid ; ignore_errors=no ; start_time=[Fr 2015-03-13 00:50:29 CET] ; stop_time=[Fr 2015-03-13 00:50:29 CET] ; pid=606 ; code=exited ; status=0 }
ControlPID=0
DirectoryMode=0755
Result=success
UMask=0022
LimitCPU=18446744073709551615
LimitFSIZE=18446744073709551615
LimitDATA=18446744073709551615
LimitSTACK=18446744073709551615
LimitCORE=18446744073709551615
LimitRSS=18446744073709551615
LimitNOFILE=4096
LimitAS=18446744073709551615
LimitNPROC=31404
LimitMEMLOCK=65536
LimitLOCKS=18446744073709551615
LimitSIGPENDING=31404
LimitMSGQUEUE=819200
LimitNICE=0
LimitRTPRIO=0
LimitRTTIME=18446744073709551615
OOMScoreAdjust=0
Nice=0
IOScheduling=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SecureBits=0
CapabilityBoundingSet=18446744073709551615
MountFlags=0
PrivateTmp=no
PrivateNetwork=no
SameProcessGroup=yes
IgnoreSIGPIPE=yes
NoNewPrivileges=no
KillMode=control-group
KillSignal=15
SendSIGKILL=yes
SendSIGHUP=no
CPUAccounting=no
CPUShares=1024
BlockIOAccounting=no
BlockIOWeight=1000
MemoryAccounting=no
MemoryLimit=18446744073709551615
DevicePolicy=auto

Comment 12 Michal Schmidt 2015-03-16 18:21:35 UTC
OK, then let's see what systemd itself remembers of the systemd-fsck@... service:

systemctl status 'systemd-fsck@dev-disk-by\x2duuid-0b95905b\x2d02c5\x2d444b\x2daf9e\x2d7615cabebb38.service'

Comment 13 Harald Reindl 2015-03-16 18:25:44 UTC
keep in mind that i reported the issue last year and did a "touch /forcefsck" after that - the fsck itself is fired, also in the initial report on top but skipped with "filesystem is clean"

systemd-fsck@dev-disk-by\x2duuid-0b95905b\x2d02c5\x2d444b\x2daf9e\x2d7615cabebb38.service - File System Check on /dev/disk/by-uuid/0b95905b-02c5-444b-af9e-7615cabebb38
   Loaded: loaded (/usr/lib/systemd/system/systemd-fsck@.service; static)
   Active: active (exited) since Fr 2015-03-13 00:50:29 CET; 3 days ago
     Docs: man:systemd-fsck@.service(8)
  Process: 573 ExecStart=/usr/lib/systemd/systemd-fsck %f (code=exited, status=0/SUCCESS)
 Main PID: 573 (code=exited, status=0/SUCCESS)

Mär 13 00:50:28 localhost systemd[1]: Starting File System Check on /dev/disk/by-uuid/0b95905b-02c5-444b-af9e-7615cabebb38...
Mär 13 00:50:29 localhost systemd-fsck[573]: data: sauber, 189132/60375040 Dateien, 18020079/241497311 Blöcke
Mär 13 00:50:29 localhost systemd[1]: Started File System Check on /dev/disk/by-uuid/0b95905b-02c5-444b-af9e-7615cabebb38.

Comment 14 Michal Schmidt 2015-03-16 18:41:38 UTC
I guess I was confused by the log in the initial report showing only one systemd-fsck line, but 3 mounts.

So it seems it really must be e2fsck who's deciding it has nothing to do for some reason. Is this system still on F19, or has it been since upgraded? I'd reassign this BZ to e2fsprogs, but I suspect Zbigniew reopened this BZ in order to track the issue that "/forcefsck" is deprecated without an obvious replacement.

Comment 15 Harald Reindl 2015-03-16 18:46:08 UTC
i kept the report as short as possible because it contains all important informations

* fsck says "clean"
* at the same time "fsck is recommended"

in the meantime F20 and F21 is on hold here because https://bugzilla.redhat.com/show_bug.cgi?id=1184016

the issue that "/forcefsck" is deprecated is ridiculous in general, you can't play around with the boot params in every envirnoment and if you have to do it by edit the boot loader config this even more invasive as the simple touch

Comment 16 Lennart Poettering 2015-04-08 11:12:52 UTC
Reassigning to the kernel, this is not systemd's business.

(And yeah, /forcefsck we only support for compat reasons, it's crazy to alter the file system like this to ask for a fsck if you have the suspicion that your file system is already corrupted. If this was a simple flag in the superblock then we could support this, but actually writing files is a really bad idea.)

Comment 17 Harald Reindl 2015-04-08 11:22:15 UTC
well, you ignore the valid case that you suspect corruption on a datadisk, log-partition, home-partition and so on which are not affected by touch a file on the rootfs (our last corruption was the large data-raid and not rootfs)

rootfs is typically not written that often on a server setup

Comment 18 Harald Reindl 2015-04-08 11:24:08 UTC
and BTW in case you need to enforce fsck via kernel-param on a remote machine without ILO it don't matter anyways, you change a important file which is more dangerous than touch a new zerobyte file

Comment 19 Eric Sandeen 2015-04-09 20:56:30 UTC
o_O this isn't a kernel bug, the kernel does not launch fsck or affect its behavior.

Anyway ...

It sounds like the core issue here is that at mount time, ext4 tells you that the check interval has expired, but e2fsck tells you that it's clean, correct?

This might be related to the mess we went through w.r.t. clock accuracy vs. check intervals; if "broken_system_clock" is in your /etc/e2fsck.conf, can you remove that line and try again?

Thanks,
-Eric

Comment 20 Zbigniew Jędrzejewski-Szmek 2015-04-09 21:07:55 UTC
(In reply to Eric Sandeen from comment #19)
> This might be related to the mess we went through w.r.t. clock accuracy vs.
> check intervals; if "broken_system_clock" is in your /etc/e2fsck.conf, can
> you remove that line and try again?
See comment #c6:

> cat /etc/e2fsck.conf
> [options]
> # If set to 1, prevent e2fsck from stopping boot just because the clock is
> wrong
> broken_system_clock = 0

Comment 21 Eric Sandeen 2015-04-09 21:54:49 UTC
Ok, missed that, sorry.

So, in my testing, a filesystem which is "past due" does get checked by e2fsck, when e2fsck is called with no options.

So, if that's not happening ... hrmph.  

Am I correct that in your case, e2fsck on the commandline does nothing for this filesystem which has warnings when mounted?

Comment 22 Carlos Maiolino 2016-08-24 08:14:32 UTC
Hi,

I'm trying to move this bug forward. But, AFAIK, the last time it was reproduced, was in Fedora 19.

Were you able to reproduce it in newer versions?

Comment 23 Harald Reindl 2016-08-27 11:42:40 UTC
at least with up-to-date RHEL//CentOS7 it still exists

[root@ingo-home:~]$ cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)

[    3.031575] EXT4-fs (sda1): barriers disabled
[    3.032252] EXT4-fs (sda1): warning: checktime reached, running e2fsck is recommended
[    3.032528] EXT4-fs (sda1): mounted filesystem with writeback data mode. Opts: commit=180,noquota,inode_readahead_blks=64,discard

Comment 24 Michal Schmidt 2016-09-05 12:50:05 UTC
Could you answer Eric's question from comment #21? That is, have a filesystem that complains about "checktime reached" when mounted, unmount it, then run e2fsck on it. Does e2fsck report it as clean and exit, or does it proceed with the checking?

Comment 25 Harald Reindl 2016-09-05 12:57:14 UTC
you can't unmount the rootfs

Comment 26 Michal Schmidt 2016-09-05 12:59:02 UTC
Is it always only the rootfs that has this issue? Your original description showed three filesystems reporting the warning (md127, md126, md125).

Comment 27 Harald Reindl 2016-09-05 13:02:36 UTC
frankly no - but currently it was and i just answered https://bugzilla.redhat.com/show_bug.cgi?id=1105877#c22 that it also happens a year later

it's a bug - i work around "touch /forcefsck" where it *does* the check and ignore the idiotic systemd warnings about using a kernel-param over SSH - so no, i won't spend much time to debug such low level basic functionality

Comment 28 Michal Schmidt 2016-09-05 13:11:59 UTC
Nobody else is seeing the problem despite attempts at reproducing it (I tried on Fedora and RHEL 7 by setting the last-checked time into the past with "tune2fs -T"). If you don't want to spend time debugging this, let's close this.
Would you at least attach the output of "journalctl -b"? Thanks.

Comment 29 Harald Reindl 2016-09-05 13:25:35 UTC
i have currently not that issue nor a persistent journal as well as *i strongly doubt* nobody is seeing this - in fact nobody is looking at his logs most of the time (explains package bugs in Fedora where a single look in /var/log/messages would have bugreports not necessary)

or when somebody does get flooded with other stuff (mostly from systemd) and so don't face anything at all

Comment 30 Eric Sandeen 2016-09-06 14:44:31 UTC
So, a while ago in a fit of pique, I set broken_system_clock to 1 in e2fsck.conf, because of reports of failure to boot when the clock was wrong.  Automatically setting check times was turned off in e2fsprogs/mkfs long ago, so upstream has generally decided that periodic forced checks are not a good default.

That setting probably makes it ignore check times if they are set, though. 

Comment #5 indicates that broken_system_clock is not set, but at least for the root fs that probably needs to propagate to the initrd/initramfs to be effective at boot time.  Any idea if it is?

Comment 31 Harald Reindl 2016-09-06 14:50:46 UTC
on the machine which showed that issue the last time (CentOS7) it's enabled and i guess dracut will take that also into the initrd

cat /etc/e2fsck.conf
[options]
# This will prevent e2fsck from stopping boot just because the clock is wrong
broken_system_clock = 1

Comment 32 Eric Sandeen 2016-09-06 15:00:59 UTC
Try setting it to 0 and rebuilding the initrd, but be warned that if the clock ever goes wonky you might fail to boot ;)

Comment 33 Harald Reindl 2017-04-18 11:51:29 UTC
what has one thing to do with the other?

frankly even a "touch /forcefsck" on CentOS7 with all patchlevels don't make the "warning: checktime reached, running e2fsck is recommended" messages go away which indicates that *not* all filesystems (if any) are checked as clearly asked and configured

Comment 34 Eric Sandeen 2017-04-18 14:34:53 UTC
It's mind-boggling that boot-time "fsck" could be so complicated, isn't it?  I guess complexity expands to fill available bugzilla space, or something.

From the original report, on June 8, 2014:

Last checked:             Mon Nov 25 20:45:49 2013
Check interval:           15552000 (6 months)
Next check after:         Sat May 24 21:45:49 2014

We are clearly after the next check time, and yet:

> Jun  8 14:51:31 localhost systemd-fsck[349]: system: sauber, 31614/640848 Dateien, 390558/2559727 Blöcke

e2fsck thinks there is no work to do and it is clean, AND YET:

> Jun  8 14:51:31 localhost kernel: [    4.855832] EXT4-fs (md127): warning: checktime reached, running e2fsck is recommended

The kernel disagrees based on last checktime, at exactly the same timestamp.  So what's going on?

e2fsck has gotten fiendishly complicated; there are many checks that determine whether a full check should get run or skipped.

In 1.42.7, we /will/ perform the check any of these are true:

* "-E journal_only" was not specified (unknown in your case), and:

* The filesystem is marked with errors (yours is not)
* The filesystem was not cleanly unmounted (yours was)
* The superblock features are inconsistent w/ backups (unknown, presumably not)
* The maximum mount count has been reached (yours is not set)
* broken_system_clock is /not/ set (you show yours is not), and:
  * we have a check interval (you do), and
  * last check time is in the future (yours is not), and
  * we are not on battery (you are not)
* broken_system_clock is /not/ set (you show yours is not), and:
  * we have a check interval (you do), and
  * time since last check has expired (it has), and
  * we are not on battery (you are not), or
    * we have gone past twice the check interval (you have not)

So I'm stumped.  Unless e2fsck was invoked with "-E journal_only" or you are magically on battery and/or have a broken system clock setting without knowing it, check should be running.

It's ridiculous that this is assigned to the kernel, because the kernel does not run e2fsck - it's either the init scripts, or e2fsck which is causing the check to get skipped.  But anyway -

Harald, if you can still reproduce this, let's build a debug e2fsck binary that spews information about all state checked in check_if_skip() and get to the bottom of this once and for all. Sound good?  Let us know what e2fsprogs version to build a test binary for.  You'll need to rebuild the initrd/ramfs to get the debug e2fsck binary into the boot process.

Comment 35 Eric Sandeen 2017-04-18 14:41:08 UTC
I guess the other thing to check is whether any fsck.*= is set on your kernel commandline.

Comment 36 Harald Reindl 2017-04-18 15:00:26 UTC
> there are many checks that determine whether a full check should get run

well, i suck that all BUT if i do "touch /forcefsck" & reboot and after "> /var/log/messages" and another reboot i get a "EXT4-fs (sdb1): warning: checktime reached, running e2fsck is recommended" THIS is ridiculous beause it simply skips / while it complains about it should be checked

3.10.0-514.16.1.el7.x86_64 #1 SMP Wed Apr 12 15:04:24 UTC 2017
e2fsprogs-1.42.9-9.el7.x86_64
systemd-219-30.el7_3.8.x86_64

and frankly that you can't run fsck on data partitions even after you run mutiple umount on the device name because e2fsck complains about it#s in use while all the mentioned tools don't show anything and that continues even after "systemctl isolate rescue.target" is ridiculous too - the only way you can reliable run fsck is when you don't mount a partition at boot - and that problem exists over years (someone on the kernel list talked about systemd namespaces are guilty - than frankly give it a --force param)
_____________________________________________________

[root@localhost:~]$ cat messages | grep fsck
Apr 18 16:52:47 localhost systemd-fsck: system: clean, 24275/320000 files, 292716/1280000 blocks
Apr 18 16:52:48 localhost kernel: EXT4-fs (sdb1): warning: checktime reached, running e2fsck is recommended
Apr 18 16:52:48 localhost systemd-fsck: Please pass 'fsck.mode=force' on the kernel command line rather than creating /forcefsck on the root file system.
Apr 18 16:52:48 localhost systemd-fsck: Please pass 'fsck.mode=force' on the kernel command line rather than creating /forcefsck on the root file system.
Apr 18 16:52:48 localhost systemd-fsck: Please pass 'fsck.mode=force' on the kernel command line rather than creating /forcefsck on the root file system.
Apr 18 16:52:48 localhost systemd-fsck: boot: 336/128016 Dateien (0.9% nicht zusammenhängend), 52471/510976 Blöcke
Apr 18 16:52:48 localhost systemd-fsck: log: 89/196608 Dateien (21.3% nicht zusammenhängend), 40245/786176 Blöcke
Apr 18 16:52:49 localhost systemd-fsck: dune: 118992/655360 Dateien (0.9% nicht zusammenhängend), 1453091/2621184 Blöcke

[root@localhost:~]$ cat messages | grep fsck
Apr 18 16:53:58 localhost systemd: Removed slice system-systemd\x2dfsck.slice.
Apr 18 16:53:58 localhost systemd: Stopping system-systemd\x2dfsck.slice.
Apr 18 16:54:02 localhost systemd-fsck: system: clean, 24271/320000 files, 292335/1280000 blocks
Apr 18 16:54:03 localhost kernel: EXT4-fs (sdb1): warning: checktime reached, running e2fsck is recommended
Apr 18 16:54:03 localhost systemd-fsck: log: sauber, 89/196608 Dateien, 40248/786176 Blöcke
Apr 18 16:54:03 localhost systemd-fsck: boot: sauber, 336/128016 Dateien, 52471/510976 Blöcke
Apr 18 16:54:03 localhost systemd-fsck: dune: sauber, 118992/655360 Dateien, 1453011/2621184 Blöcke

[root@backup-hosting:~]$ df
Dateisystem    Typ  Größe Benutzt Verf. Verw% Eingehängt auf
/dev/sdb1      ext4  4,7G    936M  3,8G   20% /
/dev/sdc1      ext4  2,9G     46M  2,9G    2% /var/log
/dev/sda1      ext4  476M     28M  444M    6% /boot
/dev/sdd1      ext4  9,8G    5,3G  4,5G   55% /tmp

Comment 37 Eric Sandeen 2017-04-18 15:06:17 UTC
> and frankly that you can't run fsck on data partitions even after you run mutiple umount

Ok, well, one bug at a time please, and I don't think e2fsck /or/ the kernel can help you with /that/ problem.

I'll build a debug e2fsprogs binary that you can run again, and if you can capture output from fsck when it runs (can you?  I have no idea how systemd logging works) then maybe we'll get a hint about why e2fsck decides not to run a full check.  Sound good?  (find me on irc if you want, sandeen on #fedora-devel today).

Comment 38 Eric Sandeen 2017-04-18 15:08:34 UTC
> Apr 18 16:52:47 localhost systemd-fsck: system: clean, 24275/320000 files, 292716/1280000 blocks

How do we even know which filesystem the above is for?  Can you verify with df (-i) that the above file & block count does really look like your root fs, just for a sanity check?

Comment 39 Harald Reindl 2017-04-18 15:14:17 UTC
> How do we even know which filesystem the above is for?

a) by "systemd-fsck: system" while "system" is the e2label
b) by "EXT4-fs (sdb1): warning: checktime reached" at the next boot
c) by the info for /dev/sdb1 in the "df" output

that's a ordinary CentOS7 with all updates applied running on VMware ESXi

Comment 40 Eric Sandeen 2017-04-18 15:17:42 UTC
Thanks, I had missed that "system" was the fs label.

Comment 41 Harald Reindl 2017-04-18 15:20:22 UTC
and that the rootfs is the only one not checked after 2015 and nothing on that install cares about how often you do "touch /forcefsck" while it did in the past, otherwise the last-checkd would be also somewhere in 2013

[root@localhost:~]$ tune2fs -l /dev/sdb1 | grep "Last checked"
Last checked:             Fri Oct  2 12:34:47 2015
[root@localhost:~]$ tune2fs -l /dev/sda1 | grep "Last checked"
Last checked:             Tue Apr 18 16:52:48 2017
[root@localhost:~]$ tune2fs -l /dev/sdb1 | grep "Last checked"
Last checked:             Fri Oct  2 12:34:47 2015
[root@localhost:~]$ tune2fs -l /dev/sdc1 | grep "Last checked"
Last checked:             Tue Apr 18 16:52:48 2017
[root@localhost:~]$ tune2fs -l /dev/sdd1 | grep "Last checked"
Last checked:             Tue Apr 18 16:52:48 2017

Filesystem volume name:   system
Last mounted on:          /
Filesystem UUID:          6273526b-ee27-4b96-bdae-848dcb26e4f0
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    journal_data_writeback user_xattr acl nobarrier
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              320000
Block count:              1280000
Reserved block count:     2
Free blocks:              987665
Free inodes:              295729
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      312
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8000
Inode blocks per group:   500
Flex block group size:    16
Filesystem created:       Thu Dec 12 14:59:58 2013
Last mount time:          Tue Apr 18 16:54:03 2017
Last write time:          Tue Apr 18 16:54:02 2017
Mount count:              36
Maximum mount count:      -1
Last checked:             Fri Oct  2 12:34:47 2015
Check interval:           31104000 (12 months)
Next check after:         Mon Sep 26 12:34:47 2016
Lifetime writes:          154 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      1b01bc5d-ed5f-4571-9947-bfb54c7b3d34
Journal backup:           inode blocks

Comment 42 Eric Sandeen 2017-04-18 17:00:05 UTC
Ok, can you grab:
http://people.redhat.com/esandeen/e2fsprogs-1.42.9-9.el7_3.test1.src.rpm
rebuild it, install it, rebuild your init[rd|ramfs], and try again?  It should print out a bunch of state when it's deciding whether or not to do the checks, i.e.


e2fsck 1.42.9 (28-Dec-2013)
check_if_skip: batt 0 broken_system_clock 0
check_if_skip: ctx->now 1492534697 lastcheck 0 lastcheck-fudge 4294880896 checkinterval 15552000, fudge 86400
fsfile: clean, 11/65536 files, 12635/262144 blocks

Comment 43 Harald Reindl 2017-04-18 17:01:51 UTC
please provide a binary - i have no build environments for centOS stuff

Comment 44 Eric Sandeen 2017-04-18 17:31:39 UTC
http://people.redhat.com/esandeen/reindl/

Comment 45 Harald Reindl 2017-04-18 17:46:19 UTC
[root@backup-hosting:~]$ cat messages | grep fsck
Apr 18 19:45:15 backup-hosting systemd: Removed slice system-systemd\x2dfsck.slice.
Apr 18 19:45:23 backup-hosting systemd-fsck: check_if_skip: batt 0 broken_system_clock 1
Apr 18 19:45:23 backup-hosting systemd-fsck: check_if_skip: ctx->now 1492537523 s lastcheck 0 s lastcheck-fudge 4294880896 s checkinterval 31104000, fudge 86400
Apr 18 19:45:23 backup-hosting systemd-fsck: system: clean, 24314/320000 files, 293085/1280000 blocks
Apr 18 19:45:24 backup-hosting kernel: EXT4-fs (sdb1): warning: checktime reached, running e2fsck is recommended
Apr 18 19:45:24 backup-hosting systemd-fsck: Please pass 'fsck.mode=force' on the kernel command line rather than creating /forcefsck on the root file system.
Apr 18 19:45:24 backup-hosting systemd-fsck: Please pass 'fsck.mode=force' on the kernel command line rather than creating /forcefsck on the root file system.
Apr 18 19:45:24 backup-hosting systemd-fsck: Please pass 'fsck.mode=force' on the kernel command line rather than creating /forcefsck on the root file system.
Apr 18 19:45:24 backup-hosting systemd-fsck: check_if_skip: batt 0 broken_system_clock 1
Apr 18 19:45:24 backup-hosting systemd-fsck: check_if_skip: ctx->now 1492537524 s lastcheck 0 s lastcheck-fudge 4294880896 s checkinterval 31104000, fudge 86400
Apr 18 19:45:24 backup-hosting systemd-fsck: check_if_skip: batt 0 broken_system_clock 1
Apr 18 19:45:24 backup-hosting systemd-fsck: check_if_skip: ctx->now 1492537524 s lastcheck 0 s lastcheck-fudge 4294880896 s checkinterval 31104000, fudge 86400
Apr 18 19:45:24 backup-hosting systemd-fsck: check_if_skip: batt 0 broken_system_clock 1
Apr 18 19:45:24 backup-hosting systemd-fsck: check_if_skip: ctx->now 1492537524 s lastcheck 0 s lastcheck-fudge 4294880896 s checkinterval 31104000, fudge 86400
Apr 18 19:45:24 backup-hosting systemd-fsck: log: 89/196608 Dateien (22.5% nicht zusammenhängend), 40270/786176 Blöcke
Apr 18 19:45:24 backup-hosting systemd-fsck: boot: 336/128016 Dateien (0.9% nicht zusammenhängend), 52471/510976 Blöcke
Apr 18 19:45:25 backup-hosting systemd-fsck: dune: 118992/655360 Dateien (0.9% nicht zusammenhängend), 1452925/2621184 Blöcke

Comment 46 Eric Sandeen 2017-04-18 20:17:16 UTC
"broken_system_clock 1"

that's why time-based updates are being skipped.

if that's not what you have in /etc/e2fsck.conf, I guess the next trick is to figure out where it's coming from.  The default value in the code, without finding different in e2fsck.conf, is "0" - so why is yours showing up as set to 1?

In comment #6 you confirmed that it was off in the config file.  Is that true for this machine as well?

-Eric

Comment 47 Harald Reindl 2017-04-18 20:22:46 UTC
no - who TF enables that on RHEL/CentOS by what reasoning and WHY does that only prevent fsck for the rootfs and NOT other filesystems? that's all a big mess

cat /etc/e2fsck.conf
[options]
# This will prevent e2fsck from stopping boot just because the clock is wrong
broken_system_clock = 1

Comment 48 Harald Reindl 2017-04-18 20:25:27 UTC
or to ask it in a different way:

what idiotic logic overrides a /forcefsck based on "broken_system_clock" nayways while that's hardly TIMEBASED an d why only for rootfs?

Comment 49 Eric Sandeen 2017-04-18 21:15:07 UTC
Ok, so set it to 0, and see if it works (even without forcefsck).

/I/ enabled broken system clock, because without it, we had reports of systems failing to boot because fsck falls over and then systemd falls over and then nobody knows how to get things going again, and - yep, not going there.  By RHEL7, timed checks are off by default in any case.  So if you want to use them, turn them on, then change e2fsck.conf, and continue.

As for anything else i.e. regarding /forcefsck, those are questions for systemd folks.

If you touch /forcefsck and/or set fsck.mode=force on the kernel commandline and it /still/ doesn't run a real fsck, that's a bug for systemd to solve.  Try fsck.mode=force on your kernel cmdline and see what happens; I could build another debug binary to see if it's getting the force flag after that.  (If it does, e2fsck /will/ run a full check.)

Comment 50 Harald Reindl 2017-04-18 21:19:54 UTC
frankly i do not care - when i do "touch /forcefsck" i expect ALL filesystems to be checked and if that#s not the case something goes terrible wrong in Redhat/Fedora/Kernel/Systemd-Lan and i don't give a damn which of the components is guilty - if would i would not use a DISTRIBUTION but bake all my components by hand -> linux ffrom scratch - as long as i donät do that i expect basic things to work just fine

Comment 51 Eric Sandeen 2017-04-18 21:29:48 UTC
It's fine if you don't care - but we will need to get the right people involved if you want the bug triaged and solved.

systemd already decided that /forcefsck is deprecated - I have no control over that decision.

I'm asking for your help in triage.  Please set broken_system_clock to 0, rebuild init[rd|ramfs], and see if your time-expired filesystem gets checked as you expect.

Then please set fsck.mode=force on the kernel cmdline, which is the New Systemd Way (tm), and see if it re-checks your filesystems.  If so, then I think things are working as documented and advertised, and aside from system configurations that you would rather not make, there's not a bug to fix, here.  If either of the above /doesn't/ work we'll look more.

Comment 52 Harald Reindl 2017-04-18 22:19:53 UTC
as explained "broken_system_clock" is the key while it has no business when one *forces* the fsck and the reason for "/forcefsck is deprecated" is JUST beause when you assume the FS where yu do the touch is compromised it would do a write access - asked that on systemd devel list - it's just a warning

so my job is done - the remaining question is why a "/forcefsck" is still depending on "broken_system_clock" which is *far* out of my scope as ell as what foll set this on RHEL7 obviously by default in a virtual machine which has no broken clock by definition

Comment 53 Eric Sandeen 2017-04-19 00:21:40 UTC
/If/ the force options are truly sending the force option ("-f") to e2fsck, then the system clock options are /not even tested/ - full e2fsck will run, and all the other "should we skip?" checks (including the system clock setting) are completely ignored.  I don't know if the force options are sending "-f" to e2fsck, and I'm asking you to test that.

I understand that you are frustrated.  I'm now actively looking at this bug, and I'd like to resolve it.  I'm asking for your assistance in teasing apart these two behaviors.  As a user of community distros, you have the opportunity to be an active participant in this sort of debugging.  If you're unable to do that, I'll have to assume that the system is simply behaving according to its configuration, and I don't mind closing the bug again if we aren't able to make further progress.

OTOH if you are willing to run another test, this one will also print out whether or not the force flag was passed to e2fsck.

http://people.redhat.com/esandeen/reindl2/

Let me know 1) how the force options were set as well as 2) how the system clock option is set along with the results you get.  We can then see exactly what happens based on the 2 different configuration options.

Thanks,
-Eric

Comment 54 Michal Schmidt 2017-04-19 11:51:59 UTC
We cannot test for the presence of /forcefsck unless we mount / first. But we run systemd-fsck & fsck on it before mounting it. Therefore systemd-fsck does not see /forcefsck at this point.
There is no such chicken&egg problem when using fsck.mode on the kernel command line.

Comment 55 Harald Reindl 2017-04-19 12:20:52 UTC
it is fact that after "broken_system_clock 0" the rootfs was checked 

for the tests in https://bugzilla.redhat.com/show_bug.cgi?id=1105877#c53 just give me some days - that's a "half-production" machine running a mysqld slave and hourly backups of the production clone and so it needs some care

in the meantime for "I'm now actively looking at this bug, and I'd like to resolve it" it's a up-to-date CentOS7 and maybe it's worth to install such VM with the partitioning below (own vdisks instead partitions) so you could try some things and get faster feedback be skip the "here build, please install, feedback"-loop

[root@backup-hosting:~]$ df
Dateisystem    Typ  Größe Benutzt Verf. Verw% Eingehängt auf
/dev/sdb1      ext4  4,7G    936M  3,8G   20% /
/dev/sdc1      ext4  2,9G     46M  2,9G    2% /var/log
/dev/sda1      ext4  476M     28M  444M    6% /boot
/dev/sdd1      ext4  9,8G    5,3G  4,5G   55% /tmp

Comment 56 Harald Reindl 2017-04-19 12:25:58 UTC
> There is no such chicken&egg problem when using 
> fsck.mode on the kernel command line

yeah but it's not much funny to change the bootloader config because when you forget to remove that on a machine with large filesystems the next routine reboot will hurt 

do it temporary in the grub-menu is no option for many remote machines which can't expose their ILO over WAN because they don't have one or the machine is the router in front of the network itself 

in fact it needs something else like "dracut -f --forcefsck" so that this file is in the initrd and is ignored wehn it's older than say 30 minutes to avoid the need of "dracut -f" after it's done

to say it short: what's needed is an easy to handle "check filesystems one time due the next reboot"

Comment 57 Eric Sandeen 2017-04-19 14:35:19 UTC
(In reply to Harald Reindl from comment #55)
> it is fact that after "broken_system_clock 0" the rootfs was checked 

Good.  That was my original suspicion.  So: no force options are needed, and modifying e2fsck.conf will allow the check to proceed if time-based fsck periods have been selected and have expired.  I'm pretty sure that resolves the original complaint in this bug.  (You had mentioned /forcefsck, but in fact the problem was an expired FS not getting checked, regardless of /forcefsck).

> for the tests in https://bugzilla.redhat.com/show_bug.cgi?id=1105877#c53
> just give me some days - that's a "half-production" machine running a mysqld
> slave and hourly backups of the production clone and so it needs some care
> 
> in the meantime for "I'm now actively looking at this bug, and I'd like to
> resolve it" it's a up-to-date CentOS7 and maybe it's worth to install such
> VM with the partitioning below (own vdisks instead partitions) so you could
> try some things and get faster feedback be skip the "here build, please
> install, feedback"-loop
> 
> [root@backup-hosting:~]$ df
> Dateisystem    Typ  Größe Benutzt Verf. Verw% Eingehängt auf
> /dev/sdb1      ext4  4,7G    936M  3,8G   20% /
> /dev/sdc1      ext4  2,9G     46M  2,9G    2% /var/log
> /dev/sda1      ext4  476M     28M  444M    6% /boot
> /dev/sdd1      ext4  9,8G    5,3G  4,5G   55% /tmp

Well, as far as /I/ know, and based on testing which I /have/ done, it's all working as designed, so I was curious to know why it wasn't working in /your/ environment, hence the test requests.

I see your point about /forcefsck being a nice, one-shot check method, but I would suggest filing a separate bug for that, as it is quite far outside the scope of this (now very long) bug.

/forcefsck, fsck.mod= and friends have nothing to do with the kernel or e2fsprogs; you'll need to file a bug with systemd/dracut/whatever on that one.  You might consider bringing up the issue on the fedora-devel list as a suggestion, and see how a one-shot force might best be implemented.

Comment 58 Eric Sandeen 2019-12-13 14:54:23 UTC
Old bug, seems to have [d]evolved into a problem about /forcefsck vs. fsck.mode= and doesn't seem to be a bug within e2fsprogs itself, closing.


Note You need to log in before you can comment on or make changes to this bug.