Bug 743696

Summary: wtmp is being corrupted at shutdown.
Product: [Fedora] Fedora Reporter: Norman Smith <nls1729>
Component: systemdAssignee: systemd-maint
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 15CC: 2mstntn2, brovvnout+rh, elosery, fedoration, harald, johannbg, kay, lpoetter, maurizio.antillon, metherid, mschmidt, notting, plautrba
Target Milestone: ---Keywords: Patch
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: systemd-26-14.fc15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-30 20:58:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
From the test system after first shutdown and re-start.
none
Dump of wtmp entries.
none
wtmp entries F14 compared to F15
none
Proposed patch to utmp-wtmp.c systemd-26-1.fc15.src.rpm
none
Proposed patch to utmp-wtmp.c systemd-26-1.fc15.src.rpm none

Description Norman Smith 2011-10-05 18:47:15 UTC
Created attachment 526558 [details]
From the test system after first shutdown and re-start.

Description of problem:

wtmp is being corrupted at shutdown.

I installed chkrootkit and ran it on three systems: (2) Fedora 14 and (1) Fedora 15.  chkrootkit reported "Searching for Suckit rootkit... Warning: /sbin/init INFECTED" on all three systems.  This appears to be a false positive (I did several verifications that all indicated no Suckit.  I think ckrootkit is not all that up to date).  

On the Fedora 15 system chkrootkit reported MANY "Checking `wted'...'  "5 deletion(s)".  I examined my log files and found that each  Checking `wted'... "5 deletion(s)" date and time corresponded to a shutdown.

I set up a test system and did a fresh install of Fedora 15.  I installed chkrootkit.  I ran chkrootkit and found no "deletion(s)".  I shutdown and rebooted.  chkrootkit reported one "5 deletion(s)"  

The time of the "5 deletion(s)" corresponds to the shutdown time.

From the test system after a fresh install of Fedora 15 and after first normal shutdown.

Checking `wted'... 5 deletion(s) between Wed Oct  5 12:43:57 2011 and Wed Oct  5 12:44:09 2011

[admin@opusrex ~]$ su -c "grep rsyslogd /var/log/messages | grep exiting"
Password: 
Oct  5 12:44:06 opusrex rsyslogd: [origin software="rsyslogd" swVersion="5.8.5" x-pid="1023" x-info="http://www.rsyslog.com"] exiting on signal 15.


Version-Release number of selected component (if applicable):

systemd-26-9.fc15.x86_64
systemd-units-26-9.fc15.x86_64
systemd-sysv-26-9.fc15.x86_64

How reproducible:

Every time for me..

Steps to Reproduce:
1.  Do fresh install of Fedora 15.
2.  Install chkrootkit.
3.  Reboot.
4.  Run chkrootkit.
5.  Check time of rsyslogd exit on shutdown in /var/log/messages with wted
    deletion(s) message from chkrootkit.
  
Actual results:

wtmp has been buggered.  

Expected results:

wtmp should not be buggered.

Additional info:

I suspect wtmp is not being closed properly and flushed to disk at shutdown.  This may not have anything to do with systemd directly but the management of system shutdown is in systemd's area.

Comment 1 Michal Schmidt 2011-10-06 07:34:51 UTC
(In reply to comment #0)
> chkrootkit reported "Searching for Suckit rootkit... Warning:
> /sbin/init INFECTED" on all three systems.  This appears to be a false
> positive
Correct. This part is bug 636231.

Comment 2 Lennart Poettering 2011-10-11 01:05:56 UTC
I am not sure I understand what exactly this bug is about? Are you suggesting systemd deletes data from wtmp?

Well, all systemd does there is call updwtmpx(_PATH_WTMPX, ...), i.e. a libc function.

What is chkconfig actually checking there? How does it detect that wtmp lost entries? I.e. where does that information come from? And what does rsyslog have to do with it? rsyslog is not in the business of interfering with wtmp at all.

Comment 3 Jonathan Wakely 2011-10-11 16:34:15 UTC
chkrootkit comes with a util called chkwtmp, which opens wtmp and does:

while (read (filehandle, (char *) &utmp_ent, sizeof (struct utmp)) > 0) {
    if (utmp_ent.ut_time == 0)
        del_counter++;
    else {
        if (del_counter) {
            printit(del_counter, start_time,
                utmp_ent.ut_time);
            t_del++;
            del_counter=0;
        }
        start_time=utmp_ent.ut_time;
    }
}

Comment 4 Jonathan Wakely 2011-10-11 16:38:13 UTC
rsyslogd is not implicated in the bug, the OP is simply using rsyslog to obtain shutdown times and showing they correlate with the "deleted entries" detected by chkrootkit

Fedora ships chkrootkit, which reports problems possibly indicating a rootkit on a fresh install.  Ignoring the possibility that Fedora installs a roottkit by default, either chkrootkit is buggy and gives false positives, or something in Fedora corrupts wtmp in such a way to make chkrootkit report problems.

Comment 5 Norman Smith 2011-10-11 18:56:57 UTC
RE Comment 3 - This is a very simple check for consistency, no 0 times followed by non 0 times in the utmp entries of wtmp, It is not really a bug but it is not the best check of wtmp. chkrootkit has not been updated recently but is still used. rkhunter does not report the suspected deletion(s).  I haven't looked at rkhunter's sources but it apparently is smart enough to ignore corruption in the file.  It's a value judgement to not report corruption.

RE Comment 4 - Right on. Right on. But I don't think chkrootkit is buggy. It's just more simplistic.

------------
I composed the following before I saw Mr. Wakely's comment 3 and 4 and I hope the duplications are not a problem.
---------

> I am not sure I understand what exactly this bug is about?

wtmp is often targeted by hackers to cover their tracks after an intrusion. An indication wtmp has been changed improperly is a sign the system may have been hacked. The bug report is about the relability of wtmp in F15 (and possibly F16).
  
> Are you suggesting systemd deletes data from wtmp?

No.  

I observed chkrootkit (an old program for checking if a system has been rootkited) reports that apparent deletions from wtmp have occurred in F15.  

F14 does not exhibit this behavior.  F15 leaves wtmp in an improper state after shutdown (atleast as determined by chkrootkit).  Between F14 and F15, there appears to have been changes in the way systemd manages the shutdown process.  

I suspect during the transition from F14 to F15, a needed "sync" has been omitted or is not being executed at the proper time leaving wtmp in an inconsistent state after shutdown.

An unlikely alternative is Fedora 15 has been hacked at the source.  With many eyes looking at Fedora 15 it is not likely the sources have been hacked, but not impossible. I cannot conceive of a hack that only shows up at shutdown.  I think the problem is likely in systemd's handling of shutdown in F15.

> Well, all systemd does there is call updwtmpx(_PATH_WTMPX, ...), i.e. a libc
> function.

systemd may have failed to administer last writes (sync) for wtmp before halting the system.  

I examined wtmp with the last command on four systems, two F14 and two F15. The entires indicate different logging at shutdown between F14 and F15 at normal shutdown.  I have only seen the "down" entry on F15 if you shutdown by pressing the power button. I checked an old ubuntu system and I see "down" on its wtmp log when normal shutdowns are done.  

From F14:
    
Admen    pts/1        :1.0             Sat Oct  8 20:59 - 20:59  (00:00)    
Admen    pts/1        :1.0             Sat Oct  8 18:01 - 18:01  (00:00)    
Admen    tty7         :1               Sat Oct  8 18:00 - down   (02:59)    
bogwan   pts/0        :0.0             Sat Oct  8 12:28 - down   (08:31)    
bogwan   pts/0        :0.0             Sat Oct  8 11:23 - 11:29  (00:05)    
bogwan   tty1         :0               Sat Oct  8 11:23 - down   (09:36)    
reboot   system boot  2.6.35.14-97.fc1 Sat Oct  8 07:57 - 21:00  (13:02)    
Admen    pts/0        :0.0             Fri Oct  7 20:55 - 20:55  (00:00)    
Admen    pts/0        :0.0             Fri Oct  7 18:17 - 18:17  (00:00)    
bogwan   pts/0        opusrex.newbarny Fri Oct  7 13:53 - 13:53  (00:00)    
bogwan   pts/0        opus.newbarnyard Fri Oct  7 13:15 - 13:16  (00:00)    
bogwan   pts/0        opus.newbarnyard Fri Oct  7 13:15 - 13:15  (00:00)    
Admen    pts/0        :0.0             Fri Oct  7 13:11 - 13:12  (00:01)    
Admen    pts/0        :0.0             Fri Oct  7 13:10 - 13:11  (00:00)    
Admen    tty1         :0               Fri Oct  7 13:10 - down   (07:45)    
reboot   system boot  2.6.35.14-96.fc1 Fri Oct  7 11:03 - 20:55  (09:52) 

From F15:

bogwan   pts/0        :0.0             Sat Oct  8 13:53 - 13:54  (00:00)    
bogwan   pts/0        :0.0             Sat Oct  8 10:25 - 12:34  (02:09)    
bogwan   pts/0        :0.0             Sat Oct  8 10:10 - 10:11  (00:01)    
bogwan   pts/0        :0.0             Sat Oct  8 09:29 - 09:35  (00:06)    
bogwan   pts/0        :0.0             Sat Oct  8 09:20 - 09:22  (00:01)    
bogwan   tty1         :0               Sat Oct  8 08:10 - 18:00  (09:49)    
reboot   system boot  2.6.40.4-5.fc15. Sat Oct  8 08:09 - 18:00  (09:50)    
bogwan   pts/1        :0.0             Fri Oct  7 17:49 - 17:49  (00:00)    
bogwan   pts/0        :0.0             Fri Oct  7 15:38 - 18:17  (02:38)    
bogwan   pts/0        :0.0             Fri Oct  7 15:24 - 15:36  (00:12)    
bogwan   pts/0        :0.0             Fri Oct  7 14:13 - 14:40  (00:26)    
bogwan   pts/1        opusrex.newbarny Fri Oct  7 13:52 - 13:53  (00:00)    
bogwan   pts/0        opusrex.newbarny Fri Oct  7 13:41 - 13:53  (00:11)    
bogwan   pts/1        :0.0             Fri Oct  7 12:24 - 13:16  (00:52)    
bogwan   pts/2        :0.0             Fri Oct  7 11:44 - 11:56  (00:12)    
bogwan   pts/1        :0.0             Fri Oct  7 11:41 - 11:56  (00:14)    
bogwan   pts/0        :0.0             Fri Oct  7 11:34 - 13:16  (01:42)    
bogwan   tty1         :0               Fri Oct  7 11:06 - 18:17  (07:10)    
reboot   system boot  2.6.40.4-5.fc15. Fri Oct  7 11:06 - 18:17  (07:11)

> What is  chkconfig  actually checking there?
> How does it detect that wtmp lost entries?
  
chkrootkit runs a series of small programs looking for evidence of hacking or the installation of a root kit.
  
chkrootkit runs the program chkwtmp which reads the wtmp file looking for evidence of deleted utmp entries in wtmp. wtmp has the same structure as utmp.  

I looked at chkwtmp.c. It simply looks for entries that have a 0 in the utmp time entry followed later by a utmp time entry of not 0.  A wtmp file with garbage (un-sync'ed) and re-opened written at bootup could trigger this detection. A consistent wtmp file should have no time entries of 0 or corruption. 

Quoted from an old hacking guide... http://freeworld.thc.org/papers/COVER-1.TXT.

"A small gimmick if you must modify wtmp but can't compile a source and no perl etc. is installed (worked on SCO but not on linux) : Do a uuencode of wtmp. Run vi, scroll down to the end of the file, and and delete the last 4 (!) lines beginning with "M" ... then save+exit, uudecode. Then the last 5 wtmp entries are deleted ;-)"

> And what does rsyslog have to do with it? 
> rsyslog is not in the business of interfering with wtmp at all.

Looking at the rsyslog exit messages in /var/messages gave me a way to correlate the deletion(s) reported by chkrootkit with shutdown.  rsyslog logs an exiting message when it exits at shutdown. The times in the wtmp messages (supposedly deleted) matched the exit times of rsyslog at shutdown.  

Since there was a 100% corretlation of the rsyslog exits and the deletion(s) messages the problem seems to be related to shutdown.
  
systemd manages shutdown. 

There were apparently changes between F14 and F15 in the way that systemd manages the shutdown process. I looked at the files in /etc/init.d and in /lib/systemd and /lib/systemd/system and found considerable differences between F14 and F15 in the shutdown area.  It appears something is messing up the end of wtmp at shutdown. As I said before I suspect there is a failure to sync wtmp before the system is halted, but that is only a guess.

Good Luck.

Comment 6 Aly 2011-10-12 20:54:47 UTC
(In reply to comment #2)
> I am not sure I understand what exactly this bug is about? Are you suggesting
> systemd deletes data from wtmp?
> 
> Well, all systemd does there is call updwtmpx(_PATH_WTMPX, ...), i.e. a libc
> function.
> 
> What is chkconfig actually checking there? How does it detect that wtmp lost
> entries? I.e. where does that information come from? And what does rsyslog have
> to do with it? rsyslog is not in the business of interfering with wtmp at all.

I have the same problem and if I dump the content of wtmp I get lines like

                                                 |Wed Dec 31 17:00:00 1969
                   |                                |8|tty2|    0|

The times before and after these entries coincide exactly with the times of the deletions and they all happen at shutdown. I am curious to know if Norman has something similar. The command I used to dump the wtmp is

/usr/sbin/dump-utmp /var/log/wtmp

Comment 7 Norman Smith 2011-10-12 22:28:46 UTC
Created attachment 527795 [details]
Dump of wtmp entries.

I have found more information about what is happening at shut down.  I modified the chkrootkit program chkwtmp to dump the good wtmp entry and the following five bad wtmp entries and the following good wtmp entry.  From what I see there is no corruption.  There is no sync problem. The wtmp entries appear to be related to the 5 agetty processes.  Something is different between F14 and F15 in handling the wtmp file between shutdown and start-up.  I have attached some of my raw data if anyone is interested.  I still think this is related to systemd but not as I first suspected.  I will continue tomorrow.

Comment 8 Norman Smith 2011-10-12 22:38:42 UTC
I wish I had checked my email.  I didn't read comment 6  until after I had made my attachment (ref comment 7).

from man wtmp

       #define DEAD_PROCESS  8 /* Terminated process */ 

       The first entries ever created result  from  init(8)  processing  init‐
       tab(5).   Before  an entry is processed, though, init(8) cleans up utmp
       by setting ut_type to  DEAD_PROCESS,  clearing  ut_user,  ut_host,  and
       ut_time   with  null  bytes  for  each  record  which  ut_type  is  not
       DEAD_PROCESS or RUN_LVL and where no process with  PID  ut_pid  exists.
       If  no empty record with the needed ut_id can be found, init(8) creates
       a new one.  It sets ut_id from the inittab, ut_pid and ut_time  to  the
       current values, and ut_type to INIT_PROCESS.

I think the init processing may now be different in F15 vs F14.  My wife called me to dinner. This will wait til tomorrow.

Comment 9 Norman Smith 2011-10-13 09:20:32 UTC
Created attachment 527911 [details]
wtmp entries F14 compared to F15

I have compared the wtmp files at shutdown and reboot on an F14 system and an F15 system using /usr/sbin/dump-utmp (Thanks Aly). See attachment.

The wtmp entries that are being found by chkrootkit on F15 are 5 agetty entries with 0 in the time field, ie. Wed Dec 31 19:00:00 1969.  
  
The agetty entries at shutdown for F14 contain a valid time entry.  

Please note that tty1 entry which went from LOGIN_PROCESS (6) to USER_PROCESS  (7) on F15 is handled properly at shutdown.  I have two sets of data for F15.  The first set of F15 data has only tty1 in USER_PROCESS (7) at shutdown. I logged in on tty4 before shutdown for the second set. The second set of F15 data has tty1 and tty4 in USER_PROCESS (7) at shutdown. In the second set tty1 is handled in a similar manner to F14 and has a valid time entry. Strangely tty4 in the second set has a 0 in the time field. In F15 it appears wtmp is not being updated for tty2, tty3, tty4, tty5 and tty6 in the same manner as F14.

wtmp in F15 is handled differently and inconsistently as compared to F14. This still looks like a bug.

Comment 10 Norman Smith 2011-10-13 09:33:57 UTC
Comment on attachment 527911 [details]
wtmp entries F14 compared to F15

Note that the runlevel wtmp entry is missing in F15 at shutdown in the "wtmp entries F14 compared to F15" attachment and the pids are 0 for tty2-6 entries.

Comment 11 Norman Smith 2011-10-14 20:29:57 UTC
This bug is not confined to shut down.  In F15 when a getty dies, init (systemd) is creating a wtmp entry with a ut_type = DEAD_PROCESS (8), ut_pid = 0, ut_line = tty1-n and the (time) ut_tv = 0. 

In FC14 when a getty dies init (upstart) creates a wtmp entry with a ut_type = DEAD_PROCESS (8), ut_pid = the dead getty's pid, the ut_line = tty1-n and the (time) ut_tv = current time.

The behaviour of F15 is definitely inconsistent.  All other F15 wtmp DEAD_PROCESS entries are treated as in F14. But a gettys obituary has no true identity and no time of death.  It seems the philosophy of wtmp logging is to account for a process beginning and a process ending.  If it is reasonable to account for the getty as a LOGIN_PROCESS when it starts it would seem reasonable to account for the getty when it becomes a DEAD_PROCESS.

I have all kinds of data showing any time a getty dies on F15 it's death is entered into wtmp as I have described.

Log in to a virtual terminal and logout.  
kill -s SIGHUP a getty's pid.
shutdown.
reboot.
etc.. etc...

You will get a bad wtmp entry for the DEAD_PROCESS every time.

I think systemd is doing some cool stuff but old init scripts are a lot easier to debug than program source.  I will continue to poke around but I suspect someone who knows the inner workings of systemd will find the problem long before me.

Comment 12 Norman Smith 2011-10-17 02:08:35 UTC
Created attachment 528422 [details]
Proposed patch to utmp-wtmp.c  systemd-26-1.fc15.src.rpm

After many false starts I think I have this problem identitifed correctly and a patch that fixes it.

The updates and appends to utmp and wtmp are performed by systemd in utmp-wtmp.c.  The logic in the original code does not conform to the description in man UTMP(5).  From my research it appears that utmp and wtmp in unix-like systems can pretty much do what they want with these files.  The bug in the systemd code is that it did not do a true update to utmp and it just appended an incomplete utmp entry to wtmp.

I assume that the earlier code in F14 handled utmp and wtmp is a commonly accepted manner.  I did not analyze the code very much in F14 since it is legacy now.  I did try to change the F15 systemd code to perform as described in UTMP(5) and to produce utmp and wtmp entries that correspond to F14 behavior.

I will keep a test system set-up on a hard drive for future testing in case anyone finds a flaw in the patch.  I had more trouble getting a build of systemd from the sources that would boot than any thing else.  I'm sure there is documentation somewhere on how to configure and build systemd from the rpm source but I missed it.  

I finally did a configure:

./configure --with-distro=fedora

Then I changed the Makefile:
 
prefix = /usr/local to /

and

change rootdir = / 

After that I could make, make install, and make uninstall and still have a runable system.

The patch is based on systemd-26-1.fc15.src.rpm.

Comment 13 Norman Smith 2011-10-17 12:46:50 UTC
Created attachment 528527 [details]
Proposed patch to utmp-wtmp.c systemd-26-1.fc15.src.rpm

Properly checked patch as content type.

Comment 14 Michal Schmidt 2011-10-17 14:12:08 UTC
Norman,
thank you for debugging this and for posting the patch.

I have just a couple of technicalities:
 - The content-type of a bugzilla attachment can always be changed. It is not
   necessary to re-upload.
 - It is customary (in Fedora and in almost every open source project nowadays)
   to exchange patches in the unified format (diff -u).

I don't really know much about utmp/wtmp, so I'll let Lennart comment on the patch itself.

Comment 15 Fedora Admin XMLRPC Client 2011-10-20 16:29:09 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 16 Michal Schmidt 2011-11-07 00:15:18 UTC
Applied a modified fix upstream:

http://cgit.freedesktop.org/systemd/commit/?id=4743137a4b7ce6214a06d02872bdfac080b6f131
(+ 3 patches before that)

Comment 17 Fedora Update System 2012-01-11 15:01:12 UTC
systemd-37-6.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/systemd-37-6.fc16

Comment 18 Fedora Update System 2012-01-11 20:57:17 UTC
Package systemd-37-6.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-6.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-6.fc16
then log in and leave karma (feedback).

Comment 19 Michael Stanton 2012-01-12 20:17:02 UTC
I have this problem in F16 too, Bug 636231 - /sbin/init INFECTED - (systemd links /sbin/init->../bin/systemd) has more information. Perhaps these two bugs could be merged?

Comment 20 Michal Schmidt 2012-01-13 10:55:26 UTC
No, bug 636231 is purely a chkrootkit bug where it reports false positive about "Suckit". It is not related to utmp/wtmp handling in systemd.

Comment 21 Fedora Update System 2012-01-16 02:24:51 UTC
Package systemd-37-7.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-7.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-7.fc16
then log in and leave karma (feedback).

Comment 22 Fedora Update System 2012-01-17 13:13:53 UTC
systemd-26-14.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/systemd-26-14.fc15

Comment 23 Fedora Update System 2012-01-17 20:22:39 UTC
Package systemd-37-8.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-37-8.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0409/systemd-37-8.fc16
then log in and leave karma (feedback).

Comment 24 Fedora Update System 2012-01-30 20:58:34 UTC
systemd-37-11.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 25 Fedora Update System 2012-01-31 22:05:35 UTC
systemd-26-14.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.