Bug 203818

Summary:

System clock runs slowly or backwards on P4 systems under RHEL4u4.

Product:

Red Hat Enterprise Linux 4

Reporter:

andrew m. boardman <amb>

Component:

kernel

Assignee:

Brian Maly <bmaly>

Status:

CLOSED ERRATA

QA Contact:

Brian Brock <bbrock>

Severity:

high

Docs Contact:

Priority:

medium

Version:

4.4

CC:

andy.wood, johnstul, konradr, paul, rlocke, wdc

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

RHBA-2007-0791

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-11-15 16:14:58 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
lspci -vvv output from an affected machine	none
lspci -vvv from another affected machine	none
sysreport of system WHILE the clock is running backwards	none
sysreport of same hardware same bad kernel before failure	none
sysreport of same hardware but with kernel that works fine.	none
Sysreport with "clock=tsc"	none
apic=verbose dmesg output	none
messages output from restart with apic=verbose	none
dmesg output for kernel with working clock	none
messages output with working clock pickes up where previous messages leaves off.	none
test case in case its helpfull	none
Output of checktime.sh showing startup, and increasing skew after ~12 hours	none
patch to resolve the issue on ThinkCentre S50	none

Description andrew m. boardman 2006-08-23 20:33:36 UTC

Description: after a number of hours, the system clock stops proceeding at the
normal speed, and instead lurches slowly forward *and* backwards, though it
generally seems to make very slow forward progress.

Version-Release number: kernel versions 2.6.9-42.EL (RHEL 4 update 4) and
2.6.9-42.0.2.EL (the new post-4u4 errata kernel) both do this; earlier ones do not.

How reproducible:
Run RHEL4u4 on various P4-based models of the IBM ThinkCentre S50 p/n 8183
(2.8GHz and 3.2GHz models tested, specifically the 8183-LU3 and 8183-Y2E) and
wait anywhere from a few hours to a day or two after boot.  At some point, the
system clock stops.  The hardware clock is unaffected.

Comment 1 Robert Locke 2006-08-24 16:02:22 UTC

I am seeing the same thing on an IBM NetVista p/n 8311 (2.4Ghz P4).

I do not know if it is related, but the other thing I noticed as different
between the 2.6.9-22.EL kernel that works and the 2.6.9-42.EL and
2.6.9-42.0.2.EL that "stops", is the IRQ routing done by ACPI (is this a
regression caused by bugzilla 184254, perhaps?).  The old kernel was assigning
shared IRQ's of 11,10,9,5, while the new (42 and 42.0.2) kernel is assigning
IRQ's in the high 100's low 200's.  I noticed one line difference that might
pertain from /var/log/messages just before the listing of IRQs assigned:

Old, working  -> PCI: IRQ9 SCI: Level Trigger.
New, stopping -> ENABLING IO-APIC IRQs

Again, I don't know if the IRQ Routing is related to the real problem of the
system clock, but noticed the "difference" when investigating....

Comment 2 Andy Wood 2006-09-29 16:41:42 UTC

I too am seeing this behaviour on an IBM ThinkCentre running RHEL4.

Is this related to Fedora Core bug 127411 and bug 152170?

Comment 3 Andy Wood 2006-10-03 10:12:50 UTC

Specifically this is seen with both 2.6.9-5.EL and 2.6.9-42.0.2.EL kernals on
an IBM ThinkCentre.

cat /proc/cpuinfo gives model name : Intel(R) Pentium(R) 4 CPU 2.80GHz

Comment 4 Andy Wood 2006-10-11 15:17:46 UTC

The issue also occurs with Fedora Core 5 kernels 2.6.15-1.2054_FC5smp and 
2.6.17-1.2187_FC5.

Comment 5 Brian Maly 2006-10-11 15:24:09 UTC

what chipset is in the affected machines? i.e. can we get an 'lspci'?

also, what timesource is being used? HPET, PMTImer, TSC/PIT?

Comment 6 andrew m. boardman 2006-10-11 19:34:18 UTC

Created attachment 138273 [details]
lspci -vvv output from an affected machine

Requested output attached.

I'm not sure how to answer the "what timesource is being used" question.  The
problem manifests in what's reported by the kernel's timekeeping as reported by
gettimeofday().  (Via, say, running "date" repeatedly.)  As to whether the
kernel's using HPET, the PM timer, some combination thereof, or something else
entirely, I'm not sure how to check; no "clock=..." boot paramater was given,
though, so it should be running with default behaviour.  (The hardware clock I
reference that's actually working properly is the RTC.)

Comment 7 Andy Wood 2006-10-25 13:06:55 UTC

Created attachment 139344 [details]
lspci -vvv from another affected machine

This is lspci -vvv from my affected machine.

Comment 8 Andy Wood 2006-11-20 16:27:06 UTC

Is the status of this bug correct - as it still has NEEDINFO from amb - 
even though this has been provided?

Comment 9 Brian Maly 2006-11-20 20:38:39 UTC

Is cpuspeed scaling being used? If so, does the problem still occur with it
disabled.. i.e. if the CPU runs at a fixed frequency?

Also, can someone attach sysreports for both a working and broken kernel?

Comment 10 wdc 2007-01-29 21:01:50 UTC

Created attachment 146867 [details]
sysreport of system WHILE the clock is running backwards

We managed to get a sysreport while the clock was running backwards.

Comment 11 wdc 2007-01-29 21:03:38 UTC

Created attachment 146868 [details]
sysreport of same hardware same bad kernel before failure

This sysreport is taken of the same hardware and the same kernel as 11850,
but before the clock started running backwards.  We presume this sysreport
will be identical to 11850.

Comment 12 wdc 2007-01-29 21:05:49 UTC

Created attachment 146870 [details]
sysreport of same hardware but with kernel that works fine.

This is the sysreport of the same hardware, but running the older kernel
that does not exhibit the "clock runs backwards" failure.  We suspect you
will find this identical to the other two sysreports excepting for the
kernel version, and that pretty much should tell you exactly what the hardware
is, the configuration, and confirm that the kernel is at fault.

Comment 13 wdc 2007-01-29 21:10:12 UTC

Bad user interface!  No biscuit!
How am I suspposed to add in my comments AND say I'm providing the
requested information, when the "create attachment" link below VERY CAREFULLY
DESTROYS my COMMNET??

Anyway, here is your sysreport output!
It would have helped to know that sysreport was a command that existed, but that
we'd need to install the sysreport RPM before we saw it.  We never heard of it
before.

Comment 14 Brian Maly 2007-01-29 21:41:32 UTC

Looks like your using PMtimer with the bad/broken sysreport and TSC in the
good/working sysreport. Can you try booting with "clock=tsc" and see if it makes
a difference? The Intel ICH5 chipset has a few known timer issues.

Also, you might try disabling NTP and see if that make a difference.

Comment 15 wdc 2007-02-05 22:42:50 UTC

Created attachment 147416 [details]
Sysreport with "clock=tsc"

Comment 16 wdc 2007-02-05 22:48:38 UTC

I've just attached a sysreport from the test system running the unhappy kernel,
and with, we believe, the setting clock=tsc.

I've included the sysreport itself in the hopes that you can use it to determine
that we correctly set clock=tsc, because it made NO difference.  i.e. We logged
onto this host and generated the sysreport with the clock running backwards.

I GUESS we could try disabling NTP, but I'd like to know why there is any reason
to suspect that NTP is the cause, when it's the same NTP but a DIFFERENT kernel?

Comment 17 Brian Maly 2007-02-06 17:20:47 UTC

The use of "clock=tsc" was correct... the system booted using TSC as the timesource.

The reason NTP is suspect is due to some timekeeping code changes that relate to
NTP.

Also, if cpuspeed scaling (i.e. speedstep-centrino) is being used, try disabling
this and see if it makes a difference.

We dont seem to have this piece of hardware in-house, but I will look at the
kernel changes between .22 and .42 and see if anything looks suspect. I may need
some help testing in order to isolate which patch caused this regression.

The chipset in this system is known to be problematic, but not usually as broken
as in this case.

Comment 18 wdc 2007-02-12 18:42:17 UTC

FYI:

We were advised by someone to try booting with ACPI and APIC turned off,  to help narrow down 
whether or not the advanced power management functionality in the kernel is causing the variations in 
clock ticks.  

We followed these steps:

1)  Reboot
2)  Catch grub and prevent it from auto booting
3)  Highlight the newest kernel and hit 'e'
4)  Highlight the kernel line and hit 'e'
5)  At the end of the line add " noapic noacpi apic=off acpi=off"
6)  Hit enter, then 'b' to boot

And the result was a system whose clock PROPERLY kept time again.

Comment 19 wdc 2007-02-13 20:50:12 UTC

FYI: I've taken some time to learn about the different clock algorithms mentioned (HPET, PMTImer, TSC/
PIT).

Apparently PMTimer is what you're supposed to use to compensate for stuff happening with ACPI, and 
indeed if one runs TSC with ACPI enabled one should expect to see incorrect time.

We're gonna see what happens with HPET and PIT selected as the clock source, but I think we'll see the 
clock still run backwards because the root cause of this problem is that the code in the PMTimer routine 
no longer properly does the job of properly compensating for action by ACPI.

Since the older kernel was running TSC by default, perhaps the more accurate characterization of the 
situation is:

With the newer kernel, ACPI is enabled by default and the clock algorithm is correctly set to PMTimer to 
deal with the effects of ACPI.  Alas, PMTimer is not properly dealing, and the clock behaves the SAME as 
if TSC is running with no intervention to deal with the ACPI.

Comment 20 Brian Maly 2007-02-16 20:17:49 UTC

Can you narrow down which boot arg was the one that made a difference in Comment
#18. It would help to narrow down which piece of the kernel code is failing.

Comment 21 wdc 2007-02-22 19:02:23 UTC

From my reading, I learned that ACPI code is generally expected to be flaky and harder to get right
than the APIC code.  Figuring, "Ok, let's do a quick sanity check on the code we expect to be good,
let's turn off ACPI and let the good APIC code run and make sure that's not the source of the problem."

Guess what?

We booted with arguments noacpi acpi=off and the clock RAN BACKWARDS!
So what we expected to NOT be a source of problems IS a source.

We are now testing with arguments noapci apci=off in the hope that the clock will run forwards,
and that we've isolated the fault into the unexpected quadrant of APIC support.

Comment 22 wdc 2007-02-23 23:40:20 UTC

I think you will find the following VERY interesting.
Yesterday we booted with noapic apic=off
Today the clock is FINE!

I believe this means we may have isolated the fault to interaction with the APIC code.

Would you please confirm for me that experimentation with other clock algorithms is
probably not appropriate now?

Would you also suggest another step we can take to further help isolate the fault.

Comment 23 Brian Maly 2007-02-27 18:18:30 UTC

The root cause is almost certainly in the APIC code. The clock code is generic
and all systems would have exhibited this same problem (and this certainly would
have been fixed long ago). So can we do a little APIC debugging here?

First "apic=" is only used for debugging and likely isnt needed. Can you just
boot with the "noapic" flag (and no other flags) and see if the system still
keeps time?

Second, can you just boot the system with "apic=debug" or even "apic=verbose".
This should provide some usefull debugging info. Please attach any relevant
output to this Bug.

Third, try booting with "lapic" (and no other flags) instead and see if the
local APIC works better for keeping good time.

Comment 24 wdc 2007-03-01 22:13:11 UTC

Status report:

We ran for a day with just the "noapic" flag.  Clock was fine.
We ran for a day with just the "lapic" flag.  Clock was BAD.

I will start up the system with "apic=debug", and attach output that seems relevant.

Answers to the following clarifying questions would be helpful:

Is is sane to specify BOTH apic=debug and apic=verbose as options?
Over what time period should I plan to gather log output?
Are there any log files besides /var/log/messages that will be relevant?

Comment 25 Brian Maly 2007-03-01 22:41:58 UTC

apic=verbose is apic=debug, just more detailed output. In looking at the code
again here verbose is probably whats needed (instead of just basic debug level).

Gather any usefull log entries from boot until the system is completly up and
running. We are particularly interested in anything that pops up during the
boot/init process.

A copy of /var/log/dmesg is probably sufficient. Its worth setting
/etc/syslog.conf to log kern.* to /var/log/messages and include that file as
well (or just the portion since last boot, since the messages file can be huge).

Comment 26 Brian Maly 2007-03-01 22:54:06 UTC

Just to make sure I was clear enough in Comment #25, just boot with
"apic=verbose". I want to look and see what code path we are going down. A copy
of the same apic=verbose output from a working kernel would be very helpfull as
well (if possible). We can see if the code path differs between working & non
working. There are a few suspect patches I can isolate and remove + build a test
kernel if your willing to help test.

Comment 27 wdc 2007-03-01 23:04:00 UTC

Created attachment 149064 [details]
apic=verbose dmesg output

Comment 28 wdc 2007-03-01 23:04:55 UTC

Created attachment 149065 [details]
messages output from restart with apic=verbose

Comment 29 wdc 2007-03-01 23:07:06 UTC

Thanks very much for quickly answering my clarifying questions.
I was expecting I'd have to run the system over night before giving you the output you required.
Here you ARE!

I didn't muck with syslog.conf because it looked like it was doing the right thing.
What further information can I supply?

Comment 30 wdc 2007-03-01 23:18:25 UTC

Created attachment 149070 [details]
dmesg output for kernel with working clock

Comment 31 wdc 2007-03-01 23:19:09 UTC

Created attachment 149071 [details]
messages output with working clock pickes up where previous messages leaves off.

Comment 32 wdc 2007-03-01 23:25:32 UTC

I see our follow-ups crossed in the ether.
I've added attachments for the .9-34 kernel that works.  (Reminder:  Its the .9-43 kernel that has the 
problem.)

I don't know how much help comparing the working one will be because it looks like that version didn't 
DO any APIC clock work at all.  It used tsc, remember?  Anyway, let me know what else to do to help 
isolate this fault.

Comment 33 Brian Maly 2007-03-02 17:59:41 UTC

Re: Comment #32,

Just want to clear up some potential points of confusion...

The system didnt keep time even when using TSC as the timesource on the broken
kernel correct (see Comment #16)? 

Im assuming here that the broken kernel wont keep time no matter which
timesource is used (HPET, PMTimer, TSC and PIT). Please verify assumption this
is correct.

Im guessing the problem is in the apic code though. Disabling apic resolved the
issue. Worth noting is that when TSC is used as a timesource, its actually using
PIT and TSC (and not pure TSC). PIT is used during the timer interrupt, and the
TSC offset is read during a non-interrupt, etc (i.e. gettimeofday() call). TSC
is used for performance reasons... its quick to read but has but less accurate,
hence the need to use PIT as the base and do a quick TSC read to calculate the
offset. TSC is scalable on MP systems like when you have a database with
thousands of threads all doing a gettimeofday() 



Also, thanks for the apic=verbose outputs.. Ill take a look at them.

Comment 34 Brian Maly 2007-03-02 18:16:09 UTC

Looked at the sysreports a bit more... So booting with "clock=tsc" on a bad
kernel doesnt fix the problem right? We have established so far that using
PMTimer (and apic timer) doesnt work. But what about "clock=pit"?

Im just a bit confused here on what works and what doesnt (gone around in
circles too many times due to the obscurity of this problem). 

Does a broken kernel fail to keep time regardless of which clock is used? Or
does it acutally keep time with TSC?

Comment 35 wdc 2007-03-02 22:50:24 UTC

Booting with "clock=tsc" on a bad kernel does NOT fix the problem.  We tried that early on.
We were going to try other time bases, but were unsure if that mattered.

I will run with "clock=pit" over the weekend and report results on Monday.
Is "clock=hpet" the other option to specify?

Comment 36 Paul J Barrett 2007-03-02 23:59:15 UTC

I am also running a 3.0ghz IBM that is having the same problem. I have only
added the noapic to grub.conf on the kernel lines and that seems to have fixed
it at this point.

I would also add that shutdowns where "VERY" slow before. It would almost appear
to hang. Even just issuing simple commands (top, ls) would lag greatly.

Comment 37 wdc 2007-03-05 20:24:27 UTC

I ran with "clock=pit" over the weekend.  BAD CLOCK! No biscuit!  It still thinks it's friday.
So we now know that the following do NOT keep time with apic enabled:
    pmtimer
    tsc
    pit

I will run with "clock=hpet" and report on those results tomorrow or so.

Comment 38 Brian Maly 2007-03-05 20:39:33 UTC

Thanks for the debugging info. Im able post a test kernel tarball (with suspect
patches removed from the build) Are you able and willing to build and install a
kernel and do some simple testing (boot and check time, etc)? Since this is
almost certainly APIC related, I wanted to determine which APIC patch is causing
the issue. At least then we will know why this is broken and can come up with a fix.

Comment 39 Brian Maly 2007-03-05 20:42:22 UTC

Created attachment 149291 [details]
test case in case its helpfull

Im attaching this script in case its helpfull in detecting the time skew faster
or more easily.

Run 'hwclock -s" to sync RTC and sysclock before running the checktime script.
RTC is used as a reference time source, hence the need to sync.

Comment 40 wdc 2007-03-05 20:55:47 UTC

Thank you for the test script.  It will be helpful in reducing the turn-around time on tests.

I've been running the tests while the engineer who normally builds and installs kernels
has been on vacation.  Since that engineer doesn't actually report to me, I don't know
when he's scheduled to return.

I'm out of practice building kernels, and I don't want to mess up Andrew's infrastructure.
With the caveat that there will be some delay though, YES we are INDEED willing to install,
and if necessary build, kernels with bits and pieces in or out that you supply.

So yes, please do supply your test kernel tarball.

If you have multiple patches for us to fiddle with, please make explicit exactly what you want in/out
and in what order you'd like us to test the various kernels.

Comment 41 wdc 2007-03-05 21:07:13 UTC

I just attempted to use your checktime utility.
Alas, though ksh may be your default shell, and something you assume everyone has installed,
it is not installed on our (usually pretty heavily tricked out) default configuration.

It's inconvenient for me to track down the ksh rpm and install it.
And I'm not up to speed on the differences between ksh and sh or csh.

Would it be terribly inconvenient for you to provide a checktime.sh script based on sh or csh?

Sorry.

Comment 42 Brian Maly 2007-03-05 22:00:22 UTC

ksh is bash compatable. just change line 1 of script to #!/bin/bash and run.

Ill pull some patches and post a kernel tarball and provide the URL in Bugzilla.

Comment 43 wdc 2007-03-06 18:48:17 UTC

Status report:

I ran off yesterday with clock=hpet running but without trying your amended script.  (We'll use that with 
the kernel stuff.)

clock=hpet ALSO FAILS to keep correct time.  (We expected this, and it's the last "due diligence test".)

The engineer who would build kernels is back from vacation.  I'll be meeting with him later today to 
discuss this.

So as soon as you post the kernel tarball, we'll be ready to take the next step.

Comment 44 Brian Maly 2007-03-06 20:56:08 UTC

First thing to try determine is when this regression occured. Comment #1 seems
to indicate this worked in RHEL4U2 but was broken in RHEL4U4. So does the
RHEL4U3 kernel work? I need to determine if it was the U3 or U4 patchset that
broke this.

Can you install this kernel RPM and see if the system keeps time:
http://people.redhat.com/bmaly/kernel-2.6.9-34.EL.i686.rpm

From there, Ill pull suspect patches out and have you build a kernel from a
tarball which Ill post. We can then isolate which patch is the culprit, then
figure out why it broke.

Comment 45 wdc 2007-03-06 22:43:36 UTC

We have installed the kernel RPM you've provided, and are running the checktime utility.
Two things to note:

1. Under bash, numbers with a leading zero are interpreted base 8, so at all times where the
leading zero is present in the output of date, we lose.  The script blows out because the 
comparison is out of range. Andrew fixed the problem by changing
all the arithmetic expressions to include "#10" to force base 10.  (Gotta love those shells,
everyone should write their own shell, and make it as close as possible, but NOT the same
as the existing shells.)

2. The clock failure, in a failing kernel does not happen right away.  Maybe the clocktest script will
notice exactly when it happens, but it DOES take a while before the clock quits working.

At any rate, we're running the kernel you've provided.  Since it's 2.6.9.-34, like the working kernel
we reported on in the sysreports we filed as attachments, we expect this one will keep time just fine.
We'll know for sure tomorrow, I expect.

Comment 46 wdc 2007-03-07 22:17:40 UTC

Status update:

We ran the 2.6.9-34 kernel provided.  As expected it kept time perfectly well.

Additional information:

When running tests, expect that it will be some time before the problem manifests.
It may be that at around midnight, the clock simply STOPS.  Monitoring with the clocktest script
will give us more exact insight.

Comment 47 wdc 2007-03-09 21:02:01 UTC

Last night we ran the 2.6.9-42.0.3 kernel with the checktime program.  I will attach the output of 
checktime.

We rebooted the system on 8 March 2007 at 14:27PM.
The clock ran fine until 9 March 2007 at 04:14:50.
At that point the divergence was monotonically increasing.
Note that 04:15 AM is within half an hour of an uptime of 12 hours.

This suggests a couple questions to me:

Is there something that gets done in advance of 12 hours of uptime?
Is there something that gets run out of cron at around 4:15 AM related to time keeping?

Comment 48 wdc 2007-03-09 21:03:10 UTC

Created attachment 149733 [details]
Output of checktime.sh showing startup, and increasing skew after ~12 hours

Comment 49 Brian Maly 2007-03-26 17:07:15 UTC

This regression first appeared in 2.6.9-34.15 kernel.

I will determine which of the 7 patches that we in .15 that caused the breakage.

Comment 50 Konrad Rzeszutek 2007-04-27 18:15:04 UTC

Can you provide the /proc/interrupts with the kernel that halts and with the
2.6.9-34 one? I am curious to see if the 2.6.9-34 ended up using the LAPIC or
the PIC controller.

Those IBM boxes have a firmware bug where the SMI firmware when exiting screws
up with the timer. One solution is to not use IO APIC, which is what the
2.6.9-34 Uni Processor kernels did, or just disable it using noapic (which
disables the IO APIC init routines).  I've been trying inside IBM to get them to
fix this, but no luck so far.

What I think we can do for RHEL4 U6 is to hard-code the 'noapic' parameter in
the kernel so that when it detects these machines it sets that. For that, we
need the output of 'dmidecode'.


Brian, the possibility of this regression was mentioned by me when submitting
the fix for RHEL4 U4 and that the IBM ThinkCentre will be affected, but that
many other boxes would benefit from this. The solution that Jason agreed to was
to submit code for blacklisting those ThinkCentre to do _no_ IOAPIC
initialization  (which is what 2.6.9-34 did).

Comment 51 Brian Maly 2007-04-27 18:44:26 UTC

Im currently doing some testing, so I cant reboot with a known bad kernel, but
heres the /proc/interrupts for now. I will update this if after booting with a
known bad kernel, if it happens to differ.

           CPU0
  0:    7368715          XT-PIC  timer
  1:         10          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  5:          0          XT-PIC  uhci_hcd
  8:          1          XT-PIC  rtc
  9:          0          XT-PIC  acpi, Intel ICH5, ehci_hcd
 10:          0          XT-PIC  uhci_hcd
 11:      79432          XT-PIC  uhci_hcd, uhci_hcd, eth0, i915@pci:0000:00:02.0
12:         92          XT-PIC  i8042
 14:      10801          XT-PIC  ide0
NMI:          0
ERR:          0

Comment 52 Brian Maly 2007-04-27 18:49:54 UTC

I dont want to go blacklisting systems if possible, its hard to maintain. It may
be a more generic chipset issue which would be more practical to address. So I
will need to determine how the code changed when .config options were modified.
I saw some #ifdef CONFIG_SMP related bits upstream that might need backporting
(to remove bits of code from the UP kernel). And yeah, this is a UP kernel.


I seem to remember CONFIG_X86_UP_IOAPIC and CONFIG_X86_UP_APIC also being
related to some NMI code as well.. wonder if thats related.

Comment 53 Brian Maly 2007-04-28 18:21:27 UTC

Just an update on this issue. The change in the -34.15 kernel that caused this
regression was a change to config-i686. Setting CONFIG_X86_UP_IOAPIC=y breaks
timekeeping on this system, unsetting this config option causes the box to run fine.

I will explore why this change broke timekeeping and if there are any upstream
fixes that can be backported to resolve this issue.

Comment 54 Brian Maly 2007-05-01 15:17:30 UTC

I was able to narrow down which lines of code cause the regression. In
arch/i386/kernel/time.c in function do_timer_interrupt():

#ifdef CONFIG_X86_IO_APIC
        if (timer_ack) {
                /*
                 * Subtle, when I/O APICs are used we have to ack timer IRQ
                 * manually to reset the IRR bit for do_slow_gettimeoffset().
                 * This will also deassert NMI lines for the watchdog if run
                 * on an 82489DX-based system.
                 */
                spin_lock(&i8259A_lock);
                outb(0x0c, PIC_MASTER_OCW3);
                /* Ack the IRQ; AEOI will end it automatically. */
                inb(PIC_MASTER_POLL);
                spin_unlock(&i8259A_lock);
        }
#endif


With this bit of code removed, the regression dissapears. It was enabled with
the config file changes that went into -34.15.

Comment 55 Brian Maly 2007-05-01 15:30:33 UTC

So the regression dissapears when timer_ack=0. timer_ack is set in
arch/i386/kernel/io_apic.c in check_timer(). timer_ack=0 happens if
timer_irq_works() returns 0. So seems like timer_irq_works() might be a place to
fix this possibly. I would like to see if there are other changes in io_apic
that might cause the timer_irq breakage, as thats a better fix if its possible.

Comment 56 Konrad Rzeszutek 2007-05-01 15:48:41 UTC

If I remember correctly, the timer_ack is set when probing the timer via IO APIC
or via the legacy timer, which is determined if the board IRQs are initialized
via APIC or the legacy timer, which in turn is determined by probing the ACPI
tables provided by the BIOS . Or when CONFIG_X86_IO_APIC is not set, then using
the MP tables to set the IRQs and also using the legacy timer instead.

Comment 57 Brian Maly 2007-05-01 16:01:26 UTC

So you think it might be a BIOS issue or perhaps ACPI breakage (wont process the
BIOS info correctly)?

The affected system uses an Intel ICH5 chipset. Its known to be a rather buggy
chipset, but I need to find a different system that has this chipset to test on.
I think we need to figure out if this problem affects all systems with ICH5 or
just this specific model of Dell.

I do see ACPI parser errors on this system for what its worth.

Comment 58 wdc 2007-05-01 17:44:11 UTC

Quick factual correction:  This is NOT a Dell system.
It is an IBM S50 system.

-wdc

Comment 59 Brian Maly 2007-05-01 17:48:58 UTC

Not a Dell. Ok, I must have confused this with a Dell related bug.

But some Dell's do use this chipset, so Ill try and round one up.


Konrad, can we determine if there were any BIOS updates from IBM that might
address this issue? Is this a known issue to IBM, does it affect any other
models too?

Comment 60 Konrad Rzeszutek 2007-05-01 17:52:14 UTC

John,

Is this related to the SMI issues you had seen sometime ago?

Brian,

I believe that there were no updates from IBM for those boxes as they have
reached end-of-life.  This issue does not affect other models (John, please
correct me if I am wrong)

Comment 61 Brian Maly 2007-05-01 18:54:19 UTC

So its my hypothesis that timer interrupts stop because the system hangs in
do_timer_interrupt(). Its either because of a deadlock issue with
spin_lock(&i8259A_lock), or more likely a hang in writing/reading the registers:

outb(0x0c, PIC_MASTER_OCW3);
inb(PIC_MASTER_POLL);

I wonder what happens if a SMI occurs while doing this write/read?

Strangely this code works for about 12 hours before timer interrupts stop. 
Im also exploring the possibility that this is NMI related just to rule that out.

Comment 62 Brian Maly 2007-05-01 19:30:00 UTC

So I have a test running to test the deadlock theory. I will know by tomorrow if
thats the issue. Its easier to test than if a write/read completed.

Comment 63 john stultz 2007-05-02 18:43:47 UTC

(In reply to comment #60)
> Is this related to the SMI issues you had seen sometime ago?

It does look like it. The problem w/ IBM ThinkCentres was a BIOS SMI would muck
up the apic interrupt frequency, so time would move very slowly. A BIOS update
could fix it, but not all affected systems got BIOS updates. I believe booting
w/ noapic is the workaround for systems that do not have a BIOS update available.

There is also a kernel side patch (never accepted into mainline) that can be
found  attached to the following bugme bug:
http://bugzilla.kernel.org/show_bug.cgi?id=6296

The patch has been around a few years and has not gone into mainline, so there
may be issues using it more widely. I'm really not sure.

Comment 64 Brian Maly 2007-05-07 17:33:41 UTC

Any way we could get a dmidecode output for the affected IBM NetVista system?

Comment 66 Brian Maly 2007-05-07 17:37:57 UTC

Created attachment 154286 [details]
patch to resolve the issue on ThinkCentre S50

Comment 67 Brian Maly 2007-05-07 17:39:23 UTC

Putting this back in NEEDINFO state... See Comment #64, need dmi strings for
other affected models.

Comment 69 RHEL Program Management 2007-05-09 16:01:52 UTC

This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 70 wdc 2007-05-09 16:32:43 UTC

I just wanted to say that Brian Maly worked really hard to understand this bug, and exceeded my 
expectations in his effort level and in his effectiveness at moving the issue forward.  It's been a very long 
time since I've seen anybody do this well with a bug.  THank you VERY much Brian!

-Bill Cattey
Linux Platform Coordinator
Massachusetts Institute of Technology
Cambridge MA

Comment 76 Jason Baron 2007-07-29 13:19:16 UTC

committed in stream U6 build 55.25. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/

Comment 79 errata-xmlrpc 2007-11-15 16:14:58 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html