Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 3 product line. The current stable release is 3.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 156608

Summary:	[RHEL3 U4] The system clock gains much time when netconle is activated.
Product:	Red Hat Enterprise Linux 3	Reporter:	Issue Tracker <tao>
Component:	kernel	Assignee:	Dave Anderson <anderson>
Status:	CLOSED ERRATA	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.0	CC:	jmoyer, linville, petrides, tao
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	RHSA-2005-663	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-09-28 15:01:29 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	156321

Description Issue Tracker 2005-05-02 14:01:01 UTC

Escalated to Bugzilla from IssueTracker

Comment 9 Dave Anderson 2005-05-02 21:04:50 UTC

Adding Jeff Moyer to cc: list in hopes he can help explain this one.

Jeff, 
  
zap_completion_queue() is called in both the netconsole and netump
code paths AFAICT.  This code then, seemingly could wreak havoc,
i.e., the jiffies bump below (and what happens if they set idle_timeout?):

        if (idle_timeout) {
                if (t0) {
                        if (((t1 - t0) >> 20) > mhz_cycles * (unsigned long
long)idle_timeout) {
                                t0 = t1;
                                printk("netdump idle timeout - rebooting in 3
seconds.\n");
                                mdelay(3000);
                                machine_restart(NULL);
                        }
                }
        }
        /* maintain jiffies in a polling fashion, based on rdtsc. */
        {
                static unsigned long long prev_tick;

                if (t1 - prev_tick >= jiffy_cycles) {
                        prev_tick += jiffy_cycles;
                        jiffies++;
                }
        }

Since the code has always been like this, what am I missing?

Comment 10 Dave Anderson 2005-05-02 21:10:34 UTC

setting back to kernel...

Comment 11 Dave Anderson 2005-05-02 21:14:00 UTC

Jeff, looks like those two if statements need a netdump_mode check?

Comment 12 Dave Anderson 2005-05-03 12:50:40 UTC

The user-land mhz argument sent to the netconsole module is basically
ignored, unless, during module load, upon reading the tsc two successive
times with an mdelay() in between, it happens to have done so when the
tsc wrapped around:

        platform_timestamp(t0);
        mdelay(1);
        platform_timestamp(t1);

In other works, if t1 > 0, mhz is completely ignored.  So let's put
that issue out of the picture.

The question is whether netconsole should be doing anything at all
with jiffies during runtime.  Doing an alt-sysrq-t operation with
thousands of processes, or simply repeated keyboard-generated attempts
(instead of echoing to /proc/sysrq-trigger), is essentially one huge
interrupt handler.  I don't know what the author's intent was -- to
"help" jiffies along, or whether it was meant to only do so in a netdump
operation?

What would happen say, if a 9600-baud serial console were hooked up,
without netconsole registered, where a single alt-sysrq-t on a system
with thousands of processes could consume several minutes?

It should also be kept in mind that alt-sysrq-t is a debug strategy,
not something that should be done in the normal course of events.
Furthermore, using /proc/sysrq-trigger does the operation in process
context so clock interrupts wouldn't be blocked.

Comment 13 Dave Anderson 2005-05-05 13:28:42 UTC

Ok, the fix will be to make this simple change to zap_completion_queue():

        /* maintain jiffies in a polling fashion, based on rdtsc. */
-       {
+       if (netdump_mode) {
                 static unsigned long long prev_tick;
  
                 if (t1 - prev_tick >= jiffy_cycles) {
                         prev_tick += jiffy_cycles;
                         jiffies++;
                 }
         }

Note that there is no way the idle_timeout check above it can cause a problem,
because t0 can never be set until a netdump operation is set in motion.

netconsole.c has no business mucking around with jiffies during runtime.

Comment 16 Dave Anderson 2005-05-13 18:54:44 UTC

Should be -- the patch will be posted in conjunction with BZ #157439,
which I'm waiting for IBM to test.

Comment 18 Ernie Petrides 2005-06-09 03:25:09 UTC

A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.7.EL).

Comment 27 Red Hat Bugzilla 2005-09-28 15:01:29 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html