Bug 226701

Summary: netdump client misses COMM_REBOOT with large number of console messages
Product: Red Hat Enterprise Linux 4 Reporter: Bryn M. Reeves <bmr>
Component: netdumpAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 4.4CC: benl, edamato, konradr, poelstra, tao
Target Milestone: ---Keywords: OtherQA, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0827 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-15 16:11:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 222397, 239544    
Attachments:
Description Flags
patch changing order of reply and show_state
none
change netdump-server reboot timeout & interval
none
Make netdump queue packets while dumping state to the console
none
netdump kernel patch
none
netdump server patch
none
revised kernel side patch with balanced parens
none
revised kernel patch none

Description Bryn M. Reeves 2007-02-01 00:34:37 UTC
Description of problem:
When the netdump client gets a SHOW_STATE command it sends the REPLY_SHOW_STATE
back to the server immediately, before dumping thread & memory information over
netconsole:

        case COMM_SHOW_STATE:
                /* send response first */
                reply.code = REPLY_SHOW_STATE;
                reply.nr = req->nr;
                reply.info = 0;

                send_netdump_msg(&np, tmp, strlen(tmp), &reply);

                netdump_mode = 0;
                if (regs)
                      show_regs(regs);
                show_state();
                show_mem();
                netdump_mode = 1;
                break;


When the server gets this packet it begins sending COMM_REBOOT requests once
every 0.5s for 20 retries before giving up.

If the client has a slow console and/or lots of threads running, it can easily
take > 10s for or the show_state/show_mem calls to return, resulting in the
client missing the reboot commands and spinning forever waiting for a new packet
from the server.

Version-Release number of selected component (if applicable):
2.6.9-42.EL
netdump-server-0.7.16-2

How reproducible:
100% under the right conditions. This was reproduced reliably on a workstation
with tg3 NIC and 38400bps serial console and a few hundred processes running.
Networks with lots of UDP broadcast also may play a role.

Steps to Reproduce:
1. Configure a client so that the thread dump takes longer than 10s.  

e.g. on the setup above running: 
# for i in {1..400}; do sleep 180 & ; done

before crashing the client reliably reproduced the problem.

2. Crash the netdump client, for e.g.:
# echo c > /proc/sysrq-trigger

3. Wait for task & mem info to finish displaying on console
  
Actual results:
Netdump client appears to hang after show_mem, spinning waiting for the next
command. Netpoll is still working at this point, but all the reboot packets have
already been dropped.

Expected results:
Netdump client gets reboot packet and reboots sucessfully.

Comment 1 Bryn M. Reeves 2007-02-01 00:48:03 UTC
Created attachment 147064 [details]
patch changing order of reply and show_state

This patch just changes the ordering, moving the reply after the call to
show_mem. Tested locally with the same setup originally used to reproduce and
it didn't cause any problems & allowed a successful reboot.

Comment 2 Bryn M. Reeves 2007-02-01 00:51:58 UTC
Created attachment 147065 [details]
change netdump-server reboot timeout & interval

This patch also avoids the problem, this time on the server side by increasing
NETDUMP_REBOOT_DELAY_MSEC and NETDUMP_N_REBOOT_REQUESTS.

Comment 3 Bryn M. Reeves 2007-02-01 01:16:46 UTC
The patch in comment #1 seems like the "right" way to fix this to me, as it
ensures the client & server are in sync, but I'm assuming there's a reason for
the somment /* send response first */ in netdump. 

The second patch works for the test cases I've used so far, but the numbers
could probably do with fine tuning (1000ms between requests and 200 requests).

Comment 4 Neil Horman 2007-02-01 12:59:51 UTC
Neither patch seems correct to me.  We send the REPLY_SHOW_STATE message before
dumping the memory and system state information to avoid a timeout on the server
and the resending of the same message again and again (which your first patch
would seem to cause to happen more often). Further, the fact that we set
netdump_mode to 0 after we send that message should prevent us from actually
receiving the subsequent reboot messages from the server until we are done
sending  the netconsole messages we generate.  Unfortunately we have no real way
to determine when the console is "done", so this is a best effort sort of thing
(and why we abandoned netdump for kdump).  Why don't you try adding
for(i=0;i<1000;i++)
     udelay(1000);
between the call to show_mem and netdump_mode=1 in netpoll_netdump.  That should
give the console time to flush.  If that works perhaps we can work up a patch
that makes that timeout configurable for those individuals that have
sufficiently busy networks to see this problem. 

Comment 5 Bryn M. Reeves 2007-02-01 13:11:22 UTC
Sorry if the point of the patches wasn't clear - they're not really proposed
'fixes' as much as patches I've been using to help debug the problem and figure
out what the real fix is.

I'll test the delay loop this afternoon, but I suspect the reboot commands will
still be long gone by the time we return from show_mem and set netdump_mode=1.

Btw, to clarify - the behavior I seem to be seeing is that when we return from
show_mem the server has already sent the reboot command and it has been dropped
by netpoll. I'm not sure how waiting even longer here is going to help avoid
that (I can reproduce the same behavior by just replacing show_mem/show_state
with a wait of >10s).


Comment 6 Neil Horman 2007-02-01 13:37:20 UTC
>"Sorry if the point of the patches wasn't clear - they're not really proposed
>'fixes'"
Then why did you say "The patch in comment #1 seems like the "right" way to fix
this to me" in comment #3?

>"Btw, to clarify - the behavior"
you don't need to clarify, I think its pretty clear whats going on.  On a busy
network, trnasmission delays on the network cause the reboot command from the
netdump server to be received by the client after we re-enable netdump mode but
before the nic can transmit all the netconsole messages that were generated as a
result of the previous show status command.  Since we have no way to know when
netconsole is done with the work that we generate for it over any given period.
 This doesn't happen on lightly loaded machines or not so busy networks because
netconsole can clear out the tx queue before they get the reboot command.  Thats
why I'd like to put a delay in before we re-enable netdump_mode, so that we give
the console time to send all its data before we allow a reboot.

let me know the results of your testing.  Thanks!

Comment 7 Bryn M. Reeves 2007-02-01 22:28:49 UTC
> Then why did you say "The patch in comment #1 seems like the "right" way
> to fix this to me" in comment #3?"

That's what the "" were about.. should have been more clear. Intuitively, it
felt more "right" moving the response after the show_mem/show_state than hacking
around with the timeouts on the server. Doing that server side means we're OK
for this case (thread dumps taking 1-2 minutes) but it's a kludge and will fail
for cases where the thread dump is taking longer. Again, sorry that wasn't clear
- I'll be sure to include "RFC" in the comment next time ;)

I've tested the delay loop suggestion and this had no impact on the problem for
my test cases. The client still sits spinning forever. Lengthening/shortening
the period from the 1s delay you recommended also didn't help.

I don't see how this delay could help here, as it seems to me it's the delay
while calling show_state/show_mem (while polling the controller) that causes the
problem - replacing those calls with a similar wait/poll loop of >10s is enough
to reproduce this, with no console backlog at all.

> you don't need to clarify, I think its pretty clear whats going on.  On a busy
> network, trnasmission delays on the network cause the reboot command from the
> netdump server to be received by the client after we re-enable netdump mode but
> before the nic can transmit all the netconsole messages that were generated as a
> result of the previous show status command.

That's not what I described and it doesn't match what I'm seeing.

Can you explain the reasoning behind the packet being received on the client
after we re-enable netdump_mode? This seems to contradict what I'm seeing in my
tests.

I see the reboot packets going to the NIC /while/ netdump_mode == 0, being
handed to netdump_rx and discarded:

Enabling packet dumps inside netpoll seems to confirm this. During
show_mem/show_state, I see exactly 20 UDP packets recieved and handed thru
netpoll_rx -> __netpoll_rx -> netdump_rx and being thrown away.

Once show_mem returns & netdump_mode is reasserted I see a tonne of UDP traffic
from a neighbouring cluster, but nothing from the netdump-server is ever passed
to netpoll_rx after re-setting netdump_mode - all twenty packets are long gone
by this stage.

I don't see adding delays to the client code ever preventing this - either the
netdump server needs to send the requests later or over a much longer interval
(to ensure they hit netpoll after netdump_mode=1 and are handed to add_new_req
instead of hitting the floor), or the netdump client needs to delay sending
REPLY_SHOW_STATE so that the server doesn't start sending COMM_REBOOT until
we're ready to receive it.

I will attach some traces showing the packets being handled by netpoll that
shows the behavior I'm describing.



Comment 8 Bryn M. Reeves 2007-02-01 22:34:52 UTC
Created attachment 147163 [details]
netpoll debug log showing reboot messages being dropped while netdump_mode=0

The time spent with netdump_mode==0 is bracketed by ***waiting/***waited, so:

$ awk '/^.*waiting.*$/,/^.*waited.*$/{print}' screenlog.0.dropped_packets2|
grep dropping | wc -l
20

All the packets being logged here after netdump_mode=1 are UDP broadcast
unrelated to netdump ("local ip check fails" in the log). 

Note that the same behavior (no reboot) is seen when no console logging is done
from netpoll (i.e. just sitting in a loop alternately mdelaying/polling).

Comment 9 Bryn M. Reeves 2007-02-01 22:42:26 UTC
Created attachment 147164 [details]
patch to reproduce missed COMM_REBOOT with netpoll logging

Debugging patch - this avoids calling show_state/show_mem and just sits in a
loop:

for(i=0;i<300;i++){
	mdelay(100);
	netpoll_poll(&np);
}

With netpoll debug disabled it is completely silent, with netpoll debugging
enabled, this obviously still sends output to the console.

Comment 10 Bryn M. Reeves 2007-02-01 22:44:16 UTC
Created attachment 147165 [details]
simple netpoll trace patch

Netpoll patch used for debugging.

Comment 11 Neil Horman 2007-02-02 14:28:34 UTC
My bad, I read your description backwards.  I understood the problem to be that
when the kernel receives a SHOW_STATE command it immediately replies and queues
all the netdump information to be sent, and then re-enters netdump mode and gets
the servers reboot command before its finished dumping the show state info to
the log, leading to lost crash data.  But as your last post indicates the
problem is really that we take too much time dumping the log, so that we timeout
on sending our reboot requests and leave the system in a crashed state.  Thats
easily fixable without any code changes, through the use of the netdump-reboot
script on the netdump server.  just modify that script to pause with a sleep
command for an amount of time appropriate for the crashing system in question to
complete the show state command, and then exit with a zero return code.  If you
want to add some intellegence, you can even check the destination log file for
continued ouput to fine tune the wait period.  Since the netdump-reboot script
executes beofre we send the reboot command, this can be used to see that the log
gets flushed before we even start to send out COMM_REBOOT messages.

Comment 12 Bryn M. Reeves 2007-02-02 19:57:08 UTC
That seems like a worse workaround than changing the timeouts/number of reboot
retries on the netdump-server. The server runs scripts in the foreground and
blocks for the duration of the script:

"The four executable scripts are run in the foreground by the netdump-server, so
any time-consuming task done by the scripts will block the netdump-server from
servicing any crash or log requests from any other source.  Therefore it may be
advisable to run any time-consuming tasks in the background, allowing  the 
script to return back to the netdump-server." (man 8 netdump-server)

Kludges to guestimate the "appropriate" length of time this should wait also
seem suspect - the delay is proportional to the number of threads executing on
the system at the time of the crash and the depth of their backtraces & speed of
the console. 

How do you envisage approximating that in the scripts?

I'll test this out here, but I'm worried we'll end up missing other clients
netdump requests while the script is waiting.

Comment 13 Bryn M. Reeves 2007-02-02 20:02:00 UTC
Another (probably dumb) question, in netdump_rx, what are we guarding against by
dropping incoming packets while netdump_mode == 0?

void netdump_rx(struct netpoll *np, short source, char *data, int dlen)
{
        req_t *req, *__req = (req_t *)data;
        
        if (!netdump_mode)
                return;

If we could queue the incoming COMM_REBOOT packets while the stack dumps are
happening, the problem goes away as we just peel them of the request list when
the dump finishes.


Comment 14 Neil Horman 2007-02-02 20:46:36 UTC
>"That seems like a worse workaround than changing the timeouts/number of reboot
>retries on the netdump-server. The server runs scripts in the foreground and
>blocks for the duration of the script:"

I disagree. yes, the script runs in the foreground, as it needs to to prevent
the   netdump server process from continuing until it completes, but we
shouldn't be talking about a long period of time here in almost all cases.  And
since subsequent dumping clients (which would be blocked by this solution) will
wait infinitely for a the dump server to service them (see netpoll_netdump and
netdup_startup_handshake) I don't really see this as a big deal.  Granted, we
can create a pessimal case (consider the serial console running at 2400 baud,
that would take a large show_state a _very_ long time to complete), but we can
just set a max wait value in a script to avoid blocking forever.  The long and
the short of it is this seems like a pretty isolated problem to me, and before
we go adding in features for this, it seems like we can do a site specific fix


>"Kludges to guestimate the "appropriate" length of time this should wait
>also...How do you envisage approximating that in the scripts"

I am assuming that the netdump server is also capturing netconsole output from
the system in question.  If so, then it can periodically check the file that is
being logged to for increases in size.  As long as the file continues to
increase, output continues to flow from the system being captured, and we can be
reasonably sure that netdump_mode is not yet enabled on the client.  Once growth
stops, or a max wait period is reached, we can continue by exiting the script.

>Another (probably dumb) question, in netdump_rx, what are we guarding against >by
>dropping incoming packets while netdump_mode == 0?

Its not a dumb question, we do it because we don't write to the netconsole in
crashdump mode (see write_msg() in netconsole.c).  IIRC there is a deadlock
potential if we are in netdump mode and we write something to the netconsole at
the same time.

I'm not saying that we can't increase our request retransmit count or something
simmilar, but since this is the only report of this I've gotten on this, I'd
really prefer to do a site specific fix if we can.



Comment 15 Bryn M. Reeves 2007-02-02 21:29:51 UTC
> I disagree. yes, the script runs in the foreground, as it needs to to prevent
> the   netdump server process from continuing until it completes, but we
> shouldn't be talking about a long period of time here in almost all cases.  

Ok - seems reasonable. I'll test this out & get back to you - thanks for your
advice.

> Its not a dumb question, we do it because we don't write to the netconsole in
> crashdump mode (see write_msg() in netconsole.c).  IIRC there is a deadlock
> potential if we are in netdump mode and we write something to the netconsole 
> at the same time.

I'm being dumb again, but I don't see how that relates to queueing requests
inside netdump_rx - avoiding console ouput when netdump_mode is true is one
thing, but this is avoiding queuing a request onto netdump's internal
request_list while netdump_mode is false. 

The only thing netdump_rx does is a GFP_ATOMIC kmalloc and a handful of Dprintks
that disappear in any normal build anyway, and then the
spin_lock_irqsave/spin_unlock_irqrestore of the request_list lock (that doesn't
seem any different than the lock/unlocks in the showm_mem/show_state paths). 

Since all this is happening in a single thread, I can't see we're going to get
ahead of ourselves and start processing those commands before
show_mem/show_state return, since the netdump poll loop is blocked inside those
functions.

Anyway, there's probably something obvious I'm missing here, but this is another
change that gives reliable reboots for my test cases in the face of slow
consoles & lots of UDP traffic.



Comment 19 Bryn M. Reeves 2007-02-14 20:02:05 UTC
Regarding the question in comment #13 and the answer in comment 14# - I still
don't see what queuing requests when netdump_mode == 0 has to do with performing
console output - there's nothing in that path that would write to the console in
a normal build, and besides, the race with console output you mentioned was with
netdump_mode == 1, right?

Again, sorry if I missed something obvious here.


Comment 21 Bryn M. Reeves 2007-02-20 20:52:50 UTC
Created attachment 148441 [details]
Make netdump queue packets while dumping state to the console

This patch adds a netdump_paused flag. This is set when we clear netdump_mode
to allow console output and is checked inside netdump_rx to decide if we want
to queue the packet (netdump_paused == 1), or discard it because netdump hasn't
been activated (netdump_mode == 0 && netdump_paused == 0).

There is a fixed limit of 20 packets queued during this period. This should be
sufficient to avoid the problem of clients dropping all 20 reboot requests sent
by the server, unless someone is simultaneously sending us lots of UDP traffic
on the netdump port.

Comment 22 Bryn M. Reeves 2007-02-20 22:11:23 UTC
I've given this some basic testing with the customer's reproducer - so far, so
good. Just setting status back to ASSIGNED for you to take a quick look at the
patch. I'll keep testing this here & update as I have more info.

Thanks,


Comment 23 Neil Horman 2007-02-21 14:13:43 UTC
Initially the patch looks good.

I assume you're testing involves just crashing a machine that previously failed
to reboot, due to the number of tasks it needed to print for the show_state
command?  And with this patch, the system is rebooting properly, correct?  If
yes, then I think I'm willing to push this.



Comment 24 Bryn M. Reeves 2007-02-21 20:14:32 UTC
Yes - I have several boxes on slow serial consoles that easily demonstrate the
problem. With the queueing patch they've made a couple of dozen successful
reboots without issue.



Comment 25 Dave Anderson 2007-02-22 19:05:59 UTC
> The patch in comment #1 seems like the "right" way to fix this to me, as it
> ensures the client & server are in sync, but I'm assuming there's a reason for
> the somment /* send response first */ in netdump. 

Rather than cluttering up rhkernel-list on netdump minutia, Jeff and I
were just looking back trying to find the original patch that moved the
COMM_SHOW_STATE_REPLY to be done *prior* to the actual work being
done in the client.  But it seems that we must have done it when forward-porting
netdump from AS2.1 to RHEL3, since AS2.1 still does it after the
show_[regs|state|mem]() operations, and RHEL3 always has done it beforehand. 
We *think* we did it because there were always "too many timeouts"
messages on the netdump-server machines.

But now, re-looking at the netdump-server code, it looks like the code
there originally intended to prevent the show-status timeouts, but it
seems that the logic is wrong:

static void
status_timeout (NetdumpClient *client,
                guint32        request)
{
  client->n_timeouts++;

  if (client->n_timeouts <= 1 && !client->got_log_packet)
    status_request (client, FALSE);
  else
    {
      syslog (LOG_WARNING, "Got too many timeouts waiting for SHOW_STATUS for
client %s, rebooting it\n", client->ip_addr);
      netdump_client_destroy (client, TRUE);
    }
}

We *believe* that is should be something like:

  if (client->n_timeouts <= MAX_NR_RETRIES || client->got_log_packet)

If a log packet comes in while waiting for the SHOW_STATUS to end,
it should just retry the SHOW_STATUS without complaining, or giving up.

That all being said, it would require a change to both the client and
the server, and wouldn't account for an "old" netdump-server being used,
with an updated client, etc, etc.  Jeff feels that this would be the right
way to solve it, i.e., a two-line change to the client, and a one-line change
to the netdump-server.  I'm of the opinion that the less that is done the 
better.  You guys, I'm sure, have your own opinions...

In any case, we just wanted to explain why it (the movement of the 
send_netdump_msg() call) was historically changed to be done that way.


Comment 26 Neil Horman 2007-02-23 21:26:59 UTC
based on feedback, we have a better solution to this problem, which we are
currently implementing

Comment 27 Neil Horman 2007-02-26 20:15:36 UTC
Created attachment 148827 [details]
netdump kernel patch

Comment 28 Neil Horman 2007-02-26 20:18:09 UTC
Created attachment 148828 [details]
netdump server patch

Bryn, Here are the netdump kernel and server patches based on the fallout of
the on-list  conversaion we all had.  Please test them out and confirm that
they solve the problem appropriately.  Thanks!

Comment 29 Bryn M. Reeves 2007-02-27 11:27:48 UTC
Hi Niel,

Thanks for the patches - looks good. I will get these applied & tested on my
boxes here today or tomorrow.

Thanks!

Comment 30 Bryn M. Reeves 2007-02-28 18:01:43 UTC
Niel, did you actually try to compile that kernel patch?

-       if (unlikely(crashdump_mode()))
+       if (unlikely(!netdump_emit_printks)
                return;



Comment 31 Bryn M. Reeves 2007-02-28 18:04:21 UTC
Created attachment 148947 [details]
revised kernel side patch with balanced parens

Comment 32 Neil Horman 2007-02-28 18:32:50 UTC
No, clearly I didn't test it out myself.  you wrote the first patch, I figured
that you would be able to correct any minor typos you encountered while testing
this one.  Thank you for fixing that

Comment 33 Bryn M. Reeves 2007-02-28 21:10:21 UTC
Initial tests are looking good. I've tested different combinations of
patched/unpatched server and kernel - the patched kernel makes it through all my
tests now. Running an old kernel with the patched server still hangs, but that's
expected.

I'll keep this in NEEDINFO until I've got a few more runs done but so far so good.

Thanks!




Comment 34 Bryn M. Reeves 2007-03-01 20:57:01 UTC
Hi Niel,

I've had the test boxes run through about 15 runs with the new patches now
without any sign of trouble.

Thanks!


Comment 35 Bryn M. Reeves 2007-03-06 17:18:26 UTC
Created attachment 149359 [details]
revised kernel patch

Niel's revised kernel side patch

Comment 37 Bryn M. Reeves 2007-03-07 12:30:44 UTC
Hi Niel,

The revised kernel patch has now been through ten test runs without incident.

Thanks!


Comment 43 Ludek Smid 2007-05-09 10:57:25 UTC
This bug has been copied as z-stream (EUS) bug #239544 and now must be resolved
in the current update release, set blocker flag.

Comment 45 Jason Baron 2007-06-05 15:43:37 UTC
committed in stream U6 build 55.6. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 46 Neil Horman 2007-07-19 15:53:48 UTC
*** Bug 248602 has been marked as a duplicate of this bug. ***

Comment 48 John Poelstra 2007-08-28 23:57:24 UTC
A fix for this issue should have been included in the packages contained in the
RHEL4.6 Beta released on RHN (also available at partners.redhat.com).  

Requested action: Please verify that your issue is fixed to ensure that it is
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 51 errata-xmlrpc 2007-11-15 16:11:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0827.html