Bug 178131 - syslog-only netdump still tries to dump memory
Summary: syslog-only netdump still tries to dump memory
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Dave Anderson
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: RHEL3U8CanFix
TreeView+ depends on / blocked
 
Reported: 2006-01-17 22:07 UTC by Bryan Mason
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version: RHSA-2006-0437
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-07-20 13:41:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0437 0 normal SHIPPED_LIVE Important: Updated kernel packages for Red Hat Enterprise Linux 3 Update 8 2006-07-20 13:11:00 UTC

Description Bryan Mason 2006-01-17 22:07:28 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7

Description of problem:
According to /etc/sysconfig/netdump,

# Alternatively, to merely syslog all messages without doing network
# crash dumps, you can set only SYSLOGADDR and leave NETDUMPADDR unset.
# You can also set both. 

However, if this configuration is used,  the crash signature will successfully be sent to the netdump-sever, but then it seems like netdump still wants to dump memory contents.  The crashed system will hang on the message "< netdump activated - performing handshake with the server . >" and not reboot.

Version-Release number of selected component (if applicable):
netdump-0.7.7-2 kernel-2.4.21-37.EL

How reproducible:
Always

Steps to Reproduce:
On the server: 
1. Reconfigure syslog to accept remote messages (set SYSLOGD_OPTIONS="-m 0 -r" in /etc/sysconfig/syslog).
2. Restart syslog.
3. Start netdump-server (or not, this seems to make no difference in client behavior).

On the client system to be crashed:
1. Set SYSLOGADDR=<netdump-server-IP> in /etc/sysconfig/netdump.  Do not change anything else from default.
2. Start netdump
3. Crash the client by running (echo "1" > /proc/sys/kernel/sysrq; echo "c" > /proc/sysrq-trigger)  

Actual Results:  Crashed system sends crash signature to netdump-server and then hangs with the message "< netdump activated - performing handshake with the server . >" on the console.

Expected Results:  Crashed system should send crash signature to netdump-sever and then reboot.

Additional info:

Additional information from Dave Anderson:

It's a RHEL3 kernel bug.

The kicking off of the netdump operation is predicated by the netconsole module pre-registering its "netdump_netdump" function into the kernel's "netdump_func" pointer when the module is loaded.

Looking at the module's init_netconsole() function, it does the registration regardless of the NETDUMPADDR or SYSLOGADDR arguments passed in.  Here's the end of init_netconsole(), where if the platform supports it, it does the registration regardless of the arguments passed in:

        if (platform_supports_netdump) {
                if (netdump_register_hooks(netconsole_rx,
                                              netconsole_receive_skb,
                                              netconsole_netdump)) {
                        printk("netdump: failed to register hooks.\n");
                }
        }
        netconsole_dev = ndev;
#define STARTUP_MSG "[...network console startup...]\n"
        write_netconsole_msg(NULL, STARTUP_MSG, strlen(STARTUP_MSG));

        register_console(&netconsole);
        printk(KERN_INFO "netlog: network logging started up successfully!\n");
        return 0;
}

It should pass a NULL as the 3rd argument if no netdump target addresses were passed in.

For example, note that the kernel's netdump_func starts out life as a NULL:

# crash
...
crash> p netdump_func
netdump_func = $1 = (void (*)(struct pt_regs *)) 0
crash>

Here's a netdump session after setting only the SYSLOGADDR in /etc/netdump.config:

# service netdump start
initializing netdump                                       [  OK  ]
# tail -10 /var/log/messages
Jan 17 09:10:59 crash netdump:: inserting netconsole module with arguments magic1=0x11111111 magic2=0x11111111 dev=eth0
source_port=6666 syslog_target_ip=0xAC105012 syslog_target_port=514 syslog_target_eth_byte0=0x00
syslog_target_eth_byte1=0x30 syslog_target_eth_byte2=0x6E syslog_target_eth_byte3=0x1E syslog_target_eth_byte4=0xFE
syslog_target_eth_byte5=0x40
Jan 17 09:10:59 crash kernel: netlog: using network device <eth0>
Jan 17 09:10:59 crash kernel: netlog: using source IP 172.16.80.17
Jan 17 09:10:59 crash kernel: netlog: using source UDP port: 6666
Jan 17 09:10:59 crash kernel: netlog: using syslog target IP 172.16.80.18, port: 514
Jan 17 09:10:59 crash kernel: netlog: using broadcast ethernet frames to send netdump packets.
Jan 17 09:10:59 crash kernel: netlog: using broadcast ethernet frames to send netdump packets.
Jan 17 09:10:59 crash kernel: netlog: using syslog target ethernet address 00:30:6e:1e:fe:40.
Jan 17 09:10:59 crash kernel: netlog: network logging started up successfully!
Jan 17 09:10:59 crash netdump: initializing netdump succeeded
[root@crash root]#

If NETDUMPADDR had been set, you'd see a bunch of "netdump_target_eth_byte*" values above in the "inserting" message.

But looking at the kernel's netdump_func pointer, you can see it has been set:

# crash
...
crash> p netdump_func
netdump_func = $1 = (void (*)(struct pt_regs *)) 0xe2d8e710
crash> sym 0xe2d8e710
e2d8e710 (t) netconsole_netdump
crash>

Comment 1 Dave Anderson 2006-01-17 22:36:31 UTC
Changed component to "kernel" from "netdump", as it's a bug with
the netconsole kernel module.

Linda -- can you link this to RHEL3-U8 and give it a devel_ack?

Comment 3 Dave Anderson 2006-01-20 17:05:01 UTC
Please give a qa_ack+ to this BZ.

QA procedure:

1. Set up only SYSLOGADDR in /etc/sysconfig/netdump, pointing to a remote
   system who's /etc/sysconfig/syslog file has the "-r" flag turned on.
2. Do a "service netdump start".
3. Crash the system with alt-sysrq-c or "echo c > /proc/sysrq-trigger".
4. Verify: 
   - the oops message made it to the remote /var/log/messages.
   - the client did not attempt to do a netdump operation.


Comment 4 Ernie Petrides 2006-02-18 00:23:05 UTC
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-40.2.EL).


Comment 8 Red Hat Bugzilla 2006-07-20 13:41:48 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0437.html



Note You need to log in before you can comment on or make changes to this bug.