Bug 613807 - kdump appears to not work
Summary: kdump appears to not work
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kexec-tools
Version: 13
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Neil Horman
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-12 20:53 UTC by Need Real Name
Modified: 2010-08-26 03:09 UTC (History)
4 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2010-07-21 11:45:21 UTC


Attachments (Terms of Use)

Description Need Real Name 2010-07-12 20:53:44 UTC
I've setup kdump to capture a kernel dump over ssh for bug 613429 which is a hard crash.

Unfortunately kdump doesn't seem to work.

When I trigger a manual crash with an echo to the usual place, I get a blank screen. Numlock still responds. Is this a kernel modesetting problem?

I wait a few minutes, and see no file appear on the kdump server.

Extra info:

The root partition of the crashing box is encrypted - but I am not dumping to a local disk.

My network card is under the control of NetworkManager.

kexec-tools-2.0.0-35.fc13.x86_64
kernel-2.6.33.5-124.fc13.x86_64
kernel-debuginfo-2.6.33.6-147.fc13.x86_64
kernel-debuginfo-common-x86_64-2.6.33.6-147.fc13.x86_64
kernel-devel-2.6.33.5-124.fc13.x86_64
kernel-headers-2.6.33.5-124.fc13.x86_64

Relevant lines from kdump.conf:

net kdump@1.2.3.4
core_collector makedumpfile -c -d 31

Comment 1 Qian Cai 2010-07-13 04:50:01 UTC
First, the kexec-tools you used have lots of known problems, so I am not surprise that you may hit one of those. It might be worth trying kexec-tools-2.0.0-33.fc14 from koji which fixed those.

Second, to get the meaningful logs for us to debug, try to setup a serial console and paste the kdump process here. Alternatively, try to disable KMS by adding nomodeset to the kernel command line and trigger the crash from a VT (not from X), and attach the screenshot which has the kdump process messages.

Comment 2 Neil Horman 2010-07-13 10:58:34 UTC
cai, you need to stop telling people that .  Theres an update working through bohdi with all the missing updates for F-13 that people can use.

Serial logs and a sosreport, as you said  will be helpful here.  disabling KMS isn't going hurt, but its not likley to fix this issue.  The logs are really what we need.

Comment 3 Need Real Name 2010-07-13 17:33:11 UTC
Okay I will wait for the update.

How can I get you the logs though? This is a desktop, I don't know of any recent desktops that have a serial connection.

The disk is encrypted so I doubt saving there will work.

Should I just wait?

Comment 4 Neil Horman 2010-07-13 17:50:32 UTC
plug a usbserial dongle into it, that should work just fine.

Comment 5 Need Real Name 2010-07-17 12:24:51 UTC
(In reply to comment #4)
> plug a usbserial dongle into it, that should work just fine.    

How does this work?

I plug a usb to serial converter into my computer. Then where does the serial end go? Into another converter that converts to usb?

Comment 6 Neil Horman 2010-07-19 11:06:15 UTC
You plug in a NULL modem cable and connect it to a second system running kermit or minimcom or other serial communications port.  Then you set up the system your debugging to add a console to ttyS0 so that kernel messages are output to that port.  You can use the secondary system to record the kernel messages and attach them here.

Comment 7 Need Real Name 2010-07-19 20:20:58 UTC
How can I use a NULL modem cable to do this? Where does ttyS0 come from, I don't have any real serial connection. Please can you give me different instructions?

I have a modern pc with no serial anything that is crashing.
I have a modern laptop with no serial anything.

How do I get a crash dump onto the laptop?

Comment 8 Neil Horman 2010-07-19 20:43:08 UTC
I've told you how.  If you need additional instructions on how to connect a serial console to your pc, you can consult this:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/serial-console.txt;h=9a7bc8b3f479b2b82dbfa1056df060366dbafdec;hb=HEAD

It doesn't matter that you have a system with no serial ports, thats why they make these:
http://www.amazon.com/Cables-Unlimited-Serial-Adapter-USB-2920/dp/B0006LSIOI

The bottom line is, if you want to log a boot sequence, this is how you do it, regardless of OS.  Your only other option is to manually record and transcribe what you see on the screen.

Trust me, the serial console is the easy way to go.

Comment 9 Need Real Name 2010-07-19 21:11:03 UTC
I can't see where the serial part comes in. I have:

PC (usb) <---some connection---> LAPTOP (usb)

So I plug one of those USB serial converters into my PC, and I end up with:

PC (usb:serial) <--- serial connection ---> LAPTOP

What happens on the laptop side? Do you want me to buy a second usb:serial connector? i.e.

PC (usb:serial) <--- serial connection ---> LAPTOP (serial:usb)

Comment 10 Neil Horman 2010-07-20 12:26:31 UTC
You're making this far more difficult than it needs to be

You're having trouble diagnosing a problem with a kernel boot, and the problem is so early in the boot cycle the system can't record/log messages on its own to any recordable media, so you need to attach an external recording device.  Since you don't have a system with a built in management board thats capable of preforming this function, you need a secondary system and a way for the two of them to communicate.  That method is a serial NULL modem connection.

Even if you're using USB, the communication protocol is going to be RS232 serial.

Now, to use RS232 serial communications over a system with no real serial ports, you'll have to make a serial port.  On systems with USB, thats easy to do.  You just buy a usb to serial converter for each system that you want use a serial port on.


Then you buy a NULL modem cable to connect the two serial ports together:
http://en.wikipedia.org/wiki/Null_modem

From there you configure the system thats crashing to send boot messages to the serial port (using the serial-console.txt document I described in comment 8), and then you use some serial comms software (like minicom or c-kermit) to watch and record those messages on the system that you're using to monitor.

Comment 11 Need Real Name 2010-07-20 21:24:48 UTC
I understand all of what you wrote from the first time you wrote it.

The problem was that you seemed to assume that I have a serial connection to a machine with a serial connector on it. I don't have such a thing.

> Now, to use RS232 serial communications over a system with no real serial
> ports, you'll have to make a serial port.  On systems with USB, thats easy to
> do.  You just buy a usb to serial converter for each system that you want use a
> serial port on.

I can't justify buying two USB to serial connectors just to get a kernel error, apologies.

Comment 12 Need Real Name 2010-07-20 21:29:25 UTC
Sorry I should add: thanks very much for taking the time to answer my questions.

Comment 13 Neil Horman 2010-07-21 11:45:21 UTC
let me know if anything changes.  I can help you fix this if you manage to obtain the funds to purchase the debug equipment.

Comment 14 Qian Cai 2010-08-24 16:01:30 UTC
Another possibility to capture the second kernel information is via netconsole if you don't want to buy serial dongles. Some information here and there,
http://sarah.thesharps.us/2009-02-22-09-00
http://sarah.thesharps.us/2010-03-26-09-41

Comment 15 Qian Cai 2010-08-24 16:08:22 UTC
(In reply to comment #0)
> The root partition of the crashing box is encrypted - but I am not dumping to a
> local disk.
This might be problematic even if you don't mount it from my observation. Can you try without encrypted rootfs? We are working to get the doc in place to write up the limitation of encrypted rootfs with kdump.

Comment 16 Need Real Name 2010-08-24 19:01:04 UTC
(In reply to comment #15)
> Can
> you try without encrypted rootfs? We are working to get the doc in place to
> write up the limitation of encrypted rootfs with kdump.

I can't reinstall the whole box without encryption. Sorry about that.

I should be able to use netconsole on this box, but in my experience it hardly ever works :/

Comment 17 Qian Cai 2010-08-26 03:09:35 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > Can
> > you try without encrypted rootfs? We are working to get the doc in place to
> > write up the limitation of encrypted rootfs with kdump.
> 
> I can't reinstall the whole box without encryption. Sorry about that.
Can you try to add rd_NO_LUKS to you kdump kernel command line like this to skip encrypted fs detection?
# cat /etc/sysconfig/kdump
KDUMP_COMMANDLINE_APPEND="rd_NO_LUKS ...


Note You need to log in before you can comment on or make changes to this bug.