Bug 739348 - Kernel bug dereferencing null pointer during shutdown hangs system
Summary: Kernel bug dereferencing null pointer during shutdown hangs system
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Paul Moore
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-18 01:10 UTC by Konstantin Boyandin
Modified: 2012-09-04 13:49 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-04 13:49:07 UTC


Attachments (Terms of Use)
A screenshot made at the last system hang mentioned. (977.80 KB, image/jpeg)
2011-09-18 01:10 UTC, Konstantin Boyandin
no flags Details
Potential AF_UNIX socket fix (1.47 KB, patch)
2011-09-21 20:00 UTC, Paul Moore
no flags Details | Diff

Description Konstantin Boyandin 2011-09-18 01:10:44 UTC
Created attachment 523731 [details]
A screenshot made at the last system hang mentioned.

Description of problem:

With KVM (libvirtd) installed, when kernel 2.6.40.4-5 is used and NetworkManager service is disabled, system hangs during shutdown process, trace message starts with:
BUG:unable to handle kernel NULL pointer dereference at 0000000000000228

Trace information visible on screen mentions dnsmasq which is used by libvirtd.

If shutdown immediately after the reboot in such a situation, Linux shuts down normally.

If NetworkManager is started, Fedora 15 shuts down normally. NetworkManager is disabled in the above configuration, since it can't handle bridged networks and incorrectly reports them as disconnected, causing multiple problems.

Nothing is written about this to system logs (since filesystems are dismounted by the time the bug strikes).

Version-Release number of selected component (if applicable):
kernel-2.6.40.4-5
libvirt-0.8.8-7
NetworkManager-0.9.0-1

How reproducible:
Every shutdown of the Fedora 15 with the above setup.


Steps to Reproduce:
1. Install libvirtd as part of KVM/QEMU and set it to start automatically.
2. Boot the Fedora 15 with the above components and make use of KVM virtual machines
3. Shut down system
  
Actual results:
System hangs during shutdown, reporting some trace info on the screen.

Expected results:
System shuts down and turns off the computer.

Additional info:
Screenshot added.

Comment 1 Chuck Ebbert 2011-09-19 18:25:29 UTC
static int selinux_socket_unix_may_send(struct socket *sock,
                                        struct socket *other)
{
        struct sk_security_struct *ssec = sock->sk->sk_security;
==>     struct sk_security_struct *osec = other->sk->sk_security;

other->sk is NULL, so we get an exception trying to follow the pointer to sk_security

Not sure what this means? Did the socket at the other end disconnect?

Comment 2 Paul Moore 2011-09-19 19:25:38 UTC
Not sure what this means either, but I'm installing a F15 system right now to find out.

Comment 3 Paul Moore 2011-09-20 16:42:51 UTC
Unfortunately, I'm not able to reproduce the problem on my test system; does this happen every time you shutdown the system or is it sporadic?

I'll go wander through the UNIX socket code now to see if anything jumps out at me.

Comment 4 Paul Moore 2011-09-21 19:59:08 UTC
(In reply to comment #3)
> Unfortunately, I'm not able to reproduce the problem on my test system; does
> this happen every time you shutdown the system or is it sporadic?
> 
> I'll go wander through the UNIX socket code now to see if anything jumps out at
> me.

Nothing looks obviously broken to me in net/af_unix.c but I'm not exactly a UNIX socket expert.  The only thing that gave me some pause is a lack of checking in unix_release_sock(), but I'm not sure if that is critical.  I'll attach a simple patch which adds some additional checking, but until I can recreate the problem I have no idea if this patch is needed or not.

Konstantin, can you please rely to my questions in comment #3?  Also, any chance you can try the attached patch?

Comment 5 Paul Moore 2011-09-21 20:00:55 UTC
Created attachment 524271 [details]
Potential AF_UNIX socket fix

Adds some additional checking in unix_release_sock()

Comment 6 Konstantin Boyandin 2011-09-22 02:59:26 UTC
Paul, answering the questions in #3: if I

- use system for more than several minutes (I suspect 3-5 minutes are enough to have all the services started, libvirtd included)
- do NOT stop libvirtd explicitly before shutdown

then the above crash occurs.

I will try the patch on weekend, since it's the computer I use heavily for business.

Thank you.

Comment 7 Paul Moore 2011-09-22 12:39:53 UTC
Thanks for the information on reproducing the problem, unfortunately that matches what I've been trying and I've still not seen the problem.  I suspect the problem you are seeing is a timing/race issue which can be very tricky to diagnose - I appreciate your willingness to help try the patch and debug the problem.

Let me know what happens with the patch.

Thanks.

Comment 8 Paul Moore 2011-10-03 19:38:55 UTC
Hi Konstantin, any updates?

Comment 9 Konstantin Boyandin 2011-10-04 05:01:10 UTC
Hi Paul, I had np chance to handle that the previous weekend, the next attempt on the followign weekend.

I noticed that the only way to shut down system normally is to stop libvirtd and wait for at least 30 seconds before issuing shutdown command.

Comment 10 Paul Moore 2011-10-04 11:40:45 UTC
Okay, thanks for the update.

I suspect the problem is a socket created by dnsmasq, which is started by libvirtd; when you shutdown libvirtd I suspect it also stops dnsmasq.  Now, why the problem only accurs when you shutdown the entire system?  I suspect it is a very narrow race condition on socket close/destroy that only happens on your particular system during system shutdown.

Comment 11 Konstantin Boyandin 2011-11-15 04:42:51 UTC
Sorry for the long silence, Paul.

I have this problem only at shutdown. I will also redirect other people having the same problem here, in case they can provide more details.

Comment 12 Paul Moore 2011-11-18 16:16:06 UTC
No worries on the delay.

Have you had a chance to try the patch in comment #5?  Would it help if I built you a kernel RPM with the patch included?  If so, let me know what kernel you are currently using (you mention 2.6.40.4-5 but that was over two months ago).

Comment 13 Konstantin Boyandin 2012-04-09 22:52:22 UTC
Now that I use the latest update for Fedora 16 x86_64 the system hangs unconditionally when I try to shutdown/reboot it.

Stopping libvirtd doesn't help.

Although patching live system which is heavily used daily isn't too encouraging, I'd like to ask you for directions on how to test whatever patch I could try.

Otherwise the only option I have is to cease using Fedora at all, since it's simply unsafe, I have to press power/reset buttons to halt/reboot the computer, with obvious consequences for file systems.

Comment 14 Paul Moore 2012-04-10 13:57:40 UTC
(In reply to comment #13)
> Although patching live system which is heavily used daily isn't too
> encouraging, I'd like to ask you for directions on how to test whatever patch I
> could try.

You would simply apply the patch attached to this bug to the kernel sources and reboot the system using the patched kernel.  If the patched kernel solves the hand on shutdown then we have our fix, if not, we can try something else.

Are you able to patch the kernel yourself or do you need a pre-built kernel RPM?

Comment 15 Konstantin Boyandin 2012-04-11 02:22:20 UTC
2Paul Moore: I will be able to patch, I just need an instruction on building the custom kernel.

Comment 16 Steve Grubb 2012-04-19 17:05:04 UTC
I just saw what appears to be the same problem with the new f16 kernel, 3.3.2-1.fc16.x86_64. It was a NULL ptr deref at 0x0000228 and EIP was at selinux_socket_unix_may_send+0x33/0x90.

Comment 17 Paul Moore 2012-04-23 14:35:07 UTC
Thanks for the additional information.

Unfortunately, I'm still unable to reproduce the problem on my test system.  I'm going to build a test kernel RPM with the patch from comment #5 applied and you guys can try it out to see if it solves the problem.

Comment 18 Paul Moore 2012-04-23 16:26:03 UTC
The test kernel RPM is at the URL below, please give it a try and let me know if it solves the kernel panic/oops/hang at shutdown.

http://people.redhat.com/~pmoore/bz/739348/kernel-3.3.2-1pm739348.fc16.x86_64.rpm

Comment 19 Paul Moore 2012-05-17 18:46:27 UTC
Any updates?


Note You need to log in before you can comment on or make changes to this bug.