Bug 280431 - ip_tables reference count will underflow occasionally
Summary: ip_tables reference count will underflow occasionally
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
low
Target Milestone: ---
: ---
Assignee: Neil Horman
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-09-06 13:43 UTC by Neil Horman
Modified: 2008-07-24 19:16 UTC (History)
4 users (show)

Fixed In Version: RHSA-2008-0665
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-24 19:16:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to avoid races on proc file reads (3.89 KB, patch)
2007-09-06 23:45 UTC, Neil Horman
no flags Details | Diff
cleaned up version of the patch (4.32 KB, patch)
2007-09-07 19:49 UTC, Neil Horman
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0665 0 normal SHIPPED_LIVE Moderate: Updated kernel packages for Red Hat Enterprise Linux 4.7 2008-07-24 16:41:06 UTC

Description Neil Horman 2007-09-06 13:43:43 UTC
Description of problem:
On the latest rhel4 kernel, the addition of the last patch in bz 212922, which
corrects a race condition on the removal of ip_tables modules during socket
option deregistration has revealed a module reference counting imbalance.  I
beleive that prior to the aforementioned patch the ref count imbalance was
hidden by the deadlock that the patch corrects

Version-Release number of selected component (if applicable):


How reproducible:
consistent

Steps to Reproduce:
Run these two scripts:
The scripts are: 
  #!/bin/bash
  i=0
  while [ -z "" ]; do
  echo "Restarting iptables $i"
  /etc/init.d/iptables restart
  i=$(($i+1))
  done

and 

  #!/bin/bash
  i=0
  while [ -z "" ]; do
  echo "Getting iptables status $i"
  /etc/init.d/iptables status
  i=$(($i+1))
  done

With this iptables config
# Generated by iptables-save v1.2.11 on Thu Aug 30 15:20:14 2007
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [167:29794]
:RH-Firewall-1-INPUT - [0:0]
-A INPUT -j RH-Firewall-1-INPUT
-A FORWARD -j RH-Firewall-1-INPUT
-A RH-Firewall-1-INPUT -i lo -j ACCEPT
-A RH-Firewall-1-INPUT -p icmp -m icmp --icmp-type any -j ACCEPT
-A RH-Firewall-1-INPUT -p ipv6-crypt -j ACCEPT
-A RH-Firewall-1-INPUT -p ipv6-auth -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A RH-Firewall-1-INPUT -s 10.22.242.8 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 25 -j ACCEPT
-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-port-unreachable
COMMIT
# Completed on Thu Aug 30 15:20:14 2007
# Generated by iptables-save v1.2.11 on Thu Aug 30 15:20:14 2007
*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [3:240]
:OUTPUT ACCEPT [3:240]
COMMIT
# Completed on Thu Aug 30 15:20:14 2007

Actual results:
iptables module reference count will underflow, leading to a reference count of
2^32 approximately, whcih can result in non-removable modules, or in oopses that
are caused by data access to modules that had incorrect refcounts which occured
during actual module removal.

Expected results:
removable modules

Additional info:

Comment 1 Neil Horman 2007-09-06 17:54:48 UTC
grumble.  Under the latest kernel, plus the patch from bz 212922, the problem
seems far less reproducible (if at all).  continuing to test....

Comment 2 Neil Horman 2007-09-06 19:44:37 UTC
Ok, its reproduced now, and I have a theory as to whats going on here.  If I'm
right it may well be upstream and really nasty to fix, but this theory explains
both the module refcnt underflow and the proc_inode_delete oops I've recently
observed in relation to this bug.

it relates to the order in which we register proc files and assign a module
owner.  Currently we register proc files in net/* using proc_net_create, or some
variants thereof.  None of these registration apis allow us to specify a module
owner during registration.  Such assignment is handled in the module init
routine after registration is complete.  There is a race here.  It is possible
for another process to read the registered file prior to the module_init routine
getting completed.  the functions proc_get_inode and proc_delete_inode are
responsible for filling out inode information for the given proc file and
incidentally handling module reference incrementing/decrementing during proc
file reads.  This to me means that the following race is possible:

1) Process A (modprobe) inserts a new module
2) Process A in the modules init routine registers a proc file
3) Process B attempts to read the proc file created in (2).  in so doing it   
attempts to call try_module_get on de->owner, which is still NULL, which by
definition in try_module_get returns success.
4) Process A finishes its init routine, and before it returns, sets
proc_file->owner to THIS_MODULE
5) process B completes the read of the proc file it started in (3).  On return
to user space proc_delete_inode is called and module_put is called therein. 
Since de->owner was set to THIS_MODULE in (4) module put preforms  a real
decrement on the module structure, leading to an imbalance in reference counting.

I'm trying to write a patch for this, by moving the module ref counting for the
proc file to proc_read_file and proc_write_file.  In and of itself that won't
solve teh problem but it will give modules an opportunity to solve the issue by
using file permissions to prevent proc file access until they have completed any
 needed data manipulations.  i'll post results as soon as I have them.

Comment 3 Neil Horman 2007-09-06 23:45:59 UTC
Created attachment 189351 [details]
patch to avoid races on proc file reads

So this patch seems to be a winner.  Running a few hours now on the reproducer
without fault, whereas we get a oops/refcnt underflow within minutes w/o the
patch.	It still needs some work, since the permissions setting isn't quite
write, and we need to wrap up registration so module conversion isn't so hard.

Comment 4 Neil Horman 2007-09-07 14:36:13 UTC
Looks like this problem is _not_ upstream.  the xt_* interface is used to
register proc files on behalf of the ip_tables module (as well as most others)
which is able to atomically set module ownership

Comment 5 Neil Horman 2007-09-07 19:49:41 UTC
Created attachment 190381 [details]
cleaned up version of the patch

Comment 7 RHEL Program Management 2008-03-06 05:40:23 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Vivek Goyal 2008-03-31 14:00:43 UTC
Committed in 68.28.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 10 Vivek Goyal 2008-04-05 15:26:57 UTC
This patch introduced lots of compilation warnings. Putting another small patch
to fix the warnings. Putting the bug back to POST state.

Comment 11 Vivek Goyal 2008-04-07 21:45:22 UTC
Committed in 68.31.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 15 errata-xmlrpc 2008-07-24 19:16:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html


Note You need to log in before you can comment on or make changes to this bug.