Description of problem: On the latest rhel4 kernel, the addition of the last patch in bz 212922, which corrects a race condition on the removal of ip_tables modules during socket option deregistration has revealed a module reference counting imbalance. I beleive that prior to the aforementioned patch the ref count imbalance was hidden by the deadlock that the patch corrects Version-Release number of selected component (if applicable): How reproducible: consistent Steps to Reproduce: Run these two scripts: The scripts are: #!/bin/bash i=0 while [ -z "" ]; do echo "Restarting iptables $i" /etc/init.d/iptables restart i=$(($i+1)) done and #!/bin/bash i=0 while [ -z "" ]; do echo "Getting iptables status $i" /etc/init.d/iptables status i=$(($i+1)) done With this iptables config # Generated by iptables-save v1.2.11 on Thu Aug 30 15:20:14 2007 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [167:29794] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -p icmp -m icmp --icmp-type any -j ACCEPT -A RH-Firewall-1-INPUT -p ipv6-crypt -j ACCEPT -A RH-Firewall-1-INPUT -p ipv6-auth -j ACCEPT -A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A RH-Firewall-1-INPUT -s 10.22.242.8 -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --dport 22 -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --dport 25 -j ACCEPT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-port-unreachable COMMIT # Completed on Thu Aug 30 15:20:14 2007 # Generated by iptables-save v1.2.11 on Thu Aug 30 15:20:14 2007 *nat :PREROUTING ACCEPT [0:0] :POSTROUTING ACCEPT [3:240] :OUTPUT ACCEPT [3:240] COMMIT # Completed on Thu Aug 30 15:20:14 2007 Actual results: iptables module reference count will underflow, leading to a reference count of 2^32 approximately, whcih can result in non-removable modules, or in oopses that are caused by data access to modules that had incorrect refcounts which occured during actual module removal. Expected results: removable modules Additional info:
grumble. Under the latest kernel, plus the patch from bz 212922, the problem seems far less reproducible (if at all). continuing to test....
Ok, its reproduced now, and I have a theory as to whats going on here. If I'm right it may well be upstream and really nasty to fix, but this theory explains both the module refcnt underflow and the proc_inode_delete oops I've recently observed in relation to this bug. it relates to the order in which we register proc files and assign a module owner. Currently we register proc files in net/* using proc_net_create, or some variants thereof. None of these registration apis allow us to specify a module owner during registration. Such assignment is handled in the module init routine after registration is complete. There is a race here. It is possible for another process to read the registered file prior to the module_init routine getting completed. the functions proc_get_inode and proc_delete_inode are responsible for filling out inode information for the given proc file and incidentally handling module reference incrementing/decrementing during proc file reads. This to me means that the following race is possible: 1) Process A (modprobe) inserts a new module 2) Process A in the modules init routine registers a proc file 3) Process B attempts to read the proc file created in (2). in so doing it attempts to call try_module_get on de->owner, which is still NULL, which by definition in try_module_get returns success. 4) Process A finishes its init routine, and before it returns, sets proc_file->owner to THIS_MODULE 5) process B completes the read of the proc file it started in (3). On return to user space proc_delete_inode is called and module_put is called therein. Since de->owner was set to THIS_MODULE in (4) module put preforms a real decrement on the module structure, leading to an imbalance in reference counting. I'm trying to write a patch for this, by moving the module ref counting for the proc file to proc_read_file and proc_write_file. In and of itself that won't solve teh problem but it will give modules an opportunity to solve the issue by using file permissions to prevent proc file access until they have completed any needed data manipulations. i'll post results as soon as I have them.
Created attachment 189351 [details] patch to avoid races on proc file reads So this patch seems to be a winner. Running a few hours now on the reproducer without fault, whereas we get a oops/refcnt underflow within minutes w/o the patch. It still needs some work, since the permissions setting isn't quite write, and we need to wrap up registration so module conversion isn't so hard.
Looks like this problem is _not_ upstream. the xt_* interface is used to register proc files on behalf of the ip_tables module (as well as most others) which is able to atomically set module ownership
Created attachment 190381 [details] cleaned up version of the patch
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Committed in 68.28.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
This patch introduced lots of compilation warnings. Putting another small patch to fix the warnings. Putting the bug back to POST state.
Committed in 68.31.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html