Escalated to Bugzilla from IssueTracker
Event posted on 12-11-2009 06:18pm BRST by dduval Scenario (from the client) <snip> Host1 : RHEL5 Host2 : Some other host, shouldn't matter what it is TCP connection is established from Host1 to Host2. Connection is up and fine. I now want to introduce a rule on host1 so that the next time there is some data flow from Host1 to Host2 on this connection, Host1 process should get a tcp rst. I use the following rule on Host1 : iptables -A OUTPUT -p tcp --destination <host2ipaddr> --dport 9004 -j REJECT --reject-with tcp-reset The above rule works just fine on Fedora Core. But it doesn't do the trick on RHEL5. I looked at the rule counters (iptables -L -v) to make sure that I am indeed hitting this rule. It appears as if the iptables subsystem on RHEL5 is not understanding the reject-with command and is just dropping the packet without sending a RST to the host1 process. </snip> I did a bit of research and it turns out this was reported shortly after 2.6.19 in the upstream kernel and fixed in 2.6.19.3: http://bugzilla.kernel.org/show_bug.cgi?id=7716 Client is running this under 2.6.18-149.el5. I reproduced the problem under 2.6.18-164.6.1.el5. This event sent from IssueTracker by fbl [Support Engineering Group] issue 375421
Event posted on 12-14-2009 07:52pm BRST by fhirtz In 5.4, the --reject-with tcp-reset option to REJECT is acting like DROP: <snip> [root@dl385g2 ~]# iptables -F [root@dl385g2 ~]# iptables -A OUTPUT -p tcp --dport 2500 -j DROP [root@dl385g2 ~]# time telnet hp-dl385g5-1.gsslab.rdu.redhat.com 2500 Trying 10.10.56.246... telnet: connect to address 10.10.56.246: Connection timed out telnet: Unable to connect to remote host: Connection timed out real 3m9.012s user 0m0.001s sys 0m0.003s </snip> This event sent from IssueTracker by fbl [Support Engineering Group] issue 375421
Event posted on 12-14-2009 08:37pm BRST by fhirtz 5.3 works: <snip> [root@dl385g2 ~]# uname -a Linux dl385g2.gsslab.rdu.redhat.com 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux [root@dl385g2 ~]# iptables -F [root@dl385g2 ~]# iptables -A OUTPUT -p tcp --dport 2500 -j REJECT [root@dl385g2 ~]# time telnet hp-dl385g5-1.gsslab.rdu.redhat.com 2500Trying 10.10.56.246... telnet: connect to address 10.10.56.246: Connection refused telnet: Unable to connect to remote host: Connection refused real 0m3.045s user 0m0.000s sys 0m0.005s [root@dl385g2 ~]# iptables -F [root@dl385g2 ~]# iptables -A OUTPUT -p tcp --dport 2500 -j REJECT --reject-with tcp-reset [root@dl385g2 ~]# time telnet hp-dl385g5-1.gsslab.rdu.redhat.com 2500 Trying 10.10.56.246... telnet: connect to address 10.10.56.246: Connection refused telnet: Unable to connect to remote host: Connection refused real 0m0.010s user 0m0.002s sys 0m0.002s </snip> This event sent from IssueTracker by fbl [Support Engineering Group] issue 375421
Event posted on 12-15-2009 01:44am BRST by fhirtz It breaks between 129 and 132 (there's no archived builds for 130-131 to test). Based on this and the changelog, my initial suspicion is the innocuous patch which was added in 132 in this area: - [net] ipt_REJECT: properly handle IP options (Ivan Vecera ) [473504] RHkernel list: [RHEL5.4 PATCH] [net] ipt_REJECT: properly handle IP options The patch turned out to be quite a bit more involved than the small one that I had here earlier. This is the patchset that is causing the problem however. This event sent from IssueTracker by fbl [Support Engineering Group] issue 375421
Event posted on 12-15-2009 03:07pm BRST by fhirtz http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9ba99b0d3f45d0aedeafce1cfa4f720b19d04477 This event sent from IssueTracker by fbl [Support Engineering Group] issue 375421
Event posted on 12-16-2009 03:57am BRST by fhirtz Success with a simple patch on current 5.4: <snip> [root@dl385g2 ~]# uname -a Linux dl385g2.gsslab.rdu.redhat.com 2.6.18-164.9.1.it375421.el5 #1 SMP Tue Dec 15 16:37:39 EST 2009 x86_64 x86_64 x86_64 GNU/Linux [root@dl385g2 ~]# iptables -F [root@dl385g2 ~]# iptables -A OUTPUT -p tcp --dport 2500 -j REJECT [root@dl385g2 ~]# time telnet hp-dl385g5-1.gsslab.rdu.redhat.com 2500Trying 10.10.56.246... telnet: connect to address 10.10.56.246: Connection refused telnet: Unable to connect to remote host: Connection refused real 0m3.042s user 0m0.002s sys 0m0.002s [root@dl385g2 ~]# iptables -F [root@dl385g2 ~]# iptables -A OUTPUT -p tcp --dport 2500 -j REJECT --reject-with tcp-reset [root@dl385g2 ~]# time telnet hp-dl385g5-1.gsslab.rdu.redhat.com 2500 Trying 10.10.56.246... telnet: connect to address 10.10.56.246: Connection refused telnet: Unable to connect to remote host: Connection refused real 0m0.010s user 0m0.003s sys 0m0.001s </snip> This event sent from IssueTracker by fbl [Support Engineering Group] issue 375421
Event posted on 12-16-2009 03:59am BRST by fhirtz File uploaded: netfilter-it375421.patch This event sent from IssueTracker by fbl [Support Engineering Group] issue 375421 it_file 282224
Event posted on 12-16-2009 04:27am BRST by fhirtz SEG Escalation Template All Issues: Problem Description --------------------------------------------------- 1. Time and date of problem: Ongoing 2. System architecture(s): Any 3. Provide a clear and concise problem description as it is understood at the time of escalation. Please be as specific as possible in your description. Do not use the generic term "hang", as that can mean many things. Observed behavior: With 5.4, the iptables option "--reject-with-tcp-reset" doesn't work. Desired behavior: That it work. 4. Specific action requested of SEG: Submit fix for inclusion in RHEL5.4 5. Is a defect (bug) in the product suspected? yes/no Bugzilla number (if one already exists): Yes. 6. Does a proposed patch exist? yes/no Yes. If yes, attach patch, making sure it is in unified diff format (diff -pruN) Done. 7. What is the impact to the customer when they experience this problem? This is especially important for severity one and two issues: Example: "This system houses our accounts payable database. When the system crashes we are unable to process payroll, and other payable functions. This is especially critical as we approach end of our quarter." The client uses iptables to force resets on client transmit at times. With this being broken, the rule now acts as "DROP" which means that we have a 3 min delay. Regressions ----------- 1. Last known working version (specific name-version-release) of the package: kernel 2.6.18-129 2. Specific name-version-release of the package where the customer discovered the regression: kernel 2.6.18-164 3. Difference in the changelog between the two versions (do not provide the entire changelog contents). Run rpm -qp [packagename] --changelog to get the information from each package, then paste or attach the differences: Pertinent info noted in ticket. 4. Have you been able to narrow the regression down to a specific version? yes/no Yes. If yes, provide the version: kernel 2.6.9-132 5. Have you been able to narrow the regression down to a specific patch committed? yes/no yes. If yes, provide the version noted. 6. Provide the exact steps to reproduce the regression so SEG/engineering can narrow down where the regression was introduced if not already known and/or analyse how the regression is occurring. 1) On arbitrary server, open the port: nc -l 2500 (nc -l -p 2500 on RHEL4) 2) On client to be tested, reset iptables: 'iptables -F' 3) Add iptables rule for the port: 'iptables -A OUTPUT -p tcp --dport 2500 -j REJECT --reject-with tcp-reset' 4) time a telnet connection attempt: 'time telnet hp-dl385g5-1.gsslab.rdu.redhat.com 2500' 6) Profit. If it's working, the connection attempt should fail essentially immediately. If it's broken, it takes around 3 mins to close. The patch here is a very minor adaptation to an upstream patch mainly to include it in the right place, since we're using a "compat" function for the relevant code instead of the shipped default version. Issue escalated to Support Engineering Group by: fhirtz. Internal Status set to 'Waiting on SEG' Severity set to: Medium Priority set to: 2 This event sent from IssueTracker by fbl [Support Engineering Group] issue 375421
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~ RHEL 5.5 Beta has been released! There should be a fix present in this release that addresses your request. Please test and report back results here, by March 3rd 2010 (2010-03-03) or sooner. Upon successful verification of this request, post your results and update the Verified field in Bugzilla with the appropriate value. If you encounter any issues while testing, please describe them and set this bug into NEED_INFO. If you encounter new defects or have additional patch(es) to request for inclusion, please clone this bug per each request and escalate through your support representative.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html