Bug 609343 - pluto crashes when removing logical interface
pluto crashes when removing logical interface
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: openswan (Show other bugs)
5.5
All Linux
high Severity high
: rc
: ---
Assigned To: Avesh Agarwal
Aleš Mareček
:
Depends On:
Blocks: 749605
  Show dependency treegraph
 
Reported: 2010-06-29 23:10 EDT by Issue Tracker
Modified: 2015-06-09 11:18 EDT (History)
8 users (show)

See Also:
Fixed In Version: openswan-2.6.32-3.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 749605 (view as bug list)
Environment:
Last Closed: 2012-02-21 00:58:46 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
proposed patch (1.04 KB, application/octet-stream)
2010-06-29 23:13 EDT, wmg
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0211 normal SHIPPED_LIVE openswan bug fix and enhancement update 2012-02-20 10:08:08 EST

  None (edit)
Description Issue Tracker 2010-06-29 23:10:35 EDT
Escalated to Bugzilla from IssueTracker
Comment 2 Issue Tracker 2010-06-29 23:10:39 EDT
Event posted on 03-30-2010 11:36pm CST by smayhew

Okay, I logged in and and reproduced this manually before I saw your notes
about the scripts.  Here's the sequence of events:

node1 # ip addr add 192.168.122.22/24 dev eth0
node1 # arping -c 1 -U -I eth0 192.168.122.22
node1 # service ipsec start
node2 # service ipsec start
node1 # ipsec auto --up conn3
node2 # ipsec auto --up conn3
node1 # ip addr del 192.168.122.22/24 dev eth0
node2 # ip addr add 192.168.122.22/24 dev eth0
node2 # arping -c 1 -U -I eth0 192.168.122.22
node2 # ipsec auto --ready
node2 # ipsec auto --up conn2
node1 # ipsec auto --ready
node1 # ipsec auto --up conn2
// wait for pluto to crash

I then went and modified your reproducer script and it worked exactly
once.  I think it might be a problem with the order in which processes are
being fired off in the background.  It might be easier to just do
everything in a sequential fashion, with a few seconds in between each
step.  I'll keep playing around with it to see if I can come up with
anything.


This event sent from IssueTracker by wezhang  [Support Engineering Group]
 issue 597033
Comment 3 Issue Tracker 2010-06-29 23:10:41 EDT
Event posted on 06-29-2010 02:42pm CST by wezhang


this is caused by processing the queued events for a connection which
already got down(conn3 here)
the following is the connections ipaddrs we defined in mesh.conf:

for conn3:
192.168.122.33(node2)
192.168.122.22(floating ip)

for conn2:
192.168.122.56(node1)
192.168.122.22(floating ip)


this bug just happens after the following two events happened sequently:
1. the floating ip got moved from node1 to node2
2. ipsec whack --listen

on node2, 'ipsec whack --listen' would trigger whack_process function,
the source code responsible for '--listen' request looks like:

void whack_process(int whackfd, struct whack_message msg)
{
    const struct osw_conf_options *oco = osw_init_options();
   
...
    if (msg.whack_listen)
    {
        fflush(stderr);
        fflush(stdout);
        close_peerlog();    /* close any open per-peer logs */
        openswan_log("listening for IKE messages");
        listening = TRUE;
        daily_log_reset();
        reset_adns_restart_count();
        set_myFQDN();
        find_ifaces(); 
        load_preshared_secrets(NULL_FD);
        load_groups();
    }
...
}

then the calling path for interfaces check is:
whack_process -> find_ifaces ->  free_dead_ifaces();

the following code is from free_dead_ifaces:
static void
free_dead_ifaces(void)
{
    struct iface_port *p;
    bool some_dead = FALSE
        , some_new = FALSE;
   
    for (p = interfaces; p != NULL; p = p->next)
    {  
        if (p->change == IFN_DELETE)
        {  
            openswan_log("shutting down interface %s/%s %s:%d"
                         , p->ip_dev->id_vname
                         , p->ip_dev->id_rname
                         , ip_str(&p->ip_addr), p->port);
            some_dead = TRUE;
        }
        else if (p->change == IFN_ADD)
        {  
            some_new = TRUE;
        }
    }
...
    /* this must be done after the release_dead_interfaces
     * in case some to the newly unoriented connections can
     * become oriented here.
     */
    if (some_dead || some_new)
        check_orientations();
}

because 192.168.122.22(floating ip) for conn2 already got moved to this
node, so some_new = TRUE, and would triger the following piece of code in
check_orientations():
void
check_orientations(void)
{
...
    
    /* Check that no oriented connection has become double-oriented.
     * In other words, the far side must not match one of our new
interfaces.
     */
    {   
        struct iface_port *i;

        for (i = interfaces; i != NULL; i = i->next)
        {                                                                 
                                                                   
            if (i->change == IFN_ADD) ---> we got floating ip for conn2
added in this node, so this condition got true
            {
                struct host_pair *hp;
               
                for (hp = host_pairs; hp != NULL; hp = hp->next)
                {  
                    if (sameaddr(&hp->him.addr, &i->ip_addr)
                        && (kern_interface!=NO_KERNEL || hp->him.host_port
== pluto_port))
                    {  
                        /* bad news: the whole chain of connections
                         * hanging off this host pair has both sides
                         * matching an interface.
                         * We'll get rid of them, using orient and
                         * connect_to_host_pair.  But we'll be lazy
                         * and not ditch the host_pair itself (the
                         * cost of leaving it is slight and cannot
                         * be induced by a foe).
                         */
                        struct connection *c = hp->connections;
                       
                        hp->connections = NULL;
                        while (c != NULL)
                        {  
                            struct connection *nxt = c->hp_next;
                           
                            c->interface = NULL;
                            (void)orient(c); 		--+---> reorient conn3 here
would triger this issue
                            connect_to_host_pair(c);	--/
                            c = nxt;
                        }
                    }
                }
            }
        }
    }
}
then let's look in orient()/connect_to_host_pai() function:
bool
orient(struct connection *c)
{   
    struct spd_route *sr;
	...
                for (;;)
                {   
                    /* check if this interface matches this end */
                    if (sameaddr(&sr->this.host_addr, &p->ip_addr)
                        && (kern_interface != NO_KERNEL
                            || sr->this.host_port == pluto_port))
                    {   
                        if (oriented(*c))
                        {   
                            if (c->interface->ip_dev == p->ip_dev)
                                loglog(RC_LOG_SERIOUS
                                       , "both sides of "%s" are
our interface %s!"
                                       , c->name, p->ip_dev->id_rname);
                            else
                                loglog(RC_LOG_SERIOUS, "two interfaces
match "%s" (%s, %s)"
                                       , c->name,
c->interface->ip_dev->id_rname, p->ip_dev->id_rname);
                            c->interface = NULL;        /* withdraw
orientation */
                            return FALSE;     --> for conn3, both
left/right end IPs are on node2, so would set ->interface = NULL
                        }
                        c->interface = p;
                    }
	...
}

void
connect_to_host_pair(struct connection *c)
{
    if (oriented(*c))
    {  
	...
    else
    {  
        /* since this connection isn't oriented, we place it
         * in the unoriented_connections list instead.
         */
        c->host_pair = NULL; 			--> host_pair becomes NULL
        c->hp_next = unoriented_connections;
        unoriented_connections = c;
    }
}
because we got host_pair NULL, so if a event already queued need to
dereference ->host_pair (like
host_pair_enqueue_pending/pending_check_timeout/unpend) would cause a
program crash
silimar situation happens on node1, but on node1 this caused by some
interface got removed(floating ip for conn3)




This event sent from IssueTracker by wezhang  [Support Engineering Group]
 issue 597033
Comment 4 wmg 2010-06-29 23:13:21 EDT
Created attachment 427849 [details]
proposed patch
Comment 6 RHEL Product and Program Management 2010-08-09 15:46:57 EDT
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.
Comment 8 RHEL Product and Program Management 2011-05-31 11:33:50 EDT
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.
Comment 10 Paul Wouters 2011-10-27 12:15:03 EDT
This issue applies to current openswan, so also RHEL6. We are going to check this fix and get back to you. Sorry this one slipped through
Comment 11 Avesh Agarwal 2011-10-27 12:25:01 EDT
Paul, are you using the same patch that is attached here or a different patch?
Comment 12 Paul Wouters 2011-10-27 16:02:36 EDT
I have not had time to look into the correctness of the patch yet. We have no patch at the moment.
Comment 15 errata-xmlrpc 2012-02-21 00:58:46 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0211.html

Note You need to log in before you can comment on or make changes to this bug.