Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1787318

Summary: [OVN] ovn-controller: crash due to use after free in I-P engine
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Dumitru Ceara <dceara>
Component: ovn2.12Assignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: unspecified    
Version: FDP 20.ACC: ctrautma, jishi, mmichels, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1787319 (view as bug list) Environment:
Last Closed: 2020-03-10 10:08:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1787319, 1802325, 1802716    
Attachments:
Description Flags
NB database for replicating the issue. none

Description Dumitru Ceara 2020-01-02 11:36:39 UTC
Created attachment 1649164 [details]
NB database for replicating the issue.

Description of problem:
With the attached scaled configuration if logical-switches are deleted ovn-controller might access freed memory and crash.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Start ovn-northd and point it to the attached northbound db (ovnnb_db.db).
2. Start ovn-controller.
3. Start OVS and bind the logical_switch_ports locally:

for i in $(ovn-nbctl --bare --columns name find logical_switch_port type=\"\"); do
    vm=$(echo $i | cut -f 1 -d "-")
    ovs-vsctl add-port br-int $vm -- set interface $vm type=internal
    ovs-vsctl set interface $vm external_ids:iface-id=$i
done

4. Delete all logical switches:
for s in $(ovn-nbctl list logical_switch | grep -E "^name" | cut -f 2 -d ':' | cut -f 2 -d '"'); do ovn-nbctl ls-del $s; done

Actual results:
ovn-controller might crash:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004b8f47 in hmap_first_with_hash (hmap=hmap@entry=0x91da08, hmap=hmap@entry=0x91da08, hash=2346380341)
    at ./include/openvswitch/hmap.h:328
328         return hmap_next_with_hash__(hmap->buckets[hash & hmap->mask], hash);


Expected results:
ovn-controller shouldn't use memory after it was freed.

Additional info:
Fixed upstream by commits:
2a4965c0e187db0c4218556ed9b06f988e88cb62: ovn-controller: Refactor I-P engine_run() tracking.
5ed53faecef12c09330ced445418c961cb1f8caf: ovn-controller: Add per node states to I-P engine.
2117ba0a91f36206d0f3665e8680c15f1f6fa0a0: ovn-controller: Add separate I-P engine node for processing ct-zones.
94cbc59dc0f1cb56e56d1551956efe5824561864: ovn-controller: Fix use of dangling pointers in I-P runtime_data.

Comment 3 Jianlin Shi 2020-02-04 09:25:24 UTC
reproduced on 2.12.0-19 with steps in description:

#!/bin/bash

systemctl restart openvswitch
systemctl restart ovn-northd
ovn-nbctl set-connection ptcp:6641                                                                    
ovn-sbctl set-connection ptcp:6642

ovs-vsctl set open . external-ids:system_id=hv1 external-ids:ovn-remote=tcp:20.0.30.25:6642 external-ids:ovn-encap-type=geneve external-ids:ovn-encap-ip=20.0.30.25

systemctl restart ovn-controller                                                                      

cp ovnnb_db.db /var/lib/ovn -f                                                                        
systemctl restart ovn-northd                                                                          
                                                                                                      

for i in $(ovn-nbctl --bare --columns name find logical_switch_port type=\"\"); do                    
    vm=$(echo $i | cut -f 1 -d "-")
    ovs-vsctl add-port br-int $vm -- set interface $vm type=internal
    ovs-vsctl set interface $vm external_ids:iface-id=$i                                              
done                                                                                                  

for s in $(ovn-nbctl list logical_switch | grep -E "^name" | cut -f 2 -d ':' | cut -f 2 -d '"'); do ovn-nbctl ls-del $s; done

[root@dell-per740-12 bz1787318]# rpm -qa | grep -E "openvswitch|ovn"
openvswitch2.12-2.12.0-21.el7fdp.x86_64                                                               
ovn2.12-2.12.0-19.el7fdp.x86_64                                                                       
ovn2.12-host-2.12.0-19.el7fdp.x86_64                                                                  
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch
ovn2.12-central-2.12.0-19.el7fdp.x86_64

log in /var/log/messages:

Feb  4 04:17:12 dell-per740-12 kernel: ovn-controller[109991]: segfault at 45ed761a8 ip 000056012d39b027 sp 00007fff94ebd890 error 4 in ovn-controller[56012d2b4000+23d000]


Verified on ovn2.12.0-26:

[root@dell-per740-12 bz1787318]# rpm -qa | grep -E "openvswitch|ovn"
openvswitch2.12-2.12.0-21.el7fdp.x86_64                                                               
ovn2.12-2.12.0-26.el7fdp.x86_64                                                                       
ovn2.12-central-2.12.0-26.el7fdp.x86_64                                                               
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch
ovn2.12-host-2.12.0-26.el7fdp.x86_64

no segfault error in /var/log/messages.

Comment 5 errata-xmlrpc 2020-03-10 10:08:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0752