Bug 864978

Summary: cura-networking: incorrect locking
Product: [Fedora] Fedora Reporter: Florian Weimer <fweimer>
Component: cura-networkingAssignee: Radek Novacek <rnovacek>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: ovasik, rnovacek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-21 14:00:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 864846    

Description Florian Weimer 2012-10-10 14:37:30 UTC
This code in src/network.c is incorrect:

Network *network_ref(const CMPIBroker *broker)
{
    if (_network == NULL) {
        _network = network_new(broker);
    }
    while (_network->loaded != (NETWORK_DEVICES_LOADED | NETWORK_SETTINGS_LOADED)) {
        // We need to wait for other thread to fetch network devices and settings
        // TODO: don't use polling
        usleep(10000);
    }
    MUTEX_LOCK(_network);
    _network->ref_count++;
    MUTEX_UNLOCK(_network);
    return _network;
}

You need to protect _network with a lock, and perform the initialization and reading under it.  If locking on every read is problematic, you could use the double-checked locking idiom, but I'm not sure if it's valid under the memory model implemented in GCC.  Using pthread_once could be an option as well.

In the network_unref function, the assignment to _network must be performed under a lock:

void network_unref(Network *network)
{
    if (network == NULL || _network == NULL) {
        network = _network = NULL;
        return;
    }

    MUTEX_LOCK(network);
    network->ref_count--;
    MUTEX_UNLOCK(network);

    if (network->ref_count <= 0) {
        network_free(network);
        network = _network = NULL;
    }
}

The value of network->ref_count must be stored in a local variable under the lock, reading it outside the lock can lead to parallel calls to network_free.  (It is safe to call network_free outside the lock when the reference count has reached zero because there is only one thread in which this can happen.  In fact, you cannot call network_free on a locked object.)

The following is just an example, there are several similar cases with the same problem.  In the network_device_removed_cb function in src/network_nm.c file, network->ports is accessed outside the lock:

void network_device_removed_cb(NMClient *nm_client, NMDevice *device, Network *network)
{
    fprintf(stderr, "Device: %s - removed\n", nm_device_get_iface(device));
    int i, len = ports_length(network->ports);
    for (i = 0; i < len; i++) {
        if (strcmp(ports_index(network->ports, i)->id, nm_device_get_iface(device)) == 0) {
            break;
        }
    }
    if (i < len) {
        MUTEX_LOCK(network);
        port_free(ports_pop(network->ports, i));
        MUTEX_UNLOCK(network);
    }
}

This read has to happen under the lock.

As a rule of thumb, it is best to assume that writes are invisible to reads unless both happen while the same lock is held.  In other words, you must lock while reading *and* writing.

Comment 1 Radek Novacek 2012-11-21 14:00:14 UTC
Thanks for the report. Locking has been fixed by explicit locking of each usage of the Network struct. Initialization of the Network struct is now done using pthead_once.

Upstream commit:
http://git.fedorahosted.org/cgit/openlmi-networking.git/commit/?id=c0328abc0977920e3fd0cbfa90566583a00ec72c