Bug 864978 - cura-networking: incorrect locking
cura-networking: incorrect locking
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: cura-networking (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Radek Novacek
Fedora Extras Quality Assurance
:
Depends On:
Blocks: 864846
  Show dependency treegraph
 
Reported: 2012-10-10 10:37 EDT by Florian Weimer
Modified: 2016-11-30 19:31 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-11-21 09:00:14 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Florian Weimer 2012-10-10 10:37:30 EDT
This code in src/network.c is incorrect:

Network *network_ref(const CMPIBroker *broker)
{
    if (_network == NULL) {
        _network = network_new(broker);
    }
    while (_network->loaded != (NETWORK_DEVICES_LOADED | NETWORK_SETTINGS_LOADED)) {
        // We need to wait for other thread to fetch network devices and settings
        // TODO: don't use polling
        usleep(10000);
    }
    MUTEX_LOCK(_network);
    _network->ref_count++;
    MUTEX_UNLOCK(_network);
    return _network;
}

You need to protect _network with a lock, and perform the initialization and reading under it.  If locking on every read is problematic, you could use the double-checked locking idiom, but I'm not sure if it's valid under the memory model implemented in GCC.  Using pthread_once could be an option as well.

In the network_unref function, the assignment to _network must be performed under a lock:

void network_unref(Network *network)
{
    if (network == NULL || _network == NULL) {
        network = _network = NULL;
        return;
    }

    MUTEX_LOCK(network);
    network->ref_count--;
    MUTEX_UNLOCK(network);

    if (network->ref_count <= 0) {
        network_free(network);
        network = _network = NULL;
    }
}

The value of network->ref_count must be stored in a local variable under the lock, reading it outside the lock can lead to parallel calls to network_free.  (It is safe to call network_free outside the lock when the reference count has reached zero because there is only one thread in which this can happen.  In fact, you cannot call network_free on a locked object.)

The following is just an example, there are several similar cases with the same problem.  In the network_device_removed_cb function in src/network_nm.c file, network->ports is accessed outside the lock:

void network_device_removed_cb(NMClient *nm_client, NMDevice *device, Network *network)
{
    fprintf(stderr, "Device: %s - removed\n", nm_device_get_iface(device));
    int i, len = ports_length(network->ports);
    for (i = 0; i < len; i++) {
        if (strcmp(ports_index(network->ports, i)->id, nm_device_get_iface(device)) == 0) {
            break;
        }
    }
    if (i < len) {
        MUTEX_LOCK(network);
        port_free(ports_pop(network->ports, i));
        MUTEX_UNLOCK(network);
    }
}

This read has to happen under the lock.

As a rule of thumb, it is best to assume that writes are invisible to reads unless both happen while the same lock is held.  In other words, you must lock while reading *and* writing.
Comment 1 Radek Novacek 2012-11-21 09:00:14 EST
Thanks for the report. Locking has been fixed by explicit locking of each usage of the Network struct. Initialization of the Network struct is now done using pthead_once.

Upstream commit:
http://git.fedorahosted.org/cgit/openlmi-networking.git/commit/?id=c0328abc0977920e3fd0cbfa90566583a00ec72c

Note You need to log in before you can comment on or make changes to this bug.