Bug 1581824

Summary: Malformed lock file found: /var/cache/dnf/metadata_lock.pid.
Product: [Fedora] Fedora Reporter: Brian J. Murrell <brian.murrell>
Component: dnfAssignee: Jaroslav Rohel <jrohel>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rawhideCC: dmach, jberan, jmracek, jrohel, mblaha, mhatina, ncross, packaging-team-maint, pkratoch, rpm-software-management, vmukhame
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: dnf-4.2.7-2.fc30 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-30 01:14:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brian J. Murrell 2018-05-23 17:11:15 UTC
Description of problem:
We seem to be hitting occurrences of:

# dnf install --allowerasing -y <package list>

Malformed lock file found: /var/cache/dnf/metadata_lock.pid.
Ensure no other dnf process is running and remove the lock file manually or run systemd-tmpfiles --remove dnf.conf.

This seems to happen most often when a machine loses power unexpectedly.  This might seem like it should be a rare event but it happens more frequently in HA configurations where nodes need to be STONITHed for the integrity of the cluster.

I suspect that what happens is that the lock is taken by a running dnf process but before the data (i.e. the pid) can be flushed to the file, the machine is powered off and on again.  Once booted up, the filesystem journal is replayed creating the file but without it's contents.

What results is an empty lock file which DNF simply refuses to handle (https://github.com/rpm-software-management/dnf/blob/e4f3e91480647146b3bce542cccd2f675a5bc8e2/dnf/lock.py#L86-L104):

    def _try_read_lock(self):
        try:
            with open(self.target, 'r') as f:
                return int(f.readline())
        except IOError:
            return -1
        except ValueError:
            time.sleep(2)
            try:
                with open(self.target, 'r') as f:
                    return int(f.readline())
            except IOError:
                return -1
            except ValueError:
                msg = _('Malformed lock file found: %s.\n'
                        'Ensure no other dnf process is running and '
                        'remove the lock file manually or run '
                        'systemd-tmpfiles --remove dnf.conf.' % (self.target))
                raise LockError(msg)

I wonder if this is the best we can do.  Perhaps a system boot job to remove any stale locks is needed?

Any other ideas?

Comment 1 Jan Beran 2019-05-07 11:09:59 UTC
PR: https://github.com/rpm-software-management/dnf/pull/1393

The patch ensures that the empty lock file is not considered as the malformed lock file, and handles the situation as the process is not locked, i.e. writes the actual pid in the lock file.

Comment 2 Jan Beran 2019-05-16 09:28:36 UTC
Jaroslav continues to make a locking improvement, assigned to him.

Comment 3 Jaroslav Rohel 2019-05-29 11:51:36 UTC
Fixed in upstream. PR: https://github.com/rpm-software-management/dnf/pull/1398

Comment 4 Fedora Update System 2019-07-04 13:50:24 UTC
FEDORA-2019-58c2d3f1aa has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-58c2d3f1aa

Comment 5 Fedora Update System 2019-07-05 00:45:57 UTC
dnf-4.2.7-1.fc30, libdnf-0.35.1-1.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-58c2d3f1aa

Comment 6 Fedora Update System 2019-07-23 07:21:20 UTC
FEDORA-2019-672a74d688 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-672a74d688

Comment 7 Fedora Update System 2019-07-24 01:44:15 UTC
dnf-4.2.7-2.fc30, libdnf-0.35.1-2.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-672a74d688

Comment 8 Fedora Update System 2019-07-30 01:14:46 UTC
dnf-4.2.7-2.fc30, libdnf-0.35.1-2.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.