Bug 1960948

Summary:	Error refreshing container XXX: error acquiring lock 0 for container
Product:	Red Hat Enterprise Linux 8	Reporter:	Evgeni Golov <egolov>
Component:	podman	Assignee:	Jindrich Novy <jnovy>
Status:	CLOSED ERRATA	QA Contact:	Alex Jia <ajia>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	CentOS Stream	CC:	bbaude, bstinson, dwalsh, jligon, jnovy, jwboyer, lsm5, mheon, pthomas, tsweeney, umohnani, vrothber, ypu
Target Milestone:	beta	Keywords:	Reopened, Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	podman-3.3.0-0.11.el8 or newer	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-11-09 17:37:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1897579, 1989481
Bug Blocks:

Description Evgeni Golov 2021-05-16 18:04:34 UTC

Description of problem:
Ohai,

this is the same issue as reported in #1888988, which is closed current release because podman now ships a tmpfiles.d with the following content:
# cat /usr/lib/tmpfiles.d/podman.conf
# /tmp/podman-run-* directory can contain content for Podman containers that have run
# for many days. This following line prevents systemd from removing this content.
x /tmp/podman-run-*
x /tmp/containers-user-*
D! /run/podman 0700 root root
D! /var/lib/cni/networks

However, podman in CentOS 8 Stream currently still creates files in /tmp/run-* for me, which is not part of the above exclude list and gets wiped after 10 days by systemd, resulting in the above error.

Version-Release number of selected component (if applicable):
podman-3.1.0-0.13.module_el8.5.0+733+9bb5dffa.x86_64

How reproducible:
100

Steps to Reproduce:
1. start a container as a user

Actual results:
$ find /tmp/run-1001/libpod/
/tmp/run-1001/libpod/
/tmp/run-1001/libpod/tmp
/tmp/run-1001/libpod/tmp/events
/tmp/run-1001/libpod/tmp/events/events.log
/tmp/run-1001/libpod/tmp/events/events.log.lock
/tmp/run-1001/libpod/tmp/exits
/tmp/run-1001/libpod/tmp/exits/b1298e65541a70d7e8419adfcc38d608f674a0db1e3dd8221f72d5870754e150
/tmp/run-1001/libpod/tmp/exits/eb31e91d069d717cde2c7341bb2929b01ce3a9d2f64008570a0075fe31718e0e
/tmp/run-1001/libpod/tmp/socket
/tmp/run-1001/libpod/tmp/socket/3e219820f36f7111183ca7fc2fc1b432cb16057587d1c6e856f9df4f3fad0883
/tmp/run-1001/libpod/tmp/socket/b1298e65541a70d7e8419adfcc38d608f674a0db1e3dd8221f72d5870754e150
/tmp/run-1001/libpod/tmp/socket/1b4c4445c86e1996afd54a004081561e8be76e126624b3256c356f949073de4f
/tmp/run-1001/libpod/tmp/socket/1f3747805b1a2a147ccb53beda3040ef74f15a98e909a04be082b480dc23a40a
/tmp/run-1001/libpod/tmp/pause.pid
/tmp/run-1001/libpod/tmp/alive.lck
/tmp/run-1001/libpod/tmp/alive
/tmp/run-1001/libpod/tmp/rootlessport545139436
/tmp/run-1001/libpod/tmp/rootlessport545139436/.bp-ready.pipe
/tmp/run-1001/libpod/tmp/rootlessport545139436/.bp.sock

Expected results:
All podman related files are in /tmp/podman-run-1001/

Additional info:
/tmp/podman-run-1001/libpod exists, and is empty.
other folders in /tmp/podman-run-1001/ are not empty.

Comment 1 Evgeni Golov 2021-05-16 18:46:31 UTC

My guess would be this line:
https://github.com/containers/podman/blob/a6a3df0273d19197286d12a805d7bc34c787b25f/vendor/github.com/containers/common/pkg/config/util_supported.go#L51
(from https://github.com/containers/common/blob/93838eeafc5408398eb1d93ca23c067f500190c9/pkg/config/util_supported.go#L51)

Even tho I still don't understand why it doesn't use /run/user/1001 instead.

Comment 2 Evgeni Golov 2021-05-17 08:09:37 UTC

Aha!

$ podman info --log-level=DEBUG
…
DEBU[0000] Initializing boltdb state at /home/container/.local/share/containers/storage/libpod/bolt_state.db 
DEBU[0000] Overriding run root "/run/user/1001/containers" with "/tmp/podman-run-1001/containers" from database 
DEBU[0000] Overriding tmp dir "/run/user/1001/libpod/tmp" with "/tmp/run-1001/libpod/tmp" from database 
…
  runRoot: /tmp/podman-run-1001/containers

So it seems something also needs to "migrate" the database?

Comment 3 Matthew Heon 2021-05-17 13:09:25 UTC

Unfortunately, the only way to migrate this is by complete removal of the DB (and all your containers and pods with it).

Probably easier for us just to throw paths of that format in our tmpfiles.d upstream (you can fix locally by doing so now, until it gets picked up in a shipping package).

Comment 4 Evgeni Golov 2021-05-17 13:49:31 UTC

(In reply to Matthew Heon from comment #3)
> Unfortunately, the only way to migrate this is by complete removal of the DB
> (and all your containers and pods with it).

"migrate" :)
But yeah, I can do that.

Wonder what triggered it being set to those paths to begin with.

> Probably easier for us just to throw paths of that format in our tmpfiles.d
> upstream (you can fix locally by doing so now, until it gets picked up in a
> shipping package).

The original fix did change the paths in https://github.com/containers/podman/pull/8241 from run-* to podman-run-*, but seems not to have caught all the occurrences of "run-UID".

And yeah, I did the local workaround for now here.

Comment 5 Tom Sweeney 2021-05-17 17:19:19 UTC

Evgeni, given the discussion, this BZ looks like it can be closed as current release.  Do you concur?

Comment 6 Matthew Heon 2021-05-17 17:37:00 UTC

Disagree. We need to update our tmpfiles.d to ignore the new pattern.

Comment 7 Evgeni Golov 2021-05-17 17:38:15 UTC

No. At least with my limited knowledge, I see the following:

* podman ships /usr/lib/tmpfiles.d/podman.conf that excludes /tmp/podman-run-* and /tmp/containers-user-* from being erased by systemd-tmpfiles
* podman creates files in /tmp/podman-run-* *AND* in /tmp/run-* (the old path, pre https://github.com/containers/podman/pull/8241)
* systemd-tmpfiles still removes files in /tmp/run-*, breaking podman

We need to either
* also exclude /tmp/run-*
OR
* make podman not ever use /tmp/run-*

Comment 8 Daniel Walsh 2021-06-11 14:03:19 UTC

Podman 3.2 will no longer use /tmp/run, so this issue should be fixed with this release.

Comment 9 Tom Sweeney 2021-06-11 14:08:40 UTC

Fixe in Podman v3.2 and RHEL 8.4.0.2 and higher.

Assigning to Jindrich for any BZ/packaging needs.

Comment 10 Matthew Heon 2021-06-11 14:44:44 UTC

Negative. *New* Podman installs will not use `/tmp/run`. Existing ones will still used cached DB paths, which may include these paths.

Comment 12 Matthew Heon 2021-06-11 19:03:50 UTC

Needs to be moved back to Assigned

Comment 13 Daniel Walsh 2021-06-12 10:12:22 UTC

Why, I think we can just tell older containers to be updated.  This is rare, and will not happen going forward with new containers. (I believe).

We can close it WONTFIX, or just state that these containers need to be updated.  But I don't see us doing more engineering on it.

Comment 14 Jindrich Novy 2021-06-14 12:15:09 UTC

Closing WONTFIX as per comment #13.

Comment 15 Evgeni Golov 2021-06-14 18:23:26 UTC

(In reply to Daniel Walsh from comment #13)
> Why, I think we can just tell older containers to be updated.  This is rare,
> and will not happen going forward with new containers. (I believe).
> 
> We can close it WONTFIX, or just state that these containers need to be
> updated.  But I don't see us doing more engineering on it.

Okay, but *how*? By wiping ~/.local/share/containers? At least that's how I read https://bugzilla.redhat.com/show_bug.cgi?id=1960948#c3

Comment 16 Matthew Heon 2021-06-14 18:30:06 UTC

There is no way to update old containers. The only solution is to start from scratch with `podman system reset`.

Because of this I am not in favor of closing.

Comment 17 Tom Sweeney 2021-06-14 23:58:00 UTC

Based on the last comment, reopening and assigning back to Matt.

Comment 19 Daniel Walsh 2021-06-16 13:36:06 UTC

Matt do you have a better idea of how to handle this?

Comment 20 Matthew Heon 2021-06-16 14:27:03 UTC

I need to check how versatile the Systemd tmpfiles syntax is - I want to say we can get away with adding `x /tmp/run-*/libpod` to our tmpfiles entry, but I'm not sure they do wildcards anywhere except the last character.

Comment 21 Tom Sweeney 2021-06-16 15:50:14 UTC

Adding Valentin to the cc in case he's a thought or two.

Comment 22 Matthew Heon 2021-06-18 20:36:29 UTC

https://github.com/containers/podman/pull/10727 should fix, but I would appreciate further testing if anyone can. I verified that systemd-tmpfiles accepted the config but I'm not sure how to force it to reap directories, so I haven't verified it resolves this issue.

Comment 24 Alex Jia 2021-06-30 11:58:55 UTC

This bug has been verified on podman-3.3.0-0.11.module+el8.5.0+11598+600219b6.

[test@kvm-02-guest05 ~]$ podman run -d registry.access.redhat.com/rhel7 sleep infinity
4714d6d59160b1523fab3ea1bf4f898ab3b6dacfdab5c19049bb23e94a98ad3f

[test@kvm-02-guest05 ~]$ rpm -q podman crun kernel
podman-3.3.0-0.11.module+el8.5.0+11598+600219b6.x86_64
crun-0.20.1-1.module+el8.5.0+11621+c130ef1a.x86_64
kernel-4.18.0-316.el8.x86_64

[test@kvm-02-guest05 ~]$ grep libpod /usr/lib/tmpfiles.d/podman.conf
x /tmp/run-*/libpod

[test@kvm-02-guest05 ~]$ ls /tmp
run-1000  systemd-private-14b25a9cb2d14ef7bcc24bfdbafc8df5-chronyd.service-ktw18f  systemd-private-14b25a9cb2d14ef7bcc24bfdbafc8df5-postfix.service-wVWD8h

[test@kvm-02-guest05 ~]$ podman run -d registry.access.redhat.com/rhel7 sleep infinity
4714d6d59160b1523fab3ea1bf4f898ab3b6dacfdab5c19049bb23e94a98ad3f

[test@kvm-02-guest05 ~]$ podman ps
CONTAINER ID  IMAGE                                    COMMAND         CREATED        STATUS            PORTS       NAMES
4714d6d59160  registry.access.redhat.com/rhel7:latest  sleep infinity  6 seconds ago  Up 7 seconds ago              modest_engelbart

[test@kvm-02-guest05 ~]$ ls /tmp
run-1000  systemd-private-14b25a9cb2d14ef7bcc24bfdbafc8df5-chronyd.service-ktw18f  systemd-private-14b25a9cb2d14ef7bcc24bfdbafc8df5-postfix.service-wVWD8h

[test@kvm-02-guest05 ~]$ sudo /usr/bin/systemd-tmpfiles --clean
[test@kvm-02-guest05 ~]$ ls /tmp
run-1000  systemd-private-14b25a9cb2d14ef7bcc24bfdbafc8df5-chronyd.service-ktw18f  systemd-private-14b25a9cb2d14ef7bcc24bfdbafc8df5-postfix.service-wVWD8h

[test@kvm-02-guest05 ~]$ podman ps
CONTAINER ID  IMAGE                                    COMMAND         CREATED         STATUS             PORTS       NAMES
4714d6d59160  registry.access.redhat.com/rhel7:latest  sleep infinity  29 seconds ago  Up 30 seconds ago              modest_engelbart

Comment 25 Alex Jia 2021-07-01 02:36:20 UTC

(In reply to Alex Jia from comment #24)

[test@kvm-02-guest05 ~]$ grep 1m /usr/lib/tmpfiles.d/tmp.conf
v /tmp 1777 root root 1m

> [test@kvm-02-guest05 ~]$ sudo /usr/bin/systemd-tmpfiles --clean
> [test@kvm-02-guest05 ~]$ ls /tmp
> run-1000

Comment 28 Alex Jia 2021-08-06 10:00:35 UTC

This bug has been verified on podman-3.3.0-2.module+el8.5.0+12136+c1ac9593
w/ runc-1.0.1-4.module+el8.5.0+12048+8939a3ea.

Comment 30 errata-xmlrpc 2021-11-09 17:37:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: container-tools:rhel8 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4154