Bug 1758500

Summary: Restarting the crio service remove config.json for podman containers.
Product: OpenShift Container Platform Reporter: Praveen Kumar <prkumar>
Component: ContainersAssignee: Peter Hunt <pehunt>
Status: CLOSED ERRATA QA Contact: weiwei jiang <wjiang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: aos-bugs, bbaude, cfergeau, dwalsh, jokerman, mheon, pehunt, scuppett
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: CRI-O not properly filtering podman containers on a restore (i.e. stop and start of CRI-O) Consequence: Starting up CRI-O caused CRI-O to see podman containers. Since podman containers don't have CRI-O specific metadata, CRI-O asked the storage library to delete them, as CRI-O mistakenly thought the podman containers were incorrectly created CRI-O containers. Fix: Properly filter podman containers on CRI-O restore Result: podman containers are no longer deleted from storage as a consequence of CRI-O starting
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-13 21:26:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Praveen Kumar 2019-10-04 10:24:45 UTC
Description of problem:

On a deployed node or a RHEL-8.0 independent machine where crio and podman is installed. First try to run any container using podman and then start the crio service which remove the config.json for podman container.

Version-Release number of selected component (if applicable):

[root@dhcp130-191 ~]# podman version
Version:            1.4.2-stable2
RemoteAPI Version:  1
Go Version:         go1.12.8
OS/Arch:            linux/amd64
[root@dhcp130-191 ~]# rpm -aq podman
podman-1.4.2-5.el8.x86_64
[root@dhcp130-191 ~]# rpm -aq cri-o
cri-o-1.14.10-0.21.dev.rhaos4.2.git0d4a906.el8.x86_64


How reproducible:


Steps to Reproduce:
```
[root@dhcp130-191 ~]# systemctl status crio
● crio.service - Open Container Initiative Daemon
   Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: https://github.com/cri-o/cri-o

# podman run -d -p 8000:80 httpd:latest
Trying to pull docker.io/library/httpd:latest...Getting image source signatures
Copying blob 1919d4fbf9e1 done
Copying blob 2c31b9311798 done
Copying blob 7422a3cdf4e3 done
Copying blob 60812fa1ab4c done
Copying blob b8f262c62ec6 done
Copying config 19459a8721 done
Writing manifest to image destination
Storing signatures
436df58ec26ebb735bbada139e18b83441530d4b1e4c44306251014f49cd1052

[root@dhcp130-191 ~]# podman ps
CONTAINER ID  IMAGE                           COMMAND           CREATED        STATUS            PORTS                 NAMES
436df58ec26e  docker.io/library/httpd:latest  httpd-foreground  3 seconds ago  Up 2 seconds ago  0.0.0.0:8000->80/tcp  hardcore_proskuriakova
[root@dhcp130-191 ~]# podman inspect 436df58ec26e
[
    {
        "Id": "436df58ec26ebb735bbada139e18b83441530d4b1e4c44306251014f49cd1052",
        "Created": "2019-10-04T06:15:44.387856646-04:00",
        "Path": "httpd-foreground",
        "Args": [
            "httpd-foreground"
        ],
[...]

[root@dhcp130-191 ~]# systemctl start crio
[root@dhcp130-191 ~]# systemctl status crio
● crio.service - Open Container Initiative Daemon
   Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2019-10-04 06:16:00 EDT; 4min 30s ago
     Docs: https://github.com/cri-o/cri-o
 Main PID: 24473 (crio)

[root@dhcp130-191 ~]# podman ps
CONTAINER ID  IMAGE                           COMMAND           CREATED        STATUS            PORTS                 NAMES
436df58ec26e  docker.io/library/httpd:latest  httpd-foreground  5 minutes ago  Up 5 minutes ago  0.0.0.0:8000->80/tcp  hardcore_proskuriakova

[root@dhcp130-191 ~]# podman inspect 436df58ec26e
Error: error getting libpod container inspect data 436df58ec26ebb735bbada139e18b83441530d4b1e4c44306251014f49cd1052: error getting container from store "436df58ec26ebb735bbada139e18b83441530d4b1e4c44306251014f49cd1052": container not known

[root@dhcp130-191 ~]# runc list
ID                                                                 PID         STATUS      BUNDLE                                                                                                                     CREATED                          OWNER
436df58ec26ebb735bbada139e18b83441530d4b1e4c44306251014f49cd1052   24342       running     /var/lib/containers/storage/overlay-containers/436df58ec26ebb735bbada139e18b83441530d4b1e4c44306251014f49cd1052/userdata   2019-10-04T10:15:45.027825532Z   root

[root@dhcp130-191 ~]# runc exec -t 436df58ec26ebb735bbada139e18b83441530d4b1e4c44306251014f49cd1052 /bin/bash
exec failed: JSON specification file config.json not found
```


Actual results:
If a container already running using podman then restarting crio service (if not running) cause the config file disappear which make that specific container useless :(


Expected results:
crio service should not conflict by any way to podman.


Additional info:

$ sudo podman info
host:
  BuildahVersion: 1.9.0
  Conmon:
    package: podman-1.4.2-5.el8.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 0.3.0, commit: unknown'
  Distribution:
    distribution: '"rhcos"'
    version: "4.2"
  MemFree: 9147961344
  MemTotal: 16814522368
  OCIRuntime:
    package: runc-1.0.0-61.rc8.rhaos4.2.git3cbe540.el8.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 0
  SwapTotal: 0
  arch: amd64
  cpus: 4
  hostname: ip-10-0-155-146
  kernel: 4.18.0-80.11.2.el8_0.x86_64
  os: linux
  rootless: false
  uptime: 1h 31m 23.49s (Approximately 0.04 days)
registries:
  blocked: null
  insecure: null
  search:
  - registry.access.redhat.com
  - docker.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 132
  GraphDriverName: overlay
  GraphOptions: null
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 35
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

Comment 1 Matthew Heon 2019-10-04 17:00:33 UTC
I think this might be related to CRI-O wipe.

Comment 2 Peter Hunt 2019-10-04 18:36:47 UTC
Ah, while it may seem as though it could be crio-wipe, it's actually crio failing to restore the container, and deciding to remove it (despite not really owning it). This is definitely a bug. Looking into a fix now.

Comment 7 weiwei jiang 2019-10-15 08:59:15 UTC
Checked with following version, and the issue is fixed now.

[root@qe-wj-6lx69-worker-g7pvh core]# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:376b65e856d3e939fbe6f4440ad3d670ca9b5a9f90483e25f2baf7b0095da68f
              CustomOrigin: Managed by machine-config-operator
                   Version: 43.80.20191014.2 (2019-10-14T23:57:48Z)

[root@qe-wj-6lx69-worker-g7pvh core]# rpm -qa|grep -i -E "cri-o|podman"
podman-manpages-1.6.2-0.3.gita8993ba.el8.noarch
podman-1.6.2-0.3.gita8993ba.el8.x86_64
cri-o-1.14.11-0.23.dev.rhaos4.2.gitc41de67.el8.x86_64


[root@qe-wj-6lx69-worker-g7pvh core]# podman run -d httpd:alpine                                                                        
Trying to pull registry.access.redhat.com/httpd:alpine...   
  name unknown: Repo not found                        
Trying to pull docker.io/library/httpd:alpine...                                                                                                                                                                                                                                
Getting image source signatures                                                                                                                                                                                                                                                 
Copying blob 9d48c3bd43c5 done                                                                                                                                                                                                                                                  
Copying blob d3565940ff69 done                                                                                                                                                                                                                                                  
Copying blob 17877ce0de23 done                                                                                                                                                                                                                                                  
Copying blob 1cc6c921162a done              
Copying blob 4e10ed3cf6fc done                                                                                                                                                                                                                                                  
Copying config 141bb8d01f done                                                                                                                                                                                                                                                  
Writing manifest to image destination                                                                                                   
Storing signatures                             
01747085de95d1d28e0ddd873d84c523f9e1f0b25b364bc5232cd0ce24ac84b5
[root@qe-wj-6lx69-worker-g7pvh core]# podman ps 
CONTAINER ID  IMAGE                           COMMAND           CREATED        STATUS           PORTS  NAMES
01747085de95  docker.io/library/httpd:alpine  httpd-foreground  2 seconds ago  Up 1 second ago         brave_gould
[root@qe-wj-6lx69-worker-g7pvh core]# systemctl restart crio
[root@qe-wj-6lx69-worker-g7pvh core]# systemctl status crio
● crio.service - Open Container Initiative Daemon
   Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/crio.service.d
           └─10-default-env.conf
   Active: active (running) since Tue 2019-10-15 08:47:05 UTC; 14s ago
     Docs: https://github.com/cri-o/cri-o
 Main PID: 173974 (crio)
    Tasks: 20
   Memory: 56.1M
      CPU: 1.460s
   CGroup: /system.slice/crio.service
           └─173974 /usr/bin/crio --enable-metrics=true --metrics-port=9537

Oct 15 08:47:04 qe-wj-6lx69-worker-g7pvh systemd[1]: Starting Open Container Initiative Daemon...
Oct 15 08:47:05 qe-wj-6lx69-worker-g7pvh systemd[1]: Started Open Container Initiative Daemon.
[root@qe-wj-6lx69-worker-g7pvh core]# podman ps 
CONTAINER ID  IMAGE                           COMMAND           CREATED         STATUS             PORTS  NAMES
01747085de95  docker.io/library/httpd:alpine  httpd-foreground  38 seconds ago  Up 38 seconds ago         brave_gould
[root@qe-wj-6lx69-worker-g7pvh core]# podman inspect 01747085de95
[
    {
        "Id": "01747085de95d1d28e0ddd873d84c523f9e1f0b25b364bc5232cd0ce24ac84b5",
        "Created": "2019-10-15T08:46:53.934441274Z",
        "Path": "httpd-foreground",
        "Args": [
            "httpd-foreground"
        ],
        "State": {
            "OciVersion": "1.0.1-dev",
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
[...] 


Also checked with 4.2 version, also fixed.
[root@preserve-42stg-5wt9r-worker-zmgnz core]# rpm -qa|grep -i -E "cri-o|podman"
podman-manpages-1.4.2-5.el8.noarch
cri-o-1.14.11-0.23.dev.rhaos4.2.gitc41de67.el8.x86_64
podman-1.4.2-5.el8.x86_64
[root@preserve-42stg-5wt9r-worker-zmgnz core]# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:db2b9ac6cd5ae6eb30b1b2c5f9739734edc7b628862072fb7399b4377684265b
              CustomOrigin: Managed by machine-config-operator
                   Version: 42.80.20191010.0 (2019-10-10T20:18:10Z)

[root@preserve-42stg-5wt9r-worker-zmgnz core]# podman run -d httpd:alpine
Trying to pull registry.access.redhat.com/httpd:alpine...ERRO[0000] Error pulling image ref //registry.access.redhat.com/httpd:alpine: Error initializing source docker://registry.access.redhat.com/httpd:alpine: Error reading manifest alpine in registry.access.redhat.com/h
ttpd: name unknown: Repo not found                                                                                                                                                                                                                                              
Failed                                             
Trying to pull docker.io/library/httpd:alpine...Getting image source signatures
Copying blob d3565940ff69 done                                                                                                          
Copying blob 4e10ed3cf6fc done                                                                                                          
Copying blob 1cc6c921162a done                        
Copying blob 9d48c3bd43c5 done
Copying blob 17877ce0de23 done                                                                                                                                                                                                                                                  
Copying config 141bb8d01f done                                                                                                                                                                                                                                                  
Writing manifest to image destination                                                                                                                                                                                                                                           
Storing signatures                                                                                                                                                                                                                                                              
73da05b0be6142883f89ad770de50a8317352b0ae4de91acb1f79e7d0646a6f1                                                                                                                                                                                                                                 
[root@preserve-42stg-5wt9r-worker-zmgnz core]# podman ps           
CONTAINER ID  IMAGE                           COMMAND           CREATED        STATUS            PORTS  NAMES
73da05b0be61  docker.io/library/httpd:alpine  httpd-foreground  7 seconds ago  Up 6 seconds ago         hungry_kepler
[root@preserve-42stg-5wt9r-worker-zmgnz core]# systemctl restart crio
[root@preserve-42stg-5wt9r-worker-zmgnz core]# systemctl status !$ 
systemctl status crio 
● crio.service - Open Container Initiative Daemon
   Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/crio.service.d
           └─10-default-env.conf
   Active: active (running) since Tue 2019-10-15 08:56:15 UTC; 15s ago
     Docs: https://github.com/cri-o/cri-o
 Main PID: 67998 (crio)
    Tasks: 21
   Memory: 61.1M
      CPU: 1.777s
   CGroup: /system.slice/crio.service
           └─67998 /usr/bin/crio --enable-metrics=true --metrics-port=9537

Oct 15 08:56:15 preserve-42stg-5wt9r-worker-zmgnz crio[67998]:  with error: exit status 1]"
Oct 15 08:56:15 preserve-42stg-5wt9r-worker-zmgnz crio[67998]: time="2019-10-15 08:56:15.449802346Z" level=error msg="error loading cached network config: network "multus-cni-network" not found in CNI cache"
Oct 15 08:56:15 preserve-42stg-5wt9r-worker-zmgnz crio[67998]: time="2019-10-15 08:56:15.451606368Z" level=error msg="Error while checking pod to CNI network "multus-cni-network": neither IPv4 nor IPv6 found when retrieving network status: [Unexpected command output nsen>
Oct 15 08:56:15 preserve-42stg-5wt9r-worker-zmgnz crio[67998]:  with error: exit status 1 Unexpected command output nsenter: cannot open /proc/4527/ns/net: No such file or directory
Oct 15 08:56:15 preserve-42stg-5wt9r-worker-zmgnz crio[67998]:  with error: exit status 1]"
Oct 15 08:56:15 preserve-42stg-5wt9r-worker-zmgnz crio[67998]: time="2019-10-15 08:56:15.452316959Z" level=error msg="error loading cached network config: network "multus-cni-network" not found in CNI cache"
Oct 15 08:56:15 preserve-42stg-5wt9r-worker-zmgnz crio[67998]: time="2019-10-15 08:56:15.454277401Z" level=error msg="Error while checking pod to CNI network "multus-cni-network": neither IPv4 nor IPv6 found when retrieving network status: [Unexpected command output nsen>
Oct 15 08:56:15 preserve-42stg-5wt9r-worker-zmgnz crio[67998]:  with error: exit status 1 Unexpected command output nsenter: cannot open /proc/4485/ns/net: No such file or directory
Oct 15 08:56:15 preserve-42stg-5wt9r-worker-zmgnz crio[67998]:  with error: exit status 1]"
Oct 15 08:56:15 preserve-42stg-5wt9r-worker-zmgnz systemd[1]: Started Open Container Initiative Daemon.
[root@preserve-42stg-5wt9r-worker-zmgnz core]# podman ps 
CONTAINER ID  IMAGE                           COMMAND           CREATED         STATUS             PORTS  NAMES
73da05b0be61  docker.io/library/httpd:alpine  httpd-foreground  37 seconds ago  Up 36 seconds ago         hungry_kepler
[root@preserve-42stg-5wt9r-worker-zmgnz core]# podman inspect 73da05b0be61
[
    {
        "Id": "73da05b0be6142883f89ad770de50a8317352b0ae4de91acb1f79e7d0646a6f1",
        "Created": "2019-10-15T08:55:59.385370338Z",
        "Path": "httpd-foreground",
        "Args": [
            "httpd-foreground"
        ],
        "State": {
            "OciVersion": "1.0.1-dev",
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
[...]

Comment 9 errata-xmlrpc 2020-05-13 21:26:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062