Bug 1915318 - [Metal] bareMetal IPI - cannot interact with toolbox container after first execution only in parallel from different connection
Summary: [Metal] bareMetal IPI - cannot interact with toolbox container after first ex...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.7
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 4.7.0
Assignee: Timothée Ravier
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-12 12:37 UTC by Elena German
Modified: 2021-02-24 15:52 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:52:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:52:28 UTC

Description Elena German 2021-01-12 12:37:27 UTC
Description of problem:
the platform is IPI BareMetal
cannot interact with toolbox container, stuck session, no input is possible.
The only option to access it is to open another session and connect from it

Version-Release number of selected component (if applicable):
Cluster version is 4.7.0-0.nightly-2021-01-10-070949
toolbox-0.0.8-1.rhaos4.7.el8.noarch

How reproducible:
always

Steps to Reproduce:
1. oc debug node/<node-name>
2. chroot /host
3. toolbox

Actual results:
Stuck on:
sh-4.4# toolbox
Trying to pull registry.redhat.io/rhel8/support-tools...
Getting image source signatures
Copying blob d9e72d058dc5 skipped: already exists  
Copying blob cca21acb641a skipped: already exists  
Copying blob 5ee83610639d done  
Copying config be1f7079a9 done  
Writing manifest to image destination
Storing signatures
be1f7079a938a4ab5c1f8b4c7d2dc82b8c60598bb1e248438ced576829f9638

Expected results:
sh-4.4# toolbox
[root@toolbox /]#

Additional info:
On a first attempt session is stuck, until new oc debug session will not be opened and toolbox will not be run again:
[kni@provisionhost-0-0 ~]$ oc debug node/master-0-1
Starting pod/master-0-1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.123.148
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# rpm -q toolbox 
toolbox-0.0.8-1.rhaos4.7.el8.noarch
sh-4.4# toolbox
Error: error creating container storage: the container name "support-tools" is already in use by "e3801dcb314833f3ff7d0db68585b9f9be5d9c9f2bb097d23d4269f75a0bbf3a". You have to remove that container to be able to reuse that name.: that name is already in use
Error: `/proc/self/exe run -it --name support-tools --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=support-tools -e IMAGE=registry.redhat.io/rhel8/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host registry.redhat.io/rhel8/support-tools:latest` failed: exit status 125
Spawning a container 'toolbox-' with image 'registry.redhat.io/rhel8/support-tools'
[root@toolbox /]#

Comment 2 Micah Abbott 2021-01-12 22:03:22 UTC
The toolbox package is provided by the container-tools module, which RHCOS consumes as part of our OS manifest.

Moving to the container-tools component for triage

Comment 3 Elena German 2021-01-13 12:48:00 UTC
sure

Comment 4 Derrick Ornelas 2021-01-13 15:10:10 UTC
(In reply to Micah Abbott from comment #2)
> The toolbox package is provided by the container-tools module, which RHCOS
> consumes as part of our OS manifest.
> 
> Moving to the container-tools component for triage

Toolbox is actually maintained and packaged separately for OCP.  OCP 4.7 looks like it will ship with toolbox-0.0.8-1.rhaos4.7.el8.  RHEL 8 currently ships toolbox-0.0.4-1.module+el8.1.1+4407+ac444e5d as part of the container-tools module.  


I tried this on an OCP 4.6 cluster with toolbox-0.0.8-1.rhaos4.6.el8, which should be the same toolbox code that 4.7 has.  I wasn't able to reproduce the issue:

# ./oc debug node/worker0
Starting pod/worker0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.130.20
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# toolbox
Trying to pull registry.redhat.io/rhel8/support-tools...
Getting image source signatures
Copying blob 5ee83610639d done  
Copying blob cca21acb641a done  
Copying blob d9e72d058dc5 done  
Copying config be1f7079a9 done  
Writing manifest to image destination
Storing signatures
be1f7079a938a4ab5c1f8b4c7d2dc82b8c60598bb1e248438ced576829f96389
Spawning a container 'toolbox-' with image 'registry.redhat.io/rhel8/support-tools'
Detected RUN label in the container image. Using that as the default...
command: podman run -it --name toolbox- --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=toolbox- -e IMAGE=registry.redhat.io/rhel8/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host registry.redhat.io/rhel8/support-tools:latest
[root@worker0 /]# 
[root@worker0 /]# exit
exit
sh-4.4# toolbox
Container 'toolbox-' already exists. Trying to start...
(To remove the container and start with a fresh toolbox, run: sudo podman rm 'toolbox-')
toolbox-
Container started successfully. To exit, type 'exit'.
[root@worker0 /]# 
[root@worker0 /]# rpm -q sosreport
package sosreport is not installed
[root@worker0 /]# exit
exit


So I suspect that it may be a difference between podman-1.9.3-3.rhaos4.6.el8 included in 4.6 and podman-2.0.5-5.module+el8.3.0+8221+97165c3f included in 4.7

Comment 5 Micah Abbott 2021-01-14 15:55:13 UTC
After discussing with Debarshi and other members of the Desktop team, we are going to move RHCOS related `toolbox` BZs back to the RHCOS component.

Comment 6 Timothée Ravier 2021-01-15 12:34:11 UTC
This one should be fixed with https://github.com/coreos/toolbox/pull/67

Comment 7 Timothée Ravier 2021-01-15 15:30:57 UTC
Will likely be fixed in upcoming sprint (needs code review & packaging).

Comment 8 Timothée Ravier 2021-01-22 17:38:04 UTC
See also https://bugzilla.redhat.com/show_bug.cgi?id=1877186

Comment 11 Michael Nguyen 2021-01-25 22:04:38 UTC
Verified on RHCOS 47.83.202101251242-0 which is a part of 4.7.0-0.nightly-2021-01-25-160335


$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-01-25-160335   True        False         35m     Cluster version is 4.7.0-0.nightly-2021-01-25-160335


$ oc debug node/ip-10-0-154-51.us-west-2.compute.internal
Starting pod/ip-10-0-154-51us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# toolbox
Spawning a container 'toolbox-root' with image 'registry.redhat.io/rhel8/support-tools'
Detected RUN label in the container image. Using that as the default...
[root@ip-10-0-154-51 /]# exit
exit
sh-4.4# toolbox
Container 'toolbox-root' already exists. Trying to start...
(To remove the container and start with a fresh toolbox, run: sudo podman rm 'toolbox-root')
toolbox-root
Container started successfully. To exit, type 'exit'.
bash-4.2# exit
exit
sh-4.4# toolbox
Container 'toolbox-root' already exists. Trying to start...
(To remove the container and start with a fresh toolbox, run: sudo podman rm 'toolbox-root')
toolbox-root
Container started successfully. To exit, type 'exit'.
bash-4.2# exit
exit
sh-4.4# toolbox
Container 'toolbox-root' already exists. Trying to start...
(To remove the container and start with a fresh toolbox, run: sudo podman rm 'toolbox-root')
toolbox-root
Container started successfully. To exit, type 'exit'.
bash-4.2# exit
exit
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

Comment 17 errata-xmlrpc 2021-02-24 15:52:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.