Bug 1299813 - "atomic run" is erratic when used multiple times in a row
"atomic run" is erratic when used multiple times in a row
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: atomic (Show other bugs)
7.2
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Lokesh Mandvekar
atomic-bugs@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-19 05:24 EST by Marius Vollmer
Modified: 2016-06-22 10:13 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-03 15:56:22 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Marius Vollmer 2016-01-19 05:24:21 EST
Description of problem:

Executing

    atomic run rhel7/rhel-tools sosreport --batch

multiple times back to back shows erratic behavior.  Sometimes "atomic run" terminates early and sosreport keeps running in the container.  Sometimes sosreport itself terminates early.

Version-Release number of selected component (if applicable):
atomic-1.6-6.gitca1e384.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
0. docker stop rhel-tools
1. docker rm rhel-tools
2. atomic run rhel7/rhel-tools sosreport --batch
   (finishes as expected)
3. atomic run rhel7/rhel-tools sosreport --batch
   (atomic returns early)
4. docker ps -a
   (sosreport still running)
5. atomic run rhel7/rhel-tools sosreport --batch
   (immediately after 4 while the container is still running)
   (sosreport gets killed half way through)
 
Actual results:
See steps.

Expected results:
Every "atomic run" invocation completes normally.

Additional info:

I think this depends on whether or not the rhel-tools container already exists, and whether or not it is already running.

When the rhel-tools containers doesn't exist (and "atomic run" creates it), execution happens normally.

When the rhel-tools container exists but is stopped, atomic run terminates early (when sosreport has written "Setting up archive") and sosreport continues to run in the container.

When the rhel-tools container is running, atomic tracks sosreport correctly, but sosreport seems to be get killed when the already running command in the container exits and the container is stopped.
Comment 2 Daniel Walsh 2016-01-27 10:30:53 EST
As you have determined 

The first time it executes a docker create followed by a docker run executing the specified command. 

If the rheltools container is not running, it will execute docker start of the container followed by docker exec of the command specified.

If the container is running it will execute docker exec.

I believe docker exec into a running container kill the container and the exec should be fixed in docker-1.9.

This could be causing the problem in the second case, if the docker start container exits before the docker exec, then sosreport will fail.
Comment 3 Daniel Walsh 2016-01-27 10:32:49 EST
It probably matters how the rheltools container was originally run.

What does the [cmd] of rheltools container look like?
Comment 4 Marius Vollmer 2016-01-28 07:23:35 EST
(In reply to Daniel Walsh from comment #3)
> It probably matters how the rheltools container was originally run.
> 
> What does the [cmd] of rheltools container look like?

After these steps

 0. docker stop rhel-tools
 1. docker rm rhel-tools
 2. atomic run rhel7/rhel-tools sosreport --batch

I get this output

# docker ps -a
CONTAINER ID        IMAGE               COMMAND               CREATED              STATUS                      PORTS               NAMES
4c8749f7fdf9        rhel7/rhel-tools    "sosreport --batch"   About a minute ago   Exited (0) 40 seconds ago                       rhel-tools

And after that:

# atomic run rhel7/rhel-tools bash
[root@localhost /]# pstree
systemd─┬─NetworkManager─┬─dhclient
        │                └─2*[{NetworkManager}]
        ├─agetty
        ├─chronyd
        ├─crond
        ├─dbus-daemon───{dbus-daemon}
        ├─dmeventd───2*[{dmeventd}]
        ├─docker─┬─bash───pstree
        │        ├─sosreport───timeout───stap-report───stap───make───make
        │        └─9*[{docker}]
        ├─gssproxy───5*[{gssproxy}]
        ├─login───bash───atomic───docker───4*[{docker}]
        ├─lvmetad───{lvmetad}
        ├─polkitd───5*[{polkitd}]
        ├─rhsmcertd
        ├─sshd
        ├─systemd-journal
        ├─systemd-logind
        ├─systemd-udevd
        ├─tuned───4*[{tuned}]
        └─wpa_supplicant
[root@localhost /]# 
-bash-4.2#

Thus, starting the container will start another sosreport run, and as soon as that ends, the container is killed, and with it the bash process.
Comment 5 Daniel Walsh 2016-01-28 08:15:01 EST
# docker stop rhel-tools
# docker rm rhel-tools
# atomic run rhel7/rhel-tools bash
# ^d
# atomic run rhel7/rhel-tools sosreport --batch

Does everything work correctly?

If I then do

# atomic run rhel7/rhel-tools bash
#

In one terminal and do 

# atomic run rhel7/rhel-tools sosreport --batch

In another, does everything finish successfully?
Comment 6 Marius Vollmer 2016-01-29 03:05:04 EST
(In reply to Daniel Walsh from comment #5)
> # docker stop rhel-tools
> # docker rm rhel-tools
> # atomic run rhel7/rhel-tools bash
> # ^d
> # atomic run rhel7/rhel-tools sosreport --batch
> 
> Does everything work correctly?

Yes.

> If I then do
> 
> # atomic run rhel7/rhel-tools bash
> #
> 
> In one terminal and do 
> 
> # atomic run rhel7/rhel-tools sosreport --batch
> 
> In another, does everything finish successfully?

Yes.


What about always letting "atomic run CMD" run rhel-tools with "sleep 100000000000d" as its command and then doing exec CMD, including with the default CMD=bash?
Comment 7 Marius Vollmer 2016-01-29 03:06:31 EST
> "sleep 100000000000d"

I am guessing that a real init is not necessary/doesn't work because this is a privileged container without its own PID namespace, right?
Comment 8 Marius Vollmer 2016-01-29 03:15:16 EST
> "sleep 100000000000d"

Heh, "sleep infinity" actually seems to work.  Let me wait whether it actually sleeps forever and then I come back to you.
Comment 9 Stephen Tweedie 2016-01-29 06:00:02 EST
(In reply to Marius Vollmer from comment #6)

> What about always letting "atomic run CMD" run rhel-tools with "sleep
> 100000000000d" as its command and then doing exec CMD, including with the
> default CMD=bash?

btw, that's what the sadc image does today.  It sleeps the main container in the background, so that cron jobs can run a docker exec to do hourly/nightly stats gathering.

The actual command used there is

# while loop checks exit code to avoid exiting on signals other than SIGTERM
while [ $? != 143 ] ; do sleep 999999999999 ; done

so in theory it's more robust against things like job control signals, if we ever have a use for those.
Comment 10 Marius Vollmer 2016-01-29 06:17:13 EST
(In reply to Stephen Tweedie from comment #9)> 
> # while loop checks exit code to avoid exiting on signals other than SIGTERM
> while [ $? != 143 ] ; do sleep 999999999999 ; done

Nice, thanks for sharing.
Comment 11 Daniel Walsh 2016-01-29 08:43:10 EST
The problem here is that the user executes the first rheltools container and this gets recorded into the docker database.  So that the next time the container "starts"  the original command gets executed.  

Most containers this is probably fine for, but if you run the rhel tools containers in a way other then the default

atomic run rhel7tools

It will record that and the start command will fail. 

atomic run rhel7tools man docker

For the first command would create a container that would just execute the man docker command. I not a huge fan of making the atomic command understand the rhel7tools container. but maybe we could make a label that tells atomic to just install the container without user commands.  IE Make

atomic run rhel7tools sosreport --batch

Do a the equivalent of 

atomic run rhel7tools
atomic run rhel7tools sosreport --batch
where it would create the container and start it in background and then take the user specified command and exec it into the container.
Comment 12 Marius Vollmer 2016-02-12 09:45:35 EST
> IE Make
> 
> atomic run rhel7tools sosreport --batch
> 
> Do a the equivalent of 
>
> atomic run rhel7tools
> atomic run rhel7tools sosreport --batch
> where it would create the container and start it in background and then take the user specified command and exec it into the container.

An alternative might be to never re-use an existing container with "atomic run", but to start a new one every time (with a new unique name).
Comment 13 Daniel Walsh 2016-02-12 09:52:00 EST
That is not the way that atomic-run was designed.  It was truly designed to handle the RHELtools container mode.   Basically launch an "Admin" shell.
Comment 14 Marius Vollmer 2016-06-06 03:05:34 EDT
Just to be clear: Is 

    atomic run rhel7/rhel-tools sosreport --batch

a supported way of running sosreport on RHEL Atomic?  If not, how should one run it instead?

What is the "command" parameter of "atomic run" good for, then?  For specifying an alternate interactive shell?  Maybe it is better to remove it, or output a warning when people use it.

Would it make sense to add "atomic exec IMAGE COMMAND", which would be a combination of "atomic run" and "docker exec".  A container is created/started if necessary, and then the command is executed in it via "docker exec".
Comment 15 Marius Vollmer 2016-06-21 07:19:49 EDT
Something has changed and executing

    atomic run rhel7/rhel-tools sosreport --batch

multiple times in a row now works as expected.

However, it still interferes with other uses of the rhel-tools container.

Just to summarize:

- If the "rhel-tools" container does not yet exist, it will be created and will
  remember "sosreport --batch" as its default command, and every subsequent
  execution of just

    atomic run rhel7/rhel-tools

  will not give you the expected shell, but will run sosreport again.

- If the container exists but is stopped, it will start, run sosreport,
  and stop when sosreport is done.  Any other "atomic run" while sosreport
  runs will be killed when sosreport is done.

- Likewise, if the container is already running some other command,
  sosreport will start executing but will be killed when the other command is
  done.

I think "atomic run" should never reuse an existing container.  It should always create a new one with a unique name, and remove it when it stops.
Comment 16 Marius Vollmer 2016-06-21 09:53:26 EDT
See https://github.com/projectatomic/atomic/pull/429
Comment 17 Daniel Walsh 2016-06-21 14:01:23 EDT
This is the way the tool was designed, basically it is going to execute a "Pet" container by default.

atomic run rhel7/rhel-tools 

Should create a container and all future executions should run that container.  I guess we could change the behavior a little to remove the container if a user specified a command and the container does not currently exist.

atomic run rhel7/rhel-tools would create a container, 
while 

atomic run rhel7/rhel-tools COMMAND would do a run once if the container does not exist, if the container does exist it will start the container and then execute the command inside the container.  If the container is running then it would just execute the command within the container.

That could fix this issue without fundamentally changing the way atomic run was designed to run.
Comment 18 Marius Vollmer 2016-06-22 03:03:59 EDT
(In reply to Daniel Walsh from comment #17)
> This is the way the tool was designed, basically it is going to execute a
> "Pet" container by default.

I see.

I think your proposal still has trouble with concurrent uses of "atomic run".  Doing "atomic run rhel7/rhel-tools" in two terminals at the same time still behaves wrong: If you exit the first one, the second one will be killed as well.


If you want a pet container, I propose that it gets a init-like main process (see comment 10), and everything (including the default bash) is always started via "docker exec".
Comment 19 Daniel Walsh 2016-06-22 08:27:00 EDT
Ok that might work.  

We could always run the first container with -d and then docker exec into the container.
Comment 20 Marius Vollmer 2016-06-22 08:48:28 EDT
> We could always run the first container with -d and then docker exec into the container.

All this need changes in the rhel7/rhel-tools container, no?
Comment 21 Daniel Walsh 2016-06-22 10:13:25 EDT
I would prefer not to have to change the rhel7/rhel-tools container, but to have a standard model where the pet container would just work.

Note You need to log in before you can comment on or make changes to this bug.