Bug 1431747 - After katello-agent installation goferd is not started
Summary: After katello-agent installation goferd is not started
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: katello-agent
Version: 6.2.8
Hardware: Unspecified
OS: Unspecified
high
medium vote
Target Milestone: Unspecified
Assignee: Eric Helms
QA Contact: Lukas Pramuk
URL:
Whiteboard:
Depends On:
Blocks: 1353215 1399395
TreeView+ depends on / blocked
 
Reported: 2017-03-13 16:58 UTC by Lukas Pramuk
Modified: 2020-06-11 13:24 UTC (History)
14 users (show)

Fixed In Version: katello-agent-2.9.0-2, gofer-2.7.7-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1465398 (view as bug list)
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 19038 0 High Closed After katello-agent installation goferd is not started 2020-12-15 21:07:53 UTC

Description Lukas Pramuk 2017-03-13 16:58:17 UTC
Description of problem:
After katello-agent installation goferd is not started

Version-Release number of selected component (if applicable):
@Satellite6.2.8
katello-agent-2.9.0-1.el7sat
gofer-2.7.6-1.el7sat

How reproducible:
100%

Steps to Reproduce:
1. # yum install katello-agent
2. # systemctl status goferd
● goferd.service - Gofer Agent
   Loaded: loaded (/usr/lib/systemd/system/goferd.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

Actual results:
goferd service is not started

Expected results:
goferd service is started


Additional info:

# rpm -q --scripts katello-agent
postinstall scriptlet (using /bin/sh):
chkconfig goferd on
touch /tmp/katello-agent-restart
exit 0

Comment 1 Lukas Pramuk 2017-03-13 17:02:28 UTC
And it is regression!
With katello-agent-2.5.0-5.el7sat it is working as it explicitly restarts goferd.

# systemctl status goferd
● goferd.service - Gofer Agent
   Loaded: loaded (/usr/lib/systemd/system/goferd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-03-13 16:46:31 UTC; 26s ago
 Main PID: 26453 (python)
   CGroup: /system.slice/goferd.service
           └─26453 python /usr/bin/goferd --foreground

# rpm -q --scripts katello-agentpostinstall scriptlet (using /bin/sh):
chkconfig goferd on
service goferd restart > /dev/null 2>&1
exit 0

Comment 4 Eric Helms 2017-03-27 14:56:41 UTC
Created redmine issue http://projects.theforeman.org/issues/19038 from this bug

Comment 5 Satellite Program 2017-03-27 22:16:58 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue http://projects.theforeman.org/issues/19038 has been resolved.

Comment 8 Lukas Pramuk 2017-04-04 11:35:30 UTC
VERIFIED.

katello-agent-2.9.0-2.el7sat.noarch

by manual reproducer in comment#0

1. # yum install katello-agent

2. # systemctl status goferd
● goferd.service - Gofer Agent
   Loaded: loaded (/usr/lib/systemd/system/goferd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2017-04-04 11:25:16 UTC; 22s ago
 Main PID: 10548 (python)
   CGroup: /system.slice/goferd.service
           └─10548 python /usr/bin/goferd --foreground


>>> after katello-agent installation goferd is running

Comment 9 Lukas Pramuk 2017-04-04 13:00:18 UTC
FailedQA.

katello-agent-2.9.0-2.el6sat.noarch

# yum install katello-agent

# service goferd status
goferd is not running.

>>> failed, lets see postinstall scriptlet

# rpm -q --scripts katello-agent
postinstall scriptlet (using /bin/sh):
chkconfig goferd on
  /sbin/service goferd start > /dev/null 2>&1 || :
touch /tmp/katello-agent-restart
exit 0

>>> if goferd is not running then /tmp/katello-agent-restart should be still present

# ll /tmp/katello-agent-restart
ls: cannot access /tmp/katello-agent-restart: No such file or directory

>>> File is not present. Did postinstall run at all? Before starting goferd is just "chkconfig goferd on"

# which chkconfig
/sbin/chkconfig

>>> oops, chkconfig on rhel6 is not in path but in "/sbin" while @rhel7 it is in path

(@rhel7) # which chkconfig  
/usr/sbin/chkconfig

>>> on rhel6 we have to have "/sbin/chkconfig goferd on" instead

Comment 10 Lukas Pramuk 2017-04-05 14:28:08 UTC
I wasn't right, %postinstall script is run both in OS and in provision time:

1. One command scenario 

# yum install gofer
# chkconfig goferd --list
goferd         	0:off	1:off	2:off	3:on	4:on	5:on	6:off

# chkconfig goferd on
# chkconfig goferd --list
goferd         	0:off	1:off	2:on	3:on	4:on	5:on	6:off

>>> "chkconfig goferd on" command enables also runlevel 2, this command is in %postinstall scriptlet so we can check if it is run 


2. katello-agent scenario in booted OS

# yum install gofer
# chkconfig goferd --list
goferd         	0:off	1:off	2:off	3:on	4:on	5:on	6:off

# yum install katello-agent
# chkconfig goferd --list
goferd         	0:off	1:off	2:on	3:on	4:on	5:on	6:off
# service goferd status
goferd is not running.

>>>  runlevel 2 got enabled the same as in previous scenario -  %postinstall script is run however goferd is not started


3. katello-agent scenario in provision time (chrooted post script)
# yum -y install gofer

# chkconfig goferd --list
goferd          0:off   1:off   2:off   3:on    4:on    5:on    6:off

# yum -y install katello-agent

# chkconfig goferd --list
goferd          0:off   1:off   2:on    3:on    4:on    5:on    6:off

# service goferd status
goferd is not running.

# service goferd start
Starting goferdTraceback (most recent call last):
  File "/usr/bin/goferd", line 17, in <module>
    from gofer.agent.main import main
  File "/usr/lib/python2.6/site-packages/gofer/agent/main.py", line 25, in <module>
    LogHandler.install()
  File "/usr/lib/python2.6/site-packages/gofer/agent/logutil.py", line 45, in install
    handler = LogHandler(address='/dev/log', facility=SysLogHandler.LOG_DAEMON)
  File "/usr/lib64/python2.6/logging/handlers.py", line 721, in __init__
    self._connect_unixsocket(address)
  File "/usr/lib64/python2.6/logging/handlers.py", line 737, in _connect_unixsocket
    self.socket.connect(address)
  File "<string>", line 1, in connect
socket.error: [Errno 111] Connection refused

>>> runlevel 2 got enabled the same - %postinstall script is run and goferd fails to start with traceback [Errno 111] Connection refused

Comment 11 Lukas Pramuk 2017-04-05 21:12:27 UTC
Zach,

starting rsyslog (when testing in chrooted postinstall environment) resolved the issue. goferd depends on rsyslog to be running otherwise it wont start.

Either goferd should try to setup logging and continue upon this exception.

Running "chkconfig goferd on" (in kickstart or in postinstall of katello-agent) worsens this situation as it enables goferd for runlevel 2 and it is too early and it may happen that rsyslogd is not running. That why after bootup it can be started manually (rsyslogd is running already)

Lets avoid using generic "chkconfig goferd on" in favor of "chkconfig goferd --levels 345 on" (but such command is run in postinstall of goferd rpm so it is not needed).

Anyway the issue is unrelated to this BZ. Lets file another BZ for it and close/verify this one. Do you agree on it?

Comment 12 Lukas Pramuk 2017-04-06 10:41:12 UTC
goferd is not running only after first reboot. After next reboot it is ok.
It seems that rsyslog during first reboot starts longer (due to some initialization?) 

With next reboots rsyslog is fast enough to be running when goferd starts up.

Also numbers ordering service startup are ok already, rsyslog having 12 while goferd 97
# ll /etc/rc3.d/*{goferd,rsyslog}
lrwxrwxrwx. 1 root root 17 Apr  6 09:33 /etc/rc3.d/S12rsyslog -> ../init.d/rsyslog
lrwxrwxrwx. 1 root root 16 Apr  6 09:35 /etc/rc3.d/S97goferd -> ../init.d/goferd

Comment 13 Zach Huntington-Meath 2017-04-06 15:19:57 UTC
I've been investigating this issue with Jeff Ortel and while the chkconfig is probably a separate problem, another issue is when the /tmp/katello-aget-restart file is created katello-agent tries to restart goferd (itself). Upon initiating the restart command the process tries to stop itself and then start itself again. First it sends the stop signal, but the way it stops allows for the process to finish what its doing first. So it continues the restart process by trying to start goferd again. When it does this it recognizes that it's already running and fails. With the way we have the spec file configured it is automatically creating the katello-agent-restart file on creation which is why it's not starting properly at first.

We are working on a fix for that right now to figure out a way to have this properly work on el6. Should we go ahead and make two separate BZs for the chkconfig issue and the fail on self-restart issue?

Comment 14 Lukas Pramuk 2017-04-06 15:32:00 UTC
I think the issue with goferd startup on rhel6 also belongs to this BZ. 

These are the changes brought in by %postinstall scriptlet in katello-agent 2.9.0
causing regression against katello-agent 2.5.0

The changes (removal of goferd startup and introducing signal file /tmp/katello-agent-restart) are related so lets track the final adjustment for rhel6 also here.

Comment 15 Jeff Ortel 2017-04-07 19:36:33 UTC
Problem:

The failure to restart on EL6 has to do with initd vs systemd.  When services are restarted using the 'service' command, the init.d script is executed within the calling process.  In the case of the katello-agent restart, the goferd service restart command is executed by a child of the goferd process.  The child process inherits the file open descriptor to /var/run/goferd.pid.  This means the newly started goferd cannot acquire the exclusive lock and refuses to start thinking a goferd is already running.  This works on Fedora and EL7 because the restart is executed by systemd.

Solution:

The init.d/gofer init script in 2.7.6 uses 'kill' whereas the 2.8+ uses 'killproc' from init.d/funcitions.  One of the advantages to using 'killproc' is that it removes the service's /var/run/<name>.pid file.  I suggest we backport the init.d/goferd script from gofer-2.11 to gofer-2.7.7 upstream.  This will be very low risk and (almost) only contains the change needed.

[jortel@localhost init.d]$ git diff gofer-2.7 master goferd
diff --git a/etc/init.d/goferd b/etc/init.d/goferd
index 5f47bcb..f8a6563 100755
--- a/etc/init.d/goferd
+++ b/etc/init.d/goferd
@@ -46,18 +46,10 @@ start() {
 
 stop() {
   echo -n "Stopping $PROG"
-  if [ -e $PID ]; then
-    pid=$(cat $PID)
-  fi
-  kill -9 $pid >/dev/null 2>&1
+  killproc -p $PID $PROG
   RETVAL=$?
-  if [ $RETVAL -eq "0" ]; then
-    rm -f $LOCK
-    success
-  else
-    failure
-  fi
   echo
+  rm -f $LOCK
   return $RETVAL
 }

Comment 18 Lukas Pramuk 2017-04-13 13:47:31 UTC
Unfortunately,

I'm not able to verify Tools for RHEL5 (though RHEL6 seems OK now)

# rpm -q katello-agent gofer
katello-agent-2.9.0-1.el5
gofer-2.7.6-1.el5

Comment 19 Lukas Pramuk 2017-04-20 17:25:14 UTC
VERIFIED.

katello-agent-2.9.0-2.el5
gofer-2.7.7-2.el5

by manual reproducer in comment#0

# yum -y install katello-agent ; service goferd status ; sleep 2 ; service goferd status ; sleep 2 ; service goferd status


RHEL7:

● goferd.service - Gofer Agent
   Loaded: loaded (/usr/lib/systemd/system/goferd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2017-04-20 17:11:35 UTC; 1s ago
 Main PID: 10687 (python)
...
● goferd.service - Gofer Agent
   Loaded: loaded (/usr/lib/systemd/system/goferd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2017-04-20 17:11:35 UTC; 3s ago
 Main PID: 10687 (python)
...
● goferd.service - Gofer Agent
   Loaded: loaded (/usr/lib/systemd/system/goferd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2017-04-20 17:11:35 UTC; 5s ago
 Main PID: 10687 (python)
...


RHEL6:
...
Complete!
goferd is not running.
goferd (2204) is running.
goferd (2204) is running.


RHEL5:
...
Complete!
goferd (2973) is running.
goferd is not running.
goferd (3038) is running.


>>> after katello-agent installation goferd is started and *keeps running* (though there is small difference in behavior across RHELs)

Comment 20 Bryan Kearney 2017-05-01 14:28:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1191

Comment 21 Bryan Kearney 2017-05-01 14:29:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1191


Note You need to log in before you can comment on or make changes to this bug.