Description of problem: rhcert listener daemon does not always start Version-Release number of selected component (if applicable): redhat-certification-1.0-20150109.1.el7.noarch and earlier How reproducible: No reproducible case yet, so it appears nearly random. Often the error will occur several times in a row, and seems to clear up by itself Steps to Reproduce: 1. install redhat-certification 2. run `rhcert-backend server start` 3. check logs Actual results (first results from a fresh install): # rhcert-backend server start registered Test from rhcert.test Starting rhcert daemon Starting rhcert listener followed immediately by checking status # rhcert-backend server status registered Test from rhcert.test The rhcert daemon is running The rhcert listener is NOT running there's also a traceback in the logs Expected results: listener starts instead of resulting in a traceback. Additional info: [ /var/log/rhcert/RedHatCertificationListener.log ] registered Test from rhcert.test The rhcert daemon is already started The rhcert listener is running Stopping rhcert listener Starting listener Traceback (most recent call last): File "/usr/bin/rhcert-backend", line 37, in <module> success = rhcertBackend.do(args) File "/usr/lib/python2.7/site-packages/rhcert/client/backend.py", line 162, in do return self.doServer(args) File "/usr/lib/python2.7/site-packages/rhcert/client/backend.py", line 205, in doServer return listener.run() File "/usr/lib/python2.7/site-packages/rhcert/listener/listen.py", line 89, in run allow_none=True) File "/usr/lib64/python2.7/SimpleXMLRPCServer.py", line 593, in __init__ SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__ self.server_bind() File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind self.socket.bind(self.server_address) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 98] Address already in use [ /var/log/rhcert/RedHatCertDaemon.log ] registered Test from rhcert.test The rhcert listener is already started The rhcert daemon is running Stopping rhcert daemon Starting daemon
several minutes later, without running any other commands on the system: # rhcert-backend server start registered Test from rhcert.test The rhcert daemon is already started Starting rhcert listener # rhcert-backend server status registered Test from rhcert.test The rhcert daemon is running The rhcert listener is running
The first time the server is started, it always gives the error. If followed immediately with `rhcert-backend server start` (a 2nd time overall), the listener daemon will launch. This occurs each time the server is stopped. After `rhcert-backend server stop`, the next start will not successfully start the listener daemon and instead gives the error above.
I'm seeing almost the identical traceback on another system. I was about to dismiss the issue as this is a notoriously unstable box, but maybe there's more to it. I have to give this machine back (it's an Intel IoT test system), so I will likely not be able to deliver it for further testing. Installed versions ------------------ redhat-certification-hardware-1.7.1-20150304.el7.noarch redhat-certification-1.0-20150505.el7.noarch redhat-certification-information-1.7.1-20150304.el7.noarch Contents of /var/log/rhcert/RedHatCertificationListener.log ----------------------------------------------------------- The rhcert daemon is already started The rhcert listener is running Stopping rhcert listener Starting listener Traceback (most recent call last): File "/bin/rhcert-backend", line 37, in <module> success = rhcertBackend.do(args) File "/usr/lib/python2.7/site-packages/rhcert/client/backend.py", line 165, in do return self.doServer(args) File "/usr/lib/python2.7/site-packages/rhcert/client/backend.py", line 209, in doServer return listener.run() File "/usr/lib/python2.7/site-packages/rhcert/listener/listen.py", line 102, in run allow_none=True) File "/usr/lib64/python2.7/SimpleXMLRPCServer.py", line 593, in __init__ SocketServer.TCPServer.__init__(self, addr, requestHandler, bind_and_activate) File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__ self.server_bind() File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind self.socket.bind(self.server_address) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 98] Address already in use
observations: That error is from trying to start listener at port 8009, where it is not freed properly. You can see the port is used with lsof -i | grep 8009. Stopping the rhcert server (rhcert-backend server stop) AND httpd (service httpd stop) frees port 8009. Running rhcert-backend server start again, appears to run fine, shows no issues. But running status shows that listener is started but not daemon, even though output from running start command was that the daemon was already started...
status of 8009 using netstat -vatn is TIME_WAIT.
I mean, after calling rhcert-backend server stop.
Getting the Address already in use error mentioned above, EVEN THOUGH... 1) ran rhcert-backend server stop 2) service httpd stop 3) lsof -t -i:8009 has no results 4) waited until netstat -vatn | grep 8009 has no results But after running it the first time and it fails putting out the error to the log file, running rhcert-backend server start back to back seems like it works consistently for listener.
This seems to happen more on el6.
*** Bug 1228240 has been marked as a duplicate of this bug. ***
In my case there were no open sockets. The following workaround worked for me # /usr/bin/rhcert-backend server stop # nohup /usr/bin/python /usr/bin/rhcert-backend server listener &
I do consider this issue a bug. So if you need logs or remote session to debug this I will be glad to cooperate. Also my workaround works for me, so this is not urgent.
verified in: redhat-certification-2.0-20150916.el7.noarch redhat-certification-backend-2.0-20150916.el7.noarch
*** Bug 1260866 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2479.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days