Bug 1948456

Summary: qpid-stat crashes when running in the loop
Product: Red Hat Enterprise MRG Reporter: Barbora Vassova <bvassova>
Component: qpid-toolsAssignee: messaging-bugs <messaging-bugs>
Status: NEW --- QA Contact: Messaging QE <messaging-qe-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.2CC: ataylor, daduval, jdanek, jross, messaging-bugs
Target Milestone: ---Flags: jdanek: needinfo? (mcressma)
ataylor: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Barbora Vassova 2021-04-12 08:39:44 UTC
Description of problem:
When running qpid-stat in a loop with some period of sleep between iterations, the qpid-stat command will throw the following exception:

Traceback (most recent call last):
  File "/usr/lib64/python2.6/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib/python2.6/site-packages/qpid/selector.py", line 185, in stop
    self.wakeup()
  File "/usr/lib/python2.6/site-packages/qpid/selector.py", line 97, in wakeup
    self.waiter.wakeup()
  File "/usr/lib/python2.6/site-packages/qpid/compat.py", line 119, in wakeup
    self._do_write()
  File "/usr/lib/python2.6/site-packages/qpid/compat.py", line 200, in _do_write
    os.write(self.write_fd, "\0")
OSError: [Errno 9] Bad file descriptor

Version-Release number of selected component (if applicable):
qpid-tools-1.36.0-22+hf5.el6_10.noarch


How reproducible:
not always, when I was reproducing the issue, the script was sometimes able to finish ok 

Steps to Reproduce:
1. use the following script

#!/bin/bash

if [ $# -ne 2 ]; then
  echo "$0 <port_number> <sleep_seconds>"
  exit 1
fi

PORT=$1
SLEEP_AMT=$2

echo "started at: $(date)"
for i in $(seq 1 50000); do
  echo "${0} ITER: ${i} $(date '+%T.%N')" | logger -t BEGIN
  qpid-stat -q -b localhost:${PORT} -S queue -L 50000 | grep -v Response >> $0.log
  echo "${0} ITER: ${i} $(date '+%T.%N')" | logger -t END
  grep -i error nohup.out 2>&1>/dev/null && kill -9 $$
  echo "$(date +%c.%N) stayin alive, stayin alive..."
  sleep ${SLEEP_AMT}
done

2. run with

# nohup ./reproducer.sh 5672 1 > nohup.out &

3. after ~2h observe the error.


Actual results:
Script errors out with bad descriptor error

Expected results:
Script should finish with "finished at: $(date)"


Additional info:
This has been reported by customer to be possible to patch with:

(output from patch file):

--- /usr/lib/python2.6/site-packages/qpid/selector.py	2017-08-15 19:38:01.000000000 +0000
+++ ./selector.py	2021-02-15 20:58:05.164764081 +0000
@@ -79,6 +79,7 @@
         atexit.register(sel.stop)
         Selector.DEFAULT = sel
         Selector._current_pid = os.getpid()
+        #os.system("echo 'pid: " + str(Selector._current_pid) + "' | logger -t SHAWNDEBUG")
       return Selector.DEFAULT
     finally:
       Selector.lock.release()
@@ -93,8 +94,14 @@
     self.exception = None

   def wakeup(self):
-    _check(self.exception)
-    self.waiter.wakeup()
+    Selector.lock.acquire()
+    try:
+      _check(self.exception)
+      self.waiter.wakeup()
+    except:
+      pass
+    finally:
+      Selector.lock.release()

   def register(self, selectable):
     self.selectables.add(selectable)
@@ -182,13 +189,18 @@
     """Stop the selector and wait for it's thread to exit. It cannot be re-started"""
     if self.thread and not self.stopped:
       self.stopped = SelectorStopped("qpid.messaging thread has been stopped")
+
+      #os.system("echo 'calling wakeup' | logger -t SHAWNDEBUG")
       self.wakeup()
+
+      #os.system("echo 'calling thread join' | logger -t SHAWNDEBUG")
       self.thread.join(timeout)

   def dead(self, e):
     """Mark the Selector as dead if it is stopped for any reason.  Ensure there any future
     attempt to use the selector or any of its connections will throw an exception.
     """
+    #os.system("echo 'we b dead' | logger -t SHAWNDEBUG")
     self.exception = e
     try:
       for sel in self.selectables.copy():

With this patch applied, when the exception would have been thrown, there is instead:
No handlers could be found for logger "qpid.messaging"