Bug 1948456 - qpid-stat crashes when running in the loop
Summary: qpid-stat crashes when running in the loop
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-tools
Version: 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: messaging-bugs
QA Contact: Messaging QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-12 08:39 UTC by Barbora Vassova
Modified: 2023-06-14 21:29 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:
jdanek: needinfo? (mcressma)
ataylor: needinfo-


Attachments (Terms of Use)

Description Barbora Vassova 2021-04-12 08:39:44 UTC
Description of problem:
When running qpid-stat in a loop with some period of sleep between iterations, the qpid-stat command will throw the following exception:

Traceback (most recent call last):
  File "/usr/lib64/python2.6/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib/python2.6/site-packages/qpid/selector.py", line 185, in stop
    self.wakeup()
  File "/usr/lib/python2.6/site-packages/qpid/selector.py", line 97, in wakeup
    self.waiter.wakeup()
  File "/usr/lib/python2.6/site-packages/qpid/compat.py", line 119, in wakeup
    self._do_write()
  File "/usr/lib/python2.6/site-packages/qpid/compat.py", line 200, in _do_write
    os.write(self.write_fd, "\0")
OSError: [Errno 9] Bad file descriptor

Version-Release number of selected component (if applicable):
qpid-tools-1.36.0-22+hf5.el6_10.noarch


How reproducible:
not always, when I was reproducing the issue, the script was sometimes able to finish ok 

Steps to Reproduce:
1. use the following script

#!/bin/bash

if [ $# -ne 2 ]; then
  echo "$0 <port_number> <sleep_seconds>"
  exit 1
fi

PORT=$1
SLEEP_AMT=$2

echo "started at: $(date)"
for i in $(seq 1 50000); do
  echo "${0} ITER: ${i} $(date '+%T.%N')" | logger -t BEGIN
  qpid-stat -q -b localhost:${PORT} -S queue -L 50000 | grep -v Response >> $0.log
  echo "${0} ITER: ${i} $(date '+%T.%N')" | logger -t END
  grep -i error nohup.out 2>&1>/dev/null && kill -9 $$
  echo "$(date +%c.%N) stayin alive, stayin alive..."
  sleep ${SLEEP_AMT}
done

2. run with

# nohup ./reproducer.sh 5672 1 > nohup.out &

3. after ~2h observe the error.


Actual results:
Script errors out with bad descriptor error

Expected results:
Script should finish with "finished at: $(date)"


Additional info:
This has been reported by customer to be possible to patch with:

(output from patch file):

--- /usr/lib/python2.6/site-packages/qpid/selector.py	2017-08-15 19:38:01.000000000 +0000
+++ ./selector.py	2021-02-15 20:58:05.164764081 +0000
@@ -79,6 +79,7 @@
         atexit.register(sel.stop)
         Selector.DEFAULT = sel
         Selector._current_pid = os.getpid()
+        #os.system("echo 'pid: " + str(Selector._current_pid) + "' | logger -t SHAWNDEBUG")
       return Selector.DEFAULT
     finally:
       Selector.lock.release()
@@ -93,8 +94,14 @@
     self.exception = None

   def wakeup(self):
-    _check(self.exception)
-    self.waiter.wakeup()
+    Selector.lock.acquire()
+    try:
+      _check(self.exception)
+      self.waiter.wakeup()
+    except:
+      pass
+    finally:
+      Selector.lock.release()

   def register(self, selectable):
     self.selectables.add(selectable)
@@ -182,13 +189,18 @@
     """Stop the selector and wait for it's thread to exit. It cannot be re-started"""
     if self.thread and not self.stopped:
       self.stopped = SelectorStopped("qpid.messaging thread has been stopped")
+
+      #os.system("echo 'calling wakeup' | logger -t SHAWNDEBUG")
       self.wakeup()
+
+      #os.system("echo 'calling thread join' | logger -t SHAWNDEBUG")
       self.thread.join(timeout)

   def dead(self, e):
     """Mark the Selector as dead if it is stopped for any reason.  Ensure there any future
     attempt to use the selector or any of its connections will throw an exception.
     """
+    #os.system("echo 'we b dead' | logger -t SHAWNDEBUG")
     self.exception = e
     try:
       for sel in self.selectables.copy():

With this patch applied, when the exception would have been thrown, there is instead:
No handlers could be found for logger "qpid.messaging"


Note You need to log in before you can comment on or make changes to this bug.