Bug 1890080

Summary: subscription-manager failing with IOError: [Errno 2] No such file or directory: '/proc/<pid>/status' [rhel-7.9.z]
Product: Red Hat Enterprise Linux 7 Reporter: Kenny Tordeurs <ktordeur>
Component: subscription-managerAssignee: candlepin-bugs
Status: CLOSED ERRATA QA Contact: Red Hat subscription-manager QE Team <rhsm-qe>
Severity: low Docs Contact:
Priority: unspecified    
Version: 7.9CC: ajambhul, dhjoshi, hmatsumo, hyu, jjansky, jreznik, jsefler, pdwyer, redakkan, wpoteat
Target Milestone: rcKeywords: EasyFix, Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: subscription-manager-1.24.45-1.el7_9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-15 11:20:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kenny Tordeurs 2020-10-21 10:53:04 UTC
Description of problem:

subscription-manager is logging errors to messages like:
~~~
Oct 19 12:03:04 hostname com.redhat.SubscriptionManager: Traceback (most recent call last):
Oct 19 12:03:04 hostname com.redhat.SubscriptionManager: File "/usr/lib64/python2.7/site-packages/subscription_manager/scripts/rhsm_d.py", line 189, i
n timeout_cb
Oct 19 12:03:04 hostname com.redhat.SubscriptionManager: if is_rhsm_icon_running():
Oct 19 12:03:04 hostname com.redhat.SubscriptionManager: File "/usr/lib64/python2.7/site-packages/subscription_manager/scripts/rhsm_d.py", line 229, i
n is_rhsm_icon_running
Oct 19 12:03:04 hostname com.redhat.SubscriptionManager: ret = is_process_running('rhsm-icon')
Oct 19 12:03:04 hostname com.redhat.SubscriptionManager: File "/usr/lib64/python2.7/site-packages/subscription_manager/utils.py", line 643, in is_proc
ess_running
Oct 19 12:03:04 hostname com.redhat.SubscriptionManager: for process_name in get_process_names():
Oct 19 12:03:04 hostname com.redhat.SubscriptionManager: File "/usr/lib64/python2.7/site-packages/subscription_manager/utils.py", line 632, in get_pro
cess_names
Oct 19 12:03:04 hostname com.redhat.SubscriptionManager: with open(process_status_file_path) as status:
Oct 19 12:03:04 hostname com.redhat.SubscriptionManager: IOError: [Errno 2] No such file or directory: '/proc/74913/status'
~~~

This only happens from time to time and we are unsure what is the root cause currently.

Version-Release number of selected component (if applicable):
RHEL 7.9
subscription-manager-1.24.42-1.el7.x86_64                   Mon Oct 12 08:44:37 2020
subscription-manager-rhsm-1.24.42-1.el7.x86_64              Mon Oct 12 08:44:37 2020


How reproducible:
Currently we have not been able to reproduce the exact scenario that causes this error, we have however received multiple incoming cases from customers with the exact same behavior.

Steps to Reproduce:
1.
2.
3.

Actual results:
IOError: [Errno 2] No such file or directory: '/proc/<pid>/status'

Expected results:
No errors

Additional info:

Comment 9 John Sefler 2020-12-01 17:04:30 UTC
Thank you Hao Chang Yu and Jiri Hnidek for your help troubleshooting and creating this reproducer...

Reproducing this bug on the following RHEL version with the help of a small bash script (helper-for-1890080.sh) that runs short-lived processes:

[root@ibm-x3650m4-01-vm-12 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.9 (Maipo)
[root@ibm-x3650m4-01-vm-12 ~]# rpm -q --whatprovides /usr/libexec/rhsmd
subscription-manager-1.24.42-1.el7.x86_64


Reproducer:
In one terminal I will run the following script which runs indefinitely printing dots until I kill it...
[root@ibm-x3650m4-01-vm-12 ~]# cat helper-for-1890080.sh 
#!/bin/bash
while true
do
    # Start not blocking sleep process
    sleep 0.1 &
    # Start blocking sleep process
    sleep 0.05
	echo -n '.'
done
[root@ibm-x3650m4-01-vm-12 ~]# chmod +x helper-for-1890080.sh 
[root@ibm-x3650m4-01-vm-12 ~]# ./helper-for-1890080.sh 
.................

In a second terminal I will run rhsmd with a 5 second timeout to produce the traceback with a high (but not 100%) probability...
[root@ibm-x3650m4-01-vm-12 ~]# subscription-manager config --rhsmd.processtimeout=5
[root@ibm-x3650m4-01-vm-12 ~]# 
[root@ibm-x3650m4-01-vm-12 ~]# /usr/libexec/rhsmd --debug
Trying to get processTimeout option from rhsm.conf
New timeout is set according rhsmd.processTimeout: 5
Timeout reached. Checking reason to continue ...
Checking if rhsm-icon process is running
Process rhsm-icon is not running
No reason found. Terminating
[root@ibm-x3650m4-01-vm-12 ~]# /usr/libexec/rhsmd --debug
Trying to get processTimeout option from rhsm.conf
New timeout is set according rhsmd.processTimeout: 5
Timeout reached. Checking reason to continue ...
Checking if rhsm-icon process is running
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/subscription_manager/scripts/rhsm_d.py", line 189, in timeout_cb
    if is_rhsm_icon_running():
  File "/usr/lib64/python2.7/site-packages/subscription_manager/scripts/rhsm_d.py", line 229, in is_rhsm_icon_running
    ret = is_process_running('rhsm-icon')
  File "/usr/lib64/python2.7/site-packages/subscription_manager/utils.py", line 643, in is_process_running
    for process_name in get_process_names():
  File "/usr/lib64/python2.7/site-packages/subscription_manager/utils.py", line 632, in get_process_names
    with open(process_status_file_path) as status:
IOError: [Errno 2] No such file or directory: '/proc/24056/status'

BANG! The traceback has been reproduced on the second attempt.

Comment 10 John Sefler 2020-12-01 17:06:45 UTC
On the same system used to reproduce the traceback in the comment above, I have upgraded the version of subscription-manager and will re-test (while the helper-for-1890080.sh is still running!)...

[root@ibm-x3650m4-01-vm-12 ~]# rpm -q subscription-manager
subscription-manager-1.24.45-1.el7_9.x86_64
[root@ibm-x3650m4-01-vm-12 ~]# rpm -q subscription-manager --changelog | grep 1890080
- 1890080: Handle IOErrors and Exceptions when looking for process names
[root@ibm-x3650m4-01-vm-12 ~]# 
[root@ibm-x3650m4-01-vm-12 ~]# 
[root@ibm-x3650m4-01-vm-12 ~]# /usr/libexec/rhsmd --debug
Trying to get processTimeout option from rhsm.conf
New timeout is set according rhsmd.processTimeout: 5
Timeout reached. Checking reason to continue ...
Checking if rhsm-icon process is running
Process rhsm-icon is not running
No reason found. Terminating
[root@ibm-x3650m4-01-vm-12 ~]# /usr/libexec/rhsmd --debug
Trying to get processTimeout option from rhsm.conf
New timeout is set according rhsmd.processTimeout: 5
Timeout reached. Checking reason to continue ...
Checking if rhsm-icon process is running
Process rhsm-icon is not running
No reason found. Terminating
[root@ibm-x3650m4-01-vm-12 ~]# /usr/libexec/rhsmd --debug
Trying to get processTimeout option from rhsm.conf
New timeout is set according rhsmd.processTimeout: 5
Timeout reached. Checking reason to continue ...
Checking if rhsm-icon process is running
Process rhsm-icon is not running
No reason found. Terminating
[root@ibm-x3650m4-01-vm-12 ~]# 

VERIFIED: The rhsmd tool no longer throws a traceback no matter how many times I try (while helper-for-1890080.sh is still running on the same machine to create short-lived processes).

Comment 11 John Sefler 2020-12-01 17:33:46 UTC
I should append one more comment to the verification comment above...
Although a traceback is no longer encountered by /usr/libexec/rhsmd, you may find the following debug comments logged in rhsm.log indicating that "A process has likely ended before it's status could be read" as shown in the example below.  This new log entry is benign...

[root@ibm-x3650m4-01-vm-12 ~]# subscription-manager config --logging.default_log_level=DEBUG
[root@ibm-x3650m4-01-vm-12 ~]# tail -f /var/log/rhsm/rhsm.log 
2020-12-01 12:24:18,942 [DEBUG] rhsmd:12318:MainThread @https.py:56 - Using standard libs to provide httplib and ssl
2020-12-01 12:24:19,179 [DEBUG] rhsmd:12318:MainThread @rhsm_d.py:109 - rhsmd started
2020-12-01 12:24:19,180 [INFO] rhsmd:12318:MainThread @rhsm_d.py:382 - D-Bus API: com.redhat.SubscriptionManager provided by rhsmd is deprecated
2020-12-01 12:24:19,180 [INFO] rhsmd:12318:MainThread @rhsm_d.py:383 - Consider using D-Bus API: com.redhat.RHSM1 provided by rhsm.service
2020-12-01 12:24:19,184 [DEBUG] rhsmd:12318:MainThread @rhsm_d.py:106 - Trying to get processTimeout option from rhsm.conf
2020-12-01 12:24:19,185 [DEBUG] rhsmd:12318:MainThread @rhsm_d.py:106 - New timeout is set according rhsmd.processTimeout: 5
2020-12-01 12:24:24,803 [DEBUG] rhsmd:12318:MainThread @rhsm_d.py:106 - Timeout reached. Checking reason to continue ...
2020-12-01 12:24:24,803 [DEBUG] rhsmd:12318:MainThread @rhsm_d.py:106 - Checking if rhsm-icon process is running
2020-12-01 12:24:24,819 [DEBUG] rhsmd:12318:MainThread @utils.py:643 - A process has likely ended before it's status could be read for 12547 : [Errno 2] No such file or directory: '/proc/12547/status'
2020-12-01 12:24:24,820 [DEBUG] rhsmd:12318:MainThread @rhsm_d.py:106 - Process rhsm-icon is not running
2020-12-01 12:24:24,820 [DEBUG] rhsmd:12318:MainThread @rhsm_d.py:106 - No reason found. Terminating

Comment 15 errata-xmlrpc 2020-12-15 11:20:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (subscription-manager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5452