Bug 1411407 - Automated installation hangs at times - patch submitted
Summary: Automated installation hangs at times - patch submitted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: anaconda
Version: 7.3
Hardware: x86_64
OS: All
unspecified
high
Target Milestone: rc
: ---
Assignee: Anaconda Maintenance Team
QA Contact: Release Test Team
URL:
Whiteboard:
Depends On:
Blocks: 1420851
TreeView+ depends on / blocked
 
Reported: 2017-01-09 16:13 UTC by Daniele
Modified: 2021-03-11 14:53 UTC (History)
10 users (show)

Fixed In Version: anaconda-21.48.22.109-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 08:53:21 UTC
Target Upstream Version:


Attachments (Terms of Use)
script to help reproducing (1013 bytes, text/plain)
2017-01-09 16:13 UTC, Daniele
no flags Details
backtrace of subprocess (44.93 KB, text/plain)
2017-03-24 12:17 UTC, Radek Vykydal
no flags Details
backtrace of hanging anaconda process (210.78 KB, text/plain)
2017-03-24 12:17 UTC, Radek Vykydal
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2293 0 normal SHIPPED_LIVE anaconda bug fix and enhancement update 2017-08-01 12:39:44 UTC

Description Daniele 2017-01-09 16:13:59 UTC
Created attachment 1238818 [details]
script to help reproducing

Description of problem:
At random, some unattended installations were hanging. 
The customer has done a lot of work and found a possible (probable) root cause, a way to reproduce and a possible patch. They'd like to know our take on it and a plan for its inclusion.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
[Customer comment:]

Based on collected backtraces from GDB and Anaconda source code I wrote a minimal working example which demonstrates a bug in Python.
Could you confirm me that you can reproduce it in your setup?
reproduction:
pkill -9 anaconda
copy mwe.py from attachment to VM
execute:
while true; do python mwe.py ; done
after few iteration script should be hanged

With little modifications I can reproduce it with Python 2.7.12 from the latest Fedora.
I also wrote and tested workaround for anaconda.

Add after line 180, before 'return subprocess.Popen(argv,' in util.py following lines:
    from distutils import spawn
    argv[0] = spawn.find_executable(argv[0])
	
With this patch anaconda doesn't hang in my tests.

Actual results:
Installation does not complete in certain scenarios

Expected results:
Installation always completes

Additional info:
The step the customer has taken before getting to the patch:

I found reproduction scenario and possible root cause of the issue.
IMO root cause of the issue is implementation of subprocess.Popen, it is not safe in multithreaded programs. 
Python should not call strerror() after fork(), because another thread can hold lock to __libc_setlocale_lock when fork() is called. 

AnaStorageThread uses pyudev module, which calls __wcsmbs_load_conv which locks  __libc_setlocale_lock( please look at attached backtrace) 
After that context is switched back to the main thread.
Main thread creates AnaTimeInit, this thread calls subprocess.Popen which calls fork().
Child process has copy of __libc_setlocale_lock which is locked, python tries to execve() which fails, after that calls strerror() - it hangs waiting for __libc_setlocale_lock.
__libc_setlocale_lock never will be unlocked because AnaStorageThread and child of AnaTimeInit are separated processes.
Please look at attached ilustration.

Steps to reproduce:
1. Create a VM with only 1vCPU - I didn't test with more than one vCPU
2. Start installation in text mode with enabled sshd 
3. pkill -9 anaconda 
4. edit /sbin/anaconda
 a) comment line 1271
#    atexit.register(exitHandler, ksdata.reboot, anaconda.storage, anaconda.payload)
 b) comment lines 1314, 1315
#    anaconda._intf.setup(ksdata)
#    anaconda._intf.run()
 c) add line 1316
    threadMgr.wait_all()
5. copy and install debuginfo packages
anaconda-debuginfo-21.48.22.93-1.el7.x86_64.rpm
glib2-debuginfo-2.46.2-4.el7.x86_64.rpm
glibc-debuginfo-2.17-157.el7.x86_64.rpm
glibc-debuginfo-common-2.17-157.el7.x86_64.rpm
gobject-introspection-debuginfo-1.42.0-1.el7.x86_64.rpm
pygobject2-debuginfo-2.28.6-11.el7.x86_64.rpm
python-debuginfo-2.7.5-48.el7.x86_64.rpm

6. copy ks.cfg from attachment to /root/ks.cfg
7. copy file anadbg from attachment to /root/anadbg
8. run gdb -P /root/anadbg
after few minutes this command should fail or hang, re-execute it unitl hangs. On my setup 4 of 10 attempts hangs
Backtrace of hanged process is the same as backtrace of previously captured processes.

Comment 15 Radek Vykydal 2017-03-24 12:17:00 UTC
Created attachment 1266040 [details]
backtrace of subprocess

I seem to just have hit a similar issue (strerror called from Popen). Now the program to exec is multipath, located in /usr/sbin/multipath while the env path starts with /usr/bin.

http://download.eng.brq.redhat.com/pub/rhel/nightly/RHEL-7.4-20170323.n.0/compose/Server/x86_64/os/
python-2.7.5-54.el7.x86_64.rpm.

I'll attach also backtrace of anaconda process stuck in read() from the subprocess.

Comment 16 Radek Vykydal 2017-03-24 12:17:58 UTC
Created attachment 1266042 [details]
backtrace of hanging anaconda process

Comment 26 errata-xmlrpc 2017-08-01 08:53:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2293


Note You need to log in before you can comment on or make changes to this bug.