Bug 1566983

Summary: Polkitd: The utils_spawn_data_free reap timeout subprocess did not work resulting in a large number of zombie processes
Product: [Fedora] Fedora Reporter: Li Ning <lining916740672>
Component: polkitAssignee: Jan Rybar <jrybar>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 28CC: lining916740672
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-28 19:23:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
0001-add-child-reaper-thread-to-fix-zombies
none
0001-polkitd-make-sure-child-process-exits-will-be-proces.patch
none
polkitd-fix-zombie-not-reaped-when-js-spawned-proces.patch none

Description Li Ning 2018-04-13 09:11:30 UTC
Created attachment 1421280 [details]
0001-add-child-reaper-thread-to-fix-zombies

Description of problem:

When subprocess running timeout in rules , the subprocess will be zombie. 

The utils_spawn_data_free reap timeout subprocess did not work , and it result in a large number of zombie processes. utils_spawn_data_free kill SIGTERM to timeout subprocess, and set a child watch source to reap the child, but the child watch source can't work because of the release of it's main_loop and context outside. 

I paste  the key code here.
static void 
js_polkit_spawn()
{
  ...
 out:
  g_strfreev (argv);
  g_free (standard_output);
  g_free (standard_error);
  g_clear_object (&data.res); // triger utils_spawn_data_free  and set child watch source 
  if (loop != NULL)
    g_main_loop_unref (loop); // destroy loop 
  if (context != NULL)
    g_main_context_unref (context);// destroy context
  return ret;
}

When the loop and context being destroy, the child watch source didn't work. subprocess exit and become a zombie.


Version-Release number of selected component (if applicable):
polkitd all version

How reproducible:
100%

Steps to Reproduce:
1.  Add a debug rule , this rule waill run spawn process over 10s and result in a timeout
[root@localhost ~]# cat /etc/polkit-1/rules.d/01-test.rules   
 polkit.addRule(function(action, subject) {             
        polkit.log("debug start")                       
         try {                                          
             polkit.spawn(["/usr/bin/sleep", "15"]);       
         } catch (error) {                              
             //    polkit.log(error)                    
         }                                              
 });                                                    
2. make the rules work
3. 

Actual results:
subprocess become zombie

Expected results:
no zombies

Additional info:

[root@localhost ~]# ps -ef |grep polkit |grep -v polkit
polkitd   1501     1  0 Mar31 ?        00:02:51 /usr/lib/polkit-1/polkitd --no-debug
polkitd   5060  1501  0 12:37 ?        00:00:00 [sleep] <defunct>
polkitd   5367  1501  0 12:38 ?        00:00:00 [sleep] <defunct>
polkitd   5631  1501  0 12:38 ?        00:00:00 [sleep] <defunct>
polkitd   5915  1501  0 12:38 ?        00:00:00 [sleep] <defunct>
polkitd  14052  1501  0 12:42 ?        00:00:00 sleep 15

[root@localhost ~]# journalctl -fu polkit
-- Logs begin at Sat 2018-03-31 14:36:03 CST. --
Apr 03 12:39:11 2-3 polkitd[1501]: /etc/polkit-1/rules.d/01-test.rules:5: Error: Error spawning helper: Timed out after 10 seconds (g-io-error-quark, 24)
Apr 03 12:39:21 2-3 polkitd[1501]: /etc/polkit-1/rules.d/01-test.rules:5: Error: Error spawning helper: Timed out after 10 seconds (g-io-error-quark, 24)
Apr 03 12:40:11 2-3 polkitd[1501]: /etc/polkit-1/rules.d/01-test.rules:5: Error: Error spawning helper: Timed out after 10 seconds (g-io-error-quark, 24)
Apr 03 12:40:21 2-3 polkitd[1501]: /etc/polkit-1/rules.d/01-test.rules:5: Error: Error spawning helper: Timed out after 10 seconds (g-io-error-quark, 24)

Comment 1 Li Ning 2018-04-13 09:12:55 UTC
I made a patch to fix this issue. I have did some test,it can fix.

Comment 2 Li Ning 2018-05-08 08:27:22 UTC
Created attachment 1433062 [details]
0001-polkitd-make-sure-child-process-exits-will-be-proces.patch

I made a better and simpler patch to make sure child process  exits will be processed.

This patch made 3 timeout source.  
The 1st one will send SIGTERM at 10s, 
2nd one will send SIGKILL at 15s, 
last one quit the main loop. 
Once child process exit and child watch source was processed , the main loop quit. Otherwise we quit main loop at 20s.

Timer1: 10s send SIGTERM.
Timer2: 15s send SIGKILL
Timer3: 20s exit the mainloop

0  ~ 10s: child exit normally
10 ~ 15s: child exit by SIGTERM
15 ~ 20s: child exit by SIGKILL
20s ~   : child seems to be abnormal. we quit main loop.

Comment 3 Li Ning 2018-05-10 09:34:00 UTC
Created attachment 1434285 [details]
polkitd-fix-zombie-not-reaped-when-js-spawned-proces.patch

This patch seems to be much better and simpler.

Comment 4 Ben Cotton 2019-05-02 21:11:43 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Ben Cotton 2019-05-28 19:23:55 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.