Bug 1549199

Summary: sudo hangs when configuring sudo-io
Product: Red Hat Enterprise Linux 7 Reporter: Renaud Métrich <rmetrich>
Component: sudoAssignee: Radovan Sroka <rsroka>
Status: CLOSED DUPLICATE QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.4CC: jvymazal, mthacker
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1549635 (view as bug list) Environment:
Last Closed: 2018-06-29 12:40:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1549635    
Attachments:
Description Flags
sudo debug output of a working run
none
sudo debug output of a non-working run none

Description Renaud Métrich 2018-02-26 16:46:02 UTC
Description of problem:

When using sudo-io with sudo to log the input ("Defaults         log_input" in /etc/sudoers), "sudo" may hang from time to time.
In such case, the admin needs to "kill -9" the sudo command.
We can see the parent "sudo" be already a zombie while the child isn't yet:

# pstree -a -p 2884
bash,2884
  └─sudo,17948 -n -- systemctl --no-pager list-unit-files
      └─(sudo,17949)

# ps -eaf | grep sudo
root     17948  2884  0 17:34 pts/0    00:00:00 sudo -n -- systemctl --no-pager list-unit-files
root     17949 17948  0 17:34 ?        00:00:00 [sudo] <defunct>
root     17963  2503  0 17:34 pts/2    00:00:00 grep --color=auto sudo


Version-Release number of selected component (if applicable):

sudo-1.8.19p2-11.el7_4.x86_64


How reproducible:

Sometimes


Steps to Reproduce:
1. Enable sudo-io

  # visudo
  (terminal opens)
  Defaults         log_input

2. Run a command in loop using sudo (having output seems to help)

  # while true; do date; sudo -n -- systemctl --no-pager list-unit-files; done


Actual results:

From time to time (1 every 30 attempts ???), the command will hang


Expected results:

No hang


Additional info:

There is likely a race condition within signal handling. I cannot reproduce when stracing the parent shell or sudo command.
To reproduce in my VM, I had to allocate 4 CPUs.

Comment 2 Renaud Métrich 2018-02-27 10:50:06 UTC
EDIT: the child becomes a zombie, while the parent waits forever.
Looks like the SIGCHILD was missed by the parent.

Note: almost 100% reproducible on Fedora 26.

Comment 3 Renaud Métrich 2018-02-27 15:01:56 UTC
Created attachment 1401396 [details]
sudo debug output of a working run

Comment 4 Renaud Métrich 2018-02-27 15:02:23 UTC
Created attachment 1401397 [details]
sudo debug output of a non-working run

Comment 5 Renaud Métrich 2018-02-27 15:05:16 UTC
From the sudo_debug outputs, we can see that the SIGCHILD is caught correctly.
The difference between ok and ko runs seems to be in a write to tty happening, causing the FD to remain alive:

In sudo_debug.ko:

687 sudo[parent] sudo_ev_add_v1: adding event 0x55727331d830 to base 0x5572733176b0, fd 10, events 4
688 sudo[parent] -> sudo_ev_add_impl @ ./event_poll.c:77
689 sudo[parent] <- sudo_ev_add_impl @ ./event_poll.c:118 := 0
690 sudo[parent] <- sudo_ev_add_v1 @ ./event.c:214 := 0
691 sudo[parent] -> sudo_ev_loop_v1 @ ./event.c:281
692 sudo[parent] -> sudo_ev_scan_impl @ ./event_poll.c:142
693 sudo[parent] sudo_ev_scan_impl: 1 fds ready
694 sudo[parent] sudo_ev_scan_impl: polled fd 10, events 4, activating 0x55727331d830
695 sudo[parent] <- sudo_ev_scan_impl @ ./event_poll.c:183 := 1
696 sudo[parent] -> sudo_ev_del_v1 @ ./event.c:220
697 sudo[parent] sudo_ev_del_v1: removing event 0x55727331d830 from base 0x5572733176b0, fd 10, events 4
698 sudo[parent] -> sudo_ev_del_impl @ ./event_poll.c:124
699 sudo[parent] <- sudo_ev_del_impl @ ./event_poll.c:133 := 0
700 sudo[parent] <- sudo_ev_del_v1 @ ./event.c:268 := 0
701 sudo[parent] -> write_callback @ ./exec_pty.c:605
702 sudo[parent] wrote 3258 bytes to fd 10

Not found in sudo_debug.ok:

Comment 6 Radovan Sroka 2018-06-20 12:36:35 UTC
I cannot reproduce the issue can you provide some more reliable reproducer?

Comment 7 Renaud Métrich 2018-06-20 12:53:54 UTC
I reproduced easily on latest RHEL7.5 using the reproducer in Description.

After some iterations, the command shows this:


-------- 8< ---------------- 8< ---------------- 8< --------
...
systemd-readahead-done.timer                  indirect
systemd-tmpfiles-clean.timer                  static  

262 unit files listed.
-------- 8< ---------------- 8< ---------------- 8< --------

Whereas a date should be printed (next iteration).

Sudo is in defunct:

-------- 8< ---------------- 8< ---------------- 8< --------
# ps -eaf | grep sudo
root      3201  2193  0 14:51 pts/0    00:00:00 sudo -n -- systemctl --no-pager list-unit-files
root      3202  3201  0 14:51 ?        00:00:00 [sudo] <defunct>
root      3212  1154  0 14:51 ttyS0    00:00:00 grep --color=auto sudo
-------- 8< ---------------- 8< ---------------- 8< --------

Comment 8 Radovan Sroka 2018-06-29 12:40:50 UTC

*** This bug has been marked as a duplicate of bug 1560657 ***