Bug 806048

Summary: "dhclient -r" may not terminate running dhclient process; may kill wrong PID
Product: [Fedora] Fedora Reporter: John Florian <john.florian>
Component: dhcpAssignee: Jiri Popelka <jpopelka>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16CC: jpopelka
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-23 12:15:30 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description John Florian 2012-03-22 14:16:58 EDT
Description of problem:
Per dhclient(8):

"""
-r     Release  the  current  lease  and  stop  the  running DHCP client as previously recorded in the PID file.
"""

However, this does not always work.  Generally it seems to work the first time, but if the interface is brought up again and the operation repeated, it seems to fail consistently thereafter.

Version-Release number of selected component (if applicable):
dhcp-4.2.3-6.P2.fc16.src.rpm

How reproducible:
Always, or nearly so, after one successful operation.

Steps to Reproduce:
1. Boot system with a DHCP-managed interface, e.g., eth0.
2. dhclient -r eth0  # probably will succeed; process will disappear
3. ifup eth0
4. dhclient -r eth0  # probably will fail; process will continue running
  
Expected results:
Lease should be freed (and perhaps actually is) AND the running dhclient process that is managing the interface should terminate as stated in the man page.

Additional info:
Upon running "dhclient -r", I've noticed that /var/run/dhclient-eth0.pid will have two PIDs recorded there.  I'm guessing that the 2nd PID is that of the temporary instance having the -r arg.  Perhaps it's trying to terminate the 2nd PID rather than the 1st, which is the long-running instance.  An strace of the process seems to confirm that, more or less.  The kill is going to the PID of the prior "dhclient -r" process from what I can see.  That could be quite bad!  It looks like the bug would be in the fact that "dhclient -r" is recording its PID.
Comment 1 Jiri Popelka 2012-03-23 09:57:12 EDT
(In reply to comment #0)
> Description of problem:
> Per dhclient(8):
> 
> """
> -r     Release  the  current  lease  and  stop  the  running DHCP client as
> previously recorded in the PID file.
> """

Therefore you need to specify the PID file (otherwise it'll use the default /var/run/dhclient.pid).

> Steps to Reproduce:
> 1. Boot system with a DHCP-managed interface, e.g., eth0.

See how it runs the dhclient (ps aux | grep dhclient).
You should see that it uses /var/run/dhclient-eth0.pid file.

> 2. dhclient -r eth0  # probably will succeed; process will disappear

Use dhclient -r -pf /var/run/dhclient-eth0.pid eth0

see also:
grep "dhclient -r" /etc/sysconfig/network-scripts/ifdown-eth

Does it solve your problem ?
Comment 2 John Florian 2012-03-23 11:53:31 EDT
Jiri, you are correct and that does resolve my problem.  I probably should have looked at /etc/sysconfig/network-scripts/ifdown-eth, but when I saw the behavior described with Additional Info, I somehow convinced myself there was a genuine bug.

Something still seems fishy when the -pf option is not used (for the release) because dhclient does indeed still find the /var/run/dhclient-eth0.pid file; the strace clearly showed that.  Nonetheless, you've given me a workable solution so thank you for that.
Comment 3 Jiri Popelka 2012-03-23 12:15:30 EDT
You're welcome.