Bug 1188259

Summary: [RFE][nova] Improve dnsmasq --dhcp-script input parsing and error handling
Product: Red Hat Enterprise Linux 7 Reporter: Eduard Barrera <ebarrera>
Component: dnsmasqAssignee: Petr Menšík <pemensik>
Status: CLOSED ERRATA QA Contact: Daniel Rusek <drusek>
Severity: medium Docs Contact: Ioanna Gkioka <igkioka>
Priority: high    
Version: 7.0CC: cww, drusek, ebarrera, ihrachys, pasik, pemensik, psklenar, salmy, srevivo, thozza, vdanek
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: dnsmasq-2.76-3.el7 Doc Type: Enhancement
Doc Text:
Error handling in the output of the *dhcp-script* has been improved Previously, any error in the output of the *dhcp-script* was ignored. With this update the output of the script is logged on the `add`, `old`, `del`, `arp-add`, `arp-del`, `tftp` actions. As a result, errors are displayed while *dnsmasq* is running. Note that the lease-init action happens only at a start of *Dnsmasq*. With this update, only a summary of the output is logged and not the standard error output, which passes to the *systemd* service for logging.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 11:46:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 1298243, 1400961, 1420851, 1472751, 1380362, 1393869, 1465887, 1465928    
Attachments:
Description Flags
proposed fix to dhcp-script error logging
none
talking-script for garbage handling tests
none
updated fix to dhcp-script error logging none

Description Eduard Barrera 2015-02-02 13:16:31 UTC
Description of problem:

dnsmasq is not giving a reasonable log out when it fails to parse the input of -dhcp-script, for example giving that command:

#dnsmasq --strict-order --bind-interfaces --conf-file= --pid-file=/var/lib/nova/networks/nova-br100.pid --listen-address=192.168.12.1 --except-interface=lo --dhcp-range=set:novanetwork,192.168.12.2,static,255.255.252.0,120s --dhcp-lease-max=1024 --dhcp-hostsfile=/var/lib/nova/networks/nova-br100.conf --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro --domain=novalocal
 
and a script giving that output:
2015-01-31 12:59:03.499 12398 DEBUG nova.openstack.common.lockutils [req-8cb351cc-a2fe-4bf6-8801-acaa0a6b424c None None] Got semaphore / lock \"__get_backend\" inner /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py

dnsmasq crash with a broken pipe:

write(1, "2015-01-31 12:59:03.499 12398 DEBUG nova.openstack.common.lockutils [req-8cb351cc-a2fe-4bf6-8801-acaa0a6b424c None None] Got semaphore / lock \"__get_backend\" inner /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py:245\n", 236) = -1 EPIPE (Broken pipe)
12398 --- SIGPIPE (Broken pipe) @ 0 (0) ---

nova-dhcpbridge return that output because it was in debug mode, but this is another bugzilla

but dnsmasq only throws the following entries:

Jan 30 20:36:38 dddm0996a dnsmasq[15337]: lease-init script returned exit code 1
Jan 30 20:36:38 dddm0996a dnsmasq[15337]: FAILED to start up


Version-Release number of selected component (if applicable):
dnsmasq-2.48-14.el6.x86_64 


How reproducible:
not sure


Steps to Reproduce:
1. - create a bash script ( for example /tmp/myscript.sh ) returning the following output:
2015-01-31 12:59:03.499 12398 DEBUG nova.openstack.common.lockutils [req-8cb351cc-a2fe-4bf6-8801-acaa0a6b424c None None] Got semaphore / lock \"__get_backend\" inner /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py

2. - execute dnsmasq with --dhcp-script=/tmp/myscript.sh
#dnsmasq --strict-order --bind-interfaces --conf-file= --pid-file=/var/lib/nova/networks/nova-br100.pid --listen-address=192.168.12.1 --except-interface=lo --dhcp-range=set:novanetwork,192.168.12.2,static,255.255.252.0,120s --dhcp-lease-max=1024 --dhcp-hostsfile=/var/lib/nova/networks/nova-br100.conf --dhcp-script=/tmp/myscript.sh --leasefile-ro --domain=novalocal

Actual results:
Broken pipe if we strace dnsmasq

write(1, "2015-01-31 12:59:03.499 12398 DEBUG nova.openstack.common.lockutils [req-8cb351cc-a2fe-4bf6-8801-acaa0a6b424c None None] Got semaphore / lock \"__get_backend\" inner /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py:245\n", 236) = -1 EPIPE (Broken pipe)
12398 --- SIGPIPE (Broken pipe) @ 0 (0) ---

these log entries in /var/log/messages
Jan 30 20:36:38 dddm0996a dnsmasq[15337]: lease-init script returned exit code 1
Jan 30 20:36:38 dddm0996a dnsmasq[15337]: FAILED to start up

Expected results:
We could have a log entry like 

Jan 30 20:36:38 dddm0996a dnsmasq[15337]: error parsing lease-init output: <some piece of the information parsed>

This small improvement can help us to determine faster what is the real problem

Additional info:

Comment 3 Tomáš Hozza 🤓 2015-02-03 07:59:49 UTC
Reporter: In the future, please file bugs for the platform product. There is no such thing as separate dnsmasq component in RHOS. Every update to dnsmasq has to go through regular RHEL update process.

Comment 5 Tomáš Hozza 🤓 2015-02-03 08:06:11 UTC
Please attach the script /usr/bin/nova-dhcpbridge with which the issue is reproducibe (without setting up OpenStack).

Comment 11 Petr Menšík 2017-03-24 17:54:54 UTC
Created attachment 1266224 [details]
proposed fix to dhcp-script error logging

I have prepared a patch to log output of dhcp-script. It logs all actions but init, that would end only with SIGPIPE before.

Comment 12 Petr Menšík 2017-03-24 18:00:20 UTC
It does not handle possible errors from script with init action. It is started differently and should not prevent script from continuing. It reads stdout of script already, but does not touch stderr. It would require to change parsing of leases from pull approach to asynchronous push approach. It seems to me that part should be easy to debug in dnsmasq debug mode (-d). All other actions are handled well by an implemented pipe.

Comment 16 Petr Menšík 2017-04-25 13:55:04 UTC
Upstream merged both logging of unusual output from dhcp-script for all actions but init. For lease init action, it does handle only stdout of script. It can detect that there is something wrong and print few words what appeared.

It does one new thing however. If there is failure in dhcp-script leases init, it will die. It will not start with empty leases, as it did before. Please let me know if this behaviour if such behaviour is not appreciated.

Comment 17 Petr Menšík 2017-05-02 09:18:46 UTC
Is it ok, if now lease-init action will prevent dnsmasq from starting not only in case of exit status not equal to zero? If format produced by script is not recognized or is no longer supported, it would require manual action from administrator to be able to start again. It silently skipped reading any leases on the first format error before. Are you ok with that?

That will be true only for dhcp-script, leases from plain file will be silently rewritten. It is considered feature required on plain file. Required for skipping unsupported IPv6 leases if IPv6 support was disabled, but I doubt we will ever need it.

Comment 18 Petr Menšík 2017-05-02 09:23:25 UTC
Link to mailing list:

http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2017q2/011417.html

Comment 19 Petr Menšík 2017-05-10 13:15:49 UTC
Accepted upstream fixes (so far):
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=c77fb9d8f09d136fa71bde2469c4fd11cefa6f4a
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=3a8b0f6fccf464b1ec6d24c0e00e540ab2b17705
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=efff74c1aea14757ce074db28e02671c7f7bb5f5

We have agreed with Tomáš Hozza that lease syntax parsing should not be fatal. If script needs to prevent dnsmasq from starting, it has to return nonzero error status. That change is not yet accepted upstream

Comment 20 Petr Menšík 2017-05-10 15:05:21 UTC
Created attachment 1277639 [details]
talking-script for garbage handling tests

I used this script when testing input into dnsmasq. I used libvirt dnsmasq setup from systemctl show libvirtd, but replaced dhcp-script to point to this script. Dnsmasq has to be killed manually before starting it again. Then commented and uncommented some lines in talking-script to produce different output.

Comment 25 Petr Menšík 2017-05-11 11:30:09 UTC
Created attachment 1277832 [details]
updated fix to dhcp-script error logging

Comment 35 Petr Menšík 2017-12-04 11:25:34 UTC
Hello Ioanna,

the package name is dnsmasq. "Masq" comes from word masquerade. I admit the name is strange, but it is the name. While dnsmasq provides dns and dhcp services in a single process, *dhcp* cannot be running. I replaced it with dhcp-script, because it is the script running on every dhcp request. I keep dash, because that is name of the configuration option. Space can be used, but it changes *dhcp script*, not *dhcp*. I removed also format errors. It logs just any error the script will produce. Format error or different kind (resource limits failure, error in script itself) are logged. Otherwise I am ok with that.

Comment 39 errata-xmlrpc 2018-04-10 11:46:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0733