RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/
Bug 1470975 - Neutron dhcp agent rootwrap config breaks in corner cases
Summary: Neutron dhcp agent rootwrap config breaks in corner cases
Keywords:
Status: CLOSED EOL
Alias: None
Product: RDO
Classification: Community
Component: openstack-neutron
Version: unspecified
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
: trunk
Assignee: Assaf Muller
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-14 07:34 UTC by kalle.happonen
Modified: 2020-08-18 19:24 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-18 19:24:29 UTC
Embargoed:


Attachments (Terms of Use)

Description kalle.happonen 2017-07-14 07:34:56 UTC
Description of problem:

Ok, this is going to be a bit long-winded, but it's a corner case. I did not submit this upstream, since the bug is basically fixed in trunk with the move from neutron-ns-metadata-proxy to haproxy. However it affects current versions, including Newton, Ocata and possibly Pike.

Rootwrap filters are used so that the OpenStack components may issue specific sudo commands. Basically they call "sudo rootwrapper command", and the "rootwrapper" program checks that the command is ok before running it.

Each program has so called "KillFilters", i.e. which programs they are allowed to send which signals to. In the neutron-metadata case, there are KillFilters for  "python" and "python2.7" in /usr/share/neutron/rootwrap/dhcp.filters. Rootwrap filter checks the if the process can be killed by checking where the /proc/<pid>/exe is linked to. Normally it looks something like this for the metadata-proxy
lrwxrwxrwx 1 root root 0 Jul 13 16:15 /proc/16403/exe -> /usr/bin/python2.7
This is ok, since it's "python2.7" which neutron is allowed to kill.

However, if you update the python binary, the RPM first renames the old binary, installs the new, then removes the old. I assume that this is to recover from problems mid-process. What it does to the link is though
lrwxrwxrwx 1 root root 0 Feb 14 23:42 /proc/32206/exe -> /usr/bin/python2.7;591bebf9 (deleted)
Since the binary isn't "python2.7" any more, it misses the first check.

Rootwrapper got a patch to handle this
https://bugs.launchpad.net/oslo.rootwrap/+bug/1482316

First it can remove "(deleted)" from the end of the file. This would work, but since the RPM moves the file first before deleting it, we end up with "/usr/bin/python2.7;591bebf9", not "/usr/bin/python2.7". That's ok, since rootwrapper handles this situation too. If the result doesn't resolve into an existing binary, rootwrapper falls back to checking /proc/<pid>/cmdline which tells how the program was invoked.

Now here's the problem. RDO calls the programs specifically with "python2" instead of "python". This means that the /proc/<pid>/cmdline file starts with "/usr/bin/python2" which is *not* on the allowed kill list.

The neutron-dhcp-agent KillFilter is the only affected filter that I could find, and it will be fixed in newest versions of OpenStack, but currently it's a bug.

Version-Release number of selected component (if applicable):

Tested in Newton

How reproducible:
Difficult

Steps to Reproduce:
1. Make sure the DHCP agent can launch metadata servers with
enable_isolated_metadata = True
in the config file. (should be applicable to l3-routers too, but this has been tested)
2. Create a new network, verify it gets the dhcp agents an a running metadata service
3. The hard part. Upgrade the python package
4. Verify for the pid on the metadata service that it's exe link looks something like this
lrwxrwxrwx 1 root root 0 Feb 14 23:42 /proc/32206/exe -> /usr/bin/python2.7;591bebf9 (deleted)
5. Delete the network

Actual results:
in dhcp-agent.log tons you get messages like this, the metadata agent is not killed.
2017-07-13 08:35:50.478 25972 ERROR neutron.agent.dhcp.agent raise convert_to_error(kind, result)
2017-07-13 08:35:50.478 25972 ERROR neutron.agent.dhcp.agent RemoteError:
2017-07-13 08:35:50.478 25972 ERROR neutron.agent.dhcp.agent ---------------------------------------------------------------------------
2017-07-13 08:35:50.478 25972 ERROR neutron.agent.dhcp.agent Unserializable message: ('#ERROR', ValueError('I/O operation on closed file',))
2017-07-13 08:35:50.478 25972 ERROR neutron.agent.dhcp.agent ---------------------------------------------------------------------------
2017-07-13 08:35:50.478 25972 ERROR neutron.agent.dhcp.agent 

Expected results:
No errors in the logs, the metadata agent is killed.

Additional info:
Suggested fix is just to add python2 to the allowed KillFilters for RDO.

Comment 1 Christopher Brown 2017-08-25 16:10:59 UTC
Hello,

Thanks for the excellent bug report!

I'm not sure what to suggest here, I'd be inclined to manually carry a rootwrap patch for a couple of cycles as the chances of getting a patch implemented upstream as a backport for something that is possibly RDO-specific(?) and is going away anyway is probably slim.

Any suggestions, I'm more than happy to help out trying to get trivial patches merged either upstream or in RDO.

Comment 2 kalle.happonen 2017-08-29 07:42:35 UTC
We're committed to carry our rootwrap patch for now, I mainly wanted to report this since it probably affects other people too.

This has a very trivial patch. I don't see any real security impact of the patch either, so I think it could be implemented in RDO packaging.

Our fix for this is to add a file (the directory needs to be created too)

/etc/neutron/rootwrap.d/dhcp-python2.filter

The etc/neutron/rootwrap.d/ directory should be automatically be included in rootwrap filters by

/etc/neutron/rootwrap.conf

if we're not carrying some patch I have forgotten.

We have the following contents in
/etc/neutron/rootwrap.d/dhcp-python2.filter

"""
[Filters]
# metadata proxy
# RHEL invocation of the metadata proxy will report /usr/bin/python2
kill_metadata2: KillFilter, root, python2, -9
"""

Another option is to directly update
/usr/share/neutron/rootwrap/dhcp.filters
and add the kill_metadata2 filter there in addition to kill_metadata and kill_metadata7. This might be a better option packaging-wise.


I hope this suggestion help? I don't think this would go well to OpenStack upstream, since it's fixed there. I've never made a patch for RDO, but if you point me to the docs I can certainly try.

Kind regards,
Kalle Happonen


Note You need to log in before you can comment on or make changes to this bug.