Bug 1803920

Summary: ovs-appctl fails "read-only" operations if it doesn't have write access to ovs-vswitchd pid file
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Dan Winship <danw>
Component: openvswitchAssignee: Timothy Redaelli <tredaelli>
openvswitch sub component: daemons and tools QA Contact: qding
Status: ASSIGNED --- Docs Contact:
Severity: low    
Priority: low CC: ctrautma, jhsiao, surya
Version: FDP 20.A   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Winship 2020-02-17 17:03:58 UTC
sh-4.2# ovs-appctl ofproto/trace br0 in_port=32,tcp6,ipv6_src=fd01:0:0:2::1f,ipv6_dst=fd01:0:0:2::1e
2020-02-17T17:01:37Z|00001|daemon_unix|WARN|/var/run/openvswitch/ovs-vswitchd.pid: open: Read-only file system
ovs-appctl: cannot read pidfile "/var/run/openvswitch/ovs-vswitchd.pid" (Read-only file system)


If you manually specify the ctl file with "-t /var/run/openvswitch/ovs-vswitchd.4034.ctl", it works fine.

The problem seems to be that ovs-appctl is using a helper function to read the pid file that is also used by code that needs read/write access to the pid file...

Comment 1 Timothy Redaelli 2020-03-10 17:59:15 UTC
Hi,

I have a patch that I'd like to send upstream, but I'd like to known when it's useful,
so I can put it inside the commit message or I know what to reply if they'll asks me something.

I mean, .ctl file and .pid file are in the same path and with the same user,
so if you have write access to .ctl file you should also have write access to the pid file.

Can you please explain better?

Thank you

Comment 2 Dan Winship 2020-03-10 19:01:47 UTC
I don't have write access to the .ctl file; you don't need write access to a socket to be able to connect to it.

You can reproduce yourself by doing something like:

    sudo podman run -it --privileged --mount type=bind,src=/var/run/openvswitch,dst=/var/run/openvswitch,ro=true fedora:31 /bin/bash

then inside the container:

    # dnf install -y openvswitch
    ...

    # ovs-appctl ofproto/trace br0 in_port=32,tcp6,ipv6_src=fd01:0:0:2::1f,ipv6_dst=fd01:0:0:2::1e
    ovs-appctl: cannot read pidfile "/var/run/openvswitch/ovs-vswitchd.pid" (Read-only file system)

    # ovs-appctl -t /var/run/openvswitch/ovs-vswitchd.$(cat /var/run/openvswitch/ovs-vswitchd.pid).ctl ofproto/trace br0 in_port=32,tcp6,ipv6_src=fd01:0:0:2::1f,ipv6_dst=fd01:0:0:2::1e
    br0: unknown bridge
    ovs-appctl: /var/run/openvswitch/ovs-vswitchd.1159105.ctl: server returned an error

(which is expected since I don't actually have a br0, but it shows that it was actually talking to vswitchd)

Comment 3 Dan Winship 2020-03-10 19:07:50 UTC
Oh, as for "when it's useful"; in OpenShift, the pod running openshift-sdn needs to be able to make OVS-related calls, so it mounts the host's /var/run/openvswitch into its pod. But just as a general "minimum privilege" sort of thing, it mounts it read-only, since it only needs read access.

openshift-sdn never calls ovs-appctl, so it doesn't actually hit this bug itself. I was just trying to debug a problem on the node, and was doing it from the context of the openshift-sdn pod, and was surprised to discover that certain commands that I would have expected to work actually didn't work.

Comment 4 Dan Winship 2020-05-08 13:07:25 UTC
Related problem: in ovn-kubernetes, where we do use ovs-appctl/ovn-appctl, I discovered a bunch of functions like this:

	pid, err := ioutil.ReadFile(runner.ovnRunDir + "ovn-northd.pid")
	if err != nil {
		return "", "", fmt.Errorf("failed to run the command since failed to get ovn-northd's pid: %v", err)
	}

	cmdArgs = []string{
		"-t",
		runner.ovnRunDir + fmt.Sprintf("ovn-northd.%s.ctl", strings.TrimSpace(string(pid))),
	}
	cmdArgs = append(cmdArgs, args...)

I pointed out that this was silly; you should be able to just say "ovn-appctl -t ovn-northd", and ovn-appctl will read the pid file itself and find the correct control socket. But in fact, that only works if you're in the same PID namespace as the process you're trying to connect to. If you're in a different PID namespace, then read_pidfile() screws things up:

    $ ovn-appctl -t ovn-controller list-commands
    2020-05-06T20:42:07Z|00001|daemon_unix|WARN|/var/run/ovn/ovn-controller.pid: stale pidfile for pid 60
 being deleted by pid 0
    ovn-appctl: cannot read pidfile "/var/run/ovn/ovn-controller.pid" (No such process)

If it just read the pidfile and used the pid to find the right socket file, then everything would work. But because it tries to do irrelevant "clever" things, it screws everything up.

Comment 5 Surya Seetharaman 2021-08-04 10:25:10 UTC
I saw the same errors:

sh-4.4# ovn-appctl -t ovnsb_db vlog/set dbg
2021-08-04T10:24:14Z|00001|daemon_unix|WARN|/var/run/ovn/ovnsb_db.pid: stale pidfile for pid 80
 being deleted by pid 0
ovn-appctl: cannot read pidfile "/var/run/ovn/ovnsb_db.pid" (No such process)

while it clearly exists.