Bug 1792197 - cockpit update using dnf takes network down
Summary: cockpit update using dnf takes network down
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: cockpit
Version: 31
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Martin Pitt
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-17 09:40 UTC by Milan Votava
Modified: 2020-01-22 05:35 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-19 14:50:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
interrupted ssh session (278.16 KB, image/png)
2020-01-17 09:40 UTC, Milan Votava
no flags Details

Description Milan Votava 2020-01-17 09:40:19 UTC
Created attachment 1652988 [details]
interrupted ssh session

Description of problem:

If connected remotely to a fedora 31 box through ssh and updating system via 
'dnf update', the network is taken down when the list of updated packages contains 'cockpit-ws' package. The system have to rebooted in order to get networking up again. This problem was not present in earlier Fedora versions (30-). 


Version-Release number of selected component (if applicable): 

cockpit-ws-210-1.fc31.x86_64


How reproducible:
log in to fedora 31 remotely and install 'cockpit-ws' package update using 'dnf update' 


Steps to Reproduce:
1. login remotely (ssk)
2. when updated 'cockpit-ws' package is available, execute 'dnf update'
3. wait till the ssh connection is interrupted

Actual results:
networking is down, you can't connect the box through ssh again


Expected results:
networking state should be preserved or restored

Additional info:
there are probably more packages "suffering" from the same problem

Comment 1 Martin Pitt 2020-01-17 11:37:07 UTC
Just about the only thing in cockpit's %post script that could be able to do this is

  # firewalld only partially picks up changes to its services files without this
  test -f /usr/bin/firewall-cmd && firewall-cmd --reload --quiet || true

Do you get the same effect with just calling "firewall-cmd --reload"? If so, then you apparently made some runtime modifications to your firewall state that isn't reflected in its configuration.

Comment 2 Milan Votava 2020-01-17 12:00:08 UTC
I'm definitely NOT doing any runtime modification to firewall state. All firewall settings are persisted using firewald-cmd. Once I have physical access to the affected fedora 31 server (later today) I will log in using console and check the networking / firewall state.

Comment 3 Milan Votava 2020-01-17 15:50:08 UTC
(followup)

I have logged in on the affected server and I tried several commands:

# firewall-cmd --reload

firewalld[1547]: WARNING: ZONE_ALREADY_SET: 'enp6s0f0' already bound to 'external'


# firewall-cmd --complete-reload


I had to restart firewall to "fix" the state of firewall:

# systemctl restart firewalld.service

then machine started to accept packets on relevant ports again...


current firewall config:

# firewall-cmd --get-active-zones
external
  interfaces: enp6s0f0
trusted
  interfaces: enp6s0f1 tun0

# firewall-cmd --info-zone external
external (active)
  target: default
  icmp-block-inversion: no
  interfaces: enp6s0f0
  sources:
  services: cockpit http openvpn plexmediaserver smtp ssh
  ports: 8080/tcp 6036/tcp
  protocols:
  masquerade: yes
  forward-ports: port=8080:proto=tcp:toport=80:toaddr=192.168.1.105
        port=6036:proto=tcp:toport=:toaddr=192.168.1.105
  source-ports:
  icmp-blocks:
  rich rules:

# cat /etc/firewalld/zones/external.xml
<?xml version="1.0" encoding="utf-8"?>
<zone>
  <short>External</short>
  <description>For use on external networks. You do not trust the other computers on networks to not harm your computer. Only selected incoming connections are accepted.</description>
  <interface name="enp6s0f0"/>
  <service name="ssh"/>
  <service name="smtp"/>
  <service name="cockpit"/>
  <service name="http"/>
  <service name="openvpn"/>
  <service name="plexmediaserver"/>
  <port port="8080" protocol="tcp"/>
  <port port="6036" protocol="tcp"/>
  <masquerade/>
  <forward-port port="8080" protocol="tcp" to-port="80" to-addr="192.168.1.105"/>
  <forward-port port="6036" protocol="tcp" to-addr="192.168.1.105"/>
</zone>


I can't see anything wrong with my firewalld config...

Comment 4 Milan Votava 2020-01-17 16:38:09 UTC
more info:

if the firewall config is reloaded completely, then firewall is reconfigured properly and networking is working fine:

# firewall-cmd --reload
Warning: ZONE_ALREADY_SET: 'enp6s0f0' already bound to 'external'
success

firewall state not configured properly
 
# firewall-cmd --complete-reload
Warning: ZONE_ALREADY_SET: 'enp6s0f0' already bound to 'external'
success

firewall state is ok


from firewall-cmd manpage:

       --reload
           Reload firewall rules and keep state information. Current permanent
           configuration will become new runtime configuration, i.e. all
           runtime only changes done until reload are lost with reload if they
           have not been also in permanent configuration.

           Note: Runtime changes applied via the direct interface are not
           affected and will therefore stay in place until firewalld daemon is
           restarted completely.

       --complete-reload
           Reload firewall completely, even netfilter kernel modules. This
           will most likely terminate active connections, because state
           information is lost. This option should only be used in case of
           severe firewall problems. For example if there are state
           information problems that no connection can be established with
           correct firewall rules.

           Note: Runtime changes applied via the direct interface are not
           affected and will therefore stay in place until firewalld daemon is
           restarted completely.

I'm no expert here so I can't tell if --reload or --complete-reload should be used in cockpit-ws's %post scriptlet

Comment 5 Martin Pitt 2020-01-19 13:25:15 UTC
I ssh'ed into a Fedora 31 VM, installed cockpit 210, downgraded (with dnf) to 209, and upgraded back to 210.

Can you try this in isolation?

  dnf install https://kojipkgs.fedoraproject.org//packages/cockpit/209/1.fc31/x86_64/cockpit-ws-209-1.fc31.x86_64.rpm
  dnf update cockpit-ws

Your previous update included a lot of other packages which might cause trouble. Does this reproduce the hang?

As it wasn't the firewall reloading, the only other thing that it does is

  systemctl try-restart cockpit.socket cockpit.service

I wouldn't know how that could kill the network, but let's cover all bases.

After it hangs, can you please ssh in again and grab a recent journal (journalctl --since '10 minutes ago'), and paste it here?

Comment 6 Milan Votava 2020-01-19 14:03:27 UTC
It looks line "my" trouble is with "firewall-cmd --reload" used in post script of cockpit-ws package, not the package itself. If I run "firewall-cmd --reload" alone on my fedora 31 box, I can't connect to it by ssh anymore, all ports (ssh, dhcp, ...) are disabled. The only remedy to login using console and execute "firewall-cmd --complete-reload" or "systemctl restart firewalld.service", then the firewall is restored back to its permanent configuration. Maybe we should close the issue as the cockpit is not the primary cause of the problem...

Comment 7 Milan Votava 2020-01-19 14:50:01 UTC
I found the real problem googling for "Warning: ZONE_ALREADY_SET: 'enp6s0f0' already bound to 'external'":

https://access.redhat.com/solutions/4586771

"firewall-cmd --reload" now works as it should (maybe the command shouldn't print "success" in the first place)

My problem is solved now, I'm closing the bug, thank you for your time Martin ;-)


Note You need to log in before you can comment on or make changes to this bug.