Bug 2189437

Summary: A wrong state file for nmstate service caused network connection lost
Product: Red Hat Enterprise Linux 9 Reporter: Mingyu Shi <mshi>
Component: nmstateAssignee: Gris Ge <fge>
Status: VERIFIED --- QA Contact: Mingyu Shi <mshi>
Severity: high Docs Contact:
Priority: medium    
Version: 9.3CC: ferferna, jiji, jishi, network-qe, sfaye, till
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nmstate-2.2.11-1.el9 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mingyu Shi 2023-04-25 08:50:52 UTC
Description of problem:
If put a wrong state file in /etc/nmstate and reboot. After rebooting, the default gateway interface is deactivated, the network connection lost.

Version-Release number of selected component (if applicable):
nmstate-2.2.9-1.el9.x86_64
nispor-1.2.10-1.el9.x86_64
NetworkManager-1.43.6-1.el9.x86_64
DISTRO=RHEL-9.3.0-20230423.30

How reproducible:
100%

Steps to Reproduce:
cat << EOF > /etc/nmstate/veth0.yml
dns-resolver: {}
route-rules: {}
routes:
  running:
  - destination: 2620:52:0:49c2::/64
    next-hop-interface: veth0
    next-hop-address: '::'
    metric: 100
    table-id: 254
  - destination: ::/0
    next-hop-interface: veth0
    next-hop-address: fe80::ee3e:f702:1d90:80a1
    metric: 100
    table-id: 254
  - destination: 0.0.0.0/0
    next-hop-interface: veth0
    next-hop-address: 10.73.195.254
    metric: 100
    table-id: 254
interfaces:
- name: veth0
  type: veth
  state: up
  veth:
    peer: veth0_p
  mac-address: 14:18:77:6D:51:08
  mtu: 8500
  min-mtu: 60
  max-mtu: 9000
  wait-ip: any
  ipv4:
    enabled: true
    dhcp: true
    address:
    - ip: 192.168.199.1
      prefix-length: 23
    auto-dns: true
    auto-gateway: true
    auto-routes: true
    auto-route-table-id: 0
  ipv6:
    enabled: true
    dhcp: true
    autoconf: true
    address:
    - ip: 2620:52:0:49c2:1618:77ff:fe6d:5107
      prefix-length: 64
    - ip: fe80::1618:77ff:fe6d:5107
      prefix-length: 64
    auto-dns: true
    auto-gateway: true
    auto-routes: true
    auto-route-table-id: 0
    addr-gen-mode: eui64
  mptcp:
    address-flags: []
  accept-all-mac-addresses: false
  lldp:
    enabled: false
EOF

systemctl enable nmstate
reboot

# Minutes later, I failed to login the server via ssh. Logged in via console, I saw only `lo` interface was active in `nmcli con`

Actual results:
The network is lost(in this example, default gateway interface `eno1` is deactivated).

Expected results:
The failure of applying /etc/nmstate/veth0.yml shouldn't impact the rest of the network.

Additional info:
This is the correct state before rebooting:
# nmcli c
NAME      UUID                                  TYPE      DEVICE 
eno1      79a38ec3-db52-49cd-b172-85fc7f59cff9  ethernet  eno1   
lo        52b1f734-246f-4032-8fcf-8c017ee65949  loopback  lo     
eno2      5ca4cb28-18bd-4493-9390-906024a2a123  ethernet  --     
eno3      1b5661b6-b42e-4f56-b816-20e7f4349c5e  ethernet  --     
eno4      d1f9d3af-87f1-468d-af04-c9347e61d367  ethernet  --     
enp6s0f0  c486a668-2d7d-4c17-8ad3-342d82c84483  ethernet  --     
enp6s0f1  82b43fd1-01a5-4e11-9f37-55ccdfc6da6c  ethernet  --     
   
# nms show eno1
dns-resolver: {}
route-rules: {}
routes:
  running:
  - destination: 2620:52:0:49c2::/64
    next-hop-interface: eno1
    next-hop-address: '::'
    metric: 100
    table-id: 254
  - destination: ::/0
    next-hop-interface: eno1
    next-hop-address: fe80::ee3e:f702:1d90:80a1
    metric: 100
    table-id: 254
  - destination: 0.0.0.0/0
    next-hop-interface: eno1
    next-hop-address: 10.73.195.254
    metric: 100
    table-id: 254
interfaces:
- name: eno1
  type: ethernet
  state: up
  mac-address: 14:18:77:6D:51:07
  mtu: 1500
  min-mtu: 60
  max-mtu: 9000
  wait-ip: any
  ipv4:
    enabled: true
    dhcp: true
    address:
    - ip: 10.73.194.142
      prefix-length: 23
    auto-dns: true
    auto-gateway: true
    auto-routes: true
    auto-route-table-id: 0
  ipv6:
    enabled: true
    dhcp: true
    autoconf: true
    address:
    - ip: 2620:52:0:49c2:1618:77ff:fe6d:5107
      prefix-length: 64
    - ip: fe80::1618:77ff:fe6d:5107
      prefix-length: 64
    auto-dns: true
    auto-gateway: true
    auto-routes: true
    auto-route-table-id: 0
    addr-gen-mode: eui64
  mptcp:
    address-flags: []
  accept-all-mac-addresses: false
  lldp:
    enabled: false
  ethtool:
    pause:
      rx: true
      tx: true
      autoneg: true
    feature:
      tx-tcp-mangleid-segmentation: false
      tx-nocache-copy: false
      tx-tcp6-segmentation: true
      rx-gro-list: false
      rx-udp-gro-forwarding: false
      highdma: true
      tx-checksum-ipv4: true
      rx-gro: true
      tx-generic-segmentation: true
      tx-tcp-ecn-segmentation: true
      rx-checksum: true
      tx-tcp-segmentation: true
      tx-checksum-ipv6: true
    coalesce:
      rx-frames: 5
      rx-usecs: 20
      rx-usecs-irq: 0
      stats-block-usecs: 0
      tx-frames: 53
      tx-frames-irq: 5
      tx-usecs: 72
      tx-usecs-irq: 0
    ring:
      rx: 200
      rx-max: 2047
      tx: 511
      tx-max: 511
  ethernet:
    auto-negotiation: true
    speed: 1000
    duplex: full

Comment 1 Mingyu Shi 2023-04-25 08:52:09 UTC
journalctl -u nmstate.service:

Apr 24 16:58:54 localhost.localdomain systemd[1]: Starting Apply nmstate on-disk state...
Apr 24 16:58:56 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:56Z INFO  nmstate::query_apply::net_state] Created checkpoint /org/freedesktop/NetworkManager/Checkpoint/1
Apr 24 16:58:56 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:56Z WARN  nmstate::ip] Static addresses [InterfaceIpAddr { ip: 192.168.199.1, prefix_length: 23, mptcp_flags: None }] are ignored when dynamic IP is enabled
Apr 24 16:58:56 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:56Z WARN  nmstate::ip] Static addresses [InterfaceIpAddr { ip: 2620:52:0:49c2:1618:77ff:fe6d:5107, prefix_length: 64, mptcp_flags: None }, InterfaceIpAddr { ip: fe80::1618:77ff:fe6d:5107, prefix_length: 64, mptcp_flags: None }] are ignored when dynamic IP is enabled
Apr 24 16:58:56 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:56Z INFO  nmstate::nm::settings::connection] Creating veth peer profile veth0_p for veth0
Apr 24 16:58:56 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:56Z INFO  nmstate::nm::query_apply::profile] Creating connection UUID Some("2790387d-3c0d-4249-b572-45dc0747a821"), ID Some("veth0"), type Some("veth") name Some("veth0")
Apr 24 16:58:56 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:56Z INFO  nmstate::nm::query_apply::profile] Creating connection UUID Some("1b538c0c-8bb0-4a08-bf0d-07d901ee6c27"), ID Some("veth0_p"), type Some("veth") name Some("veth0_p")
Apr 24 16:58:56 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:56Z INFO  nmstate::nm::query_apply::profile] Activating connection 2790387d-3c0d-4249-b572-45dc0747a821: veth0/veth
Apr 24 16:58:56 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:56Z INFO  nmstate::nm::query_apply::profile] Activating connection 1b538c0c-8bb0-4a08-bf0d-07d901ee6c27: veth0_p/veth
Apr 24 16:58:56 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:56Z INFO  nmstate::nm::query_apply::profile] Got activation failure Bug: Manager(UnknownConnection): Connection 'veth0_p' is not available on device veth0_p because device is strictly unmanaged
Apr 24 16:58:56 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:56Z INFO  nmstate::nm::query_apply::profile] Will retry activation 2 seconds
Apr 24 16:58:58 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:58Z INFO  nmstate::nm::query_apply::profile] Activating connection 1b538c0c-8bb0-4a08-bf0d-07d901ee6c27: veth0_p/veth
Apr 24 16:58:59 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:58:59Z INFO  nmstate::query_apply::net_state] Retrying on: VerificationError: Verification failure: veth0.interface.mptcp desire '{"address-flags":[]}', current 'null'
Apr 24 16:59:00 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:59:00Z INFO  nmstate::query_apply::net_state] Retrying on: VerificationError: Verification failure: veth0.interface.mptcp desire '{"address-flags":[]}', current 'null'
Apr 24 16:59:01 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:59:01Z INFO  nmstate::query_apply::net_state] Retrying on: VerificationError: Verification failure: veth0.interface.mptcp desire '{"address-flags":[]}', current 'null'
Apr 24 16:59:02 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:02Z INFO  nmstate::query_apply::net_state] Retrying on: VerificationError: Verification failure: veth0.interface.mptcp desire '{"address-flags":[]}', current 'null'
Apr 24 16:59:03 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:03Z INFO  nmstate::query_apply::net_state] Retrying on: VerificationError: Verification failure: veth0.interface.mptcp desire '{"address-flags":[]}', current 'null'
Apr 24 16:59:05 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:05Z INFO  nmstate::nm::settings::connection] Creating veth peer profile veth0_p for veth0
Apr 24 16:59:05 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:05Z INFO  nmstate::nm::query_apply::profile] Deactivating connection 2790387d-3c0d-4249-b572-45dc0747a821: veth0/veth
Apr 24 16:59:05 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:05Z INFO  nmstate::nm::query_apply::profile] Modifying connection UUID Some("2790387d-3c0d-4249-b572-45dc0747a821"), ID Some("veth0"), type Some("veth") name Some("veth0")
Apr 24 16:59:05 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:05Z INFO  nmstate::nm::query_apply::profile] Modifying connection UUID Some("1b538c0c-8bb0-4a08-bf0d-07d901ee6c27"), ID Some("veth0_p"), type Some("veth") name Some("veth0_p")
Apr 24 16:59:05 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:05Z INFO  nmstate::nm::query_apply::profile] Activating connection 2790387d-3c0d-4249-b572-45dc0747a821: veth0/veth
Apr 24 16:59:05 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:05Z INFO  nmstate::nm::query_apply::profile] Activating connection 1b538c0c-8bb0-4a08-bf0d-07d901ee6c27: veth0_p/veth
Apr 24 16:59:05 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:05Z INFO  nmstate::nm::query_apply::profile] Got activation failure Bug: Manager(UnknownConnection): Connection 'veth0_p' is not available on device veth0_p because device is strictly unmanaged
Apr 24 16:59:05 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:05Z INFO  nmstate::nm::query_apply::profile] Will retry activation 2 seconds
Apr 24 16:59:07 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:07Z INFO  nmstate::nm::query_apply::profile] Activating connection 1b538c0c-8bb0-4a08-bf0d-07d901ee6c27: veth0_p/veth
Apr 24 16:59:07 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:07Z INFO  nmstate::query_apply::net_state] Retrying on: VerificationError: Verification failure: veth0.interface.mptcp desire '{"address-flags":[]}', current 'null'
Apr 24 16:59:08 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:08Z INFO  nmstate::query_apply::net_state] Retrying on: VerificationError: Verification failure: veth0.interface.mptcp desire '{"address-flags":[]}', current 'null'
Apr 24 16:59:09 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:09Z INFO  nmstate::query_apply::net_state] Retrying on: VerificationError: Verification failure: veth0.interface.mptcp desire '{"address-flags":[]}', current 'null'
Apr 24 16:59:10 dell-per730-20.rhts.eng.pek2.redhat.com nmstatectl[1468]: [2023-04-24T08:59:10Z INFO  nmstate::query_apply::net_state] Retrying on: VerificationError: Verification failure: veth0.interface.mptcp desire '{"address-flags":[]}', current 'null'
Apr 24 16:59:11 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:59:11Z INFO  nmstate::query_apply::net_state] Rollbacked to checkpoint /org/freedesktop/NetworkManager/Checkpoint/1
Apr 24 16:59:11 localhost.localdomain nmstatectl[1468]: [2023-04-24T08:59:11Z ERROR nmstatectl::service] Failed to apply state file /etc/nmstate/veth0.yml: NmstateError: VerificationError: Verification failure: veth0.interface.mptcp desire '{"address-flags":[]}', current 'null'
Apr 24 16:59:11 localhost.localdomain systemd[1]: Finished Apply nmstate on-disk state.

Comment 2 Gris Ge 2023-05-15 12:09:35 UTC
Hi Mingyu,

I tried in my VM using nmstate-2.2.10-3.el9 , works well even with incorrect YML file.

Can you try again?

Comment 3 Mingyu Shi 2023-05-23 06:20:04 UTC
Looks good now:
nmstate-2.2.10-3.el9.x86_64
nispor-1.2.10-1.el9.x86_64
NetworkManager-1.43.8-1.el9.x86_64

BTW, the incorrect state in #comment0 can be applied successfully now, so I used another incorrect state(a veth without peer) instead:
interfaces:
- name: veth10
  type: veth
  state: up

The rollback has not cause any other issue.

Comment 6 Mingyu Shi 2023-06-01 08:42:29 UTC
Verified with:
nmstate-2.2.11-1.el9.x86_64
nispor-1.2.10-1.el9.x86_64
NetworkManager-1.43.8-1.el9.x86_64