Description of problem: Currently must-gather performs the following tasks to pick up the OVN or vswitchd database: - store the size of the .db file - compact the .db - store the size of the .db file - copy the .db file This has some problems: - copying the .db file might pick partial data - even after compacting, it's not guaranteed that there are no transactions (which could happen after compacting and before copying the file). This makes it difficult for parsers other than ovsdb-server, e.g: insights. - compacting the db while ovsdb-server is running can cause db corruption The way I see it, I think we would achieve the same result by using "ovsdb-client backup" or even "ovsdb-client dump". Raising this BZ to understand if there is a strong reason why "compact" + "cp" was added in the first place that I might be missing before I send a PR to use "backup".
OK, I thought it was using ovsdb-tool to compact, but it's using ovsdb-appctl, so at least we don't have risk of corruption.
Sending to ovn team who owns this bit.
There is no special reason to use "compact" + "cp" and we agree that "ovsdb-client backup" will solve some of the issues with copying. The only downside is it will leave out ephemeral columns https://github.com/ovn-org/ovn/blob/master/ovn-nb.ovsschema#L467 and you may want to leave some comments on it.
I created a PR https://github.com/openshift/must-gather/pull/245 for this bug, please take a look
Thanks Nadia, will follow the discussion on your PR
Verified on 4.9.0-0.ci-2021-07-16-112407 [must-gather-fhkq5] POD 2021-07-16T21:24:17.494562344Z + for OVNKUBE_MASTER_POD in ${OVNKUBE_MASTER_PODS[@]} [must-gather-fhkq5] POD 2021-07-16T21:24:17.494673097Z + oc cp openshift-ovn-kubernetes/ovnkube-master-c78z2:/etc/ovn/ovnnb_db.db -c nbdb must-gather/network_logs/ovnkube-master-c78z2_nbdb [must-gather-fhkq5] POD 2021-07-16T21:24:17.495197064Z + oc -n openshift-ovn-kubernetes exec -c ovnkube-master ovnkube-master-c78z2 -- bash -c 'ovn-nbctl --db=ssl:10.0.223.231:9641,ssl:10.0.133.148:9641,ssl:10.0.163.124:9641 -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt list Logical_Switch_Port' [must-gather-fhkq5] POD 2021-07-16T21:24:17.496050302Z + oc -n openshift-ovn-kubernetes exec -c ovnkube-master ovnkube-master-c78z2 -- bash -c 'ovn-nbctl --db=ssl:10.0.223.231:9641,ssl:10.0.133.148:9641,ssl:10.0.163.124:9641 -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt list Load_Balancer' [must-gather-fhkq5] POD 2021-07-16T21:24:17.496763670Z + oc -n openshift-ovn-kubernetes exec -c ovnkube-master ovnkube-master-c78z2 -- bash -c 'ovn-sbctl --db=ssl:10.0.223.231:9642,ssl:10.0.133.148:9642,ssl:10.0.163.124:9642 -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt show' [must-gather-fhkq5] POD 2021-07-16T21:24:18.053535936Z tar: Removing leading `/' from member names [must-gather-fhkq5] POD 2021-07-16T21:24:18.071258828Z + oc cp openshift-ovn-kubernetes/ovnkube-master-c78z2:/etc/ovn/ovnsb_db.db -c sbdb must-gather/network_logs/ovnkube-master-c78z2_sbdb [must-gather-fhkq5] POD 2021-07-16T21:24:18.382628779Z tar: Removing leading `/' from member names [must-gather-fhkq5] POD 2021-07-16T21:24:18.406525928Z + PIDS+=($!) [must-gather-fhkq5] POD 2021-07-16T21:24:18.406583250Z + gzip must-gather/network_logs/ovnkube-master-c78z2_nbdb [must-gather-fhkq5] POD 2021-07-16T21:24:18.409488930Z + oc exec -n openshift-ovn-kubernetes ovnkube-master-c78z2 -c sbdb -- bash -c 'ps -eo nlwp' [must-gather-fhkq5] POD 2021-07-16T21:24:18.409981461Z + PIDS+=($!) [must-gather-fhkq5] POD 2021-07-16T21:24:18.410068395Z + for OVNKUBE_MASTER_POD in ${OVNKUBE_MASTER_PODS[@]} [must-gather-fhkq5] POD 2021-07-16T21:24:18.410143784Z + oc cp openshift-ovn-kubernetes/ovnkube-master-f5s48:/etc/ovn/ovnnb_db.db -c nbdb must-gather/network_logs/ovnkube-master-f5s48_nbdb [must-gather-fhkq5] POD 2021-07-16T21:24:18.411006709Z + oc exec -n openshift-ovn-kubernetes ovnkube-master-c78z2 -c sbdb -- bash -c 'cat /proc/sys/kernel/threads-max' [must-gather-fhkq5] POD 2021-07-16T21:24:18.411006709Z + oc exec -n openshift-ovn-kubernetes ovnkube-master-c78z2 -c nbdb -- bash -c 'ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound' [must-gather-fhkq5] POD 2021-07-16T21:24:18.428092560Z + gzip must-gather/network_logs/ovnkube-master-c78z2_sbdb [must-gather-fhkq5] POD 2021-07-16T21:24:18.428452496Z + oc exec -n openshift-ovn-kubernetes ovnkube-master-c78z2 -c sbdb -- bash -c 'ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound' [must-gather-fhkq5] POD 2021-07-16T21:24:18.688684104Z tar: Removing leading `/' from member names [must-gather-fhkq5] POD 2021-07-16T21:24:18.701251236Z + oc cp openshift-ovn-kubernetes/ovnkube-master-f5s48:/etc/ovn/ovnsb_db.db -c sbdb must-gather/network_logs/ovnkube-master-f5s48_sbdb [must-gather-fhkq5] POD 2021-07-16T21:24:19.215677683Z tar: Removing leading `/' from member names [must-gather-fhkq5] POD 2021-07-16T21:24:19.233398062Z + PIDS+=($!) [must-gather-fhkq5] POD 2021-07-16T21:24:19.233484741Z + gzip must-gather/network_logs/ovnkube-master-f5s48_nbdb [must-gather-fhkq5] POD 2021-07-16T21:24:19.235252831Z + PIDS+=($!) [must-gather-fhkq5] POD 2021-07-16T21:24:19.235252831Z + oc exec -n openshift-ovn-kubernetes ovnkube-master-f5s48 -c nbdb -- bash -c 'ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound' [must-gather-fhkq5] POD 2021-07-16T21:24:19.235252831Z + PIDS+=($!) [must-gather-fhkq5] POD 2021-07-16T21:24:19.235252831Z + oc exec -n openshift-ovn-kubernetes ovnkube-master-f5s48 -c sbdb -- bash -c 'ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound' [must-gather-fhkq5] POD 2021-07-16T21:24:19.236019961Z + PIDS+=($!) [must-gather-fhkq5] POD 2021-07-16T21:24:19.236019961Z + for OVNKUBE_MASTER_POD in ${OVNKUBE_MASTER_PODS[@]} [must-gather-fhkq5] POD 2021-07-16T21:24:19.236019961Z + oc cp openshift-ovn-kubernetes/ovnkube-master-xcwts:/etc/ovn/ovnnb_db.db -c nbdb must-gather/network_logs/ovnkube-master-xcwts_nbdb [must-gather-fhkq5] POD 2021-07-16T21:24:19.236755881Z + gzip must-gather/network_logs/ovnkube-master-f5s48_sbdb [must-gather-fhkq5] POD 2021-07-16T21:24:19.237195971Z + oc exec -n openshift-ovn-kubernetes ovnkube-master-f5s48 -c sbdb -- bash -c 'cat /proc/sys/kernel/threads-max' [must-gather-fhkq5] POD 2021-07-16T21:24:19.241021634Z + oc exec -n openshift-ovn-kubernetes ovnkube-master-f5s48 -c sbdb -- bash -c 'ps -eo nlwp' [must-gather-fhkq5] POD 2021-07-16T21:24:19.614633161Z tar: Removing leading `/' from member names [must-gather-fhkq5] POD 2021-07-16T21:24:19.631991798Z + oc cp openshift-ovn-kubernetes/ovnkube-master-xcwts:/etc/ovn/ovnsb_db.db -c sbdb must-gather/network_logs/ovnkube-master-xcwts_sbdb [must-gather-fhkq5] POD 2021-07-16T21:24:19.823091605Z tar: Removing leading `/' from member names [must-gather-fhkq5] POD 2021-07-16T21:24:19.846578224Z + PIDS+=($!)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759