Cause: We were incorrectly removing VNID allow rules before they were really unused. It appears that when containers had startup errors it can cause the tracking to get out of sync.
Consequence: The rules that allowed communication for a namespace were removed early, so that if there were still pod in that namespace on the node, they could not communicate with one another.
Fix: Change the way that the tracking is done so that we avoid the nasty edge cases around pod creation / deletion failures.
Result: The VNID tracking does not fail so traffic flows.