It is important to know that the separate placement is not smart -- it doesn't attempt to do any balancing and if it cannot find any completely empty physical card then it switches to the compact placement. For example, if there are two physical cards then the first vGPU is placed on one of them, the second vGPU is placed on the other one, and all additional vGPUs may be placed on any of them arbitrarily (assuming nothing is detached in the meantime). According to the provided nvidia-smi output, there is something running on each of the cards, which satisfies the separate placement. The provided Vdsm log snippet doesn't contain the VM start, I'd expect to see "Separate mdev placement failed, trying compact placement." messages during VM startups once all the available cards have some vGPU assigned. Frank, can you confirm whether this is the case (and thus not a bug)?
I cannot reproduce the problem and the QE confirms they have run it many times without experiencing any such problem. Frank, we need the Vdsm logs from starting the VMs. Even better, it would be great if it was possible to run a VM in the situation when there is a free (i.e. nothing running there) physical card available and it is not used for the VM's vGPU, with Vdsm debug logging enabled. Then the Vdsm log would very likely pointed us to the particular problem. (With the default logging level, we can only know whether Vdsm switches to the consolidated placement but if it does we won't know why.)
This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000).
The bug is not reproducible and the customer case was closed, so closing the bug. Please reopen if it still needs attention, with the additional information as requested in Comment 6.