Description of problem: In BZ1654058, we added handling for the case where the vCenter REST API throws a 400 error for more than 1000 VMs. Today while looking for something else in the API doc, I noticed that both the limitation and the error code have changed. - Now, the limitation is 4000 VMs instead of 1000 VMs. - Now, the error code for "more than 4000 VMs" is either 507 or 500 (I'll post more details on the discrepancy below). - Now, error 400 only means "com.vmware.vapi.std.errors.invalid_argument : if the VM.FilterSpec.power-states field contains a value that is not supported by the server." The API documentation at https://developer.vmware.com/docs/vsphere-automation/latest/vcenter/rest/vcenter/vm/get/ (which I accessed via https://developer.vmware.com/) makes no mention of which API version or vSphere version introduced this change. It only says "latest" for the API version in the URL. I see no ChangeLog, no way to choose which version of the API you want to view docs for, or display of what the current latest version is. Via a search engine, I found an alternative API documentation tree, which shows through comparison that the change was introduced in vSphere 7.0. Here's the more extensive doc tree with version choices: - https://code.vmware.com/web/sdk/6.7/vsphere-automation-rest - https://code.vmware.com/web/sdk/7.0/vsphere-automation-rest The info about the old vSphere 6.7 interface for vcenter/vm/list can be found here: - vSphere Automation API 6.7 (https://code.vmware.com/apis/366/vsphere-automation) - vSphere Automation API Reference 6.7 U1 (https://vmware.github.io/vsphere-automation-sdk-rest/6.7.1/operations/com/vmware/vcenter/vm.list-operation.html) - Note: Both of the above (6.7 and 6.7 U1) agree on the details of the vcenter/vm list operation. There are two conflicting documents for the vSphere 7.0 API, and both are linked from the doc tree that I provided above. - vSphere Automation API 7.0 (https://code.vmware.com/apis/991/vsphere-automation) - This redirects to the developer.vmware.com documentation that I linked near the top. - This says that the limit is now 4000 VMs, and that the GET call for the list operation returns error **507** if there are more than 4000 VMs. - vSphere Automation API 7.0U1 (https://code.vmware.com/apis/1119/vsphere-automation) - This says that the limit is now 4000 VMs, and that the GET call for the list operation returns error **500** if there are more than 4000 VMs. The fence_vmware_rest fence agent uses the following exception handling logic: ~~~ try: command = "vcenter/vm" if "--filter" in options: command = command + "?" + options["--filter"] res = send_command(conn, command) except Exception as e: logging.debug("Failed: {}".format(e)) if str(e).startswith("400"): if options.get("--original-action") == "monitor": return outlets else: logging.error("More than 1000 VMs returned. Use --filter parameter to limit which VMs to list.") fail(EC_STATUS) else: fail(EC_STATUS) ~~~ So there are at least two changes we need to make. - We need to look for 507 or 500 as the "too many VMs" error code to avoid failing with a generic EC_STATUS. - We need to modify the error message to say "More than 4000 VMs". A related problem that you can see in the vSphere 6.7 API doc that I linked above: Error code 400 can mean two different things. 400 invalid_argument if the vcenter.VM.filter_spec.power_states field contains a value that is not supported by the server. 400 unable_to_allocate_resource if more than 1000 virtual machines match the vcenter.VM.filter_spec. So we don't actually know which one is meant just by checking the error code. On vSphere 7.0, code 507 or 500 is used for "too many VMs", so we have distinct error codes. ----- Version-Release number of selected component (if applicable): fence-agents-vmware-rest-4.2.1-53.el8_3.1 ----- How reproducible: Presumably always ----- Steps to Reproduce: 1. Set up more than 4000 VMs on a vCenter running version 7.0 or greater. 2. Run `fence_vmware_rest <options> -o list`. ----- Actual results: Based on the current agent code, the agent will fail with only "Unable to connect/login to fencing device". ----- Expected results: The agent fails with "More than 1000 VMs returned. Use --filter parameter to limit which VMs to list" in addition to the "Unable to connect/login to fencing device" error. ----- Additional info: We should coordinate with VMware here to verify which of the 500 vs. 507 error codes is correct, or when the change from one to the other was introduced. There is technically no impact, as this only affects logging. However, if the agent interprets error codes incorrectly because their meanings have changed, it can make troubleshooting more difficult.
It's been reported that the error code 507 for too many VMs is not an issue in current testing. I looked into that, and here's what I found. In comment 0 I said: > - vSphere Automation API 7.0 (https://code.vmware.com/apis/991/vsphere-automation) > - This redirects to the developer.vmware.com documentation that I linked near the top. > - This says that the limit is now 4000 VMs, and that the GET call for the list operation returns error **507** if there are more than 4000 VMs. If you go to that URL now and follow it to the vcenter/vm GET endpoint page, the page now says error code **500** for too many VMs. The other (direct) link I provided, https://developer.vmware.com/docs/vsphere-automation/latest/vcenter/rest/vcenter/vm/get/, also says error code **500** for too many VMs. So the documentation has changed between the day I filed this bug and today. It looks as if VMware has reverted the error code from **507** to **500** since then. ----- I want to point something else out: fence_vmware_rest connects to "{api_host}/rest". This is deprecated as of v7.0 U2. The new REST APIs are served under "{api_host}/api". VMware says: "There is no immediate impact out of this change as the old REST APIs will continue to work. We are not removing the old REST APIs or their support. We intend to remove it only after 2 major vSphere releases, also subject to customer feedback." - https://core.vmware.com/blog/vsphere-7-update-2-rest-api-modernization So that's good news for us. However, it is something we need to be aware of. At some point we'll want to update fence_vmware_rest to make "/api" the default api_path; users on legacy vSphere deployments can still override this by setting api_path="/rest". **Eventually**, using "/rest" may break. We could either change our default in RHEL 9 (or 10, since we've probably got a while); or just wait until we're working on officially supporting vSphere 9. Either way, that's a separate BZ.
Nice to know. Do you know how far back support for /api is available? If it's only 7.0+
(In reply to Oyvind Albrigtsen from comment #8) > Do you know how far back support for /api is available? If it's only 7.0+ All I can do is read the VMware announcement that's linked above: All REST APIs from 6.0 to 6.7 were served under /rest and referred to as old REST APIs. Starting from vSphere 7, REST APIs are served under /api and referred to as new REST APIs. With the release of vSphere 7 Update 2, VMware announces the deprecation of old REST APIs. "/api" didn't appear in the docs I was looking through until 7.0 Update 2.
Closing since 500/507 issue has been fixed in the API.