Created attachment 1877972 [details] openshift_install.log for the cluster name ipitest, Version: $ openshift-install version ./openshift-install unreleased-master-5680-gb9faa56e1a63d5aac107d4f059d30cc25702be93 built from commit b9faa56e1a63d5aac107d4f059d30cc25702be93 release image registry.ci.openshift.org/origin/release:4.10 release architecture amd64 Platform: ibmcloud * IPI What happened? IPI cluster creation fails with platform ibmcloud with different errors every time. Hence, I have attached multiple logs in the attachment. What I expected to happen? Successful creation of IPI cluster How to reproduce it (as minimally and precisely as possible)? $ Follow this doc for IPI cluster creation -https://deploy-preview-39767--osdocs.netlify.app/openshift-enterprise/latest/installing/installing_ibm_cloud_public/installing-ibm-cloud-customizations.html#installing-ibm-cloud-customizations
Created attachment 1877991 [details] Cluster creation failed - attached logs Adding one more log. This was observed during another cluster creation
Adding one more console log: (The cluster is cleaned up, hence, we have lost .openshift_install.log) ``` INFO Obtaining RHCOS image file from 'https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases/rhcos-4.10/410.84.202201251210-0/x86_64/rhcos-410.84.202201251210-0-ibmcloud.x86_64.qcow2.gz?sha256=8fc2f8c99b6fc4766907f0e793bdf6ce7d0e0160f5a8296e4b0c3bb05bb57f1d' INFO Creating infrastructure resources... ERROR ERROR Error: Invalid function argument ERROR ERROR on ../../../../var/folders/65/fc3xk3hj4b9c7754yqtb5qkw0000gn/T/openshift-install-bootstrap-3569125112/main.tf line 34, in resource "ibm_is_instance" "bootstrap_node": ERROR 34: user_data = templatefile("${path.module}/templates/bootstrap.ign", { ERROR 35: HOSTNAME = ibm_cos_bucket.bootstrap_ignition.s3_endpoint_public ERROR 36: BUCKET_NAME = ibm_cos_bucket.bootstrap_ignition.bucket_name ERROR 37: OBJECT_NAME = ibm_cos_bucket_object.bootstrap_ignition.key ERROR 38: IAM_TOKEN = data.ibm_iam_auth_token.iam_token.iam_access_token ERROR 39: }) ERROR |---------------- ERROR | path.module is "../../../../var/folders/65/fc3xk3hj4b9c7754yqtb5qkw0000gn/T/openshift-install-bootstrap-3569125112" ERROR ERROR Invalid value for "path" parameter: no file exists at ERROR ../../../../var/folders/65/fc3xk3hj4b9c7754yqtb5qkw0000gn/T/openshift-install-bootstrap-3569125112/templates/bootstrap.ign; ERROR this function works only with files that are distributed as part of the ERROR configuration source code, so if this file will be created by a resource in ERROR this configuration you must instead obtain this result from an attribute of ERROR that resource. ERROR ERROR Failed to read tfstate: open /var/folders/65/fc3xk3hj4b9c7754yqtb5qkw0000gn/T/openshift-install-bootstrap-3569125112/terraform.bootstrap.tfstate: no such file or directory FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change ambikanair@Ambika-Nairs-MacBook-Pro openshift-install-mac-4.10.0-rc.8 % ```
I am making this BZ public so people in IBM can access it. Logs from CI are public anyway.
Looks like the first two attachments (attachment 1877972 [details] and 1877991), the error was due to the user or system cancelling the operation, via an interrupt. time="2022-04-19T16:35:19+05:30" level=debug msg="module.vpc.ibm_is_lb_pool.kubernetes_api_public[0]: Still creating... [30s elapsed]" time="2022-04-19T16:35:21+05:30" level=debug msg="module.vpc.ibm_is_lb_pool.machine_config: Still creating... [1m0s elapsed]" time="2022-04-19T16:35:21+05:30" level=debug msg="module.vpc.ibm_is_lb_pool.kubernetes_api_private: Still creating... [1m0s elapsed]" time="2022-04-19T16:35:27+05:30" level=debug msg="Interrupt received." time="2022-04-19T16:35:27+05:30" level=debug msg="Please wait for Terraform to exit or data loss may occur." time="2022-04-19T16:35:27+05:30" level=debug msg="Gracefully shutting down..." time="2022-04-19T16:35:27+05:30" level=debug msg="Stopping operation..." time="2022-04-19T16:35:27+05:30" level=error msg="Two interrupts received. Exiting immediately. Note that data" time="2022-04-19T16:35:27+05:30" level=error msg="loss may have occurred." time="2022-04-19T16:35:27+05:30" level=error time="2022-04-19T16:35:27+05:30" level=error msg="Error: operation canceled" I would make sure the user or system that is performing the IPI deployment is not interrupted (machine goes to sleep, powers down, etc.).
As for the other failure ERROR Invalid value for "path" parameter: no file exists at ERROR ../../../../var/folders/65/fc3xk3hj4b9c7754yqtb5qkw0000gn/T/openshift-install-bootstrap-3569125112/templates/bootstrap.ign; ERROR this function works only with files that are distributed as part of the ERROR configuration source code, so if this file will be created by a resource in ERROR this configuration you must instead obtain this result from an attribute of ERROR that resource. ERROR ERROR Failed to read tfstate: open /var/folders/65/fc3xk3hj4b9c7754yqtb5qkw0000gn/T/openshift-install-bootstrap-3569125112/terraform.bootstrap.tfstate: no such file or directory FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change This seems like a potential pathing issue for Terraform, or perhaps on MacOS. As I do not see these issues on a fresh Ubuntu container. But, I'm not familiar enough with Terraform to determine the root cause or expectations for this. But I do feel like the pathing should be <cluster-dir>/terraform.bootstrap.tfstate Someone with more knowledge on Terraform might have some better insight into this I hope.
(In reply to Guna K Kambalimath from comment #6) > Update: > > Tried to create cluster inside a linux env (inside docker container). I > could see some progress. But, I still see the following error. > > OS: > root@9e6e50387922:/cco_credentials_mount# cat /etc/os-release > NAME="Ubuntu" > VERSION="18.04.6 LTS (Bionic Beaver)" > ID=ubuntu > ID_LIKE=debian > PRETTY_NAME="Ubuntu 18.04.6 LTS" > VERSION_ID="18.04" > HOME_URL="https://www.ubuntu.com/" > SUPPORT_URL="https://help.ubuntu.com/" > BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" > PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy- > policy" > VERSION_CODENAME=bionic > UBUNTU_CODENAME=bionic > > Logs: > INFO Cluster operator cloud-controller-manager CloudControllerOwner is True > with AsExpected: Cluster Cloud Controller Manager Operator owns cloud > controllers at 4.10.0-rc.8 > INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: > INFO Cluster operator image-registry Available is False with > DeploymentNotFound: Available: The deployment does not exist > INFO NodeCADaemonAvailable: The daemon set node-ca has available replicas > INFO ImagePrunerAvailable: Pruner CronJob has been created > INFO Cluster operator image-registry Progressing is True with Error: > Progressing: Unable to apply resources: unable to apply objects: failed to > create object *v1.Secret, Namespace=openshift-image-registry, > Name=image-registry-private-configuration: specified resource key > credentials does not contain HMAC keys > ERROR Cluster operator image-registry Degraded is True with Unavailable: > Degraded: The deployment does not exist > INFO Cluster operator insights Disabled is False with AsExpected: > INFO Cluster operator insights SCANotAvailable is True with NotFound: Failed > to pull SCA certs from > https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API > https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP > 404: > {"code":"ACCT-MGMT-7","href":"/api/accounts_mgmt/v1/errors/7","id":"7", > "kind":"Error","operation_id":"290a4842-1b16-4fc6-98d2-d7e3a13bc93c", > "reason":"The organization (id= 25yGxao0DHRsqcIax7rSwDLAmjQ) does not have > any certificate of type sca. Enable SCA at > https://access.redhat.com/management."} > INFO Cluster operator network ManagementStateDegraded is False with : > ERROR Cluster initialization failed because one or more operators are not > functioning properly. > ERROR The cluster should be accessible for troubleshooting as detailed in > the documentation linked below, > ERROR > https://docs.openshift.com/container-platform/latest/support/troubleshooting/ > troubleshooting-installations.html > ERROR The 'wait-for install-complete' subcommand can then be used to > continue the installation > FATAL failed to initialize the cluster: Cluster operator image-registry is > not available > > > Installer used: - openshift-install-linux-4.10.0-rc.8
So it appears you are affected by the bug from last month, likely multiple bugs actually as well. https://bugzilla.redhat.com/show_bug.cgi?id=2082492 > INFO Cluster operator image-registry Available is False with > DeploymentNotFound: Available: The deployment does not exist > INFO NodeCADaemonAvailable: The daemon set node-ca has available replicas > INFO ImagePrunerAvailable: Pruner CronJob has been created > INFO Cluster operator image-registry Progressing is True with Error: > Progressing: Unable to apply resources: unable to apply objects: failed to > create object *v1.Secret, Namespace=openshift-image-registry, > Name=image-registry-private-configuration: specified resource key > credentials does not contain HMAC keys I recommend you use an OCP release of 4.10.15 or later, or a very recent 4.11 CI/nightly build, that will include the fixes for these known bugs. Please also keep in mind, the fix for the `image-registry` bug, requires the user to perform the CredentialsRequests extraction again (as it requires changes to the IRO CredentialsRequest. https://coreos.slack.com/archives/C01U40AM37F/p1652196272952479?thread_ts=1652074667.322439&cid=C01U40AM37F The important item being Step 4, when the CredentialsRequests are extracted, and the steps that follow using these new CredentialsRequest oc adm release extract --cloud=ibmcloud --credentials-requests $RELEASE_IMAGE \ --to=<path_to_credential_requests_directory> The other bugs that affected 4.11 CI/nightly releases back in early May. https://bugzilla.redhat.com/show_bug.cgi?id=2082604 https://bugzilla.redhat.com/show_bug.cgi?id=2082687
I'm marking this as not a blocker because it does not clearly identify an issue. I'm interested in the error in https://bugzilla.redhat.com/show_bug.cgi?id=2083006#c5 if you are able to reproduce it. We have seen errors like this when the installer is run on a symlink directory at a different level in the hierarchy. That issue was previously fixed. Please let me know (and file a bz) if you can reproduce that issue.
test with 4.10.18 IBMCloud IPI install succeed.
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-9261