Description of problem: Upgraded from 4.4.9 to 4.5.24 using local mirror. Process appeared to run successfully, however current configuration reports: Failed to resync 4.5.24 because: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool master is not ready, retrying. Status: (pool degraded: true total: 3, ready 3, updated: 3, unavailable: 0) Version-Release number of selected component (if applicable): 4.5.24 How reproducible: not very Steps to Reproduce: 1. 2. 3. Actual results: upgrade reports success but master MCP reports a degraded status Expected results: upgrade succeeds and MCP is healthy Additional info:
@jerzhang, cu is having RBAC issues getting both the must-gather and the inspection. will update BZ when either becomes available
@jerzhang, can you yank the file-bundle in supportshell? that would be quicker than me d'loading/splitting/attaching it to this ticket. and the RBAC errors that i posted in the BZ previously is what led me to believe that the CM corruption is preventing them from getting a must-gather (i can be wrong, i frequently am); and yes, this is the result of an upgrade.
@jerzhang, must-gather has been executed and attached to the ticket.
the error msg for the master MCP has changed ever-so-slightly: - lastTransitionTime: "2021-04-29T16:46:04Z" message: |- Failed to render configuration for pool master: parsing Ignition config failed with error: config is not valid Report: error at line 1, column 1178 1: {"ignition":{"config":{},"security":{"tls":{}},"timeouts":{},"version":"2.2.0"},"networkd":{},"passwd":{},"storage":{"files":[{"contents":{"source":"data:text/plain;charset=utf-8;base64,IyBTcGVjaWZ5IHRpbWUgc291cmNlcy4Kc2VydmVyICAgbnRwMmEubWwuY29tCnNlcnZlciAgIG50 cDJiLm1sLmNvbQpzZXJ2ZXIgICBudHAyYy5tbC5jb20Kc2VydmVyICAgbnRwMmQubWwuY29tCgoj IFJlY29yZCB0aGUgcmF0ZSBhdCB3aGljaCB0aGUgc3lzdGVtIGNsb2NrIGdhaW5zL2xvc3NlcyB0 aW1lLgpkcmlmdGZpbGUgL3Zhci9saWIvY2hyb255L2RyaWZ0CgojIEFsbG93IHRoZSBzeXN0ZW0g Y2xvY2sgdG8gYmUgc3RlcHBlZCBpbiB0aGUgZmlyc3QgdGhyZWUgdXBkYXRlcwojIGlmIGl0cyBv ZmZzZXQgaXMgbGFyZ2VyIHRoYW4gMSBzZWNvbmQuCm1ha2VzdGVwIDEuMCAzCgojIEVuYWJsZSBr ZXJuZWwgc3luY2hyb25pemF0aW9uIG9mIHRoZSByZWFsLXRpbWUgY2xvY2sgKFJUQykuCnJ0Y3N5 bmMKCiMgSW5jcmVhc2UgdGhlIG1pbmltdW0gbnVtYmVyIG9mIHNlbGVjdGFibGUgc291cmNlcyBy ZXF1aXJlZCB0byBhZGp1c3QKIyB0aGUgc3lzdGVtIGNsb2NrLgptaW5zb3VyY2VzIDIKCiMgU3Bl Y2lmeSBmaWxlIGNvbnRhaW5pbmcga2V5cyBmb3IgTlRQIGF1dGhlbnRpY2F0aW9uLgprZXlmaWxl IC9ldGMvY2hyb255LmtleXMKCiMgR2V0IFRBSS1VVEMgb2Zmc2V0IGFuZCBsZWFwIHNlY29uZHMg ZnJvbSB0aGUgc3lzdGVtIHR6IGRhdGFiYXNlLgpsZWFwc2VjdHogcmlnaHQvVVRDCgojIFNwZWNp ZnkgZGlyZWN0b3J5IGZvciBsb2cgZmlsZXMuCmxvZ2RpciAvdmFyL2xvZy9jaHJvbnkK ^ invalid data character reason: "" status: "True" type: RenderDegraded it could just be formatting, but i dont think so
@jerzhang, the cu sent this comment to the ticket: The base64 data in that yaml was from an attempt to update the ntp server configuration to conform to bank standards. The base64 text had embedded spaces, which I missed when creating it. No cr/lf characters, but there were bad characters in the encoding. I've fixed that, and the ntp.conf files are now correct, meaning the patch apply was successful. About 15 minutes later, the operators recovered and the cluster now seems to be clear. I knew there had been a problem with the ntp change apply, but I didn't expect it to cause this level of error with no clear explanation of why it happened. In any case, thanks for the help, I will leave the cluster as it is for a while, and if it remains normally operational, I will attempt the next stage of the update to 4.6. they were able to correct the MC and proceed to their final upgrade of the cluster. the support ticket has been closed, if you would like to close this BZ.
Ok, thank you for the update. Closing this bug.