See https://coreos.slack.com/archives/CBN38N3MW/p1608140054245400 for full discussion. The basic idea is that boskos can drive a level based workflow. Basically we need to add a state 'dirty' into which used leases go. A level-driven controller (the level being a resource in the `dirty` state) runs periodically, scans for the leaked cluster, and attempts to clean it up. If it succeeds, it transitions the lease into 'free' state. If it fails, the resource remains in 'dirty' state, and boskos will trigger a new clean up task on its new poll.
As part of bug triage, I'm changing the status to "Assigned" as I see that the bug is currently assigned to Deep.
Hi Deep, do you think this bug will be resolved before the end of this Sprint (January 16th)? If not, can we add "UpcomingSprint"?
Hi Deep, do you know if this bug will be resolved before the end of this sprint (Feb. 6th)? If not, can we set the "Reviewed-In-Sprint" flag to "+"?
At the moment we require more input from the testplatform team, this bug will not be resolved in this sprint.
Hi Deep, do you think this bug will be resolved by the end of this sprint (Feb 27th)? If not, can we set "Reviewed-in-Sprint"?
Hi Deep, do you think this bug will be resolved by the end of this sprint (Mar 20th)? If not, can we set "Reviewed-in-Sprint"?
Hi Deep, do you think this bug will be resolved by the end of this sprint (Apr 10th)? If not, can we set "Reviewed-in-Sprint"?
Hi Deep, do you think this bug will be resolved by the end of this sprint (May 1st)? If not, can we set "Reviewed-in-Sprint"?
Some progress have been made after initial investigation. @steve kuz Can you provide any info as to how we can test the controller locally? cc @mhamzy
cc @skuznets
https://coreos.slack.com/archives/CBN38N3MW/p1625672208051900
More discussion on the progress https://coreos.slack.com/archives/CBN38N3MW/p1627399369095300
https://github.com/kubernetes-sigs/boskos/pull/97
Hi Deep, do you think this bug will be resolved before the end of the current sprint (Sep 24th)? If not, can we add "reviewed-in-sprint" flag?
Hi Deep, do you think this bug will be resolved before the end of the current sprint (Nov 27th)? If not, can we set the "reviewed-in-sprint" flag?
Hi Deep, do you think this bug will be resolved before the end of the current sprint (January 8th)? If not, can we set "reviewed-in-sprint"?
Hi Deep, it was mentioned during backlog refinement that the assignee for this bug might change. Can we change the assignee to the correct personnel working on this bug?
Hi Basava, do you think this bug would be resolved before the end of the current sprint (January 29th)? If not, can we set the "reviewed-in-Sprint" flag to indicate that we have looked at the bug?
Adding reviewed-in-sprint, as it was mentioned during yesterday's sprint planning that Basava will continue to work on this bug.
Hi Basava, do you think this bug would be resolved before the end of the current sprint (February 19th)? If not, can we set the "reviewed-in-Sprint" flag to indicate that we have looked at the bug and will continue to work on it?
Basava indicated that he will continue to work on this in the next sprint. So setting the flag.
Chatted with Basava - this bug will continue in the next sprint. Keeping the "reviewed-in-sprint+" label
Hi Basava, do you know if this bug will be resolved before the end of the current sprint (April 23rd)? If not, can we set the "reviewed-in-sprint" flag?
Chatted with Basava and found out that this is in QA testing. Marking the status as ON_QA
Basava's latest results: recently in testing its failed to delete the resources do to missing libvirt binaries {"component":"janitor","error":"Post \"http://boskos.test-pods.svc.cluster.local./acquire?dest=cleaning\u0026owner=Janitor\u0026state=dirty\u0026type=libvirt-ppc64le-quota-slice\": dial tcp: lookup boskos.test-pods.svc.cluster.local.: no such host","file":"/go/src/app/cmd/janitor/janitor.go:137","func":"main.run","level":"info","msg":"no available resource libvirt-ppc64le-quota-slice","severity":"info","time":"2022-05-06T07:09:55Z"} {"component":"janitor","file":"/go/src/app/cmd/janitor/janitor.go:146","func":"main.run","level":"info","msg":"Acquired resources libvirt-ppc64le-0-2 of type libvirt-ppc64le-quota-slice","severity":"info","time":"2022-05-06T07:10:55Z"} {"component":"janitor","file":"/go/src/app/cmd/janitor/janitor.go:101","func":"main.janitorClean","level":"info","msg":"executing janitor: /root/libvirt-ppc64le-janitor.sh --slice=libvirt-ppc64le-0-2 --hours=0","severity":"info","time":"2022-05-06T07:10:55Z"} {"component":"janitor","file":"/go/src/app/cmd/janitor/janitor.go:146","func":"main.run","level":"info","msg":"Acquired resources libvirt-ppc64le-0-0 of type libvirt-ppc64le-quota-slice","severity":"info","time":"2022-05-06T07:10:55Z"} {"component":"janitor","file":"/go/src/app/cmd/janitor/janitor.go:101","func":"main.janitorClean","level":"info","msg":"executing janitor: /root/libvirt-ppc64le-janitor.sh --slice=libvirt-ppc64le-0-0 --hours=0","severity":"info","time":"2022-05-06T07:10:55Z"} {"component":"janitor","file":"/go/src/app/cmd/janitor/janitor.go:146","func":"main.run","level":"info","msg":"Acquired resources libvirt-ppc64le-1-0 of type libvirt-ppc64le-quota-slice","severity":"info","time":"2022-05-06T07:10:55Z"} {"component":"janitor","file":"/go/src/app/cmd/janitor/janitor.go:101","func":"main.janitorClean","level":"info","msg":"executing janitor: /root/libvirt-ppc64le-janitor.sh --slice=libvirt-ppc64le-1-0 --hours=0","severity":"info","time":"2022-05-06T07:10:55Z"} {"component":"janitor","file":"/go/src/app/cmd/janitor/janitor.go:146","func":"main.run","level":"info","msg":"Acquired resources libvirt-ppc64le-0-1 of type libvirt-ppc64le-quota-slice","severity":"info","time":"2022-05-06T07:10:55Z"} {"component":"janitor","file":"/go/src/app/cmd/janitor/janitor.go:101","func":"main.janitorClean","level":"info","msg":"executing janitor: /root/libvirt-ppc64le-janitor.sh --slice=libvirt-ppc64le-0-1 --hours=0","severity":"info","time":"2022-05-06T07:10:55Z"} {"component":"janitor","error":"resources not found","file":"/go/src/app/cmd/janitor/janitor.go:137","func":"main.run","level":"info","msg":"no available resource libvirt-ppc64le-quota-slice","severity":"info","time":"2022-05-06T07:10:55Z"} {"component":"janitor","error":"exit status 127","file":"/go/src/app/cmd/janitor/janitor.go:105","func":"main.janitorClean","level":"info","msg":"failed to clean up project libvirt-ppc64le-0-1, error info: libvirtcli command not found, installing it.\nlibvirtcli: error while loading shared libraries: libvirt-lxc.so.0: cannot open shared object file: No such file or directory\n","severity":"info","time":"2022-05-06T07:10:55Z"} {"component":"janitor","error":"exit status 1","file":"/go/src/app/cmd/janitor/janitor.go:105","func":"main.janitorClean","level":"info","msg":"failed to clean up project libvirt-ppc64le-0-0, error info: libvirtcli command not found, installing it.\nmv: cannot stat './libvirtcli': No such file or directory\n","severity":"info","time":"2022-05-06T07:10:56Z"} {"component":"janitor","error":"exit status 1","file":"/go/src/app/cmd/janitor/janitor.go:105","func":"main.janitorClean","level":"info","msg":"failed to clean up project libvirt-ppc64le-0-2, error info: libvirtcli command not found, installing it.\nmv: cannot stat './libvirtcli': No such file or directory\n","severity":"info","time":"2022-05-06T07:10:56Z"} {"component":"janitor","error":"exit status 127","file":"/go/src/app/cmd/janitor/janitor.go:105","func":"main.janitorClean","level":"info","msg":"failed to clean up project libvirt-ppc64le-1-0, error info: libvirtcli command not found, installing it.\nlibvirtcli: error while loading shared libraries: libvirt-lxc.so.0: cannot open shared object file: No such file or directory\n","severity":"info","time":"2022-05-06T07:10:56Z"} root@basavarg-boskos-testing:~/dev/test-infra/config/prow/cluster# root@basavarg-boskos-testing:~/dev/test-infra/config/prow/cluster# Resources are marked dirty: root@basavarg-boskos-testing:~/dev/test-infra/config/prow/cluster# kubectl get resources -n test-pods NAME TYPE STATE OWNER LAST-UPDATED libvirt-ppc64le-0-0 libvirt-ppc64le-quota-slice dirty 3s libvirt-ppc64le-0-1 libvirt-ppc64le-quota-slice dirty 3s libvirt-ppc64le-0-2 libvirt-ppc64le-quota-slice dirty 3s libvirt-ppc64le-0-3 libvirt-ppc64le-quota-slice dirty 3s libvirt-ppc64le-1-0 libvirt-ppc64le-quota-slice dirty 3s libvirt-ppc64le-1-1 libvirt-ppc64le-quota-slice dirty 3s libvirt-ppc64le-1-2 libvirt-ppc64le-quota-slice dirty 3s libvirt-ppc64le-2-0 libvirt-ppc64le-quota-slice dirty 3s libvirt-ppc64le-2-1 libvirt-ppc64le-quota-slice dirty 3s libvirt-ppc64le-2-2 libvirt-ppc64le-quota-slice dirty 3s libvirt-ppc64le-2-3 libvirt-ppc64le-quota-slice dirty 3s root@basavarg-boskos-testing:~/dev/test-infra/config/prow/cluster# working on fixing shared library issue. libvirt client currently in my repo: https://github.com/Basavaraju-G/janitor
https://github.com/multi-arch/ocp-remote-ci/pull/26
Talked to Deep and Florian and this bug was verified and can be closed.