Description of problem:
Users with appropriate roles are continually hitting `[security_exception] no permissions for [indices:data/read/field_caps]` errors when accessing Kibana. Deleting/Restoring Kibana user index resolves issue momentarily, but issue reoccurs repeatedly.
Version-Release number of selected component (if applicable):
Customer is able to reproduce easily
Steps to Reproduce:
1. User with appropriate roles gets `[security_exception] no permissions for [indices:data/read/field_caps]` message when accessing Kibana
2. Affected user's Kibana index is deleted and restored
3. Affected user is able to access Kibana without error
4. Days later, the issue occurs again, repeat steps
User indices continue to need to be deleted and restored, even in green status, in order for users to access Kibana without issues.
The Kibana user indices should not need to be deleted and restored originally, let alone multiple times, just for the issue to continue to reoccur
Adding customer specific details as private notes in BZ
Consider asking them to try and access Kibana again and without 60s run the following script . Clone the entire repo and execute with someone who has permissions to access Elasticsearch. This will show you the state of all permissions.
* Is the user trying to access anything other then the discover tab?
* What is their role? Can they see the default namespace?
What exactly is the user doing when they see this error?
Is it trying to get access to their kibana index?
Is it trying to query logs by using an index pattern? If so, please list the index-pattern
What is user's role?
Following is a diagram of how a user's role is determined, how the permissions are seeded
You might temporarily workaround this issue by:
1. exec into an es pod
2. edit sgconfig/sg_action_groups.yaml and add 'indices:data/read/field_caps' to the list for "INDEX_KIBANA"
3. reseed the permissions by 'es_seed_acl'
If this resolves the issue we can modify it for release
Received update from customer:
After working with Engineering, we shortened retention times and fine tuned some settings and did not see issues immediately after.
When updating our environment, all changes were reset and issue was observed again.
We need a solution/process to correct this that will either persist thru OCP version updates, or that can be automated against our environment after OCP version update.
"Working with Engineering" is referring to c#17. This worked, until an update, and everything was reset. Is there a way to achieve this goal that will persist through upgrades?
(In reply to Greg Rodriguez II from comment #19)
> Received update from customer:
> "Working with Engineering" is referring to c#17. This worked, until an
> update, and everything was reset. Is there a way to achieve this goal that
> will persist through upgrades?
You say these issues occur after update which is when all permissions are reseeded.
Did #c9 resolve the problem and it was only an issue after upgrade? This would make sense then because you are then reverting what is allowed and we should update to make the change permanent. It's easy enough to test by applying #c9, reseed, test and then reverting #c9, reseed, test; the problem would disappear and come back. Can they confirm?
The might also consider accessing Kibana with a "private" browser to ensure they are not experiencing issues related to caching.
The output doesn't answer my question from #c20. Does making the change and reseeding the permissions resolve the issue until the pods get restarted?
Customer is requesting update. Has there been any further progress at this time?
Are there any updates that can be provided regarding this issue?
Customer is pushing for movement on this issue. Are there any updates that can be provided?
Issue appears to hit a stand still. Are there any updates in this case? Customer is still affected and requesting to move forward.
Customer is still impacted by this issue. Is there anything whatsoever that can be communicated to the customer regarding the status of this issue or workarounds?
I'm not exactly sure where the idea originated that a user's kibana index is "corrupted" and needs to be removed and replaced in order to fix the problem. Ultimately I believe the issue relates to a user's oauth token expiring and possibly an unexpired cookie in the user's browser and the interplay between the two. We are currently working through some odd behavior for another customer related to addressing oauth tokens, cookie expiration and sign off which I believe may mitigate some of the frustrations user's experience.
I have captured screen shots of behavior I see when my oauth token expires  on a 4.4 cluster which is the same logging stack. The first issue occurs by using the discover tab and allowing the browser to refresh until the token expires. The second issue occurs when navigating to the management tab, choosing an index pattern and then clicking the refresh button. You will note the last is the same error though it lists the kibana certs instead of the user. I was able to solve these issues by clicking the logout link in the upper right corner and then signing back in to get a new oauth token.
* What is the user doing to retrieve logs? When they come back to Kibana at a "later" time are they asked to re-enter their credentials?
* How much "later" do they see the error message occur?
* What is the expiration time for tokens for the ocp cluster?
* Have they tried accessing Kibana in a private browser to see if there is still an issue?
* Have they tried to delete their browser cookies and do they still see the issue?
I also experimented with, after my token expires, closing the window, opening a new one and navigating to Kibana. I discovered the same error reported here related to field_caps and I resolved by signing_out and then signing back in again to refresh my oauth token. I'm asking our docs team to document this as a known issue as we are unlikely to resolve it because of the nature of Kibana. If signing back in resolves error and/or along with #c29, I am inclined to close this issue as WONTFIX
Several IBM Cloud Openshift customers are impacted by defect https://bugzilla.redhat.com/show_bug.cgi?id=1835396.
But 1835396 has been closed as a duplicate of this issue.
So would someone be able to give a status on this issue please?
Thanks in advance. Brian. IBM Cloud Support.
Please ref https://bugzilla.redhat.com/show_bug.cgi?id=1791837#c29 . I don't believe this is an issue but user's needing to clear their browser cache
@Jeff Cantrell is it just me but I cannot see c29 even after I logged in. Can you help me to find that comment?
So to be clear, is just a Kibana issue? Have we confirmed that all log entries from fluentd are being stored in logstash ?
@Jeff Cantrill Helo, Jeff, we have one of our customers also having the same issue on IBM Cloud IKS Version: 4.3.12_1520_openshift. Please investigate this further?
1. Configured the es but still failed.
The procedure as below:
a. exec into an es pod
b. edit sgconfig/sg_action_groups.yaml and add 'indices:data/read/field_caps' to the list for "INDEX_KIBANA"
c. reseed the permissions by 'es_seed_acl'
2. Cleared the browser and login the Kibana but still failed.
Please see my comment #c33. I made the ref'd comment public and don't believe this to be a bug. Moving to UpcomingSprint
Following comment 29, I can fix the Kibana permissions error by log out and log in. so move the bug to Verified.
Note: The logout navigates to https://kibana.example.com/oauth/sign_in. but you couldn't log in via 'https://kibana.example.com/oauth/sign_in. you must log in Kibana via 'https://kibana.example.com'.
@grodrigu @brian_mckeown.com, feel free to re-open this bug if you hit that again. and please provide the oauthaccesstoken too. you can get the token by 'oc get oauthaccesstoken|grep $username'
(In reply to Devendra Kulkarni from comment #39)
> Hello @Jeff and @Anping,
> For case 02629833, the issue still persists and reoccurs once in a week and
> the only way to recover is to delete the kibana user index and then re-login.
Please verify what the version and sha of the images the customer tested. I don't see where they would have had access to the image verified by QE as it was not added to the errata until after they tested.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days