Created attachment 880306 [details] Simple bash script to create an index of retrace tasks by caseno and finished_time Description of problem: Often we get cases where customers submit multiple vmcores. After about 1 or 2 vmcores it gets difficult to keep track of the task id's of the vmcores on the case. It's not a ton of time but you have to search the case comments, etc. People usually start with the case they are working on, so they know the case number. This RFE is to add a 'core-by-caseno' index subdirectory, and create a symlink to the retrace task with a meaningful name (I picked 'finished_time' in my index, but we could use something else, such as the info inside crash 'sys' output such as the 'DATE' and 'NODENAME'). $ ls -l 001050838/ total 0 lrwxrwxrwx. 1 dwysocha dwysocha 30 Mar 30 07:28 2014-Mar-06-1255-443952481 -> /cores/retrace/tasks/443952481 lrwxrwxrwx. 1 dwysocha dwysocha 30 Mar 30 07:28 2014-Mar-06-1259-864010766 -> /cores/retrace/tasks/864010766 lrwxrwxrwx. 1 dwysocha dwysocha 30 Mar 30 07:28 2014-Mar-12-1504-177582185 -> /cores/retrace/tasks/177582185 ... lrwxrwxrwx. 1 dwysocha dwysocha 30 Mar 30 07:28 2014-Mar-25-0853-229467397 -> /cores/retrace/tasks/229467397 lrwxrwxrwx. 1 dwysocha dwysocha 30 Mar 30 07:28 2014-Mar-26-1333-660655087 -> /cores/retrace/tasks/660655087 Version-Release number of selected component (if applicable): retrace-server-1.11-1.el6.noarch Details of the RFE: 1. Create a subdirectory underneath /cores/retrace/ where the index will go (perhaps /cores/retrace/tasks-by-caseno) 2. When a task completes, check for the presence of 'caseno' file in the task directory. 3. If the caseno file exists, cat the file, and based on the contents, check whether /cores/retrace/tasks-by-caseno/$caseno directory already exists in the index, if not mkdir /cores/retrace/tasks-by-caseno/$caseno. 4. Once the /cores/retrace/tasks-by-caseno/$caseno exists, create the symlink to the retrace task, with a human-readable / identifying name, such as the following: 2014-Mar-25-0853-229467397 -> /cores/retrace/tasks/229467397 5. Details of the symlink name are TBD * The format of the symlink needs some thought. For example, we may want to use ISO format for the date (i.e. 201403061255) but personally I liked something more readable such as the above. We could also use the info inside the 'sys' output of crash, due to the fact it's often more meaningful to think about the task in terms of the date/time the crash occurred, rather than the time the retrace task completed, and perhaps the name of the machine that crashed. 6. It may be better to have a small python tool to do this rather than adding a couple lines in existing retrace server. Additional info: A number of months back I had put this on the list of retrace RFEs and forgot about it until I got a case with 20+ vmcores on it. So I created a simple script to index it and asked for feedback on sbr-kernel list (http://post-office.corp.redhat.com/archives/sbr-kernel-list/2014-March/msg00025.html). Feedback was all positive that it would be a useful thing to have. This would probably be better implemented inside retrace, since retrace could also cleanup the symlink when the task is deleted. It probably is very easy to do inside retrace too. Note there are a few pitfalls to watch out for, and I think I've covered most of them inside the script attached.
I think case-based scheme should replace current one, if that is not feasible we are better off with post-analysis and pre-cleanup hooks that would execute scripts. In case hooks get implemented, retrace should provide config options +/- follows: HookUser = blah HookGroup = blah HookPostAnalysis = /some/dir HookPreCleanup = /some/other/dir Then it would execute each file in respective directories as HookUser:HookGroup, in alphabetical order.
Scripts would receive 2 arguments: task id and task directory (yes, I know task id can be currently obtained from the path, but this does look like a good thing to depend on)
Created attachment 880444 [details] Simple bash script to create an index of retrace tasks by caseno and finished_time Fixed 'find' in script - otherwise we get a lot of these, I think from the 32-bit vmcores. find: `/cores/retrace/tasks/610785335-kernel': Permission denied
(In reply to Mateusz Guzik from comment #1) > I think case-based scheme should replace current one, if that is not > feasible we are better off with post-analysis and pre-cleanup hooks that > would execute scripts. > Not sure what you mean - are you saying replace the taskid with something like '/cores/retrace/tasks/<caseno>/..."? If so I'm not sure about it since it's a bigger change. Right now 'caseno' is optional too. > In case hooks get implemented, retrace should provide config options +/- > follows: > HookUser = blah > HookGroup = blah > HookPostAnalysis = /some/dir > HookPreCleanup = /some/other/dir > > Then it would execute each file in respective directories as > HookUser:HookGroup, in alphabetical order. I think the hooks should be a different bz - feel free to file it or I will do it soon. I agree we need these if there is not already something in retrace. Need to think about it to make sure existing retrace functionality is not destabilized by some post-analysis hook.
(In reply to Dave Wysochanski from comment #4) > (In reply to Mateusz Guzik from comment #1) > > I think case-based scheme should replace current one, if that is not > > feasible we are better off with post-analysis and pre-cleanup hooks that > > would execute scripts. > > > Not sure what you mean - are you saying replace the taskid with something > like '/cores/retrace/tasks/<caseno>/..."? > > If so I'm not sure about it since it's a bigger change. Right now 'caseno' > is optional too. > Well, not requiring case number when submitting a vmcore reduces usefulness of this feature in the first place. So case number should become mandatory (and maybe we can work something out to make it easier to submit it). With case-related numbers ids current id scheme become redundant, thus it makes sense to get rid of it. However, if that cannot be done for now, it will be easier for everyone to have custom scripts doing the job. On the other hand we would need hooks even if current id scheme goes away, so I agree separate bz is needed - https://bugzilla.redhat.com/show_bug.cgi?id=1082376 That said, this bz should be: replace current scheme with case number based one, if that is not going to work we should close it.
(In reply to Mateusz Guzik from comment #5) > (In reply to Dave Wysochanski from comment #4) > > (In reply to Mateusz Guzik from comment #1) > > > I think case-based scheme should replace current one, if that is not > > > feasible we are better off with post-analysis and pre-cleanup hooks that > > > would execute scripts. > > > > > Not sure what you mean - are you saying replace the taskid with something > > like '/cores/retrace/tasks/<caseno>/..."? > > > > If so I'm not sure about it since it's a bigger change. Right now 'caseno' > > is optional too. > > > > Well, not requiring case number when submitting a vmcore reduces usefulness > of this feature in the first place. So case number should become mandatory > (and maybe we can work something out to make it easier to submit it). > True but assuming people working on cases are submitting the cores there should be incentives to submit the case # with the core. That said, if we can make it easier we should do it. I already filed one bz in this regard and it's been addressed (https://bugzilla.redhat.com/show_bug.cgi?id=999643) Maybe someone needs to work out a portal app to take a vmcore name and submit from the case, then the case # would be there. After all, the customer just gives us the tarball name of the vmcore they uploaded and that is how it is identified through case updates. It then makes sense to have some way directly from the case to submit to retrace-server and we should have most vmcores with caseno file. > With case-related numbers ids current id scheme become redundant, thus it > makes sense to get rid of it. > Not really. Why would you think everyone using retrace-server for vmcores would have a case#? Remember people can use it outside of case work. That is just our primary use case. However, I think we're agreeing on the general idea that task#'s are just numbers, and they don't add value to a user of retrace - they kinda get in the way a bit but I understand why we have them. On a slightly related topic, it may be nice to be able to load crash just based on the vmcore tarball, or perhaps we need tools to look it up (kinda what this bz is). I do agree the 'taskid' is just something people need to remember which is really unrelated to the important information which identifies a vmcore (case#, vmcore tarball name, machine name timestamp of the crash, customer name, bugzilla if associated, etc). So it may be we need to think about a better interface into retrace rather than using the taskid, which is really what is the overarching reason for this bz I think. > However, if that cannot be done for now, it will be easier for everyone to > have custom scripts doing the job. > Ok so you'd have this index outside of retrace-server package then? I didn't like the idea of custom scripts the more I thought about it but yeah it can be done. We'd need to have cleanup script too which is not too hard. > On the other hand we would need hooks even if current id scheme goes away, > so I agree separate bz is needed - > https://bugzilla.redhat.com/show_bug.cgi?id=1082376 > > That said, this bz should be: replace current scheme with case number based > one, if that is not going to work we should close it. I don't think I agree with the above but I'll have to take a crack at a patch and/or see what Michal thinks. Maybe you're right and we should just close the bz and add scripts in some other package, which would go on top of existing retrace-server taskid based design to make the user interface a bit more "workflow friendly"
(In reply to Mateusz Guzik from comment #5) > That said, this bz should be: replace current scheme with case number based > one, if that is not going to work we should close it. One thing we always need is some disambiguation for cores with the same characteristics (same case#, submitted at same time, tarball same name, etc). If we ditch taskid, we'd have to come up with some scheme for uniqueness problem (such as time of submission to retrace-server if it can be guaranteed unique). So I think the existing 'taskid' type approach is fine for uniqueness problem and the base of retrace-server.
(In reply to Dave Wysochanski from comment #6) > (In reply to Mateusz Guzik from comment #5) > > (In reply to Dave Wysochanski from comment #4) > > > (In reply to Mateusz Guzik from comment #1) > > > > I think case-based scheme should replace current one, if that is not > > > > feasible we are better off with post-analysis and pre-cleanup hooks that > > > > would execute scripts. > > > > > > > Not sure what you mean - are you saying replace the taskid with something > > > like '/cores/retrace/tasks/<caseno>/..."? > > > > > > If so I'm not sure about it since it's a bigger change. Right now 'caseno' > > > is optional too. > > > > > > > Well, not requiring case number when submitting a vmcore reduces usefulness > > of this feature in the first place. So case number should become mandatory > > (and maybe we can work something out to make it easier to submit it). > > > True but assuming people working on cases are submitting the cores there > should be incentives to submit the case # with the core. That said, if we > can make it easier we should do it. I already filed one bz in this regard > and it's been addressed (https://bugzilla.redhat.com/show_bug.cgi?id=999643) > > Maybe someone needs to work out a portal app to take a vmcore name and > submit from the case, then the case # would be there. After all, the > customer just gives us the tarball name of the vmcore they uploaded and that > is how it is identified through case updates. It then makes sense to have > some way directly from the case to submit to retrace-server and we should > have most vmcores with caseno file. This sounds like a good idea. The case can then maintain it's own list of vmcores submitted (and possibly include details from the 'sys' command so we can distinguish different cores).
> > Well, not requiring case number when submitting a vmcore reduces usefulness > > of this feature in the first place. So case number should become mandatory > > (and maybe we can work something out to make it easier to submit it). > > > True but assuming people working on cases are submitting the cores there > should be incentives to submit the case # with the core. That said, if we > can make it easier we should do it. I already filed one bz in this regard > and it's been addressed (https://bugzilla.redhat.com/show_bug.cgi?id=999643) > > Maybe someone needs to work out a portal app to take a vmcore name and > submit from the case, then the case # would be there. After all, the > customer just gives us the tarball name of the vmcore they uploaded and that > is how it is identified through case updates. It then makes sense to have > some way directly from the case to submit to retrace-server and we should > have most vmcores with caseno file. > That sounds reasonable for the short term. I think the real problem is that customers upload files to an anonymous server. Instead, an account should be created along with the case in sfdc and you would get credentials in return. Files uploaded with such credentials end up in case-specific directory thus we get stuff automagically classified and can have it acted upon in similar manner. > > With case-related numbers ids current id scheme become redundant, thus it > > makes sense to get rid of it. > > > Not really. Why would you think everyone using retrace-server for vmcores > would have a case#? Remember people can use it outside of case work. That > is just our primary use case. However, I think we're agreeing on the > general idea that task#'s are just numbers, and they don't add value to a > user of retrace - they kinda get in the way a bit but I understand why we > have them. > I don't really see why would you use retrace for something else than case work, but such usage only strenghtens my argument. :) As was noted, the problem is that we just have a directory with semi-random numbers known as task id. Nobody knows what the core is based on directory content unless it happens to have caseno files. That said, some outside-of-sfdc usage with no additional information would only contribute to the mess. Instead all cores should be "resolvable" to a case number, and if there is none we have to know who owns the core - so it needs some user-specific id, username or something else. I don't see why this could not be included in task id. > On a slightly related topic, it may be nice to be able to load crash just > based on the vmcore tarball, or perhaps we need tools to look it up (kinda > what this bz is). I do agree the 'taskid' is just something people need to > remember which is really unrelated to the important information which > identifies a vmcore (case#, vmcore tarball name, machine name timestamp of > the crash, customer name, bugzilla if associated, etc). So it may be we > need to think about a better interface into retrace rather than using the > taskid, which is really what is the overarching reason for this bz I think. > Just core name will lead to ambiguity (read: problems) in the long run, case number is needed. For instance if per-case dirs are created we can suddenly get files with the same name from different cases (and the one you want is not submitted yet, or maybe just got removed, while the other one is ready), you don't want to open a different core by accident, do you? :-> ... and if you allow people to lookup the core based only on the name this will happen in the future. But if we will have links/dirs with cases (and we will) this loses most of its value. > > > However, if that cannot be done for now, it will be easier for everyone to > > have custom scripts doing the job. > > > Ok so you'd have this index outside of retrace-server package then? > I didn't like the idea of custom scripts the more I thought about it but > yeah it can be done. We'd need to have cleanup script too which is not too > hard. > Not sure what you don't like. With custom scripts we have more flexibility to do various stuff. > > > On the other hand we would need hooks even if current id scheme goes away, > > so I agree separate bz is needed - > > https://bugzilla.redhat.com/show_bug.cgi?id=1082376 > > > > That said, this bz should be: replace current scheme with case number based > > one, if that is not going to work we should close it. > > I don't think I agree with the above but I'll have to take a crack at a > patch and/or see what Michal thinks. Maybe you're right and we should just > close the bz and add scripts in some other package, which would go on top of > existing retrace-server taskid based design to make the user interface a bit > more "workflow friendly" Again, custom scripts = more flexibility. On the other hand replacing current scheme with case-based is cleaner and would have to happen in retrace. Leaving the scheme as it is and implementing this feature in retrace gives us the worst of two worlds. :)
(In reply to Dave Wysochanski from comment #7) > (In reply to Mateusz Guzik from comment #5) > > That said, this bz should be: replace current scheme with case number based > > one, if that is not going to work we should close it. > > One thing we always need is some disambiguation for cores with the same > characteristics (same case#, submitted at same time, tarball same name, > etc). If we ditch taskid, we'd have to come up with some scheme for > uniqueness problem (such as time of submission to retrace-server if it can > be guaranteed unique). So I think the existing 'taskid' type approach is > fine for uniqueness problem and the base of retrace-server. Uniqueness is not a problem. We can have: caseno/coreno, caseno-coreno or some other variant where coreno is a counter.
(In reply to Mateusz Guzik from comment #10) > (In reply to Dave Wysochanski from comment #7) > > (In reply to Mateusz Guzik from comment #5) > > > That said, this bz should be: replace current scheme with case number based > > > one, if that is not going to work we should close it. > > > > One thing we always need is some disambiguation for cores with the same > > characteristics (same case#, submitted at same time, tarball same name, > > etc). If we ditch taskid, we'd have to come up with some scheme for > > uniqueness problem (such as time of submission to retrace-server if it can > > be guaranteed unique). So I think the existing 'taskid' type approach is > > fine for uniqueness problem and the base of retrace-server. > > Uniqueness is not a problem. We can have: caseno/coreno, caseno-coreno or > some other variant where coreno is a counter. Yeah that's a good idea. But again you're assuming we mandate the case#. I'm not sure about this since not everyone that uses retrace has 'case numbers'. For us it would be great though. I am also concerned about instability should we go the route of moving away from the taskid as the basis. Recently we've addressed a lot of problems but we've still got multiple issues in production. It's one thing to add an index, but another thing to change the design. We need Michal to weigh in on it, someone to create a patch, and probably some help with testing. It doesn't seem like it would be too bad but we won't know until someone does a patch.
I already noted that in #9: "That said, some outside-of-sfdc usage with no additional information would only contribute to the mess. Instead all cores should be "resolvable" to a case number, and if there is none we have to know who owns the core - so it needs some user-specific id, username or something else. I don't see why this could not be included in task id." Replace caseno with ident, where ident is either a case number or user login.
This message is a reminder that EPEL 6 is nearing its end of life. Fedora will stop maintaining and issuing updates for EPEL 6 on 2020-11-30. It is our policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of 'el6'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later EPEL version. Thank you for reporting this issue and we are sorry that we were not able to fix it before EPEL 6 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above.
EPEL el6 changed to end-of-life (EOL) status on 2020-11-30. EPEL el6 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of EPEL please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.