Bug 712390

Summary: Integrate "How can I use the GFS2 tracepoints and debugfs glocks file in RHEL6?" into docs
Product: Red Hat Enterprise Linux 6 Reporter: Allison Matlack <amatlack>
Component: doc-Cluster_AdministrationAssignee: Steven J. Levine <slevine>
Status: CLOSED NEXTRELEASE QA Contact: ecs-bugs
Severity: low Docs Contact:
Priority: medium    
Version: 6.3CC: jha, jskeoch, sfolkwil, slevine, ssaha, swhiteho
Target Milestone: rcKeywords: Documentation
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-15 18:27:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Allison Matlack 2011-06-10 13:13:19 UTC
Need to integrate the kbase article "How can I use the GFS2 tracepoints and debugfs glocks file in RHEL6?" into official product docs.

Link to kbase article: https://access.redhat.com/kb/docs/DOC-41624

Comment 2 Steve Whitehouse 2011-06-10 14:29:15 UTC
That sounds ok to me.

Comment 3 Steven J. Levine 2011-06-14 14:47:48 UTC
Assigning to me, since it's my area, but I thought we'd already had an extensive discussion that concluded that this did not belong in the user administration guide, that the kBase article was the place for information of this sort. I'll dig out that exchange and revisit this issue.

Comment 4 Allison Matlack 2011-06-14 14:54:36 UTC
We're in the process of trying to clear out the kbase of everything that is not a KCS article. KCS articles are generally specific bullet-point lists that address customer issues; the new tech briefs provide more detailed information that drills down into specific features or applications; and product docs seem to be the place for general how-to kinds of things.

I'm the new tech writer for the portal, and I had a long meeting with Sam, Perry, and Sayandeb where we went through all the cluster docs in the kbase to identify what needs to go where. I have filed bugs for everything identified to be integrated into the docs.

You can find my tracking list here: https://docspace.corp.redhat.com/docs/DOC-67420

Comment 5 Steven J. Levine 2011-06-14 17:10:48 UTC
But this is not a how-to sort of thing. This information is for file system developers, not for system administrators.

Comment 6 Steven J. Levine 2011-06-14 17:33:38 UTC
For related information, there is BZ#579598.

But that split off into private email.

-----------------
Message-ID: <4BF6DDA6.5030304>
Date: Fri, 21 May 2010 14:23:18 -0500
From: Steven Levine <slevine>
To: Steven Whitehouse <swhiteho>, Perry Myers <pmyers>,
        Ric Wheeler <rwheeler>, Steven Levine <slevine>,
        Nathan Straz <nstraz>, David Teigland <teigland>,
        Bob Peterson <rpeterso>, Abhijith Das <adas>,
        Bob Peterson <rpeterso>
CC: Subhendu Ghosh <sghosh>, "Michael H. Smith" <mhideo>,
        filesystem-list <filesystem-dept-list>
Subject: GFS2 glocks and tracepoints: Looking for Feedback on Where to Document
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11

Summary: I'm looking for some feedback on where/how to document GFS2
troubleshooting -- particularly some clarification on who the audience
is for this information

For the RHEL 6 release, I have been working with Steve Whitehouse on
documenting glocks and the new trace points in the GFS2 document.

Much of our discussion can be found here:

https://bugzilla.redhat.com/show_bug.cgi?id=579598

In that discussion, I suggest that we include information about trace points in an appendix -- on the theory that we have the information, it's GFS2-specific, people might want it, and there's no general "tracepoints" documentation otherwise.

As I delve deeper into the actual administrative operation of trace points, however, I'm starting to question who the audience is for this information. It doesn't seem to be GFS administrators -- which is the audience for the book itself. Do Red Hat customers use this information?

It doesn't seem as if it will really hurt anything if I provide the trace points information in the GFS2 manual -- that makes them available and easy to find for anybody doing development, even if that isn't really the defined audience for this document -- but I wanted to run this issue by this list to get some feedback on whether I'm confusing things for our customers if I do.

On a related note, several weeks ago Steve provided me with some nice information about GFS node locking, and how that relates to performance tuning and troubleshooting. That information can be found in the RHEL 6 GFS2 draft here:

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-ov-lockbounce.html

And then it continues here, with a section on troubleshooting:

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/gfs2_performance_troubleshoot.html

But as I look at the trace points info, I have the same questions about this troubleshooting section. Who is the audience that will be troubleshooting GFS2?  I'm not sure it is the same audience that is using the rest of the document.

I'd appreciate any thoughts you have about what I should do with this information -- or if the direction I'm heading (leave the GFS troubleshooting section as is, put the trace points in an appendix) is ok.

Thanks,

-Steven

-------------------

From Dave Chinner in response:

Not so much a GFS2 point of view, but I would not expect anyone who
is not familiar with the code to understand what the output of
tracepoints really mean. They might be useful to support engineers
that have been trained to understand them, but I doubt that
customers want to did that deeply into the inner workings of the
system - that's why they pay for support....

Cheers,

Dave.
--
Dave Chinner
dchinner
-----------------------

From Steve Whitehouse:

Yes, I'd tend to at least partially agree here. On the other hand, given
suitable basic information about the internals, I hope that customers
would gain a better understanding of the principles behind the
filesystem's operation and that should reduce the number of support
calls we get relating to poor performance due to cache bouncing between
nodes.

We've had a lot of those reports from customers in the past, and
anything we can do to enhance understanding and reduce the number of
calls that support have to deal with, the better.

Also, this has come about because I wrote a brief comment aimed at the
release notes, to indicate that we had a new feature - gfs2 tracepoints.
That addition to the release notes was rejected on the basis that the
feature should have more extensive documentation, and that resulted in
the bz now opened to cover this documentation issue.

It might be worth while though to ask support about the best way to
tackle this particular subject in case there are any items they'd like
to add in/leave out specifically,

Steve.

-------------
From Perry Myers:

If we're not sure about this material being pertinent for the formal docs,
why not just put the info in a public kbase?  That would make it available
for advanced users, w/o exploding the formal docs w/ potentially too much
information.  A kbase would also make it easy for support folks to find it.

Perry

----------------
From Steve Whitehouse:

That seems reasonable to me. The only question then is, should we
release note the new feature? My feeling is yes, but I'll need to
convince Ryan of the merits,

Steve.

----------------
From Perry:

I think a comment in the 'Technical Notes' section and in the errata
should be sufficient.  If someone creates the kbase prior to GA, we could
even reference the kbase itself in the Technical Notes.

Perry

---------------
From Steve

There are no technical notes for .0 releases, but the release notes
should be ok I think. It doesn't need much of a mention - just something
to say that it exists. Otherwise I think that sounds good,

Steve


---------------------

Other than that, it's documented in BZ#579598.

Comment 7 Steven J. Levine 2011-06-14 17:56:59 UTC
Allison:

Based on the definition in Comment 4, why isn't this suitable for a tech brief? It's a very specific feature of the application, and not at all an end-user how-to. 

In fact, the glock information which I wound up putting in the GFS2 manual should probably also be a tech. brief, by this definition.

-Steven

Comment 8 Allison Matlack 2011-06-14 18:06:18 UTC
Steven,

Per our discussion about these articles, it was decided that all of the debug information should be included in product docs. I think a description of the glocks and tracepoints should definitely be in the product docs. Perry Myers and/or Sam Folk-Williams could probably give you more information.

-Allison

Comment 9 Steven J. Levine 2011-06-14 18:42:49 UTC
Perry: I don't understand this. Why are we putting file system internal debugging information for file system/kernel developers into an end-user administration document rather than a technical brief? I can see why this is not a good fit for kbase, from this description, but for similar reasons it is not a good fit for the administration manual.

In Comment 6, above, I reproduce something Steve W. wrote last year about customers needing to understand some issues that might cause poor file system performance, but that's addressed in the existing section that he provided on node locking (which already seems pretty advanced, but at least it gives specific user-level advice on what to look for and how you might address it):

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-ov-lockbounce.html

But, as Dave Chinner noted at the time, tracepoints won't mean much unless you are familiar with the actual file system code.

This redefines the audience for this document. If we have defined a place for information of this sort -- in this case technical briefs -- why put it in administration documentation? Back when we first had the discussion there was a bit of logic of "Well, it has to go somewhere and there's no good place for it". In fact, I may have said that myself. But is that what technical briefs are for?

We are looking at creating a new tuning document as part of the documentation overhaul, but even a cluster tuning document doesn't seem to encompass file system development and debugging. What is it that tracepoints are used for, for somebody setting up a system? If the logic here is that "debugging" goes into the product document, what is it that's being debugged?  (That is: How the system is configured, or the file system code itself?) This seems to be kernel stuff.

As always, I could be misinterpreting what this is for and who will use it. Also I'm thinking that GFS2 may be its own species, regarding how to organize the documentation. Still, this seems kernel-level to me. Am I wrong about that?


-Steven

Comment 10 Perry Myers 2011-06-14 18:57:03 UTC
slevine: To be clear... I don't care where this information goes, so long as it is easily accessible and searchable by the customers.  If you think TechBrief is better place for this particular documentation, please coordinate with Allison and work out where this ultimately should live.

Comment 11 Sam Knuth 2011-06-15 12:21:32 UTC
All,

I think it would be helpful here to identify the audience and the purpose of these documents, as Steve is suggesting. I'm totally happy to have them be tech briefs. Tech Briefs cover a variety of areas, including specific use cases, setting up whole solutions, performance tuning, best practices, etc. Personally, I think having a comprehensive debugging guide in the product docs could be most appropriate. I definitely see Steve's point about confusing the audience of the administration guide. 

The trade off for tech briefs is that they are not systematically maintained for the life of the product, unlike the official documentation. On the other hand, we can produce them more rapidly and in an agile fashion. 

Perry/Sayan - what is the demand for these documents? Do we have a lot of customers asking for this information? Is this something that needs to be updated with every release, or is it OK to have it updated on demand? 

Sam

Comment 12 Perry Myers 2011-06-15 12:34:10 UTC
@Sam: Ric and SteveW would be better to ask, since this is specific to GFS which falls outside of my product (although it's very intertwined).  I've set needinfo on them.

Comment 13 Steve Whitehouse 2011-06-15 12:56:59 UTC
Let me see if I can answer some of the comment #11 questions...

The demand is basically so that our more technical customers can solve some of their own performance issues (and other) issues so that it will hopefully reduce the load on support.

The customers who ask for the information tend to do so indirectly - by asking for debugging info. At least to the best of my knowledge.

It will have to be updated from time to time since the tracepoints are not an API and we don't guarantee to keep them stable. Of course we won't change them if we don't have to, and we will try to keep them stable whenever possible. There will need to be an update for 6.2, for example, but there was no change from 6.0 to 6.1.

Comment 14 Sam Knuth 2011-06-15 16:03:50 UTC
OK I discussed with Steven L and a few other folks. Based on customer demand and the direction from Sayan and Perry, we agree this should be included in product documentation. However, it will be difficult to integrate it into the existing admin guide. Therefor we think including it as an appendix targeted at developers is probably the best way to go. 

Steven L will review the guide and the references it points to and evaluate further. 

Other options would be to create a new guide for developers, or do do significant re-writes/expansion of the material to be integrated with existing guide. The problem here is resources, so an appendix seems like the right compromise. 
-Sam

Comment 15 Steve Whitehouse 2011-06-16 14:38:46 UTC
That seems like a reasonable plan to me.

Is any more info required? I'm assuming that all it now settled at this point.

Comment 16 Steven J. Levine 2011-06-16 19:02:06 UTC
Steve: When I add this to the document I will need to be in touch with you about any edits or modifications I make, but my first-pass plan is to leave it in its current technical paper form, with a brief introduction to put in a context to indicate who this is for. I'll be asking for your advice/review when I put that together. I may have to do some reworking, but what I see here is that this will remain as is, self contained, almost as if we are publishing a standalone tech brief as an appendix to the document.

To expand on Sam's summary, and to keep my reasoning on record in this bug:

I still am not happy putting file system internals in an administration manual, but there is pretty much zero chance there will be an actual internals manual (that would require your/Bob's fulltime work for a while I think). Here I quote one of the system administrators who responded to my informal attempt last year to get a sense of this issue -- mind you this wasn't about GFS in particular and this was not a Red Hat customer and it was a completely informal comment but it's still interesting: If the system administrator needs to be explicitly concerned with file locks you have problems beyond what the documentation can solve. Meaning that this is below-the-covers stuff.

But If I understand your Comment 13 there are GFS customers who are not your more standard system administrators and who are working at the level of tracepoints and examining file system locks to debug issues with file systems. Debugging file system performance? Or other sorts of problems? (That's the sort of question I will be asking you when I write up the introduction to this planned appendix.)

In any case, as noted, the information has to go somewhere and while I do think making it a tech brief would cleanly solve the differing-audience issue (which I think is a major issue), I wasn't clear on the maintenance issue you note in comment 13, which is that the information is subject to change on a point-release basis. I think maintaining a tech brief on the customer portal site at each point release is something that could easily get lost, but I update the GFS2 manual on that basis as a matter of course and it's easy for you to do exactly as you have always done here: file a bug or let me know when there's something that's changing for a release and it gets monitored from there as part of our standard bugzilla procedure. It's the maintenance issue that convinces me that of our options, putting it as an appendix in the admin manual is probably the easiest course.

In sum: Nothing more required from you now but I will need your approval and review and perhaps some more contextual information when I move the information to the GFS2 manual.

Comment 17 Steve Whitehouse 2011-07-04 10:49:58 UTC
Plan sounds ok to me, let me know if you need anything from engineering at this stage.

Comment 18 Steven J. Levine 2011-07-29 15:10:09 UTC
This comment is a status update to this BZ.

I have added the tracepoints article as an appendix to my current working copy of the 6.2 gfs2 manual, reformatting accordingly. I sent Steve Whitehouse a list of small questions, to clarify and discuss a few things, and he has responded and provided information about a new supported tracepoint to add. I will incorporate his comments and then send the reformatted appendix for review in the RHEL 6.2 timeframe.

Comment 19 Steven J. Levine 2011-08-12 14:39:29 UTC
Status update: I have edited and formatted Steve Whitehouse's article on tracepoints into an appendix, which is in the current review draft of the GFS2 manual. I have sent Steve a note asking him to look this over. Once he gives his approval I will move this BZ to MODIFIED. I think we're pretty safe for a 6.2 release of the material in the document.