Bug 1564149 - Agree upon a coding standard, and automate check for this in smoke
Summary: Agree upon a coding standard, and automate check for this in smoke
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: project-infrastructure
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard: Process-Automation
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-05 13:30 UTC by Amar Tumballi
Modified: 2019-03-25 16:30 UTC (History)
11 users (show)

Fixed In Version: glusterfs-6.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-23 15:06:59 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Amar Tumballi 2018-04-05 13:30:31 UTC
# Description of problem:

Today, people spend time reviewing the patches for coding standard too. Which is IMHO a waste of time for all, but is critical to keep the readability in some consistency for the project.

With the availability of tools like `clang-format` etc, we should move toward automated check for patch.

# Additional Info:

It means we may have to format the whole project base again with 'clang-format' first. Also we have to come up with a agreed format spec file.

Comment 1 Jeff Darcy 2018-04-05 14:58:50 UTC
If we're going to make a big-bang format change, I'd like to propose that we also move toward shorter indentation. The standard at Facebook is two spaces, which my old eyes consider too narrow, but eight is unnecessarily wide. In particular, eight makes the 80-column rule (which I think is important for all sorts of ergonomic reasons) a bit awkward. Would anyone object to four?

Comment 2 Amar Tumballi 2018-04-05 15:13:58 UTC
How about keeping tabs, and making the 80line check on 4space for tab config? Also provide relevant vim and emacs config to people (may be in extras).

Comment 3 Xavi Hernandez 2018-04-05 15:49:04 UTC
I think it's ok to use 4 spaces for indentation. It helps prevent ugly wraps on some function calls with long names and/or arguments when they are indented.

Regarding the use of tabs, I would prefer to still use spaces instead of tabs. The traditional tab size is 8 spaces and this is used in (almost ?) all editors, so to prevent the code from being unreadable if someone uses an unconfigured editor, we should still use spaces, specially if we change to a 4 spaces indentation.

Comment 4 Shyamsundar 2018-04-05 19:07:25 UTC
Overall ack to the proposal.

Nice call-out by Jeff on any other change requiring a big-bang addressing to be a part of the same.

I can live with 4 spaces :), but no tabs (way too much discrepancy across editors as pointed out).

Comment 5 Jeff Darcy 2018-04-06 01:09:02 UTC
I think we're all in agreement on spaces vs. tabs. Tabs are eight spaces wide, full stop, end of story. It's kind of survivable with eight-space indents, but any other indent width becomes a total nightmare if tabs are involved.

Comment 6 Shreyas Siravara 2018-04-06 17:58:13 UTC
Whoever came up with this idea is a genius, and I totally ack it :)

Comment 7 Nigel Babu 2018-04-17 06:55:22 UTC
Alright. So we have agreement that this is a good idea. How do we want to implement this? As a check or as a pre-commit hook?

Comment 8 Amar Tumballi 2018-04-17 07:06:01 UTC
(In reply to Shreyas Siravara from comment #6)
> Whoever came up with this idea is a genius, and I totally ack it :)

All credits for this to Shreyas Siravara! As the original idea was proposed while we were walking for dinner on Boston's streets.

Comment 9 Amar Tumballi 2018-04-17 07:11:38 UTC
(In reply to Nigel Babu from comment #7)
> Alright. So we have agreement that this is a good idea. How do we want to
> implement this? As a check or as a pre-commit hook?

This is a multi-step process!

0. Team agrees on a style and a config file representing the style.
1. Commit the coding style guide to codebase and make changes in rfc.sh to use it.
2. 'gluster-ant' commits a single large patch for whole codebase with a standard clang-format style. (This should be only changes which happened due to clang-format, and no other changes should be in the patch. This can crash gerrit if we send it to review. 
  -> NOTE: This change can be as big as moving gluster repo from tla to git, as we have now 2 repositories, 'historic' and 'glusterfs' to understand the actual source of a line, if one needs 'git blame'.

3. Have the job ready to check the patch with the config file, on the server side (along with a pre-check in rfc.sh to warn people), this should be a Voting job in smoke.

4. We all live happily ever after.

Comment 10 Amar Tumballi 2018-04-17 09:00:49 UTC
(Update from Nigel)

Which format should we start with as base? Google/LLVM/Mozilla/Webkit/Chromium
       * [Chromium](https://gist.github.com/nigelbabu/667f3b626779bb14858aa4b520e67c18#file-chromium-c) 
        * [Mozilla](https://gist.github.com/nigelbabu/667f3b626779bb14858aa4b520e67c18#file-mozilla-c)
        * [LLVM](https://gist.github.com/nigelbabu/667f3b626779bb14858aa4b520e67c18#file-llvm-c)
        * [Webkit](https://gist.github.com/nigelbabu/667f3b626779bb14858aa4b520e67c18#file-webkit-c)
        * [Google](https://gist.github.com/nigelbabu/667f3b626779bb14858aa4b520e67c18#file-google-c)

When do we want to make this change? Before the 4.1 branching seems like a good time to make vast changes.

Comment 11 Jeff Darcy 2018-04-17 16:25:23 UTC
Whatever we pick is going to require some tweaking. Some of the things I notice:

 * All use two-space indents, which I personally find atrocious.

 * All put a function's type on the same line as its name.

 * All put the opening brace for a function on the last line of the arguments, instead of at the beginning of a separate line.

 * Mozilla, Webkit, and LLVM indent 'case' from 'switch'

 * None enforce a space between a function name and opening paren in calls.

 * Mozilla, LLVM, and Google all express pointers as "type* name" instead of the more familiar (in our code) "type *name"

So Chromium seems closest, but none are particularly close. Looking at the clang-format pages, here are some options I'd suggest for compatibility with our existing standard:

 !AllowShortIfStatementsOnASingleLine
 !AllowShortLoopsOnASingleLine
 BraceWrapping(!AfterControlStatement)
 BraceWrapping(AfterFunction)
 BraceWrapping(!BeforeElse)
 ColumnLimit(80)
 !IndentCaseLabels
 IndentWidth(8) or IndentWidth(4) depending on outcome of that conversation
 PointerAlignment(PAS_Right)
 SpaceBeforeParens(SBPO_Always)
 TabWidth(8)
 UseTab(UT_Never)

This fixes most of the warts, though I don't see an option to control function type and name on the same line and I didn't expect one to enforce braces around single statements (which would be nice).

BTW, yes, I know that our coding style is rather archaic. The fact that all of Clang's defined styles are so different than ours is testament to that. There are some who would argue that any "big bang" reformatting should modernize as many aspects as possible, including even function opening braces and (*shudder*) two-wide indents. I'm not necessarily opposed to such changes, but I think they need to be considered *individually* and not just adopted by accident. We already did that once, when we imported a check script from the Linux kernel without anything resembling proper review and it turned out to be wildly incompatible with our existing code. Let's not repeat that mistake.

Comment 12 Xavi Hernandez 2018-04-17 21:02:47 UTC
Using Chromium as base, basically the changes you propose seem ok. I would like to propose these additional changes:

* I prefer to use 4 spaces for indent. 2 seems too few to me.

* IndentCaseLabels = true. If we use 4 spaces for indent, we could consider indenting case labels to make them more readable and similar to other code blocks enclosed between braces.

* SpaceBeforeParens = ControlStatements. I think the additional space before parens on function calls doesn't provide any advantage (in fact none of the default configurations uses it).

* BinPackParameters = true.

* AlwaysBreakAfterReturnType = true. I think that separating return type from function name is useful in some places that use long function names or argument types.

* AlignEscapedNewLinesLeft=false. Otherwise it fills files with a lot of unnecessary spaces.

* AlignConsecutiveDeclarations=true. Using this option has the side effect that pointers are not correctly aligned as we currently do. Pointers will be rendered like this:

        int32_t var1 = 1;
        void *  var2 = NULL;

* AlignConsecutiveAssignments=true.

I'm not sure if the last two options provide any interesting advantage. So, if the pointer alignment is more important, we can disable them.

Comment 13 Jeff Darcy 2018-04-17 22:02:43 UTC
IndentCaseLabels: personally I don't like indenting case labels because they create a gap with no close brace - unless people put braces around cases, which is a bad habit for other reasons - but I know I'm probably outvoted on that one.

SpaceBeforeParens: I like the space between function name and arguments. The fact that none of the "modern" styles is not very compelling by itself (appeal to popularity) but I'm probably outvoted there too.

AlwaysBreakAfterReturnType: Yes. That's the option I was looking for. For people who use IDEs it probably doesn't matter, but having the function name start at the beginning of the line is very handy for those using non-IDE editors like vi(m). It also leaves more space for parameters, making long parameter lists - which we tend to have - more readable.

I almost don't have an opinion on the alignment options. For much of my career I would group the * with the type rather than the variable (i.e. PAS_Left) because it made more intuitive sense. On one hand, it seems like most of the world now agrees. On the other, it's not the way things are in Gluster now and hence requires a change in habits.

Comment 14 Amar Tumballi 2018-04-18 02:51:30 UTC
> On the other, it's not the way things are in Gluster now and hence requires a change in habits.

The whole point of this exercise is that we don't need to change habit, tool does it. Well, you have to get used to the change while reading code.

All the different formats, with 4 space indent is at https://github.com/nigelbabu/clang-format-sample 

I will request Nigel to run a sample with above argument before maintainers meeting today to have them compared to the same file.

Comment 15 Xavi Hernandez 2018-04-18 06:13:43 UTC
(In reply to Jeff Darcy from comment #13)
> IndentCaseLabels: personally I don't like indenting case labels because they
> create a gap with no close brace - unless people put braces around cases,
> which is a bad habit for other reasons - but I know I'm probably outvoted on
> that one.

The reason I proposed it is because if we don't indent labels, we have a brace after 'switch' which is not followed by indentation, as it normally happens, but we'll still have indentation of the code inside the case label without braces (I also don't like the braces inside cases):

    switch (select) {
    case VALUE1:
            some_code();
            break;
    }

(Visually, the closing brace seems to come from 'case' and not from 'switch')

I think it's visually simpler to indent the cases to make it clear they belong to the switch case. This way it's easy to see where the switch starts and ends.

    switch (select) {
        case VALUE1:
            some_code();
            break;
    }

Note that with the reduction of tabs from 8 to 4, the code is at the same indentation, so the gap with the closing brace is not big. The only change is the case label itself.

> I almost don't have an opinion on the alignment options. For much of my
> career I would group the * with the type rather than the variable (i.e.
> PAS_Left) because it made more intuitive sense. On one hand, it seems like
> most of the world now agrees. On the other, it's not the way things are in
> Gluster now and hence requires a change in habits.

I don't have a strong preference here either. I've never used the PAS_Right alignment before working on gluster, so I can adapt to whatever alignment is preferred. I only wanted to point out that if we want to go with PAS_Right, it currently has an issue in combination with AlignConsecutiveDeclarations.

Comment 16 Jeff Darcy 2018-04-18 15:06:36 UTC
FWIW, since this came up in the maintainers' meeting...

The inconsistency of style has had a measurable negative impact on ramp-up time for new developers at Facebook. I'm sure the same has been true of new developers at Red Hat and elsewhere. This impact manifests in several ways:

(1) Every developer has to spend time adjusting editor settings to have code not seem randomly indented in one set of files, then again for another set of files which violate our existing standard in a different way. Readability matters. Also, when they're *writing* code, each developer has to tweak settings some more because our standards are so unlike any other.

(2) Many aspects of our old standard (to which most code still conforms) are simply alien to someone who learned C in the last ten years. Again, readability matters. It might not seem like a big deal to read code that's formatted "strangely" and indeed it's a good skill for all developers to have, but on top of everything else it slows down new contributors.

(3) Time is wasted addressing these issues in code reviews, especially when a patch is submitted to code that was already in violation of all standards and common sense.

Lastly, I'll add that the combination of eight-column indents, 80-column limit, and gigantic functions nested ten levels deep has led to a lot of code wedged up against the right margin. Relaxing the 80-column limit would be ill advised in light of real human-factors studies showing that anything more than 80-100 columns (results vary) makes it harder to scan from one line to the next. Of the other two factors, the indent width can be changed with a reformat. The gigantic functions and deep nesting are terrible habits, but those habits aren't likely to change any time soon. Even if the only change was to reduce indentation and spurious line breaks, I think a reformat would be justified.

Comment 17 Amar Tumballi 2018-04-19 06:39:04 UTC
Thanks for these pointers Jeff, makes the case even more critical to work towards.

Shyam kindly agreed to make this a blocker for 4.1 branching out, which means, we have to come to agreement soon.

While many standards followed by gluster was due to having a look at some of the projects which were active then, I am fine with options which makes the project get closer to more generally adopted standards, while keeping some of our developer's concerns (if anyone shows any). Till now, I see Xavi and you showing interest in this.

Can we plan to have a meeting/call on Monday morning EST/Evening IST? And come to agreement?

@Nigel, meantime we need a job to sanitize the patch with the same file.

Comment 18 Jeff Darcy 2018-04-19 13:59:03 UTC
Monday morning EST would be fine.

Comment 19 Xavi Hernandez 2018-04-20 11:43:41 UTC
I made a mistake on one of the proposed values:

AlignEscapedNewLinesLeft=false

What I wanted to propose was to set it to 'true' (which matches the explanation I gave).

Comment 20 Jeff Darcy 2018-04-24 14:54:53 UTC
FWIW, I've posted a sample of the result for my team at Facebook, and will forward any feedback.

Comment 21 Jeff Darcy 2018-04-24 17:16:38 UTC
The feedback so far has been strongly positive, but one issue has been raised. As Xavi mentioned, the PointerAlignment doesn't seem to work properly on lines affected by AlignConsecutive*. AFAICT it's trying to center the asterisk, perhaps using PointerAlignment to decide left/right bias in case of a tie, but still leaving undesirable spaces between the asterisk and the variable name. Awkward. If these options are incompatible, which do *we* believe should take precedence?

Comment 22 Xavi Hernandez 2018-04-24 17:49:17 UTC
Some issues I've seen:

* Some initializations are formatted like this:

        char      gfid_local[GF_UUID_BUF_SIZE] = {
            0,
        };

I've tried to change AllowShortBlocksOnASingleLine=true, but no change here.

* Braces after functions are not put into a new line. It seems that BreakBeforeBraces=Attach has precedence to BraceWrapping.AfterFunction=true.

* AlignConsecutiveAssignments=true aligns some assignments inside a function (not only variables initialization at the beginning). I think this is not necessary.

* Some assignments are rendered like this:

    meta_dst->size =
        hton64(ntoh64(meta_dst->size) + ntoh64(meta_src->size));
    meta_dst->file_count =
        hton64(ntoh64(meta_dst->file_count) + ntoh64(meta_src->file_count));

I think we should increase PenaltyBreakAssignment to prevent this. I've chosen a random value (200) and this is the result:

    meta_dst->size = hton64(ntoh64(meta_dst->size) +
                            ntoh64(meta_src->size));
    meta_dst->file_count = hton64(ntoh64(meta_dst->file_count) +
                                  ntoh64(meta_src->file_count));

I'll create a new pull request with these changes.

Comment 23 Xavi Hernandez 2018-04-24 17:52:17 UTC
The change will also include some of the controversial options:

IndentCaseLabels=true
SpaceBeforeParens=ControlStatements

What should we do with them ?

Comment 24 Jeff Darcy 2018-04-25 00:30:14 UTC
Personally I'm willing to go with the flow on those. I've laid out my reasoning, other people seem less inclined to compromise, I think we all have better things to do than turn it into a pitched battle.

Comment 25 Xavi Hernandez 2018-04-25 03:47:52 UTC
I don't want to start any battle. It was requested to create pull requests with the proposed changes and that's what I've done. I've not hidden that some of the changes are debatable so that everyone can give his point of view. If they are not accepted it's ok for me.

Comment 26 Aravinda VK 2018-04-25 04:12:20 UTC
(In reply to Xavi Hernandez from comment #23)
> The change will also include some of the controversial options:
> 
> IndentCaseLabels=true



> SpaceBeforeParens=ControlStatements

+1 for only ControlStatements

> 
> What should we do with them ?

Comment 27 Aravinda VK 2018-04-25 04:21:06 UTC
(In reply to Xavi Hernandez from comment #19)
> I made a mistake on one of the proposed values:
> 
> AlignEscapedNewLinesLeft=false
> 
> What I wanted to propose was to set it to 'true' (which matches the
> explanation I gave).

I prefer Right(`AlignEscapedNewlines=Right`), since Patch will have minimum number of lines change on modification. For example,

#define gf_log(dom, levl, fmt...) do {                          \
                FMT_WARN (fmt);                                 \
                _gf_log (dom, __FILE__, __FUNCTION__, __LINE__, \
                         levl, ##fmt);                          \
        } while (0)

If longest line is changed then patch diff shows all of them as changed lines in case of `AlignEscapedNewLines=Left`

Comment 28 Xavi Hernandez 2018-04-25 04:27:12 UTC
(In reply to Aravinda VK from comment #27)
> I prefer Right(`AlignEscapedNewlines=Right`), since Patch will have minimum
> number of lines change on modification. For example,
> 
> #define gf_log(dom, levl, fmt...) do {                          \
>                 FMT_WARN (fmt);                                 \
>                 _gf_log (dom, __FILE__, __FUNCTION__, __LINE__, \
>                          levl, ##fmt);                          \
>         } while (0)
> 
> If longest line is changed then patch diff shows all of them as changed
> lines in case of `AlignEscapedNewLines=Left`

That makes sense. It's ok for me.

Comment 29 Nigel Babu 2018-04-25 06:04:00 UTC
Um, so I've been accepting the pull requests that come in more or less mostly so we can see what the output looks like. I'm going to wait for 24h for each of the remaining pull requests for comments.

Comment 30 Xavi Hernandez 2018-04-27 06:41:38 UTC
(In reply to Jeff Darcy from comment #24)
> Personally I'm willing to go with the flow on those. I've laid out my
> reasoning, other people seem less inclined to compromise, I think we all
> have better things to do than turn it into a pitched battle.

Besides IndentCaseLabels and SpaceBeforeParens, do you agree with the other changes ? If you are ok with them, I'll update the pull request without the conflicting ones and, if necessary, push them in a future change.

We also need to decide if we prefer right pointer alignment or declarations alignment, since they seem to be incompatible.

Comment 31 Yaniv Kaul 2018-06-26 14:38:04 UTC
(In reply to Nigel Babu from comment #29)
> Um, so I've been accepting the pull requests that come in more or less
> mostly so we can see what the output looks like. I'm going to wait for 24h
> for each of the remaining pull requests for comments.

Nigel, what's the latest on this?

Comment 32 Nigel Babu 2018-06-26 15:10:51 UTC
We were blocked on Fedora builders. We procured them 2 weeks ago and they've had some time to settle in. Infra-wise, we're nearly ready to go. I have a job half complete. I'll try to get the Jenkins job complete this week. Then it's waiting on us agreeing on the exact config file on https://github.com/nigelbabu/clang-format-sample that needs to be then checked in and used in the Jenkins jobs.

Comment 33 Yaniv Kaul 2018-06-27 07:32:05 UTC
(In reply to Nigel Babu from comment #32)
> We were blocked on Fedora builders. We procured them 2 weeks ago and they've
> had some time to settle in. Infra-wise, we're nearly ready to go. I have a
> job half complete. I'll try to get the Jenkins job complete this week. Then
> it's waiting on us agreeing on the exact config file on
> https://github.com/nigelbabu/clang-format-sample that needs to be then
> checked in and used in the Jenkins jobs.

Static code analysis can run on VMs even, but I'm glad to see where are good with resources.

As for agreeing - let's go with small, incremental wins - we can get in whatever everyone (or majority) seem to agree with, and incrementally add more checks in steps.

Comment 34 Nigel Babu 2018-06-28 11:21:41 UTC
Jenkins job is ready: https://review.gluster.org/#/c/20418/

Cannot merge this until we have a config file as it's a voting job.

Yaniv, just to be clear, these machines are still VMs :) We only have 4 physical machines. We spin up VMs on them.

Comment 35 Worker Ant 2018-08-22 07:03:29 UTC
REVIEW: https://review.gluster.org/20892 (clang-format: add the config file) posted (#1) for review on master by Amar Tumballi

Comment 36 Worker Ant 2018-09-12 11:49:17 UTC
COMMIT: https://review.gluster.org/20892 committed in master by "Nigel Babu" <nigelb> with a commit message- clang-format: add the config file

Also update the required documents and scripts to enable clang-format

Change-Id: I73aae6db06c2f732a1779d59a73bc05e28beafba
updates: bz#1564149
Signed-off-by: Amar Tumballi <amarts>

Comment 37 Worker Ant 2018-09-12 13:08:43 UTC
REVIEW: https://review.gluster.org/21164 (template files: revert clang) posted (#1) for review on master by Amar Tumballi

Comment 38 Worker Ant 2018-09-12 13:36:58 UTC
COMMIT: https://review.gluster.org/21164 committed in master by "Amar Tumballi" <amarts> with a commit message- template files: revert clang

Change-Id: If3925191d23afe83cbbdbc3cf0554c0a9c76d043
updates: bz#1564149
Signed-off-by: Amar Tumballi <amarts>

Comment 39 Amar Tumballi 2018-09-12 13:47:58 UTC
This still needs work:

1. Need to identify and rename the 'template' files, which are not directly compiled by gcc. It shouldn't have `.c` as the type. (or `.h` for that matter).

2. clang-format job should still validate only files which are ! contrib.

3. ./rfc.sh changes makes the change on top most patch locally, but doesn't commit it if there are any changes after clang-format. Need to make it commit it automatically before submit.

Comment 40 Worker Ant 2018-09-17 11:51:50 UTC
REVIEW: https://review.gluster.org/21193 (xlators/experimental: move template files to '.c.in' type) posted (#1) for review on master by Amar Tumballi

Comment 41 Nithya Balachandran 2018-09-18 08:42:36 UTC
Is there a clang-format package for RHEL?

Comment 42 Worker Ant 2018-10-05 03:41:19 UTC
REVIEW: https://review.gluster.org/21351 (rfc.sh: fix the error in case clang-format is not present) posted (#1) for review on master by Amar Tumballi

Comment 43 Worker Ant 2018-10-05 10:13:09 UTC
REVIEW: https://review.gluster.org/21193 (xlators/experimental: move template files to '.c.in' type) posted (#5) for review on master by Xavi Hernandez

Comment 44 Shyamsundar 2018-10-23 15:06:59 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 45 Shyamsundar 2019-03-25 16:30:19 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.