You are looking at the user documentation for the most recent master
branch of RepoSense (not released to the public yet). The documentation for the latest public release is here.
Given below are the details of the various config files used by RepoSense.
RepoSense ignores the order of column headers in the CSV config files. However, the spelling of the column headers must match the specified headers exactly, or the column would be ignored by RepoSense.
A value in a config file is optional to provide unless it is specified as mandatory.
repo-config.csv
repo-config.csv
file contains repo-level config data. Each row represents a repository's configuration (example).
Column Name | Explanation |
---|---|
Repository's Location mandatory | The Remote Repo URL or Disk Path to the Git repository e.g., https://github.com/foo/bar.git or C:\Users\user\Desktop\GitHub\foo\bar |
Branch | The branch to analyze in the target repository e.g., master . Default: the default branch of the repo |
File formats*+ | The file extensions to analyze. Binary file formats, such as png and jpg , will be automatically labelled as the file type binary in the generated report. Default: all file formats |
Find Previous Authors | Enter yes to utilize Git blame's ignore revisions functionality, RepoSense will attempt to blame the line changes caused by commits in the ignore commit list to the previous authors who altered those lines (if available). |
Ignore Glob List*+ | The list of file path globs to ignore during analysis for each author e.g., test/**;temp/** . Refer to the glob format for the path glob syntax. |
Ignore standalone config | To ignore the standalone config file (if any) in target repository, enter yes . If the cell is empty, the standalone config file in the repo (if any) will take precedence over configurations provided in the csv files. |
Ignore Commits List*+ | The list of commits to ignore during analysis. For accurate results, the commits should be provided with their full hash. Additionally, a range of commits can be specified using the .. notation e.g. abc123..def456 (both inclusive). |
Ignore Authors List*+ | The list of authors to ignore during analysis. Authors should be specified by their Git Author Name. |
Shallow Cloning | Enter yes to clone the repository using Git's shallow cloning functionality. This option can significantly reduce the time taken to clone large repositories. However, the option should ideally be disabled for smaller repositories where the .git file is smaller than 500 MB, as it would create overhead. |
File Size Limit+ | Enter a file size limit for the repository in bytes as a single number without units (for a size limit of 1MB for example, enter 1000000). This file size limit will override the default file size limit (500KB). Files exceeding the file size limit will be marked as ignored and only the file name and line count will be reflected in the report. |
Ignore File Size Limit | Enter yes to ignore both the default file size limit and the file size limit possibly set by the user in repo-config.csv . |
Skip Ignored File Analysis | Enter yes to ignore analysis of files exceeding the file size limit entirely. If file analysis is skipped, all information about the file will be omitted from the generated report. This option can significantly improve report generation time. |
* Multi-value column: multiple values can be entered in this column using a semicolon ;
as the separator.
+ Overrideable column: prepend with override:
to use entered value(s) instead of value(s) from standalone config.
When using standalone config (if it is not ignored), it is possible to override specific values from the standalone config by prepending the entered value with override:
.
author-config.csv
Optionally, you can use an author-config.csv
(which should be in the same directory as the repo-config.csv
file) to provide more details about the authors to analyze (example). It should contain the following columns:
Column Name | Explanation |
---|---|
Repository's Location* | Same as repo-config.csv . Default: all the repos in repo-config.csv . |
Branch | The branch to analyze for this author, e.g., master . Default: the author will be bound to all the repos in repo-config.csv that has the same repo's location, regardless of branch. |
Author's Git Host ID# mandatory | Username of the target author's profile on GitHub, GitLab or Bitbucket, e.g.JohnDoe . |
Author's Emails* | Associated emails of the author. For GitHub users, this can be found in your GitHub settings. |
Author's Display Name | The name to display for the author. Default: author's username. |
Author's Git Author Name* | The meaning of Git Author Name is explained in A note about Git author name. |
Ignore Glob List* | Files to ignore for this author, in addition to files ignored by the patterns specified in repo-config.csv . The path glob syntax is the same as that of Ignore Glob List in repo-config.csv . |
* Multi-value column: multiple values can be entered in this column using a semicolon ;
as the separator.
# For backward compatibility, Author's GitHub ID
is still accepted as the header in place of Author's Git Host ID
.
If author-config.csv
is not given and the repo has not provided author details in a standalone config file, all the authors of the repositories within the date range specified (if any) will be analyzed.
author-config.csv
has some advanced syntax that can help to reduce duplication of the author's information. The syntax guide is given below:
group-config.csv
Optionally, you can provide a group-config.csv
(which should be in the same directory as repo-config.csv
file) to provide details on any custom groupings for files in specified repositories (example). It should contain the following columns:
Column Name | Explanation |
---|---|
Repository's Location | Same as repo-config.csv . Default: all the repos in repo-config.csv |
Group Name mandatory | Name of the group, e.g.,test . |
Globs * mandatory | The list of file path globs to include for specified group, e.g.,**/test/*;**.java . |
* Multi-value column: multiple values can be entered in this column using a semicolon ;
as the separator.
Note that a file in a given repository should only be tagged to one group.
e.g.: example.java
in example-repo
can either be in the test
group or the code
group, but not in both test
and code
group. If multiple groups are specified for a given file, the latter group (i.e., code
group) is set for the file.
report-config.json
You can optionally use report-config.json
to customize report generation by providing the following information. (example)
Fields to provide:
title
: Title of the generated report, which is also the title of the deployed dashboard. Default: "RepoSense Report"config.json
(standalone config file)Repo owners can provide the following additional information to RepoSense using a config file that we call the standalone config file:
To use this feature, add a _reposense/config.json
to the root of your repo using the format in the example below (another example) and commit it (reason: RepoSense can see committed code only):
{
"ignoreGlobList": ["about-us/**", "**index.html"],
"formats": ["html", "css"],
"ignoreCommitList": ["90018e49f129ce7e0abdc8b18e91c9813588c601", "67890def", "abc123..def456"],
"ignoreAuthorList": ["charlie"],
"authors":
[
{
"gitId": "alice",
"emails": ["alice@example.com", "alicet@example.com"],
"displayName": "Alice T.",
"authorNames": ["AT", "A"],
"ignoreGlobList": ["**.css"]
},
{
"gitId": "bob"
}
]
}
Note: all fields are optional unless specified otherwise.
Fields to provide repository-level info:
ignoreGlobList
: Folders/files to ignore, specified using the glob format.formats
: File formats to analyze. Binary file formats, such as png
and jpg
, will be automatically labelled as the file type binary
in the generated report. Default: all file formatsignoreCommitList
: The list of commits to ignore during analysis. For accurate results, the commits should be provided with their full hash. Additionally, a range of commits can be specified using the ..
notation e.g. abc123..def456
(both inclusive).ignoreAuthorList
: The list of authors to ignore during analysis. Authors specified in authors
field or author-config.csv
will be also be omitted if they are in this list. Authors should be specified by their Git Author Name.fileSizeLimit
: A file size limit for the repository in bytes as a single number without units, that will override the default file size limit. If not specified, the default file size limit will continue to be used.Fields to provide author-level info:
Note: authors
field should contain all authors that should be captured in the analysis.
gitId
: Username of the author. mandatory field.emails
: Associated Git emails of the author. For GitHub, this can be found in your GitHub settings.displayName
: Name to display on the report for this author.authorNames
: Git Author Name(s) used in the author's commits. By default, RepoSense assumes an author would use their remote Git Host username as the Git username too. The meaning of Git Author Name is explained in A note about Git author name.ignoreGlobList
: Additional (i.e. on top of the repo-level ignoreGlobList
) folders/files to ignore for a specific author. The path glob syntax is specified by the glob format. In the example above, the actual ignoreGlobList
for alice
would be ["about-us/**", "**index.html", "**.css"]
.To verify your standalone configuration is as intended, add the _reposense/config.json
to your local copy of repo and run RepoSense against it as follows:
java -jar RepoSense.jar --repo LOCAL_REPO_LOCATION
java -jar RepoSense.jar --repo c:/myRepose/foo/bar
Git Author Name
refers to the customizable author's display name set in the local .gitconfig
file. For example, in the Git Log's display:
...
commit cd7f610e0becbdf331d5231887d8010a689f87c7
Author: ConfiguredAuthorName <author@example.com>
Date: Fri Feb 9 19:14:41 2018 +0800
Make some changes to show my new author's name
commit e3f699fd4ef128eebce98d5b4e5b3bb06a512f49
Author: ActualGitHostId <author@example.com>
Date: Fri Feb 9 19:13:13 2018 +0800
Initial commit
...
ActualGitHostId
and ConfiguredAuthorName
are both Git Author Name
of the same author.
To find the author name that you are currently using for your current Git repository, run the following command within your Git repository:
git config user.name
To set the author name to the value you want (e.g., to set it to your GitHub username) for your current Git repository, you can use the following command (more info):
git config user.name "YOUR_AUTHOR_NAME”
To set the author name to use a default value you want for future Git repositories, you can use the following command:
git config --global user.name "YOUR_AUTHOR_NAME”
RepoSense expects the Git Author Name to be the same as author's username on the Git hosting platform (GitHub, GitLab, BitBucket). If an author's Git Author Name
is different from their username on the Git hosting platform, the Git Author Name
needs to be specified in the standalone config file. If the author has more than one Git Author Name
, multiple values can be entered too.
Note: Symbols such as "
, !
, /
etc. in your author name will be omitted, which may reduce the accuracy of the analysis if 2 names in the repository are approximately similar.
blurbs.md
You can optionally use blurbs.md
to add blurbs in Markdown syntax for repository branches. These blurbs will be seen when grouping by Repo/Branch
. (example)
Format:
<!--repo-->
. Everything on the line after the delimiter will be ignored.