architecture Figure 1. Overall architecture of RepoSense


Parser contains three components:

  • ArgsParser: Parses the user-supplied command line arguments into a CliArguments object. RunConfigurationDecider then gets the appropriate RunConfiguration for the CliArguments which generates the appropriate Config files.
  • CsvParser: Abstract generic class for CSV parsing functionality. The following three classes extend CsvParser.
    • AuthorConfigCsvParser: Parses the author-config.csv config file into a list of AuthorConfiguration for each repository to analyze.
    • GroupConfigCsvParser: Parses the group-config.csv config file into a list of GroupConfiguration for each repository to analyze.
    • RepoConfigCsvParser: Parses the repo-config.csv config file into a list of RepoConfiguration for each repository to analyze.
  • JsonParser: Abstract generic class for JSON parsing functionality. The following class extends JsonParser class:


Git package contains the wrapper classes for respective git commands.

  • GitBlame: Wrapper class for git blame functionality. Traces the revision and author last modified each line of a file.
  • GitBranch: Wrapper class for git branch functionality. Gets the name of the working branch of the target repo.
  • GitCatFile: Wrapper class for git cat-file functionality. Obtains the parent commit hash with the given commit indicated by the commit hash.
  • GitCheckout: Wrapper class for git checkout functionality. Checks out the repository by branch name or commit hash.
  • GitClone: Wrapper class for git clone functionality. Clones the repository from the given URL or local directory into a temporary folder in order to run the analysis.
  • GitDiff: Wrapper class for git diff functionality. Obtains the changes between commits.
  • GitLog: Wrapper class for git log functionality. Obtains the commit logs and the authors' info.
  • GitRevList: Wrapper class for git rev-list functionality. Retrieves the commit objects in reverse chronological order.
  • GitRevParse: Wrapper class for git rev-parse functionality. Ensures that the branch of the repo is to be analyzed exists.
  • GitShortlog: Wrapper class for git shortlog functionality. Obtains the list of authors who have contributed to the target repo.
  • GitShow: Wrapper class for git show functionality. Gets the date of the commit with the commit hash.
  • GitUtil: Contains helper functions used by the other Git classes above.
  • GitVersion: Wrapper class for git --version functionality. Obtains the current Git version of the environment that RepoSense is being run on.

Note that when constructing new commands containing path arguments, use the StringsUtil::addQuotesForFilePath method to safely convert a Java string into an equivalent Bash/CMD argument.


CommitsReporter is responsible for analyzing the commit history and generating a CommitContributionSummary for each repository. CommitContributionSummary contains information such as each author's daily and weekly contribution and the variance of their contribution. CommitsReporter

  1. uses CommitInfoExtractor to run the git log command, which generates each commit's statistics within the date range.
  2. generates a CommitInfo for each commit, which contains the infoLine and statLine.
  3. uses CommitInfoAnalyzer to extract the relevant data from CommitInfo into a CommitResult, such as the number of line insertions and deletions in the commit and the author of the commit.
  4. uses CommitResultAggregator to aggregate all CommitResult into a CommitContributionSummary.


AuthorshipReporter is responsible for analyzing the whitelisted files, traces the original author for each line of text/code, and generating an AuthorshipSummary for each repository. AuthorshipSummary contains the analysis results of the whitelisted files and the number of line contributions each author made. AuthorshipReporter

  1. uses FileInfoExtractor to traverse the repository to find all relevant files.
  2. generates a FileInfo for each relevant file, which contains the path to the file and a list of LineInfo representing each line of the file.
  3. uses FileInfoAnalyzer to analyze each file, using git blame or annotations, and finds the Author for each LineInfo.
  4. generates a FileResult for each file, which consolidates the authorship results into a Map of each author's line contribution to the file.
  5. uses FileResultAggregator to aggregate all FileResult into an AuthorshipSummary.



  1. clones repositories using the GitClone API in a multi-threaded fashion.
    • By default, 4 threads are used for cloning; the number of threads can be specified using the CLI argument --cloning-threads <threads>.
  2. analyzes the repositories using the CommitReporter and AuthorshipReporter in a multi-threaded fashion.
    • First, copies the template files into the designated output directory.
    • Then, uses CommitReporter and AuthorshipReporter to produce the commit and authorship summary, respectively.
    • By default, the number of threads used for analysis is equal to the number of CPU cores available; the number of threads can be specified using the CLI argument --analysis-threads <threads>.
  3. generates the JSON files needed to generate the HTML report.


System contains the classes that interact with the Operating System and external processes.

  • CommandRunner creates processes that execute commands on the terminal. It consists of many git commands.
  • LogsManager uses the java.util.logging package for logging. The LogsManager class is used to manage the logging levels and logging destinations. Log messages are output through: Console and to a .log file.
  • ReportServer starts a server to display the report on the browser. It depends on the net.freeutils.httpserver package.


Model holds the data structures that are commonly used by the different aspects of RepoSense.

  • Author stores the Git ID of an author. Any contributions or commits made by the author, using his/her Git ID or aliases, will be attributed to the same Author object. AuthorshipReporter and CommitsReporter use it to attribute the commit and file contributions to the respective authors.
  • CliArguments stores the parsed command-line arguments supplied by the user. It contains the configuration settings such as the CSV config file to read from, the directory to output the report to, and the date range of commits to analyze. These configuration settings are passed into RepoConfiguration.
  • FileTypeManager stores the file format to be analyzed and the custom groups specified by the user for any repository.
  • RepoConfiguration stores the configuration information from the CSV config file for a single repository: the repository's organization, name, branch, list of authors to analyze, date range to analyze commits, and files from CliArguments. This configuration information is used by:
    • GitClone to determine the location to clone the repository from and which branch to check out to.
    • AuthorshipReporter and CommitsReporter to determine the range of commits and files to analyze.
    • ReportGenerator to determine the directory to output the report.