repo_stats API

citation_metrics

class repo_stats.citation_metrics.ADSCitations(token, cache_dir)[source]

Class for getting, processing and aggregating citation data from the NASA ADS database for a given set of papers.

Parameters:
  • token (str) – Authorization token for ADS queries

  • cache_dir (str, default=None) – Path to directory that will be populated with caches of citation data

aggregate_citations(bibcode, metric='bibcode, pubdate, pub, author, title')[source]

Get, process and aggregate citation data in ‘metric’ for all papers in ‘bibcode’.

Parameters:
  • bibcode (str or list of str) – Bibcode identifier(s) of the paper(s) being cited, e.g., “2013A&A…558A..33A”

  • metric (str, default="bibcode, pubdate, pub, author, title") – Metrics to return for each citation

Returns:

all_stats – Individual and aggregated citation statistics across all papers in ‘bibcode’

Return type:

dict

get_citations(bib, metric)[source]

Get citation data for a paper with the identifier ‘bib’ by querying the ADS API.

Parameters:
  • bib (str) – Bibcode identifier of the paper being cited, e.g., “2013A&A…558A..33A”

  • metric (str) – Metrics to return for each citation to the paper, e.g. “bibcode, pubdate, pub, author, title”

Returns:

all_cites – For each citation to the paper ‘bib’, a dictionary of ‘metric’ data

Return type:

list of dict

process_citations(citations)[source]

Process (obtain statistics for) citation data in ‘citations’.

Parameters:

citations (list of dict) – Dictionary of data for each citation to the reference paper

Returns:

stats

Citation statistics:
  • ’cite_all’: total number of citations

  • ’cite_year’: citations in current year

  • ’cite_month’: citations in previous month

  • ’cite_per_year’: citations per year

  • ’cite_bibcodes’: bibcodes of all citations

Return type:

dict

git_metrics

class repo_stats.git_metrics.GitMetrics(token, repo_owner, repo_name, cache_dir)[source]

Class for getting and processing repository data (commit history, issues, pull requests, contributors) from GitHub for a given repository.

Parameters:
  • token (str) – Authorization token for GitHub queries

  • repo_owner (str) – Owner (or organization) of repository on GitHub

  • repo_name (str) – Name of repository on GitHub

  • cache_dir (str, default=None) – Path to directory that will be populated with caches of git data

get_age(date)[source]

Get the ‘datetime’ age of a string ‘date’.

Parameters:

date (str) – Dates with assumed string format “2024-01-01…”

Returns:

age – Age of the item (int if ‘days_since’ is True)

Return type:

‘datetime.timedelta’ instance or int

get_commits()[source]

Obtain the commit history for a repository with ‘git log’, and parse the output.

Returns:

all_items – A dictionary entry for each commit in the history, including the identifiers below in ‘query’

Return type:

list of dict

get_commits_via_git_log(repo_local_path)[source]

Obtain the commit history for a repository with ‘git log’ and a local copy of the repository; and parse the output.

Parameters:

repo_local_path (str) – Path to local copy of repository

Returns:

  • dates (list of str) – Date of each commit

  • author_commits (dict) – Keys are the authors and the value is a list of the commits they have contributed

get_issues_prs(item_type)[source]

Obtain the issue or pull request history for a GitHub repository by querying the GraphQL API.

Parameters:

item_type (str) – One of [‘issues’, ‘pullRequests’] to obtain the corresponding history

Returns:

all_items – A dictionary entry for each issue or pull request in the history, including the identifiers below in ‘query’

Return type:

list of dict

parse_log_line(line)[source]

Break an individual ‘git log’ line ‘line’ into its component parts (commit hash, date, author).

Parameters:

line (str) – Dates with assumed string format “2024-01-01…”

Returns:

parsed – The commit’s hash, date, author

Return type:

list of str

process_commits(results, age_recent=90)[source]

Process (obtain statistics for) git commit data.

Parameters:
  • results (list of dict) – A dictionary entry for each commit in the history (see Git_metrics.get_commits)

  • age_recent (int, default=90) – Days before present used to categorize recent commit statistics

Returns:

stats

Commit statistics:
  • ’age_recent_commit’: the input arg ‘age_recent’

  • ’unique_authors’: each commit author, their number of commits and index of first commit

  • ’new_authors’: list of authors with their first commit in ‘age_recent’

  • ’n_recent_authors’: number of authors with commits in ‘age_recent’

  • ’authors_per_month’: number of commit authors per month, over time

  • ’new_authors_per_month’: number of new commit authors per month, over time

  • ’multi_authors_per_month’: number of commit authors per month with >1 commit that month, over time

Return type:

dict

process_issues_prs(results, items, labels, age_recent=90)[source]

Process (obtain statistics for) and aggregate issue and pull request data in ‘results’.

Parameters:
  • results (list of dict) – A dictionary entry for each issue or pull request in the history (see git_metrics.get_issues_prs)

  • items (list of str) – Names for the dictionary entries in the return ‘issues_prs’

  • labels (list of str) – GitHub labels (those added to an issue or pull request) to obtain additional statistics

  • age_recent (int, default=90) – Days before present used to categorize recent issue and pull request statistics

Returns:

issues_prs

Statistics for issues and separately for pull requests:
  • ’age_recent’: the input arg ‘age_recent’

  • ’recent_open’: number of items (issues or pull requests) opened in ‘age_recent’

  • ’recent_close’: number of items closed in ‘age_recent’

  • ’open_per_month’: number of items opened per month, over time

  • ’close_per_month’: number of items closed per month, over time

  • ’label_open’: the input arg ‘labels’ and the number of currently open items with each label

Return type:

list of dict

plot

repo_stats.plot.author_time_plot(commit_stats, repo_owner, repo_name, cache_dir, window_avg=7)[source]

Plot repository commit authors over time.

Parameters:
  • commit_stats (dict) – Dictionary including commit statistics. See git_metrics.Gits.process_commits()

  • repo_owner (str) – Owner of repository (for labels)

  • repo_name (str) – Name of repository (for labels and figure savename)

  • cache_dir (str) – Name of directory in which to cache figure

  • window_avg (int, default=7) – Number of months for rolling average of commit data. Enforced to be odd.

Returns:

fig – The generated figure

Return type:

plt.figure instance

repo_stats.plot.citation_plot(cite_stats, repo_name, cache_dir, names=None)[source]

Plot citations to referenced papers over time.

Parameters:
  • cite_stats (dict) – Dictionary including citation statistics. See citation_metrics.Cites.aggregate_citations()

  • repo_name (str) – Name of repository (for labels and figure savename)

  • cache_dir (str) – Name of directory in which to cache figure

  • names (list of str, optional) – Name of referenced papers (for plot legend)

Returns:

fig – The generated figure

Return type:

plt.figure instance

repo_stats.plot.open_issue_pr_plot(issue_pr_stats, repo_name, cache_dir)[source]

Plot a bar chart of a repository’s currently open issues and pull requests.

Parameters:
  • issue_pr_stats (list of dict) – Statistics for issues and pull requests (see git_metrics.Gits.process_issues_prs)

  • repo_name (str) – Name of repository (for labels and figure savename)

  • cache_dir (str) – Name of directory in which to cache figure

Returns:

fig – The generated figure

Return type:

plt.figure instance

repo_stats.plot.issue_pr_time_plot(issue_pr_stats, repo_owner, repo_name, cache_dir, window_avg=7)[source]

Plot a repository’s number of issues and pull requests open and closed over time.

Parameters:
  • issue_pr_stats (list of dict) – Statistics for issues and pull requests (see git_metrics.Gits.process_issues_prs)

  • repo_owner (str) – Owner of repository (for labels)

  • repo_name (str) – Name of repository (for labels and figure savename)

  • cache_dir (str) – Name of directory in which to cache figure

  • window_avg (int, default=7) – Number of months for rolling average of commit data. Enforced to be odd.

Returns:

fig – The generated figure

Return type:

plt.figure instance

runner

repo_stats.runner.parse_parameters(*args)[source]

Read the repository and citation targets and the analysis parameters from a .json parameter file.

Parameters:

*args (list of str) – Simulates the command line arguments

Returns:

params – Parameters used by the analysis

Return type:

dict

repo_stats.runner.main(*args)[source]

Run the citation and repository statistics analysis.

Parameters:

*args (list of str) – Simulates the command line arguments

user_stats

class repo_stats.user_stats.StatsImage(template_image, font)[source]

Class for updating a template image (e.g. to be displayed in a GitHub README) with repository and citation statistics.

Parameters:
  • template_image (str) – Template image to be updated

  • font (str) – Font file (.tff) to be used

draw_text(coords, text, text_color=None, font=None, **kwargs)[source]

Convenience wrapper for ‘PIL.ImageDraw.Draw’.

Parameters:
  • coords (tuple of int) – (x,y) coordinates of text location

  • text (str) – Text to be drawn

  • text_color (str, default=None) – Text color

  • font ('PIL.ImageFont' instance, default=None) – Text font

update_image(stats, repo_name, cache_dir)[source]

Update the provided template image with text summarizing repository and citation statistics.

Parameters:
  • stats (dict) – A dictionary entry for each issue or pull request in the history (see git_metrics.get_issues_prs)

  • repo_name (str) – Name of repository on GitHub (for drawn text)

  • cache_dir (str) – Name of directory in which to cache updated image

Returns:

self.img – The provided template image updated with text

Return type:

‘PIL.Image’ instance

utilities

repo_stats.utilities.fill_missed_months(unique_output)[source]

For an output of ‘np.unique(x, return_counts=True)’ where ‘x’ is a list of dates of the format ‘2024-01’, fill in months missing in this list and set their count to 0.

Parameters:

unique_output (tuple of array) – Output of ‘np.unique’

Returns:

unique_output – The input updated with inserted entries for missing months

Return type:

list of array

repo_stats.utilities.rolling_average(unaveraged, window)[source]

Obtain a rolling average of ‘unaveraged’ data in a sliding window of index length ‘window’.

Parameters:
  • unaveraged (list) – Data to be averaged

  • window (int) – Width (in indices) of sliding window. Enforced to be odd

Returns:

  • roll_avg (array) – Averaged data

  • window (int) – The input ‘window’, potentially decreased by 1 to make odd

repo_stats.utilities.update_cache(cache_file, old_items, new_items)[source]

Update ‘cache_file’ with ‘new_items’ entries, one per line.

Parameters:
  • cache_file (str) – Path to existing ASCII cache file

  • old_items (str or list of str) – Existing and new cache entries

  • new_items (str or list of str) – Existing and new cache entries

Returns:

all_items – Combined ‘old_items’ and ‘new_items’

Return type:

list of str