Scripts as modules

When I write R or bash scripts I usually write them following the “scripts as modules” idiom. This technique is very common in Python, but I rarely see it in R or bash.

The main advantages of writing scripts as modules is that it makes code reuse, testing and development easier. It also facilitates organizing your script into functions, which pays off in maintainability.

Script of module? Both!

As the name suggests, the name of the idiom allows writing a script file that also works as a module. By module I mean a file that can be loaded (imported, source()-d, etc.), into an interpreter as a collection of functions, classes, etc. without running the actual script.

I write the script as a collection of functions. The functions include a main() function. main() parses the command line arguments and then carries out the work the file is supposed to do when running as a script. After the functions I include a small piece of code that checks if the file is running as a script or not. If yes, then I call the main() function.

If this file is loaded (imported, sourced, etc.) into a REPL or another script, then main() does not run, but all functions and classes defined in the file are available to use.

Some history

I learned this idiom from Python, where it is commonly used, and it is called the __name__ == "__main__" idiom, because the extra code at the end of the file, that checks for being run as a script looks like this:

1
2
if __name__ == "__main__":
main()

It looks slightly different in other languages, some examples from a Stack Overflow question are:

Rubylink
1
2
3
if __FILE__ == $0
main()
end
Perllink
1
main() unless caller;

It turns out that the idea is at least 20 years old, possibly older.

Shell scripts

In bash, there is no clear consensus about the best way of deciding if a file is being source-d, or run as a script. There are a lot of good alternatives. The simplest one is perhaps

Bash
1
2
3
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
main "$@"
fi

On the other hand, if I am already writing a shell script, I’ll try to make it more portable, so it works on other shells as well, or at least zsh, which is the default shell in macOS nowadays. A solution that works on most shells is much trickier, unfortunately.

I have been using this version for a while and it seems to work well (in bash, dash, and zsh at least):

Shell scriptsSource on Stack Overflow
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sourced=0
if [ -n "$ZSH_VERSION" ]; then
case $ZSH_EVAL_CONTEXT in *:file) sourced=1;; esac
elif [ -n "$KSH_VERSION" ]; then
[ "$(cd -- "$(dirname -- "$0")" && pwd -P)/$(basename -- "$0")" != "$(cd -- "$(dirname -- "${.sh.file}")" && pwd -P)/$(basename -- "${.sh.file}")" ] && sourced=1
elif [ -n "$BASH_VERSION" ]; then
(return 0 2>/dev/null) && sourced=1
else # All other shells: examine $0 for known shell binary filenames.
# Detects `sh` and `dash`; add additional shell filenames as needed.
case ${0##*/} in sh|-sh|dash|-dash) sourced=1;; esac
fi

# ...

if [ "$sourced" = "0" ]; then
main "$*"
fi

Here is an example script that uses this idiom and a bats test file that runs tests for the functions of the script after source-ing the script.

R scripts

Luckily, it is much simpler to implement the idiom in R. sys.calls() returns NULL at the top level, i.e. when an R file is running as a script. So I put this at the end of the R script file:

1
2
3
if (is.null(sys.call())) {
main(commandArgs(TRUE))
}

For a complete example, here is an R script that converts a Parquet file to a CSV file using the nanoparquet package:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#! Rscript

parquet_to_csv <- function(parquet, csv = "") {
df <- nanoparquet::read_parquet(parquet)
utils::write.csv(df, csv, row.names = FALSE)
}

usage <- function() {
message("Usage: parquet-to-csv <parquet-file> [ <csv-file> ]")
}

main <- function(args) {
if (length(args) < 1) {
usage()
stop("<parquet-file> is missing", call. = FALSE)
}
if (length(args) > 2) {
usage()
stop("Extra arguments.", call. = FALSE)
}
parquet_to_csv(args[[1]], if (length(args) == 2) args[[2]] else "")
}

if (is.null(sys.call())) {
main(commandArgs(TRUE))
}

If you save this code as a file called parquet-to-csv, and make it executable, then you can run it as a script, either with Rscript or in Unix directly:

Run with Rscript
1
Rscript parquet-to-csv mtcars.parquet mtcars.csv
Run directly on Unix
1
parquet-to-csv mtcars.parquet mtcars.csv

You can also source() it into other scripts or for interactive use:

Use interactively
1
2
source("parquet-to-csv")
parquet_to_csv("mtcars.parquet")