When I write R or bash scripts I usually write them following the “scripts as modules” idiom. This technique is very common in Python, but I rarely see it in R or bash.
The main advantages of writing scripts as modules is that it makes code reuse, testing and development easier. It also facilitates organizing your script into functions, which pays off in maintainability.
Script of module? Both!
As the name suggests, the name of the idiom allows writing a script file
that also works as a module. By module I mean a file that can be loaded
(imported, source()
-d, etc.), into an interpreter as a collection of
functions, classes, etc. without running the actual script.
I write the script as a collection of functions. The functions include a
main()
function. main()
parses the command line arguments and then
carries out the work the file is supposed to do when running as a script.
After the functions I include a small piece of code that checks if the
file is running as a script or not. If yes, then I call the main()
function.
If this file is loaded (imported, sourced, etc.) into a REPL or another
script, then main()
does not run, but all functions and classes defined
in the file are available to use.
Some history
I learned this idiom from Python, where it is commonly used, and it is called the __name__ == "__main__"
idiom, because the extra
code at the end of the file, that checks for being run as a script looks
like this:
1 | if __name__ == "__main__": |
It looks slightly different in other languages, some examples from a Stack Overflow question are:
1 | if __FILE__ == $0 |
1 | main() unless caller; |
It turns out that the idea is at least 20 years old, possibly older.
Shell scripts
In bash, there is no clear consensus about the best way of deciding if a file
is being source
-d, or run as a script. There are a lot of good
alternatives. The simplest
one is perhaps
1 | if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then |
On the other hand, if I am already writing a shell script, I’ll try to make it more portable, so it works on other shells as well, or at least zsh, which is the default shell in macOS nowadays. A solution that works on most shells is much trickier, unfortunately.
I have been using this version for a while and it seems to work well (in bash, dash, and zsh at least):
1 | sourced=0 |
Here is an example script that uses this idiom and a bats test file that runs tests
for the functions of the script after source
-ing the script.
R scripts
Luckily, it is much simpler to implement the idiom in R. sys.calls()
returns NULL
at the top level, i.e. when an R file is running as a
script. So I put this at the end of the R script file:
1 | if (is.null(sys.call())) { |
For a complete example, here is an R script that converts a Parquet file to a CSV file using the nanoparquet package:
1 | #! Rscript |
If you save this code as a file called
parquet-to-csv
, and make it executable,
then you can run it as a script, either with Rscript or in Unix directly:
1 | Rscript parquet-to-csv mtcars.parquet mtcars.csv |
1 | parquet-to-csv mtcars.parquet mtcars.csv |
You can also source()
it into other scripts or for interactive use:
1 | source("parquet-to-csv") |