gh 1.6.0: recover from interruption

Last week I published an update to gh, our very minimal GitHub API client.

The release was triggered by GitHub changing the format of some of their authentication tokens.

This was also a good occasion to fix all outstanding gh issues, and I added lots of tests.

Do not fail on unknown token formats

Previous versions of gh validated the user’s authentication token and failed if it did not match one of the known token formats. This made sense (to us) when we first added this check because all tokens had the same format, and people used passwords often enough to cause issues when they were trying to use their GitHub password when a token was expected.

Later GitHub introduced more token formats, and this strict check became very brittle because each new token format broke gh. This time I missed the blog post about the new token format, so gh simply stopped working with it. Not great.

The new version of gh does not fail if it sees an unknown token format, it merely issues a warning. This is reasonable, because it might still be helpful occasionally, but it is not going to break any workflows.

Interrupts

If you do interactive data analysis, you’ll remember the case when you start a long iterative computation, e.g. downloading a bunch of pages from an API, and after a minute you realize that it is going to take longer than you expected. If you are lucky then you’ll have a progress bar, so at least you have an estimate of how long. Even if you do, you may want to back out instead of waiting for hours. But then you’d lose the partial results that you already have.

I (and others) wanted a solution that lets me interrupt a paginated gh API call, without losing the partial results. The new version of gh saves the partial results on an interrupt to the same place that tidyverse packages use to save the last error message, so it is accessible using rlang::last_error() after the interruption. It works like this:

1
2
> gh::gh("/users", per_page = 10, .limit = Inf)
50 items, page 5 | 1.4s

This is the list of all GitHub users, 10 at a time, so it would take a very long time (and very high rate limits) to finish. Now if I interrupt it I get:

1
2
! `gh()` interrupted after fetching 50 records.
ℹ Partial results are available in `rlang::last_error()$gh_result`.

Indeed:

1
2
3
4
5
6
7
8
9
> rlang::last_error()$gh_result[[1]][1:3]
$login
[1] "mojombo"

$id
[1] 1

$node_id
[1] "MDQ6VXNlcjE=“

It is also possible to handle the partial results programmatically, by catching the condition classed as gh_interrupt and then looking at the gh_result entry of the condition object.

If the interrupt condition is not caught, then it is saved so that rlang::last_error() can access it.

Tests!!! Fake HTTP apps!!!

I have always wanted to add better test cases to gh with webfakes. In the past the major blocker for this was that it was considerable effort to write a custom webfakes app for a single package. Every time I did it in the past it was well worth the effort. But for an established package without any major changes, like gh, I could not justify the time to do it.

Nowadays this is different. When starting to work on gh I realized (again) how annoying the existing (little!) live gh tests were and how even more annoying the lack of more gh tests were. Then with Claude I wrote a webfakes app for gh in a matter of minutes.

Claude is great at writing webfakes apps. webfakes follows the regular structure and terminology of web apps, with path matching, a handle stack, middleware, and LLMs are very good at HTTP as well.

I used to worry about not having live tests for an HTTP client, i.e. running all tests through webfakes and not touching GitHub at all. I don’t really mind this any more. It probably would not take much to change the gh tests to be able to run against live GitHub instead of webfakes and run the test suite (say) once a week. In practice I don’t think this is really important. If some change in the GitHub API breaks gh, we’ll know about it soon enough, because users will let us know before the weekly or even daily live test would run.

In summary, gh has 100% test coverage now.