API for discovering changes to archive.org items
On this page
API for discovering changes to archive.org items#
At the core of the archive.org content management system there is a task manager which changes the items in the archive.
This API provides access to the list of identifiers which have changed since a particular time.
This API also supports a cold-start mode, which will enumerate all the items in the archive, in addition to providing the change stream.
Changes API#
URL
https://be-api.us.archive.org/changes/v1
Method
POST
Arguments:
access
S3 access key from https://archive.org/account/s3.phpsecret
S3 secret key from https://archive.org/account/s3.phpstart_date
if present sets the beginning point for the list of changed identifiers.a date stamp of the form
YYYYMMDD
0
to indicate a cold start, which will cause all identifiers to be returned, followed by any changes.
token
optional, a value from thenext_token
field from a previous call to this API. When present, the API returns the next set of results in the sequence of changes to the petabox.
In order to access the API, the account holding the secret
and access
keys requires /
privs, or the see_all_catalog_changes
privilege. Please contact a member of the collections team to arrange this access. Access control is required because the changes API reveals the presence of noindex
items, which are otherwise unknown to the general public.
Starting from now, see the changes in the archive#
ACCESS=<ias3 access key>
SECRET=<ias3 secret key>
curl --data-urlencode access="$ACCESS" --data-urlencode secret="$SECRET" \
https://be-api.us.archive.org/changes/v1 | jq .
Returns:
{
"estimated_distance_from_head": 0,
"do_sleep_before_returning": true,
"changes": [],
"next_token": "eyJmaW5pc2hlZF9tYXJrZXIiOjUzMjI3NjI3MH0="
}
Then to see the next set of changes, make the request again with the next_token
:
curl --data-urlencode access="$ACCESS" --data-urlencode secret="$SECRET" \
--data-urlencode token="eyJmaW5pc2hlZF9tYXJrZXIiOjUzMjI3NjI3MH0=" \
https://archive.org/services/changes.php | jq .
Returns:
{
"estimated_distance_from_head": 0,
"do_sleep_before_returning": true,
"changes": [
{
"identifier": "TELECONGO_20171026_223000"
},
...
{
"identifier": "ARCHIVEIT-6334-CDL-20140408-00002"
}
],
"next_token": "eyJmaW5pc2hlZF9tYXJrZXIiOjUzMjI3Nzg2M30="
}
Each time, the call is made with the next_token
, the next batch of results is returned. The fields returned are:
estimated_distance_from_head
This value does not always appear, when it does, it is the number of tasks on the done list, beyond the end of the current data set.changes
A list of records of items which have changed. This may contain more data about the item and change in the future.next_token
The token which, when used on a subsequent call will return the batch of changes following the ones given in thechanges
list.do_sleep_before_returning
A boolean which advises the caller to either take a break before the next call to the endpoint (because this data is very near the most recent possible), or if there is still much more change data to be consumed after this batch.
The api supports two other starting modes.
Starting from a date, see changes made after midnight on that UTC day#
curl --data-urlencode access="$ACCESS" --data-urlencode secret="$SECRET" \
--data-urlencode start_date=20170401
https://be-api.us.archive.org/changes/v1 | jq .
Returns:
{
"estimated_distance_from_head": 90828632,
"do_sleep_before_returning": false,
"changes": [
{
"identifier": "in.ernet.dli.2015.550031"
},
...
{
"identifier": "jamaica-gleaner.com-20160320-230158"
}
],
"next_token": "eyJmaW5pc2hlZF9tYXJrZXIiOjQ0MTQ1Mjc2Nn0="
}
To fetch more results, make a request using the next_token
, like so:
curl --data-urlencode access="$ACCESS" --data-urlencode secret="$SECRET" \
--data-urlencode token="eyJmaW5pc2hlZF9tYXJrZXIiOjQ0MTQ1Mjc2Nn0=" \
https://be-api.us.archive.org/changes/v1 | jq .
Returns:
{
"estimated_distance_from_head": 90779995,
"do_sleep_before_returning": false,
"changes": [
{
"identifier": "china.org.cn-20170101-092922"
},
...
{
"identifier": "GurudevQuotes"
}
],
"next_token": "eyJmaW5pc2hlZF9tYXJrZXIiOjQ0MTUwMjc2Nn0="
}
Starting from nothing, load all the identifiers, then pick up changes#
This API supports a cold start mode where all items are dumped out, then changes since the full item dump time are sent. This is done by setting start_date
to 0
.
For example:
curl --data-urlencode access="$ACCESS" --data-urlencode secret="$SECRET" \
--data-urlencode start_date=0
https://be-api.us.archive.org/changes/v1 | jq .
Returns:
{
"do_sleep_before_returning": false,
"changes": [
{
"identifier": "0----------"
},
...
{
"identifier": "008MRAnonymous"
}
],
"next_token": "eyJmaW5pc2hlZF9tYXJrZXIiOiI1MzIyODQzOTAiLCJzY2FuX3N0YXJ0IjoiMDA4TVJBbm9ueW1vdXMifQ=="
}
Continuing calls with token
set to next_token
will eventually return all changes up until the present.