The next step is to upload a document. We will upload the following sample document and then write queries against it.
curl --config redshred-auth \
-X POST \
-F 'file=@Honda-GX270-short.pdf' \
https://api.redshred.com/v2/collections/my-test-collection/documents/
@TODO add python upload example
from redshred import SimpleAPI
api = SimpleAPI(raw_responses=True)
response = api.post("/collections/my-test-collection/documents/", files=(blah blah))
Once the upload is complete, you’ll get back a response similar to:
{
"collection_link": "https://api.redshred.com/v2/collections/my-test-collection",
"collection_slug": "my-test-collection",
"config": null,
"content_hash": "8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa",
"created_at": "2022-09-04T21:37:19.250049Z",
"created_by": "jmk+admin@redshred.com",
"csv_metadata": null,
"description": null,
"document_segment_link": null,
"errors": null,
"file_link": "https://api.redshred.com/v2/files/my-test-collection/s8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa.pdf?name=Honda-GX270-short.pdf",
"file_size": 234614,
"id": "3aLf3arG3LmWxzBKYFNzqc",
"index": 1,
"metadata": null,
"n_pages": null,
"name": "Honda-GX270-short.pdf",
"original_name": "Honda-GX270-short.pdf",
"pages_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc/pages",
"pdf_link": "https://api.redshred.com/v2/files/my-test-collection/s8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa.pdf",
"perspectives_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc/perspectives",
"read_state": "queued",
"read_state_updated_at": "2022-09-04T21:37:19.397325Z",
"region": {
"coordinates": [
[
[
0,
0
],
[
1,
0
],
[
1,
1
],
[
0,
1
],
[
0,
0
]
]
],
"type": "Polygon"
},
"segments_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc/segments",
"self_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc",
"slug": "honda-gx270-shortpdf",
"source": "file",
"summary": null,
"text": null,
"uniqueness_id": "s8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa",
"updated_at": "2022-09-04T21:37:19.276674Z",
"updated_by": "jmk+admin@redshred.com",
"user_data": null,
"warnings": null
}
There’s a lot going on here, so let’s unpack a few key pieces.
Note the read_state: queued
attribute. This means the document has been accepted and was queued to be processed by RedShred. RedShred is asynchronous; uploads return with success before the reading is actually complete. This allows you to batch uploads while RedShred scales to read them concurrently even under high load.
When a document is finished reading, its read_state
will move to read
. If something fails during reading, its read state will transition to crashed
.
self_link
. This is used to make calls specific to this document or any of the content within it.updated_at
date to let you know when they were last read or modifiedoriginal_name
(the name of the that was uploaded) or name
which is user-settableThe read_state
field tells the status of this document. When documents are uploaded they are queued
to be read using the collection’s current configuration. From there they transition into reading
and then to read
when the read process is complete. In the event of an error, the document will move to the crashed
state which indicates an error occurred at read time.