Uploading Documents

The next step is to upload a document. We will upload the following sample document and then write queries against it.

Perform the upload

curl --config redshred-auth \
    -X POST \
    -F 'file=@Honda-GX270-short.pdf'  \
    https://api.redshred.com/v2/collections/my-test-collection/documents/
@TODO add python upload example
from redshred import SimpleAPI
api = SimpleAPI(raw_responses=True)
response = api.post("/collections/my-test-collection/documents/", files=(blah blah))

Once the upload is complete, you’ll get back a response similar to:

{
  "collection_link": "https://api.redshred.com/v2/collections/my-test-collection",
  "collection_slug": "my-test-collection",
  "config": null,
  "content_hash": "8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa",
  "created_at": "2022-09-04T21:37:19.250049Z",
  "created_by": "jmk+admin@redshred.com",
  "csv_metadata": null,
  "description": null,
  "document_segment_link": null,
  "errors": null,
  "file_link": "https://api.redshred.com/v2/files/my-test-collection/s8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa.pdf?name=Honda-GX270-short.pdf",
  "file_size": 234614,
  "id": "3aLf3arG3LmWxzBKYFNzqc",
  "index": 1,
  "metadata": null,
  "n_pages": null,
  "name": "Honda-GX270-short.pdf",
  "original_name": "Honda-GX270-short.pdf",
  "pages_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc/pages",
  "pdf_link": "https://api.redshred.com/v2/files/my-test-collection/s8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa.pdf",
  "perspectives_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc/perspectives",
  "read_state": "queued",
  "read_state_updated_at": "2022-09-04T21:37:19.397325Z",
  "region": {
    "coordinates": [
      [
        [
          0,
          0
        ],
        [
          1,
          0
        ],
        [
          1,
          1
        ],
        [
          0,
          1
        ],
        [
          0,
          0
        ]
      ]
    ],
    "type": "Polygon"
  },
  "segments_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc/segments",
  "self_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc",
  "slug": "honda-gx270-shortpdf",
  "source": "file",
  "summary": null,
  "text": null,
  "uniqueness_id": "s8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa",
  "updated_at": "2022-09-04T21:37:19.276674Z",
  "updated_by": "jmk+admin@redshred.com",
  "user_data": null,
  "warnings": null
}

There’s a lot going on here, so let’s unpack a few key pieces.

Reading is asynchronous

Note the read_state: queued attribute. This means the document has been accepted and was queued to be processed by RedShred. RedShred is asynchronous; uploads return with success before the reading is actually complete. This allows you to batch uploads while RedShred scales to read them concurrently even under high load.

When a document is finished reading, its read_state will move to read. If something fails during reading, its read state will transition to crashed.

Other key details:

  1. Each document is represented by a URL, self_link. This is used to make calls specific to this document or any of the content within it.
  2. Documents have an updated_at date to let you know when they were last read or modified
  3. They also include original_name (the name of the that was uploaded) or name which is user-settable

Checking on status

The read_state field tells the status of this document. When documents are uploaded they are queued to be read using the collection’s current configuration. From there they transition into reading and then to read when the read process is complete. In the event of an error, the document will move to the crashed state which indicates an error occurred at read time.