Uploading Documents

The next step is to upload a document. We will upload the following sample document and then write queries against it.

Perform the upload

curl --config redshred-auth \
    -X POST \
    -F 'file=@Honda-GX270-short.pdf'  \
    https://api.redshred.com/v2/collections/my-test-collection/documents/
from redshred import RedShredClient

# Create client localy using token and api link
client = RedShredClient(token=YOUR_USER_API_TOKEN, host="https://api.staging.redshred.com")

# Retrieve already existing collection object locally
collection = client.collection("my-collection")

# Upload local file to that collection
document = collection.upload_file("/home/johndoe/Documents/my-document.pdf")

Once the upload is complete, you’ll get back a response similar to:

{
  "collection_link": "https://api.redshred.com/v2/collections/my-test-collection",
  "collection_slug": "my-test-collection",
  "config": null,
  "content_hash": "8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa",
  "created_at": "2022-09-04T21:37:19.250049Z",
  "created_by": "jmk+admin@redshred.com",
  "csv_metadata": null,
  "description": null,
  "document_segment_link": null,
  "errors": null,
  "file_link": "https://api.redshred.com/v2/files/my-test-collection/s8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa.pdf?name=Honda-GX270-short.pdf",
  "file_size": 234614,
  "id": "3aLf3arG3LmWxzBKYFNzqc",
  "index": 1,
  "metadata": null,
  "n_pages": null,
  "name": "Honda-GX270-short.pdf",
  "original_name": "Honda-GX270-short.pdf",
  "pages_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc/pages",
  "pdf_link": "https://api.redshred.com/v2/files/my-test-collection/s8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa.pdf",
  "perspectives_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc/perspectives",
  "read_state": "queued",
  "read_state_updated_at": "2022-09-04T21:37:19.397325Z",
  "region": {
    "coordinates": [
      [
        [
          0,
          0
        ],
        [
          1,
          0
        ],
        [
          1,
          1
        ],
        [
          0,
          1
        ],
        [
          0,
          0
        ]
      ]
    ],
    "type": "Polygon"
  },
  "segments_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc/segments",
  "self_link": "https://api.redshred.com/v2/collections/my-test-collection/documents/3aLf3arG3LmWxzBKYFNzqc",
  "slug": "honda-gx270-shortpdf",
  "source": "file",
  "summary": null,
  "text": null,
  "uniqueness_id": "s8b702b30121677f5bd91c07ca00c2a187fdde555d3a1b0cfb40a206f1ad68dfa",
  "updated_at": "2022-09-04T21:37:19.276674Z",
  "updated_by": "jmk+admin@redshred.com",
  "user_data": null,
  "warnings": null
}

There’s a lot going on here, so let’s unpack a few key pieces.

Reading is asynchronous

Note the read_state: queued attribute. This means the document has been accepted and was queued to be processed by RedShred. RedShred is asynchronous; uploads return with success before the reading is actually complete. This allows you to batch uploads while RedShred scales to read them concurrently even under high load.

Checking on status

The read_state field tells the status of this document. When documents are uploaded they are queued to be read using the collection’s current configuration. From there they transition into reading and then to read when the read process is complete. In the event of an error, the document will move to the crashed state which indicates an error occurred at read time.

Other key details:

  1. Each document is represented by a URL, self_link. This is used to make calls specific to this document or any of the content within it.
  2. Documents have an updated_at date to let you know when they were last read or modified
  3. They also include original_name (the name of the that was uploaded) or name which is user-settable