Document-Level Queries

While RSQL is commonly used for segment queries, it can also be applied to document and perspective instances. This page explains how to use RSQL to query documents directly. Note that the perspective endpoint operates in a similar way as the document endpoint.

Field Access and Dot Notation

The query language exposes fields on the top-level object and subfields via dot notation. This is the same pattern used in segment queries (e.g., perspective.name), but applied to document attributes.

Query Endpoints

Document-level queries can be used with the documents endpoint:

https://api.redshred.com/v2/collections/{collection_name}/documents/

Query Styles

Basic Document Filtering

Querying Documents by File Size

file_size <= 102000

This query finds all documents with a file size less than or equal to 102KB. This demonstrates how RSQL inherits functionality from Django’s Q objects to support numeric comparisons on document attributes.

You can use this query in two ways:

  1. As a direct query parameter to the documents endpoint:

    /v2/collections/research-reading/documents/?q=file_size%20%3C=%20102000&fields=self_link,name,file_size
    
  2. As a field-specific filter parameter (Django-style):

    /v2/collections/research-reading/documents/?file_size__lte=102000&fields=self_link,name,file_size
    

Both approaches achieve the same result, with the first using RSQL syntax and the second using Django’s field lookup syntax.

Document Name Filtering

name ~ /report/i

This finds documents with “report” in their name (case insensitive).

Date Filtering

created_at >= "2023-01-01"

This finds documents created on or after January 1, 2023.

Combining Conditions

file_size <= 102000 and name ~ /report/i

This query finds documents that are both smaller than 102KB and have “report” in their name.

Common Document Attributes

Documents have several attributes that can be queried:

Attribute Description Example Query
name Document filename name = "quarterly_report.pdf"
file_size Size in bytes file_size <= 1024000
created_at Creation timestamp created_at >= "2023-01-01"
updated_at Last update timestamp updated_at >= "2023-01-01"
mime_type MIME type mime_type = "application/pdf"
status Processing status status = "complete"

API Response Fields

When querying documents, you can specify which fields to include in the response using the fields parameter:

https://api.staging.redshred.com/v2/collections/research-reading/documents/?q=file_size%20%3C=%20102000&fields=self_link,name,file_size,created_at

This returns only the specified fields for each matching document, which can improve performance for large result sets.

Performance Considerations

For optimal performance when querying documents:

  1. Be as specific as possible with your queries
  2. Use the fields parameter to limit the returned data
  3. Consider using pagination parameters (limit and offset) for large result sets
  4. For complex queries, the Django-style field lookups may offer better performance in some cases