Making Queries

Intro to the Query Language

RedShred provides a full-featured query language that lets you say things like “all the steps within this procedure” or “all the headings within this document”.

The query language provides:

  • Queries for objects and attributes
  • String, substring, and regular expression based matches
  • Spatial queries based on the arrangement of elements in the document
  • Boolean connectives to AND, OR, and NOT subclauses together to express complex constraints

Making Queries

The basics of executing a query are to send a q= URL parameter to a /segments/ endpoint. These endpoints are found in several locations across the list of endpoints, most notably off of /collections/$COLLECTION/segments/ and /documents/$DOCUMENT_ID/segments/. In each case, the query is implicitly scoped to the prefix of the URL: in the first case, the collection, and in the second case, just the specified document.

curl -H "Authorization: Token $REDSHRED_AUTH_TOKEN"  \
     -H "Content-Type: application/json" \
     -X GET "https://api.redshred.com/v2/collections/$COLLECTION/documents/$DOCUMENT_ID/segments/?q=$EXPRESSION"
import os

import requests

REDSHRED_AUTH_TOKEN = os.getenv('REDSHRED_AUTH_TOKEN')

headers = {
    'Authorization': f"Token {REDSHRED_AUTH_TOKEN}",
    'Content-Type': 'application/json',
}

response = requests.get('https://api.redshred.com/v2/collections/$COLLECTION/documents/$DOCUMENT_ID/segments/?q=$EXPRESSION', headers=headers)
package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
)

func main() {
	client := &http.Client{}
	req, err := http.NewRequest("GET", "https://api.redshred.com/v2/collections/$COLLECTION/documents/$DOCUMENT_ID/segments/?q=$EXPRESSION", nil)
	if err != nil {
		log.Fatal(err)
	}
	req.Header.Set("Authorization", "Token $REDSHRED_AUTH_TOKEN")
	req.Header.Set("Content-Type", "application/json")
	resp, err := client.Do(req)
	if err != nil {
		log.Fatal(err)
	}
	defer resp.Body.Close()
	bodyText, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s\n", bodyText)
}

Search Results

You’ll receive a payload in reply that includes a list of search results of the form:

{
  "next": "...",
  "previous": "...",
  "count": 123,
  "results": [ "..." ],  // a list of instances
}

next and previous are hyperlinks if a previous or next page exists, null otherwise. count is the total number of results across all pages and results is a list of segment instances.

Segment Basics

Segments contain, at a minimum:

  • Metadata about their creation (who created them, when, which enrichment process)
  • text - the text under this segment on the page.
  • regions - the unique region of the collection’s space occupied by this segment
  • enrichment_data - the metadata attached by this segment to the document at this region. The details of this vary from enrichment to enrichment.

The query language lets you make queries around these data in both as structured data using tests like less-than, greater-than, and equality testing as well as spatial queries that allow you to search for content based on where it appears on the page and how it relates to its surrounding content.

On the next pages let’s take a look at some example queries.