RedShred provides a full-featured query language that lets you say things like “all the steps within this procedure” or “all the headings within this document”.
The query language provides:
AND
, OR
, and NOT
subclauses together to express complex constraintsThe basics of executing a query are to send a q=
URL parameter to a /segments/
endpoint. These endpoints are found in several locations across the list of endpoints, most notably off of /collections/$COLLECTION/segments/
and /documents/$DOCUMENT_ID/segments/
. In each case, the query is implicitly scoped to the prefix of the URL: in the first case, the collection, and in the second case, just the specified document.
curl -H "Authorization: Token $REDSHRED_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-X GET "https://api.redshred.com/v2/collections/$COLLECTION/documents/$DOCUMENT_ID/segments/?q=$EXPRESSION"
import os
import requests
REDSHRED_AUTH_TOKEN = os.getenv('REDSHRED_AUTH_TOKEN')
headers = {
'Authorization': f"Token {REDSHRED_AUTH_TOKEN}",
'Content-Type': 'application/json',
}
response = requests.get('https://api.redshred.com/v2/collections/$COLLECTION/documents/$DOCUMENT_ID/segments/?q=$EXPRESSION', headers=headers)
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://api.redshred.com/v2/collections/$COLLECTION/documents/$DOCUMENT_ID/segments/?q=$EXPRESSION", nil)
if err != nil {
log.Fatal(err)
}
req.Header.Set("Authorization", "Token $REDSHRED_AUTH_TOKEN")
req.Header.Set("Content-Type", "application/json")
resp, err := client.Do(req)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
bodyText, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s\n", bodyText)
}
You’ll receive a payload in reply that includes a list of search results of the form:
{
"next": "...",
"previous": "...",
"count": 123,
"results": [ "..." ], // a list of instances
}
next
and previous
are hyperlinks if a previous or next page exists, null
otherwise. count
is the total number of results across all pages and results
is a list of segment instances.
Segments contain, at a minimum:
text
- the text under this segment on the page.regions
- the unique region of the collection’s space occupied by this segmentenrichment_data
- the metadata attached by this segment to the document at this region. The details of this vary from enrichment to enrichment.The query language lets you make queries around these data in both as structured data using tests like less-than, greater-than, and equality testing as well as spatial queries that allow you to search for content based on where it appears on the page and how it relates to its surrounding content.
On the next pages let’s take a look at some example queries.