Language is hard. RSQL is here to help. RSQL is your interface to accessing document data. Creating a framework to allow RSQL to build on your mental model of document data is key to making the most of this documentation.
Below are examples of queries that illustrate the features of the query language.
Query | Description |
---|---|
"2.4.2." in text |
Searches for text containing “2.4.2.” |
"2.4.2." in text and segment_type = "procedure" |
Combines text search with segment type filter |
within ("2.4.2." in text and segment_type = "procedure") |
Finds segments within the physical extent of another segment |
within segment_type = "procedure-step" and (within (segment_type = "procedure" and "2-4-2." in text)) |
Uses nested within clauses for hierarchical constraints |
perspective.name = "tech_manuals" and enrichment_data.confidence_score <= 0.6 |
Uses numeric comparison with field access |
text ~ /GX\d*/ |
Uses regular expression pattern matching |
text ~ /(?i)bucket/ |
Uses case-insensitive regular expression matching |
id = "Dzu727yGWuX9FC4cBX2dcK" |
Queries by exact segment ID |
The query "2.4.2." in text
searches for any segment where “2.4.2.” appears anywhere in the
text
field. This is the most basic form of search.
"2.4.2." in text and segment_type = "procedure"
"2.4.2." in text or segment_type = "procedure"
demonstrates how to combine multiple conditions. This query finds segments that both contain “2.4.2.” in their text field AND have a segment type of
“procedure” using the and
keyword. Alternately you can use or
to search for either.
Queries to the /segments/
endpoint evaluate to lists of segments. Because each segment has
regions
in the landscape of the collection, we can constrain queries to be relative to those segments.
The query within ("2.4.2." in text and segment_type = "procedure")
finds not the segment with
“2.4.2.” in its text field, but segments in the same physical extent of that segment. Paragraphs,
actions, diagrams, etc. can be constrained this way.
The query within segment_type = "procedure-step" and (within (segment_type = "procedure" and "2-4-2." in text))
shows how to perform constraints in the context of a boolean subclause. This finds any segments that
are within a “procedure-step” within the regions of procedure 2.4.2.
(segment_type = "actor" or segment_type = "action") and (within segment_type = "procedure-step" and (within (segment_type = "procedure" and "2-4-2." in text)))
demonstrates filtering for specific types of segments in a region. In this case, we’re looking for segments that are either “actor” or “action” types within the specified nested constraints.
perspective.name = "jobguide_steps" and (within segment_type = "procedure-step" and (within (segment_type = "procedure" and "2-4-2." in text)))
The query shows filtering on the attribute name
of the perspective
attribute of the segment. We can access
subfields using the .
operator to separate fields from their parents.
perspective.name = "tech_manuals" and enrichment_data.confidence_score <= 0.6
illustrates using numeric fields with comparison operators. This query shows segments from the “tech_manuals” perspective that have a confidence threshold of 60% or lower.
The query text ~ /GX\d*/
showcases using regular expressions for search. Regular expressions are a
powerful syntax for string matching that is widely supported across many applications. This example
matches values of the text
field that match “GX” followed by zero or more of any digit from 0 to 9.
A detailed introduction to regular expressions is beyond the scope of this tutorial, but sites like https://regex101.com/ are a good place to start.
text ~ /(?i)bucket/
demonstrates case-insensitive matching with regular expressions. By setting the
(?i)
flag in the regular expression, it matches “bucket” in any case combination (e.g., “bucket”,
“Bucket”, “bUCket”, “buckeT”).
The query id = "Dzu727yGWuX9FC4cBX2dcK"
shows how to query for a segment by its specific segment_id
,
which is useful when you need to retrieve a particular segment directly.
perspective.id="32ccPw6ZA8ZQtPvvVytJzB" and id="NbRvgPy5ymA9CKEcBS9uTN"
This query retrieves a specific segment by its ID within a specific perspective. While the segment ID alone is usually sufficient to uniquely identify a segment, adding the perspective ID may potentially improve query performance in certain cases by narrowing the search scope.
enrichment_name="typography" and max_y=1 and segment_type = "page"
This query finds the first page of all documents in a collection. It works by filtering for segments
of type page
that have the typography
enrichment and a max_y
value of 1 (indicating they’re at
the top of the document).
(text ~ /^Note/ or text ~ /^NOTE/) and segment_type = "text-line"
This query finds all segments of type text-line
that have a text field starting with either Note
or NOTE
. The ^
character in the regular expression indicates the start of the text.
segment_type = "list-item" and text ~ "^•"
This query retrieves all segments of type list-item
that begin with a bullet point character (•).
This is useful for identifying bulleted lists in documents.
enrichment_name = "enrichment_model_adapter" and text ~ /^\d+-\d+-\d+\s+[A-Z\s,]+\.$/
This query identifies segments from the enrichment_model_adapter
perspective that follow a
specific format: three hyphen-separated numbers, followed by a space, then one or more uppercase
letters (with possible spaces or commas), ending with a period. For example: “20-20-93 AIRCRAFT
MANUAL.”
Note: When using this query in the collection config editor or in certain api calls, the escape characters (backslashes) are required as shown above. However, when using the same query in the top search bar of the viewer, the escape slashes aren’t needed:
enrichment_name = "enrichment_model_adapter" and text ~ /^\d+-\d+-\d+\s+[A-Z\s,]+.$/
text ~ /^Checklist Requirements:/ and enrichment_name = "enrichment_model_adapter"
This query finds all segments from the enrichment_model_adapter
perspective that begin with the
exact string “Checklist Requirements:”. This is useful for identifying specific section headers in
technical documents.