Welcome to RedShred > Working with Document Data > Example Queries

Example Queries

Language is hard. RSQL is here to help. RSQL is your interface to accessing document data. Creating a framework to allow RSQL to build on your mental model of document data is key to making the most of this documentation.

Below are examples of queries that illustrate the features of the query language.

Query	Description
`"2.4.2." in text`	Searches for text containing “2.4.2.”
`"2.4.2." in text and segment_type = "procedure"`	Combines text search with segment type filter
`within ("2.4.2." in text and segment_type = "procedure")`	Finds segments within the physical extent of another segment
`within segment_type = "procedure-step" and (within (segment_type = "procedure" and "2-4-2." in text))`	Uses nested within clauses for hierarchical constraints
`perspective.name = "tech_manuals" and enrichment_data.confidence_score <= 0.6`	Uses numeric comparison with field access
`text ~ /GX\d*/`	Uses regular expression pattern matching
`text ~ /(?i)bucket/`	Uses case-insensitive regular expression matching
`id = "Dzu727yGWuX9FC4cBX2dcK"`	Queries by exact segment ID

Query Styles

Basic Text Search

The query "2.4.2." in text searches for any segment where “2.4.2.” appears anywhere in the text field. This is the most basic form of search.

Combining Conditions (Boolean Operations)

"2.4.2." in text and segment_type = "procedure"
"2.4.2." in text or segment_type = "procedure"

demonstrates how to combine multiple conditions. This query finds segments that both contain “2.4.2.” in their text field AND have a segment type of “procedure” using the and keyword. Alternately you can use or to search for either.

Using the Within Operator

Queries to the /segments/ endpoint evaluate to lists of segments. Because each segment has regions in the landscape of the collection, we can constrain queries to be relative to those segments.

The query within ("2.4.2." in text and segment_type = "procedure") finds not the segment with “2.4.2.” in its text field, but segments in the same physical extent of that segment. Paragraphs, actions, diagrams, etc. can be constrained this way.

Nested Within Clauses

The query within segment_type = "procedure-step" and (within (segment_type = "procedure" and "2-4-2." in text)) shows how to perform constraints in the context of a boolean subclause. This finds any segments that are within a “procedure-step” within the regions of procedure 2.4.2.

Filtering by Segment Type

(segment_type = "actor" or segment_type = "action") and (within segment_type = "procedure-step" and (within (segment_type = "procedure" and "2-4-2." in text)))

demonstrates filtering for specific types of segments in a region. In this case, we’re looking for segments that are either “actor” or “action” types within the specified nested constraints.

Accessing Subfields

perspective.name = "jobguide_steps" and (within segment_type = "procedure-step" and (within (segment_type = "procedure" and "2-4-2." in text)))

The query shows filtering on the attribute name of the perspective attribute of the segment. We can access subfields using the . operator to separate fields from their parents.

Numeric Comparisons

perspective.name = "tech_manuals" and enrichment_data.confidence_score <= 0.6

illustrates using numeric fields with comparison operators. This query shows segments from the “tech_manuals” perspective that have a confidence threshold of 60% or lower.

Regular Expressions

The query text ~ /GX\d*/ showcases using regular expressions for search. Regular expressions are a powerful syntax for string matching that is widely supported across many applications. This example matches values of the text field that match “GX” followed by zero or more of any digit from 0 to 9.

A detailed introduction to regular expressions is beyond the scope of this tutorial, but sites like https://regex101.com/ are a good place to start.

Case-Insensitive Matching

text ~ /(?i)bucket/ demonstrates case-insensitive matching with regular expressions. By setting the (?i) flag in the regular expression, it matches “bucket” in any case combination (e.g., “bucket”, “Bucket”, “bUCket”, “buckeT”).

Exact ID Matching

The query id = "Dzu727yGWuX9FC4cBX2dcK" shows how to query for a segment by its specific segment_id, which is useful when you need to retrieve a particular segment directly.

Common Queries

Specific Segment in a Specific Perspective

perspective.id="32ccPw6ZA8ZQtPvvVytJzB" and id="NbRvgPy5ymA9CKEcBS9uTN"

This query retrieves a specific segment by its ID within a specific perspective. While the segment ID alone is usually sufficient to uniquely identify a segment, adding the perspective ID may potentially improve query performance in certain cases by narrowing the search scope.

First Page of All Documents in a Collection

enrichment_name="typography" and max_y=1 and segment_type = "page"

This query finds the first page of all documents in a collection. It works by filtering for segments of type page that have the typography enrichment and a max_y value of 1 (indicating they’re at the top of the document).

Finding Text Lines Starting with “Note” or “NOTE”

(text ~ /^Note/ or text ~ /^NOTE/) and segment_type = "text-line"

This query finds all segments of type text-line that have a text field starting with either Note or NOTE. The ^ character in the regular expression indicates the start of the text.

Finding List Items with Bullet Points

segment_type = "list-item" and text ~ "^•"

This query retrieves all segments of type list-item that begin with a bullet point character (•). This is useful for identifying bulleted lists in documents.

Finding Document Section Headers with Specific Format

enrichment_name = "enrichment_model_adapter" and text ~ /^\d+-\d+-\d+\s+[A-Z\s,]+\.$/

This query identifies segments from the enrichment_model_adapter perspective that follow a specific format: three hyphen-separated numbers, followed by a space, then one or more uppercase letters (with possible spaces or commas), ending with a period. For example: “20-20-93 AIRCRAFT MANUAL.”

Note: When using this query in the collection config editor or in certain api calls, the escape characters (backslashes) are required as shown above. However, when using the same query in the top search bar of the viewer, the escape slashes aren’t needed:
enrichment_name = "enrichment_model_adapter" and text ~ /^\d+-\d+-\d+\s+[A-Z\s,]+.$/

Finding Checklist Requirement Headers

text ~ /^Checklist Requirements:/ and enrichment_name = "enrichment_model_adapter"

This query finds all segments from the enrichment_model_adapter perspective that begin with the exact string “Checklist Requirements:”. This is useful for identifying specific section headers in technical documents.