Searching Over Encrypted Documents
Generate blind index tokens, search encrypted records, and decrypt results locally.
Search encrypted documents by generating blind index tokens locally. Literal matches opaque tokens and returns encrypted records for the client to decrypt.
Prerequisites
Before searching, the client needs:
- An active access token.
- At least one indexed document.
- The relevant blind index key for the search scope.
- A normalizer compatible with the indexing path.
- Cryptography support for HMAC and local decryption.
A processed document is not automatically searchable in every scope. Searchability depends on whether blind index tokens exist for that scope.
Overview
Search has three client-side responsibilities:
- Normalize the search term.
- Generate a blind index token using the correct search key.
- Decrypt encrypted records returned by the search endpoint.
For the privacy model behind blind indexing, see Encrypted Search. For how document references are anonymized, see Blind Routing.
Step 1 — Normalize The Search Term
Before generating a search token, normalize the search term so it matches the tokens that were created at index time. Normalization converts text to a consistent format so capitalization or whitespace differences do not cause misses.
The normalization process applies:
- Unicode NFC normalization — characters with multiple representations are stored in a single canonical form.
- Case folding — text is converted to lowercase.
- Whitespace normalization — runs of spaces, tabs, and newlines collapse into a single space; leading and trailing whitespace is trimmed.
For example:
| Input | Normalized Output |
|---|---|
" Driver's License " | "driver's license" |
"PASSPORT" | "passport" |
"José García" | "josé garcía" |
Normalization is deterministic (same input → same output) and idempotent (normalizing an already-normalized string returns the same string).
Use the same normalizer during indexing and search. Mismatched normalization is the most common cause of missed results.
Step 2 — Generate A Search Token
Once the search term is normalized, compute a blind index token from it using the relevant blind index key. The token is an HMAC-SHA256 output: a 32-byte value derived from the normalized text.
token = HMAC-SHA256(bik, normalized_text)Where:
bikis the blind index key (32 bytes).normalized_textis the normalized search term from Step 1.- Output is a 32-byte HMAC-SHA256 digest, base64-encoded for transport.
This is a one-way transformation — Literal cannot reverse the token to recover the search term, and different search terms produce entirely different tokens.
Use well-maintained cryptography libraries for HMAC-SHA256 and base64 encoding. Verify the implementation with test vectors when available.
Step 3 — Choose The Search Scope
- Personal search — use the document holder’s personal blind index key (derived from the User Master Key during account creation).
- Organization search — use the organization’s blind index key (derived from the Entity Master Key inside the secure enclave).
The same plaintext value produces different tokens in different scopes, so personal and organization searches do not match each other’s tokens.
Step 4 — Submit The Search
Endpoint: POST /v1/search
A search request includes the token, search scope, optional index-type filters, and an organization scope token when searching entity documents.
Use the API Reference for exact request fields, response bodies, and errors.
Search accepts one token per request. For multi-term search, issue one request per term and intersect or combine encrypted result sets client-side.
Step 5 — Handle Search Results
Literal returns encrypted document records. The client must:
- Choose the correct wrapped document key for the search scope.
- Unwrap the document encryption key.
- Decrypt encrypted metadata locally.
- Use the blind document token for follow-up operations when needed.
Filtering By Index Type
Literal indexes several categories of document information. Index-type filters narrow which token categories the server matches.
| Index Type | What It Matches |
|---|---|
doc_type | The kind of document (e.g., "passport", "driver's license"). |
doc_field | Specific extracted fields (e.g., a name, ID number). |
doc_date | Date values associated with the document. |
doc_tag | User-applied labels and tags. |
text_content | Words and phrases extracted from the document’s text content. |
Use index-type filters to narrow which token categories the server matches. Omitting the filter searches across all indexed categories.
Searching After Claiming A Shared Document
Claiming a grant gives the recipient access to the document key, but it does not automatically add the document to the recipient’s personal search scope.
To make the shared document searchable personally, the client:
- Claims the grant and obtains access to the document key.
- Decrypts the document metadata or searchable fields locally.
- Normalizes each searchable value.
- Generates blind index tokens using the recipient’s personal search key.
- Submits those tokens for the document.
Endpoint: POST /v1/documents/consumer-indexes
Use the API Reference for exact request fields.
Search Flow Pseudocode
normalized = normalize(searchTerm)
token = hmacSha256(searchKey, normalized)
results = search(token, scope)
for record in results:
dek = unwrap(record.wrappedDocumentKey)
metadata = decrypt(dek, record.metadataEncrypted)Related Resources
- Encrypted Search — the privacy model behind blind indexing, including what the application server can and cannot infer.
- Blind Routing — how document references are anonymized.
- Zero-Knowledge Model — the trust boundaries around encrypted search.
- Document Upload — how blind index tokens are produced at upload time.
Last updated on