Search API

Overview

The Search API is for heavy, low-frequency search jobs over post data.

The API is designed for asynchronous bulk retrieval:

You create a search job.
The API immediately returns a job ID.
You check the job status later.
Once the job completes, you fetch the result location.

Base URL

https://search-jobs-service-s4c56s44ia-uk.a.run.app

Authentication

This API requires an API key.

Send one of:

Preferred: Authorization: Bearer <YOUR_API_KEY>
Alternative: x-api-key: <YOUR_API_KEY>

If the key is missing or invalid, the API returns 401.

Endpoints

`GET /v1/search-jobs/brands`

Returns the brand lists available to the authenticated API key.

The response includes:

the global list configured under *
the key-specific list configured for your key prefix
the effective list actually used for authorization

Example request

curl -sS "https://search-jobs-service-s4c56s44ia-uk.a.run.app/v1/search-jobs/brands" \
  -H "Authorization: Bearer YOUR_API_KEY"

Example response

{
  "ok": true,
  "keyPrefix": "sak_live_ab12cd34ef56",
  "globalBrands": ["brand1", "brand2"],
  "keySpecificBrands": ["brand3"],
  "effectiveBrands": ["brand1", "brand2", "brand3"]
}

`POST /v1/search-jobs/brand-search`

Creates a bulk brand search job and returns immediately with a job record.

Request body

{
  "brands": ["brand1", "brand2"],
  "format": "format",
  "subtype": "subtype",
  "search": "search",
  "relevanceThreshold": 0.6,
  "timestampFrom": "2026-01-01T00:00:00.000Z",
  "timestampTo": "2026-04-01T00:00:00.000Z",
  "limit": 10000
}

Fields

brands: required array of brand names or canonical brand keys.
format: optional primary format key to filter results. Matching is case-insensitive, but invalid values return 400.
subtype: optional primary subtype key to filter results. Requires format. Matching is case-insensitive, but invalid values return 400.
search: optional natural-language query used to rank matching posts by relevance.
relevanceThreshold: optional minimum relevance score between 0 and 1 when search is used. Defaults to 0.6.
timestampFrom: required ISO 8601 timestamp.
timestampTo: optional ISO 8601 timestamp. If omitted, the current time is used.
limit: optional integer cap on the number of rows returned in the export.

format and subtype should use canonical taxonomy keys. The search matches against the post’s primary format classification only. Unknown request fields are rejected with 400, so misspelled field names do not get silently ignored. Malformed JSON request bodies also return 400 with invalid_request.

Search behavior

When search is provided

results are ranked by relevance instead of recency
each row includes relevanceScore
relevanceScore is always between 0 and 1
only rows with relevanceScore >= relevanceThreshold are returned
relevanceScore is calculated as the average score across the matching retrieval signals for the post
if relevanceThreshold is omitted, the default is 0.6
a higher threshold reduces unrelated results, but increases the risk of filtering out genuinely relevant content
a lower threshold keeps more possible matches, but increases the risk of including content unrelated to the search

In practice, embedding-based relevance scores are often clustered above 0.5. That means thresholds below 0.5 may have little practical effect, especially for broader queries.

When search is not provided

results are ordered by recency
relevanceScore is omitted from each row

Search tips

Describe the kind of content you want, not just a single keyword. videos about sports betting odds and predictions is usually stronger than sports betting.
Include the specific angle you care about. For example: people reacting to election results, tutorials explaining mortgage rates, or funny videos about fantasy football losses.
Use natural language instead of trying to write boolean search syntax. Full phrases and short descriptions work better than keyword stuffing.
If results are too broad, add more context rather than only raising the threshold. Narrowing the query often works better than forcing stricter relevance filtering.
If results are too narrow, simplify the query and lower relevanceThreshold slightly.
Pair search with format or subtype when you know the type of content you want. For example: search: "sports betting picks and parlays" with format: "Opinion" and subtype: "Commentary".

Format taxonomy

Use these exact values for format and subtype:

Opinion: Confession, Commentary, Debate, Duet, Hot Take, Reaction, Reply, Stitch
Narrative: Day in the Life, POV, Progress, Routine, Storytime, Vlog
Educational: Analysis / Deep Dive, DIY, FAQ, Hack, Tutorial, Skill Showcase, Walkthrough
Participatory: AMA/Q&A, Challenge, Collab, Interview, Poll, Social Experiment
Entertaining: ASMR, Dance, Impressions, Lip Sync, Live Performance, Meme, Parody / Satire, Prank, Roast, Roleplay, Rant, Skit
Evaluative: Haul, Product Showcase, Rating, Review, Testimonial, Tour, Unboxing
Excerpts: Behind the Scenes, Bloopers / Outtakes, Film / TV Clips, Gameplay, Highlights, Recaps
Promotional: Advertisement, Announcement, Endorsement, Giveaway, Teaser / Trailer, Promo
Artistic: Animation, Cinematic, Photographic, Timelapse
Text Posts: Quotes, Stories

Example request

curl -sS -X POST "https://search-jobs-service-s4c56s44ia-uk.a.run.app/v1/search-jobs/brand-search" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "brands": ["brand1", "brand2"],
    "format": "format",
    "subtype": "subtype",
    "search": "search",
    "relevanceThreshold": 0.6,
    "timestampFrom": "2026-01-01T00:00:00.000Z",
    "timestampTo": "2026-04-01T00:00:00.000Z",
    "limit": 10000
  }'

Example response

{
  "ok": true,
  "jobId": "3e6f5f88-0f6c-4d6a-9a1f-2d5b87c5d2b1",
  "status": "RUNNING",
  "createdAt": "2026-04-16T18:22:10.000Z",
  "completedAt": null,
  "failedAt": null,
  "resultRowCount": null,
  "errorMessage": null
}

Behavior

The request returns immediately after the job is accepted.
The heavy search work continues in the background.
Use the status endpoint to track progress.
Use the results endpoint after the job completes.

`GET /v1/search-jobs/:jobId`

Returns the current state of a previously created search job.

Example request

curl -sS "https://search-jobs-service-s4c56s44ia-uk.a.run.app/v1/search-jobs/<jobId>" \
  -H "Authorization: Bearer YOUR_API_KEY"

Example response

{
  "ok": true,
  "jobId": "3e6f5f88-0f6c-4d6a-9a1f-2d5b87c5d2b1",
  "status": "COMPLETED",
  "queryType": "brand-search",
  "createdAt": "2026-04-16T18:22:10.000Z",
  "startedAt": "2026-04-16T18:22:10.000Z",
  "completedAt": "2026-04-16T18:22:41.000Z",
  "failedAt": null,
  "resultRowCount": 123,
  "errorMessage": null
}

`GET /v1/search-jobs/:jobId/results`

Returns the result handle for a completed job.

Example request

curl -sS "https://search-jobs-service-s4c56s44ia-uk.a.run.app/v1/search-jobs/<jobId>/results" \
  -H "Authorization: Bearer YOUR_API_KEY"

Example response

{
  "ok": true,
  "jobId": "3e6f5f88-0f6c-4d6a-9a1f-2d5b87c5d2b1",
  "status": "COMPLETED",
  "resultFormat": "jsonl",
  "resultRowCount": 123,
  "storage": {
    "kind": "gcs",
    "storagePath": "search-api/<jobId>.jsonl",
    "downloadUrl": "https://storage.googleapis.com/<bucket>/<path>?X-Goog-...",
    "expiresAt": "2026-04-16T19:22:41.000Z"
  }
}

Result format

Field reference

timestamp: publish timestamp of the Instagram post.
permalink: public Instagram URL for the post.
videoLink: the video itself, stored as a direct media URL you can download.
likeCount: like count captured on the original post.
commentCount: comment count captured on the original post.
brandsIdentified: brands identified in the analysis of the post.
keywords: keywords extracted in the analysis of the post.
primaryLanguage: primary detected language for the post content.
contentFormat: high-level content-format label captured from the analysis of the post
caption: original caption text.
locationName: Instagram location name attached to the post, if present.
locationLat: latitude for the attached Instagram location, if present.
locationLng: longitude for the attached Instagram location, if present.
summary: short generated summary of the post.
timeline: structured timeline of what happens in the post (actions, etc).
playByPlayTranscript: detailed transcript of what happens in the post (words, etc)
subject: concise description of what the post is about.
formatPrimaryCategory: primary format category from the format taxonomy above.
formatPrimarySubtype: primary format subtype from the format taxonomy above.
primaryTopicKeys: canonical topic-taxonomy keys representing the main themes of the post.
relevanceScore: search-only relevance score between 0 and 1. This field is included only when the request uses search.

Results are stored as JSONL.

That means:

one JSON object per line
easy to stream
easy to process with Python, Node, jq, BigQuery loaders, and shell tools

Each row currently looks like:

{
  "timestamp": "2026-03-10T17:22:11.000Z",
  "permalink": "https://www.instagram.com/...",
  "videoLink": "https://storage.googleapis.com/...",
  "likeCount": 1234,
  "commentCount": 56,
  "brandsIdentified": ["brand1", "brand2"],
  "keywords": ["keyword1", "keyword2"],
  "primaryLanguage": "language",
  "contentFormat": "content format",
  "caption": "caption",
  "locationName": "location",
  "locationLat": 12.3456,
  "locationLng": -12.3456,
  "summary": "summary",
  "timeline": "timeline",
  "playByPlayTranscript": "transcript",
  "subject": "subject",
  "formatPrimaryCategory": "format",
  "formatPrimarySubtype": "subtype",
  "primaryTopicKeys": ["key1", "key2"]
}

When search is provided, rows also include:

{
  "relevanceScore": 0.78
}

Whitelisting

Each API key is allowed to search only specific brands.

If any requested brand is not allowed for your key, the API returns 403.

Example response:

{
  "ok": false,
  "error": "brand_not_whitelisted",
  "message": "Brand(s) \"brand1\" are not whitelisted for that key."
}

If you need access to additional brands, ask the API owner to whitelist them for your key.

Rate limits

Brand search job creation is limited to 100 requests per hour per API key.

The limit uses a rolling 1-hour window. If an API key creates 100 jobs within the last hour, additional job creation requests return 429 until enough time has passed for the window to fall back under the limit.

Example response:

{
  "ok": false,
  "error": "job_rate_limited",
  "message": "This API key can create up to 100 jobs per hour."
}

Common errors

Missing API key

{
  "ok": false,
  "error": "missing_api_key",
  "message": "Missing API key."
}

Invalid API key

{
  "ok": false,
  "error": "invalid_api_key",
  "message": "Invalid API key."
}

Invalid request

{
  "ok": false,
  "error": "invalid_request",
  "message": "Request validation failed."
}

Job not found

{
  "ok": false,
  "error": "job_not_found",
  "message": "Search job not found."
}

Results requested too early

{
  "ok": false,
  "error": "job_not_completed",
  "message": "Search job results are not available until the job completes."
}