Skip to content

Search API

The Search API is for heavy, low-frequency search jobs over post data.

The API is designed for asynchronous bulk retrieval:

  • You create a search job.
  • The API immediately returns a job ID.
  • You check the job status later.
  • Once the job completes, you fetch the result location.
https://search-jobs-service-s4c56s44ia-uk.a.run.app

This API requires an API key.

Send one of:

  • Preferred: Authorization: Bearer <YOUR_API_KEY>
  • Alternative: x-api-key: <YOUR_API_KEY>

If the key is missing or invalid, the API returns 401.

Returns the brand lists available to the authenticated API key.

The response includes:

  • the global list configured under *
  • the key-specific list configured for your key prefix
  • the effective list actually used for authorization
Terminal window
curl -sS "https://search-jobs-service-s4c56s44ia-uk.a.run.app/v1/search-jobs/brands" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"ok": true,
"keyPrefix": "sak_live_ab12cd34ef56",
"globalBrands": ["brand1", "brand2"],
"keySpecificBrands": ["brand3"],
"effectiveBrands": ["brand1", "brand2", "brand3"]
}

Creates a bulk brand search job and returns immediately with a job record.

{
"brands": ["brand1", "brand2"],
"format": "format",
"subtype": "subtype",
"search": "search",
"relevanceThreshold": 0.6,
"timestampFrom": "2026-01-01T00:00:00.000Z",
"timestampTo": "2026-04-01T00:00:00.000Z",
"limit": 10000
}

Fields

  • brands: required array of brand names or canonical brand keys.
  • format: optional primary format key to filter results. Matching is case-insensitive, but invalid values return 400.
  • subtype: optional primary subtype key to filter results. Requires format. Matching is case-insensitive, but invalid values return 400.
  • search: optional natural-language query used to rank matching posts by relevance.
  • relevanceThreshold: optional minimum relevance score between 0 and 1 when search is used. Defaults to 0.6.
  • timestampFrom: required ISO 8601 timestamp.
  • timestampTo: optional ISO 8601 timestamp. If omitted, the current time is used.
  • limit: optional integer cap on the number of rows returned in the export.

format and subtype should use canonical taxonomy keys. The search matches against the post’s primary format classification only. Unknown request fields are rejected with 400, so misspelled field names do not get silently ignored. Malformed JSON request bodies also return 400 with invalid_request.

When search is provided

  • results are ranked by relevance instead of recency
  • each row includes relevanceScore
  • relevanceScore is always between 0 and 1
  • only rows with relevanceScore >= relevanceThreshold are returned
  • relevanceScore is calculated as the average score across the matching retrieval signals for the post
  • if relevanceThreshold is omitted, the default is 0.6
  • a higher threshold reduces unrelated results, but increases the risk of filtering out genuinely relevant content
  • a lower threshold keeps more possible matches, but increases the risk of including content unrelated to the search

In practice, embedding-based relevance scores are often clustered above 0.5. That means thresholds below 0.5 may have little practical effect, especially for broader queries.

When search is not provided

  • results are ordered by recency
  • relevanceScore is omitted from each row
  • Describe the kind of content you want, not just a single keyword. videos about sports betting odds and predictions is usually stronger than sports betting.
  • Include the specific angle you care about. For example: people reacting to election results, tutorials explaining mortgage rates, or funny videos about fantasy football losses.
  • Use natural language instead of trying to write boolean search syntax. Full phrases and short descriptions work better than keyword stuffing.
  • If results are too broad, add more context rather than only raising the threshold. Narrowing the query often works better than forcing stricter relevance filtering.
  • If results are too narrow, simplify the query and lower relevanceThreshold slightly.
  • Pair search with format or subtype when you know the type of content you want. For example: search: "sports betting picks and parlays" with format: "Opinion" and subtype: "Commentary".

Use these exact values for format and subtype:

  • Opinion: Confession, Commentary, Debate, Duet, Hot Take, Reaction, Reply, Stitch
  • Narrative: Day in the Life, POV, Progress, Routine, Storytime, Vlog
  • Educational: Analysis / Deep Dive, DIY, FAQ, Hack, Tutorial, Skill Showcase, Walkthrough
  • Participatory: AMA/Q&A, Challenge, Collab, Interview, Poll, Social Experiment
  • Entertaining: ASMR, Dance, Impressions, Lip Sync, Live Performance, Meme, Parody / Satire, Prank, Roast, Roleplay, Rant, Skit
  • Evaluative: Haul, Product Showcase, Rating, Review, Testimonial, Tour, Unboxing
  • Excerpts: Behind the Scenes, Bloopers / Outtakes, Film / TV Clips, Gameplay, Highlights, Recaps
  • Promotional: Advertisement, Announcement, Endorsement, Giveaway, Teaser / Trailer, Promo
  • Artistic: Animation, Cinematic, Photographic, Timelapse
  • Text Posts: Quotes, Stories
Terminal window
curl -sS -X POST "https://search-jobs-service-s4c56s44ia-uk.a.run.app/v1/search-jobs/brand-search" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"brands": ["brand1", "brand2"],
"format": "format",
"subtype": "subtype",
"search": "search",
"relevanceThreshold": 0.6,
"timestampFrom": "2026-01-01T00:00:00.000Z",
"timestampTo": "2026-04-01T00:00:00.000Z",
"limit": 10000
}'
{
"ok": true,
"jobId": "3e6f5f88-0f6c-4d6a-9a1f-2d5b87c5d2b1",
"status": "RUNNING",
"createdAt": "2026-04-16T18:22:10.000Z",
"completedAt": null,
"failedAt": null,
"resultRowCount": null,
"errorMessage": null
}

Behavior

  • The request returns immediately after the job is accepted.
  • The heavy search work continues in the background.
  • Use the status endpoint to track progress.
  • Use the results endpoint after the job completes.

Returns the current state of a previously created search job.

Terminal window
curl -sS "https://search-jobs-service-s4c56s44ia-uk.a.run.app/v1/search-jobs/<jobId>" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"ok": true,
"jobId": "3e6f5f88-0f6c-4d6a-9a1f-2d5b87c5d2b1",
"status": "COMPLETED",
"queryType": "brand-search",
"createdAt": "2026-04-16T18:22:10.000Z",
"startedAt": "2026-04-16T18:22:10.000Z",
"completedAt": "2026-04-16T18:22:41.000Z",
"failedAt": null,
"resultRowCount": 123,
"errorMessage": null
}

Returns the result handle for a completed job.

Terminal window
curl -sS "https://search-jobs-service-s4c56s44ia-uk.a.run.app/v1/search-jobs/<jobId>/results" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"ok": true,
"jobId": "3e6f5f88-0f6c-4d6a-9a1f-2d5b87c5d2b1",
"status": "COMPLETED",
"resultFormat": "jsonl",
"resultRowCount": 123,
"storage": {
"kind": "gcs",
"storagePath": "search-api/<jobId>.jsonl",
"downloadUrl": "https://storage.googleapis.com/<bucket>/<path>?X-Goog-...",
"expiresAt": "2026-04-16T19:22:41.000Z"
}
}

Field reference

  • timestamp: publish timestamp of the Instagram post.
  • permalink: public Instagram URL for the post.
  • videoLink: the video itself, stored as a direct media URL you can download.
  • likeCount: like count captured on the original post.
  • commentCount: comment count captured on the original post.
  • brandsIdentified: brands identified in the analysis of the post.
  • keywords: keywords extracted in the analysis of the post.
  • primaryLanguage: primary detected language for the post content.
  • contentFormat: high-level content-format label captured from the analysis of the post
  • caption: original caption text.
  • locationName: Instagram location name attached to the post, if present.
  • locationLat: latitude for the attached Instagram location, if present.
  • locationLng: longitude for the attached Instagram location, if present.
  • summary: short generated summary of the post.
  • timeline: structured timeline of what happens in the post (actions, etc).
  • playByPlayTranscript: detailed transcript of what happens in the post (words, etc)
  • subject: concise description of what the post is about.
  • formatPrimaryCategory: primary format category from the format taxonomy above.
  • formatPrimarySubtype: primary format subtype from the format taxonomy above.
  • primaryTopicKeys: canonical topic-taxonomy keys representing the main themes of the post.
  • relevanceScore: search-only relevance score between 0 and 1. This field is included only when the request uses search.

Results are stored as JSONL.

That means:

  • one JSON object per line
  • easy to stream
  • easy to process with Python, Node, jq, BigQuery loaders, and shell tools

Each row currently looks like:

{
"timestamp": "2026-03-10T17:22:11.000Z",
"permalink": "https://www.instagram.com/...",
"videoLink": "https://storage.googleapis.com/...",
"likeCount": 1234,
"commentCount": 56,
"brandsIdentified": ["brand1", "brand2"],
"keywords": ["keyword1", "keyword2"],
"primaryLanguage": "language",
"contentFormat": "content format",
"caption": "caption",
"locationName": "location",
"locationLat": 12.3456,
"locationLng": -12.3456,
"summary": "summary",
"timeline": "timeline",
"playByPlayTranscript": "transcript",
"subject": "subject",
"formatPrimaryCategory": "format",
"formatPrimarySubtype": "subtype",
"primaryTopicKeys": ["key1", "key2"]
}

When search is provided, rows also include:

{
"relevanceScore": 0.78
}

Each API key is allowed to search only specific brands.

If any requested brand is not allowed for your key, the API returns 403.

Example response:

{
"ok": false,
"error": "brand_not_whitelisted",
"message": "Brand(s) \"brand1\" are not whitelisted for that key."
}

If you need access to additional brands, ask the API owner to whitelist them for your key.

Brand search job creation is limited to 100 requests per hour per API key.

The limit uses a rolling 1-hour window. If an API key creates 100 jobs within the last hour, additional job creation requests return 429 until enough time has passed for the window to fall back under the limit.

Example response:

{
"ok": false,
"error": "job_rate_limited",
"message": "This API key can create up to 100 jobs per hour."
}
{
"ok": false,
"error": "missing_api_key",
"message": "Missing API key."
}
{
"ok": false,
"error": "invalid_api_key",
"message": "Invalid API key."
}
{
"ok": false,
"error": "invalid_request",
"message": "Request validation failed."
}
{
"ok": false,
"error": "job_not_found",
"message": "Search job not found."
}
{
"ok": false,
"error": "job_not_completed",
"message": "Search job results are not available until the job completes."
}