Query Task Status
- Query the current status and result of an asynchronous task
- Task status flow:
pending→processing→completed/failed - Optionally pass
sync_upstream=trueto proactively refresh the latest status for tasks still in progress - After task completion, result links are valid for 24 hours; please save them promptly
results Field Format
After a task completes (status=completed), results is an array whose structure varies by model. Before parsing results, check the model field in the response to determine the structure type.
Common Format (Most Models)
Grouped by task output type (type):
type | results[i] Structure | Typical Use Cases |
|---|---|---|
image | {url} | Image generation, editing, enhancement |
video | {url} | Video generation, lip-sync, video editing |
audio (TTS / Music) | {url} | Text-to-speech, music generation |
llm | Full OpenAI ChatCompletion: {id, object:"chat.completion", created, model, choices[].message, usage} | LLM generation (llm_router / llm_async) |
All url links are valid for 24 hours. Please save them promptly.
Special Formats (By model)
The results[i] structure for the following models does not follow the common format above. Callers must handle them separately based on model.
Profile Creation (No Media URL Returned)
model | type | results[i] Structure | Notes |
|---|---|---|---|
kling-video-create-voice | video | {voice_id} | Creates a reusable voice profile ID for subsequent Kling lip-sync / synthesis tasks |
sora-2-character | video | {character_id, name} | Extracts a character profile from a reference video for character consistency in subsequent Sora 2 tasks |
phota-create-profile | image | {result_type, profile_id} | Trains a person profile for use by other PHOTA models; result_type is always profile |
Voice Cloning
model | type | results[i] Structure | Notes |
|---|---|---|---|
minimax-voice-clone | audio | {voice_id}; with preview text, additionally returns {url, content_type} | MiniMax voice cloning: submit a reference audio to generate a reusable voice ID; if preview text is provided, the preview audio url is also returned |
Speech Transcription (No Media URL Returned)
model | type | results[i] Structure | Notes |
|---|---|---|---|
scribe-v2 | audio | {text, language_code, language_probability, words[]} | Speech-to-text. words[] is a word-level timestamp array; each entry contains text / start / end / type |
Video with Seed Value
model | type | results[i] Structure | Notes |
|---|---|---|---|
seedance-2.0-text-to-video | video | {url, seed} | Additionally returns the generation seed for reproducibility |
seedance-2.0-fast-text-to-video | video | {url, seed} | Same as above |
seedance-2.0-image-to-video | video | {url, seed} | Same as above |
seedance-2.0-fast-image-to-video | video | {url, seed} | Same as above |
seedance-2.0-reference-to-video | video | {url, seed} | Same as above |
seedance-2.0-fast-reference-to-video | video | {url, seed} | Same as above |
PBR Material (Single Task Returns Multiple Semantically Labeled Results)
model | type | results[i] Structure | Notes |
|---|---|---|---|
patina-pbr-maps | image | {url, content_type, result_type: "pbr_map", map_type} | PBR map generation. map_type identifies the map type (e.g. albedo / normal / roughness, etc.) |
patina-material | image | Mixed: {url, content_type, result_type: "texture"} and {url, content_type, result_type: "pbr_map", map_type} | Outputs both tiling textures and multiple PBR maps; distinguish by result_type |
patina-material-extract | image | Same as patina-material | Extracts textures from an existing image to generate a PBR map set |
Music Generation with Lyrics Text
model | type | results[i] Structure | Notes |
|---|---|---|---|
lyria-3 | audio | {url}, some candidates add {lyrics} | 30-second music clip; lyrics is the generated lyrics / section structure text, returned only by some candidates and omitted when absent |
lyria-3-pro | audio | {url}, some candidates add {lyrics} | Full song (up to ~3 minutes); same as above, lyrics may be omitted depending on the candidate |
Parsing Recommendations
- Read
modelfirst, then parseresults: Different models under the sametypemay have completely different structures - Mind the expiration for URL results: All
urllinks are valid for 24 hours; your application should download and store them immediately upon receipt - Profile-type tasks return
voice_id/character_id/profile_idas long-lived resource identifiers that can be used directly in subsequent task parameters
Authorizations
All endpoints require Bearer Token authentication
Add the following to your request headers:
Authorization: Bearer YOUR_API_KEY
Path Parameters
Task ID, returned by the task submission endpoint
"task-unified-1757165031-uyujaw3d"
Query Parameters
Whether to proactively refresh the task status before returning. Only takes effect for tasks that are still in progress and have an associated remote task; otherwise, the current task status is returned directly.
true
Response
Query successful
Task ID
"task-unified-1757165031-uyujaw3d"
Specific task type
video.generation.task, image.generation.task, audio.generation.task, llm.generation.task "video.generation.task"
Task output type
video, image, audio, llm "video"
Actual model name used
"lipsync-2"
Task status
Options:
| Value | Meaning |
|---|---|
pending | Pending |
processing | Processing |
completed | Completed |
failed | Failed |
pending, processing, completed, failed "pending"
Task progress percentage
0 <= x <= 1000
Task creation timestamp (Unix seconds)
1757165031
Task result list; only populated when status=completed.
Structure varies by task output type (type):
type | results[i] structure | Typical scenarios |
|---|---|---|
image | {url}, some models include content_type; special tasks (e.g. PHOTA profile creation) return {result_type, profile_id} | Image generation, editing, enhancement |
video | {url}, some models include content_type / seed; special subtasks (Kling voice creation / Sora 2 character profile) return {voice_id} or {character_id, name} | Video generation, lip-sync |
audio (TTS / Music) | {url} | Text-to-speech, music generation |
audio (STT / Transcription) | {text, language_code, language_probability, words[]} (no url) | Speech transcription (e.g. Scribe V2) |
audio (Voice Clone) | {voice_id}; with preview text, additionally returns {url, content_type} | Voice cloning (e.g. MiniMax Voice Clone) |
llm | Full OpenAI ChatCompletion: {id, object:"chat.completion", created, model, choices[].message, usage} | LLM generation (llm_gateway / llm_async) |
General notes:
- Result URLs are valid for 24 hours; please save them promptly
type=llmresults are conversation responses and do not produce URLs- Some tasks produce non-media artifacts (voice profile / character profile creation, etc.);
resultscontains id-type fields instead ofurl
Image result (type=image). Most image models only return url; some models (Patina family) also include content_type; PHOTA profile creation tasks return {result_type, profile_id} (non-image artifact)
- Option 1
- Option 2
- Option 3
- Option 4
- Option 5
- Option 6
Error information; only populated when status=failed
Billing information