Create Video Generation Task API

This document describes the input and output parameters of the Create Video Generation Task API for your reference when using the interface. The model generates videos based on the input images and text information. After generation is complete, you can query tasks by conditions and obtain the generated videos.

Model Capabilities

Doubao Seedance 2.0 Series (Audio/Video)
Multimodal Reference Video Generation: Input reference images (0~9) + reference videos (0~3) + reference audios (0~3) + text prompts (optional) to generate 1 target video. Note that audio cannot be input alone; at least 1 reference video or image must be included. Supports generating new videos, editing videos, and extending videos. Read tutorials for detailed code examples.
Image-to-Video - First and Last Frames: Input first frame image + last frame image + text prompts (optional) to generate 1 target video.
Image-to-Video - First Frame: Input first frame image + text prompts (optional) to generate 1 target video.
Text-to-Video: Input text prompts to generate 1 target video.

Note
Image-to-Video - First Frame, Image-to-Video - First and Last Frames, and Multimodal Reference Video Generation are three mutually exclusive scenarios. If you need to use both reference images and first/last frame control simultaneously, you can specify the first/last frames through prompts in multimodal reference mode; if you need to strictly ensure that the first and last frames are exactly the same as the specified images, please use Image-to-Video - First and Last Frames mode first.

Request Parameters

Request Information

Item	Value
Request Method	POST
Official Request URL	https://ai-tokenhub.com/api/v2/contents/generations/tasks
Content-Type	application/json
Authentication	API Key Authentication (Bearer Token)
Task Data Retention Period	7 days (calculated from task creation time)

Request Headers

Field Name	Required	Description
Authorization	✓	Format: `Bearer <your_api_key>`, where `<your_api_key>` is the long-term API key you obtained from the Tianshu platform
Content-Type	✓	Fixed value: `application/json`

Request Body

model `string` Required

The Model ID of the model you need to call. Enable the model service and query the Model ID.

content `object` Required

Information input to the model for video generation, supporting text, images, audio, and video.

Supported Input Combinations:

Text
Text (optional) + Image
Text (optional) + Video
Text (optional) + Image + Video
Text (optional) + Video + Audio
Text (optional) + Image + Audio
Text (optional) + Image + Video + Audio

Information Types

Text Information object

Prompt information input to the model.

content.type string Required: Type of input content, should be text here
content.text string Required: Text prompt input to the model, describing the expected video to be generated.

Note

Prompt Language Support: All models support Chinese and English prompts, as well as Japanese, Indonesian, Spanish, and Portuguese.
Prompt Word Count Recommendation: Chinese prompts should not exceed 500 characters, English prompts should not exceed 1000 words. Excessive word count can easily lead to scattered information, and the model may ignore details and only focus on key points, resulting in missing elements in the video.
For more usage tips: Please refer to the seedance Prompt Guide for detailed prompt usage tips.

Image Information object

Image information input to the model.

Properties:

content.type string Required: Type of input content, should be image_url here.
content.image_url object Required: Image object input to the model.
content.image_url.url string Required: Image URL, image Base64 encoding, or material ID.
- Image URL: Fill in the public URL of the image.
- Base64 Encoding: Convert local files to Base64 encoded strings, then submit to the large model. Follow the format: data:image/<image_format>;base64,<Base64_encoding>. Note that <image_format> should be lowercase, e.g., data:image/png;base64,<Base64 image>.
- Material ID: ID for preset materials and virtual avatars used for video generation, following the format: asset://< ASSET_ID >. For details, see Material Library Usage Guide.

Single Image Requirements

Format: jpeg, png, webp, bmp, tiff, gif, heic, heif
Aspect Ratio (width/height): (0.4, 2.5)
Width/Height Length (px): (300, 6000)
Size: Single image less than 30 MB. Request body size does not exceed 64 MB. Do not use Base64 encoding for large files.

Image Count:

Image-to-Video - First Frame: 1 image
Image-to-Video - First and Last Frames: 2 images
Seedance 2.0 Series Multimodal Reference Video Generation: 1-9 images

content.role string Conditionally Required: Position or purpose of the image.

Note
Image-to-Video - First Frame, Image-to-Video - First and Last Frames, and Multimodal Reference Video Generation (including reference images, videos, and audio) are 3 mutually exclusive scenarios and cannot be mixed.
Multimodal Reference Video Generation can specify reference images as first/last frames through prompts, indirectly achieving "first/last frames + multimodal reference" effect. If you need to strictly ensure that the first and last frames match the specified images, use Image-to-Video - First and Last Frames mode first (configure role as first_frame / last_frame).

Image-to-Video - First Frame-role Value: Need to pass 1 image_url object, role is first_frame or not filled.

Image-to-Video - First and Last Frames-role Value: Need to pass 2 image_url objects, and role is required.

Role for first frame image: first_frame
Role for last frame image: last_frame

Image-to-Video - Reference Image-role Value: Required, role for each reference image is: reference_image

Note

The input first and last frame images can be the same. When the aspect ratios of the first and last frame images are inconsistent, the first frame image is used as the main reference, and the last frame image will be automatically cropped to fit.

Video Information object

Video information input to the model.

Properties:

content.type string Required: Type of input content, should be video_url here.
content.video_url object Required: Video object input to the model.
content.video_url.url string Required: Video URL or material ID.

Video URL:

Fill in the public URL of the video.

Single Video Requirements

Video Format: `mp4`, `mov`. See table below for supported codecs.
Resolution: 480p, 720p, 1080p.
Duration: Single video duration [2, 15]s. Maximum 3 reference videos. Total duration of all videos does not exceed 15s.
Duration: Single video duration [2, 15]s. Maximum 3 reference videos. Total duration of all videos does not exceed 15s.
Dimensions:
- Aspect Ratio (width/height): [0.4, 2.5]
- Width/Height Length (px): [300, 6000]
- Total Pixels: (640×640=409600, 2206×2046=2086876), i.e., the product of width and height should be in the range [409600, 2086876].
Size: Single video does not exceed 50 MB.
Frame Rate (FPS): [24, 60]

Container Format

Container Format	Common File Extensions	MIME	Supported Codecs
MP4	mp4	video/mp4	Video: H.264/AVC, H.265/HEVC; Audio: AAC, MP3
QuickTime	mov	video/quicktime	Video: H.264/AVC, H.265/HEVC; Audio: AAC, MP3

content.role string Conditionally Required: Position or purpose of the video. Currently only supports reference_video.

Audio Information object

Note: Audio cannot be input alone; at least 1 reference video or image must be included.

Audio information input to the model.

Properties:

content.type string Required: Type of input content, should be audio_url here.
content.audio_url object Required: Audio object input to the model.
content.audio_url.url string Required: Audio URL, audio Base64 encoding, or material ID.

Audio URL: Fill in the public URL of the audio.
Base64 Encoding: Convert local files to Base64 encoded strings, then submit to the large model. Follow the format: data:audio/< audio_format >;base64,< Base64_encoding >. Note that < audio_format > should be lowercase, e.g., data:audio/wav;base64,< base64 audio >.
Material ID: ID for preset materials and virtual avatars used for video generation, following the format: asset://< ASSET_ID >.

Single Audio Requirements

Format: `wav`, `mp3`
Duration: Single audio duration [2, 15]s. Maximum 3 reference audios. Total duration of all audios does not exceed 15s.
Size: Single audio does not exceed 15 MB. Request body size does not exceed 64 MB. Do not use Base64 encoding for large files.

content.role string Conditionally Required: Position or purpose of the audio. Currently only supports reference_audio.

callback_url `string`

Fill in the callback notification address for the results of this generation task. When the video generation task has status changes, a POST request will be pushed to this address.

The callback request content structure is the same as the return body of the Query Task API.

Callback returned status includes the following:

queued: Queued
running: Task running
succeeded: Task succeeded (if sending fails, i.e., no successful sending information is received within 5 seconds, callback three times)
failed: Task failed (if sending fails, i.e., no successful sending information is received within 5 seconds, callback three times)
expired: Task timed out, i.e., the task has been in running or queued status for longer than the expiration time. The expiration time can be set through the execution_expires_after field.

return_last_frame `boolean`

Default: false

true: Return the last frame image of the generated video. After setting to true, the last frame image of the video can be obtained through the Query Video Generation Task interface. The format of the last frame image is png, with width and height pixel values consistent with the generated video, no watermark. Using this parameter can achieve multiple consecutive videos: use the last frame of the previous generated video as the first frame of the next video task to quickly generate multiple consecutive videos. For call examples, see the official tutorial of Volcengine.
false: Do not return the last frame image of the generated video.

execution_expires_after `integer`

Default: 172800

Task timeout threshold. Specify the expiration time (in seconds) after task submission, calculated from the created_at timestamp. The default value is 172800 seconds, which is 48 hours. Recommended range: [3600, 259200].

Regardless of which service tier is used, it is recommended to set an appropriate timeout based on the business scenario. After exceeding this time, the task will be automatically terminated and marked as expired status.

generate_audio `boolean`

Default: true

Control whether the generated video contains synchronized audio with the video.

true: The video output by the model contains synchronized audio. The model will automatically generate matching human voice, sound effects, and background music based on the text prompts and visual content. It is recommended to place dialogue parts in double quotes to optimize audio generation. For example: The man stopped the woman and said: "Remember, you must not point at the moon with your finger in the future."
false: The video output by the model is silent.

Note: All generated audio videos are mono-channel, regardless of the number of channels in the input audio.

tools `object`

Configure tools to be called by the model.

Properties:

tools.type string: Specify the tool type to use.
- web_search: Web search tool.

Note

After enabling web search, the model will independently determine whether to search the Internet (such as products, weather, etc.) based on the user's prompts. This can improve the timeliness of generated videos but will also increase certain latency.
The actual number of searches can be obtained through the `usage.tool_usage.web_search` field returned by the Query Video Generation Task API. If it is 0, it means no search was performed.

safety_identifier `string`

Unique identifier for end users, used to assist the platform in detecting users in your application who may violate the Volcano Ark usage policy.

This identifier is an English string, must be fixed and unique for a single user, and cannot exceed 64 characters. It is recommended to pass a string generated by hashing the username, user ID, or email to avoid leaking user privacy information.

priority `integer`

Default: 0

Set the execution priority of the current request, which determines its position in the queue. Value range: 0-9, the higher the value, the higher the priority.

By default, requests are executed in FIFO (First In, First Out) order. After setting a higher priority, the request will be inserted before all lower priority requests under the same Endpoint (inference access point).

Example:

An Endpoint currently has 3 queued (status=queued) tasks in its queue, all with priority 0 (default). Queue: [Request A:priority=0] → [Task B:priority=0] → [Task C:priority=0] At this time, a new request with priority=5 is submitted, which will be directly placed at the head of the queue: Queue: [New Request:priority=5] → [Task A:priority=0] → [Task B:priority=0] → [Task C:priority=0]

Note

Requests with the same priority are still sorted by FIFO
Priority only affects the queuing order and does not interrupt tasks that are currently executing (status=running)
Priority only takes effect within the same Endpoint and does not affect tasks in other Endpoints
Offline inference mode (service_tier=flex) does not support configuring priority

Parameter Upgrade Notes

For `resolution`, `ratio`, `duration`, `frames`, `seed`, `camera_fixed`, and `watermark` parameters, the platform has upgraded the parameter passing method, as shown in the example below. All models are still compatible with the old parameter passing method.
Different models may support different parameters and values. See Output Video Format for details. When the input parameters or values do not match the selected model, the parameters will be ignored or trigger an error.
New Method: Pass parameters directly in the request body. This method is strongly validated. If parameters are filled incorrectly, the model will return an error prompt.
Old Method: Append `--[parameters]` after the text prompt. This method is weakly validated. If parameters are filled incorrectly, they will be ignored or trigger an error.

New Method (Recommended): Pass parameters directly in the request body

json

// Specify the aspect ratio of the generated video as 16:9, duration as 5 seconds,
// resolution as 720p, seed as 11, and include a watermark. The camera is not fixed.
{
  "model": "doubao-seedance-1-5-pro-251215",
  "content": [
    {
      "type": "text",
      "text": "A kitten yawns at the camera"
    }
  ],
  // All parameters must be written in full; abbreviations are not supported
  "resolution": "720p",
  "ratio": "16:9",
  "duration": 5,
  // "frames": 20, Either duration or frames is required
  "seed": 11,
  "camera_fixed": false,
  "watermark": true
}

Old Method: Append --[parameters] after the text prompt

json

{
    "id": "019e640a-348e-74f8-8aff-9f7520b76359",
    "upstreamTaskId": "cgt-20260526192731-qjwzf",
    "status": "submitted",
    "model": "doubao-seedance-2-0-260128",
    "createdAt": "2026-05-26T11:27:30.95832Z"
}

resolution `string`

Default: 720p

Video resolution. Enumerated values:

480p
720p
1080p: Not supported by Seedance 2.0 fast

ratio `string`

Default: adaptive

Aspect ratio of the generated video. See the table below for width and height pixel values corresponding to different aspect ratios.

Enumerated values:

16:9
4:3
1:1
3:4
9:16
21:9
adaptive: Automatically select the most suitable aspect ratio based on input (see description below)

adaptive Adaptation Rules

When ratio is configured as adaptive, the model will automatically adapt the aspect ratio according to the generation scenario; the actual aspect ratio of the generated video can be obtained through the ratio field returned by the Query Video Generation Task API.

Value Rules:

Text-to-Video: Intelligently select the most suitable aspect ratio based on the input prompts.
First Frame / First and Last Frames Video Generation: Automatically select the closest aspect ratio based on the aspect ratio of the uploaded first frame image.
Multimodal Reference Video Generation: Judged based on user prompt intent. If it is first frame video generation/video editing/video extension, select the closest aspect ratio based on the image/video; otherwise, select the closest aspect ratio based on the first media file passed in (priority: video > image).

Width and Height Pixel Values for Different Aspect Ratios

Resolution	Aspect Ratio	Width x Height	Width x Height
480p	16:9	864×480	864×486
	4:3	736×544	752×560
	1:1	640×640	640×640
	3:4	544×736	560×752
	9:16	480×864	486×864
	21:9	960×416	992×432
720p	16:9	1248×704	1280×720
	4:3	1120×832	1112×834
	1:1	960×960	960×960
	3:4	832×1120	834×1112
	9:16	704×1248	720×1280
	21:9	1504×640	1470×630
1080p Not supported by Seedance 2.0 fast	16:9	1920×1088	1920×1080
	4:3	1664×1248	1664×1248
	1:1	1440×1440	1440×1440
	3:4	1248×1664	1248×1664
	9:16	1088×1920	1080×1920
	21:9	2176×928	2206×946

duration `integer`

Default: 5

Either duration or frames is required, frames takes priority over duration. If you want to generate a video with an integer number of seconds, it is recommended to specify duration.

Duration of the generated video, only supports integers, unit: seconds.

Seedance 1.0 pro, Seedance 1.0 pro fast: [2, 12]
Seedance 2.0 Series: [4, 15] or set to -1

Note: Seedance 2.0 Series supports two configuration methods

Specify specific duration: Support any integer shown in Table 1
Smart duration: Set to -1, and the model will automatically select an appropriate video length (integer seconds) within the valid range. The actual duration of the generated video can be obtained through the duration field returned by the Query Video Generation Task API. Note that video duration is context-dependent, please set it carefully.

seed `integer`

Default: -1

Seed integer, used to control the randomness of generated content.

Value range: Integers between [-1, 2^31-1].

Note: Under the same request, if the model receives different seed values, such as: not specifying a seed value or setting seed to -1 (a random number will be used instead), or manually changing the seed value, different results will be generated.

Seedance 2.0 Series: When the model receives the same seed value, it will generate similar results, but does not guarantee exact consistency.

watermark `boolean`

Default: false

Whether the generated video contains a watermark. Enumerated values:

true: The generated video will display an AI-generated watermark in the bottom right corner.
false: The generated video does not contain a watermark.

Response Parameters

id `string`

Video generation task ID.

Set draft: true: Draft video task ID.
Set draft: false: Normal video task ID.

The Create Video Generation Task is an asynchronous interface. After obtaining the id, you need to use the Query Video Generation Task API to query the status of the video generation task. After the task succeeds, the video_url of the generated video will be output.

Create Video Generation Task API ​

Model Capabilities ​

Request Parameters ​

Request Information ​

Request Headers ​

Request Body ​

model string Required ​

content object Required ​

Information Types ​

callback_url string ​

return_last_frame boolean ​

execution_expires_after integer ​

generate_audio boolean ​

tools object ​

safety_identifier string ​

priority integer ​

resolution string ​

ratio string ​

duration integer ​

seed integer ​

watermark boolean ​

Response Parameters ​

id string ​

Create Video Generation Task API

Model Capabilities

Request Parameters

Request Information

Request Headers

Request Body

model `string` Required

content `object` Required

Information Types

callback_url `string`

return_last_frame `boolean`

execution_expires_after `integer`

generate_audio `boolean`

tools `object`

safety_identifier `string`

priority `integer`

resolution `string`

ratio `string`

duration `integer`

seed `integer`

watermark `boolean`

Response Parameters

id `string`