Create Video Generation Task API
This document describes the input and output parameters of the Create Video Generation Task API for your reference when using the interface. The model generates videos based on the input images and text information. After generation is complete, you can query tasks by conditions and obtain the generated videos.
Model Capabilities
Doubao Seedance 2.0 Series (Audio/Video)
Multimodal Reference Video Generation: Input reference images (0~9) + reference videos (0~3) + reference audios (0~3) + text prompts (optional) to generate 1 target video. Note that audio cannot be input alone; at least 1 reference video or image must be included. Supports generating new videos, editing videos, and extending videos. Read tutorials for detailed code examples.
Image-to-Video - First and Last Frames: Input first frame image + last frame image + text prompts (optional) to generate 1 target video.
Image-to-Video - First Frame: Input first frame image + text prompts (optional) to generate 1 target video.
Text-to-Video: Input text prompts to generate 1 target video.
Note
Image-to-Video - First Frame, Image-to-Video - First and Last Frames, and Multimodal Reference Video Generation are three mutually exclusive scenarios. If you need to use both reference images and first/last frame control simultaneously, you can specify the first/last frames through prompts in multimodal reference mode; if you need to strictly ensure that the first and last frames are exactly the same as the specified images, please use Image-to-Video - First and Last Frames mode first.
Request Parameters
Request Information
| Item | Value |
|---|---|
| Request Method | POST |
| Official Request URL | https://ai-tokenhub.com/api/v2/contents/generations/tasks |
| Content-Type | application/json |
| Authentication | API Key Authentication (Bearer Token) |
| Task Data Retention Period | 7 days (calculated from task creation time) |
Request Headers
| Field Name | Required | Description |
|---|---|---|
| Authorization | ✓ | Format: Bearer <your_api_key>, where <your_api_key> is the long-term API key you obtained from the Tianshu platform |
| Content-Type | ✓ | Fixed value: application/json |
Request Body
model string Required
The Model ID of the model you need to call. Enable the model service and query the Model ID.
content object Required
Information input to the model for video generation, supporting text, images, audio, and video.
Supported Input Combinations:
- Text
- Text (optional) + Image
- Text (optional) + Video
- Text (optional) + Image + Video
- Text (optional) + Video + Audio
- Text (optional) + Image + Audio
- Text (optional) + Image + Video + Audio
Information Types
Text Information object
Prompt information input to the model.
- content.type
stringRequired: Type of input content, should betexthere - content.text
stringRequired: Text prompt input to the model, describing the expected video to be generated.
- Prompt Language Support: All models support Chinese and English prompts, as well as Japanese, Indonesian, Spanish, and Portuguese.
- Prompt Word Count Recommendation: Chinese prompts should not exceed 500 characters, English prompts should not exceed 1000 words. Excessive word count can easily lead to scattered information, and the model may ignore details and only focus on key points, resulting in missing elements in the video.
- For more usage tips: Please refer to the seedance Prompt Guide for detailed prompt usage tips.
Image Information object
Image information input to the model.
Properties:
- content.type
stringRequired: Type of input content, should beimage_urlhere. - content.image_url
objectRequired: Image object input to the model. - content.image_url.url
stringRequired: Image URL, image Base64 encoding, or material ID.- Image URL: Fill in the public URL of the image.
- Base64 Encoding: Convert local files to Base64 encoded strings, then submit to the large model. Follow the format:
data:image/<image_format>;base64,<Base64_encoding>. Note that<image_format>should be lowercase, e.g.,data:image/png;base64,<Base64 image>. - Material ID: ID for preset materials and virtual avatars used for video generation, following the format: asset://< ASSET_ID >. For details, see Material Library Usage Guide.
- Format:
jpeg,png,webp,bmp,tiff,gif,heic,heif - Aspect Ratio (width/height): (0.4, 2.5)
- Width/Height Length (px): (300, 6000)
- Size: Single image less than 30 MB. Request body size does not exceed 64 MB. Do not use Base64 encoding for large files.
Image Count:
- Image-to-Video - First Frame: 1 image
- Image-to-Video - First and Last Frames: 2 images
- Seedance 2.0 Series Multimodal Reference Video Generation: 1-9 images
- content.role
stringConditionally Required: Position or purpose of the image.
Note
Image-to-Video - First Frame, Image-to-Video - First and Last Frames, and Multimodal Reference Video Generation (including reference images, videos, and audio) are 3 mutually exclusive scenarios and cannot be mixed. Multimodal Reference Video Generation can specify reference images as first/last frames through prompts, indirectly achieving "first/last frames + multimodal reference" effect. If you need to strictly ensure that the first and last frames match the specified images, use Image-to-Video - First and Last Frames mode first (configure role as first_frame/last_frame).
Image-to-Video - First Frame-role Value: Need to pass 1 image_url object, role is first_frame or not filled.
Image-to-Video - First and Last Frames-role Value: Need to pass 2 image_url objects, and role is required.
- Role for first frame image:
first_frame - Role for last frame image:
last_frame
Image-to-Video - Reference Image-role Value: Required, role for each reference image is: reference_image
The input first and last frame images can be the same. When the aspect ratios of the first and last frame images are inconsistent, the first frame image is used as the main reference, and the last frame image will be automatically cropped to fit.
Video Information object
Video information input to the model.
Properties:
- content.type
stringRequired: Type of input content, should bevideo_urlhere. - content.video_url
objectRequired: Video object input to the model. - content.video_url.url
stringRequired: Video URL or material ID.
Video URL:
- Fill in the public URL of the video.
- Video Format: `mp4`, `mov`. See table below for supported codecs.
- Resolution: 480p, 720p, 1080p.
- Duration: Single video duration [2, 15]s. Maximum 3 reference videos. Total duration of all videos does not exceed 15s.
- Duration: Single video duration [2, 15]s. Maximum 3 reference videos. Total duration of all videos does not exceed 15s.
- Dimensions:
- Aspect Ratio (width/height): [0.4, 2.5]
- Width/Height Length (px): [300, 6000]
- Total Pixels: (640×640=409600, 2206×2046=2086876), i.e., the product of width and height should be in the range [409600, 2086876].
- Size: Single video does not exceed 50 MB.
- Frame Rate (FPS): [24, 60]
Container Format
| Container Format | Common File Extensions | MIME | Supported Codecs |
|---|---|---|---|
| MP4 | mp4 | video/mp4 | Video: H.264/AVC, H.265/HEVC; Audio: AAC, MP3 |
| QuickTime | mov | video/quicktime | Video: H.264/AVC, H.265/HEVC; Audio: AAC, MP3 |
- content.role
stringConditionally Required: Position or purpose of the video. Currently only supportsreference_video.
Audio Information object
Note: Audio cannot be input alone; at least 1 reference video or image must be included.
Audio information input to the model.
Properties:
- content.type
stringRequired: Type of input content, should beaudio_urlhere. - content.audio_url
objectRequired: Audio object input to the model. - content.audio_url.url
stringRequired: Audio URL, audio Base64 encoding, or material ID.
- Audio URL: Fill in the public URL of the audio.
- Base64 Encoding: Convert local files to Base64 encoded strings, then submit to the large model. Follow the format: data:audio/< audio_format >;base64,< Base64_encoding >. Note that < audio_format > should be lowercase, e.g., data:audio/wav;base64,< base64 audio >.
- Material ID: ID for preset materials and virtual avatars used for video generation, following the format: asset://< ASSET_ID >.
- Format: `wav`, `mp3`
- Duration: Single audio duration [2, 15]s. Maximum 3 reference audios. Total duration of all audios does not exceed 15s.
- Size: Single audio does not exceed 15 MB. Request body size does not exceed 64 MB. Do not use Base64 encoding for large files.
- content.role
stringConditionally Required: Position or purpose of the audio. Currently only supportsreference_audio.
callback_url string
Fill in the callback notification address for the results of this generation task. When the video generation task has status changes, a POST request will be pushed to this address.
The callback request content structure is the same as the return body of the Query Task API.
Callback returned status includes the following:
queued: Queuedrunning: Task runningsucceeded: Task succeeded (if sending fails, i.e., no successful sending information is received within 5 seconds, callback three times)failed: Task failed (if sending fails, i.e., no successful sending information is received within 5 seconds, callback three times)expired: Task timed out, i.e., the task has been in running or queued status for longer than the expiration time. The expiration time can be set through theexecution_expires_afterfield.
return_last_frame boolean
Default: false
true: Return the last frame image of the generated video. After setting totrue, the last frame image of the video can be obtained through the Query Video Generation Task interface. The format of the last frame image ispng, with width and height pixel values consistent with the generated video, no watermark. Using this parameter can achieve multiple consecutive videos: use the last frame of the previous generated video as the first frame of the next video task to quickly generate multiple consecutive videos. For call examples, see the official tutorial of Volcengine.false: Do not return the last frame image of the generated video.
execution_expires_after integer
Default: 172800
Task timeout threshold. Specify the expiration time (in seconds) after task submission, calculated from the created_at timestamp. The default value is 172800 seconds, which is 48 hours. Recommended range: [3600, 259200].
Regardless of which service tier is used, it is recommended to set an appropriate timeout based on the business scenario. After exceeding this time, the task will be automatically terminated and marked as expired status.
generate_audio boolean
Default: true
Control whether the generated video contains synchronized audio with the video.
true: The video output by the model contains synchronized audio. The model will automatically generate matching human voice, sound effects, and background music based on the text prompts and visual content. It is recommended to place dialogue parts in double quotes to optimize audio generation. For example:The man stopped the woman and said: "Remember, you must not point at the moon with your finger in the future."false: The video output by the model is silent.
Note: All generated audio videos are mono-channel, regardless of the number of channels in the input audio.
tools object
Configure tools to be called by the model.
Properties:
- tools.type
string: Specify the tool type to use.web_search: Web search tool.
- After enabling web search, the model will independently determine whether to search the Internet (such as products, weather, etc.) based on the user's prompts. This can improve the timeliness of generated videos but will also increase certain latency.
- The actual number of searches can be obtained through the `usage.tool_usage.web_search` field returned by the Query Video Generation Task API. If it is 0, it means no search was performed.
safety_identifier string
Unique identifier for end users, used to assist the platform in detecting users in your application who may violate the Volcano Ark usage policy.
This identifier is an English string, must be fixed and unique for a single user, and cannot exceed 64 characters. It is recommended to pass a string generated by hashing the username, user ID, or email to avoid leaking user privacy information.
priority integer
Default: 0
Set the execution priority of the current request, which determines its position in the queue. Value range: 0-9, the higher the value, the higher the priority.
By default, requests are executed in FIFO (First In, First Out) order. After setting a higher priority, the request will be inserted before all lower priority requests under the same Endpoint (inference access point).
Example:
An Endpoint currently has 3 queued (status=queued) tasks in its queue, all with priority 0 (default). Queue: [Request A:priority=0] → [Task B:priority=0] → [Task C:priority=0] At this time, a new request with priority=5 is submitted, which will be directly placed at the head of the queue: Queue: [New Request:priority=5] → [Task A:priority=0] → [Task B:priority=0] → [Task C:priority=0]
- Requests with the same priority are still sorted by FIFO
- Priority only affects the queuing order and does not interrupt tasks that are currently executing (status=running)
- Priority only takes effect within the same Endpoint and does not affect tasks in other Endpoints
- Offline inference mode (service_tier=flex) does not support configuring priority
- For `resolution`, `ratio`, `duration`, `frames`, `seed`, `camera_fixed`, and `watermark` parameters, the platform has upgraded the parameter passing method, as shown in the example below. All models are still compatible with the old parameter passing method.
- Different models may support different parameters and values. See Output Video Format for details. When the input parameters or values do not match the selected model, the parameters will be ignored or trigger an error.
- New Method: Pass parameters directly in the request body. This method is strongly validated. If parameters are filled incorrectly, the model will return an error prompt.
- Old Method: Append `--[parameters]` after the text prompt. This method is weakly validated. If parameters are filled incorrectly, they will be ignored or trigger an error.
New Method (Recommended): Pass parameters directly in the request body
// Specify the aspect ratio of the generated video as 16:9, duration as 5 seconds,
// resolution as 720p, seed as 11, and include a watermark. The camera is not fixed.
{
"model": "doubao-seedance-1-5-pro-251215",
"content": [
{
"type": "text",
"text": "A kitten yawns at the camera"
}
],
// All parameters must be written in full; abbreviations are not supported
"resolution": "720p",
"ratio": "16:9",
"duration": 5,
// "frames": 20, Either duration or frames is required
"seed": 11,
"camera_fixed": false,
"watermark": true
}Old Method: Append --[parameters] after the text prompt
{
"id": "019e640a-348e-74f8-8aff-9f7520b76359",
"upstreamTaskId": "cgt-20260526192731-qjwzf",
"status": "submitted",
"model": "doubao-seedance-2-0-260128",
"createdAt": "2026-05-26T11:27:30.95832Z"
}resolution string
Default: 720p
Video resolution. Enumerated values:
- 480p
- 720p
- 1080p: Not supported by Seedance 2.0 fast
ratio string
Default: adaptive
Aspect ratio of the generated video. See the table below for width and height pixel values corresponding to different aspect ratios.
Enumerated values:
- 16:9
- 4:3
- 1:1
- 3:4
- 9:16
- 21:9
- adaptive: Automatically select the most suitable aspect ratio based on input (see description below)
When ratio is configured as adaptive, the model will automatically adapt the aspect ratio according to the generation scenario; the actual aspect ratio of the generated video can be obtained through the ratio field returned by the Query Video Generation Task API.
Value Rules:
- Text-to-Video: Intelligently select the most suitable aspect ratio based on the input prompts.
- First Frame / First and Last Frames Video Generation: Automatically select the closest aspect ratio based on the aspect ratio of the uploaded first frame image.
- Multimodal Reference Video Generation: Judged based on user prompt intent. If it is first frame video generation/video editing/video extension, select the closest aspect ratio based on the image/video; otherwise, select the closest aspect ratio based on the first media file passed in (priority: video > image).
Width and Height Pixel Values for Different Aspect Ratios
| Resolution | Aspect Ratio | Width x Height | Width x Height |
|---|---|---|---|
| 480p | 16:9 | 864×480 | 864×486 |
| 4:3 | 736×544 | 752×560 | |
| 1:1 | 640×640 | 640×640 | |
| 3:4 | 544×736 | 560×752 | |
| 9:16 | 480×864 | 486×864 | |
| 21:9 | 960×416 | 992×432 | |
| 720p | 16:9 | 1248×704 | 1280×720 |
| 4:3 | 1120×832 | 1112×834 | |
| 1:1 | 960×960 | 960×960 | |
| 3:4 | 832×1120 | 834×1112 | |
| 9:16 | 704×1248 | 720×1280 | |
| 21:9 | 1504×640 | 1470×630 | |
| 1080p Not supported by Seedance 2.0 fast | 16:9 | 1920×1088 | 1920×1080 |
| 4:3 | 1664×1248 | 1664×1248 | |
| 1:1 | 1440×1440 | 1440×1440 | |
| 3:4 | 1248×1664 | 1248×1664 | |
| 9:16 | 1088×1920 | 1080×1920 | |
| 21:9 | 2176×928 | 2206×946 |
duration integer
Default: 5
Duration of the generated video, only supports integers, unit: seconds.
Seedance 1.0 pro, Seedance 1.0 pro fast: [2, 12]
Seedance 2.0 Series: [4, 15] or set to -1
Note: Seedance 2.0 Series supports two configuration methods
- Specify specific duration: Support any integer shown in Table 1
- Smart duration: Set to -1, and the model will automatically select an appropriate video length (integer seconds) within the valid range. The actual duration of the generated video can be obtained through the duration field returned by the Query Video Generation Task API. Note that video duration is context-dependent, please set it carefully.
seed integer
Default: -1
Seed integer, used to control the randomness of generated content.
Value range: Integers between [-1, 2^31-1].
Note: Under the same request, if the model receives different seed values, such as: not specifying a seed value or setting seed to -1 (a random number will be used instead), or manually changing the seed value, different results will be generated.
watermark boolean
Default: false
Whether the generated video contains a watermark. Enumerated values:
true: The generated video will display an AI-generated watermark in the bottom right corner.false: The generated video does not contain a watermark.
Response Parameters
id string
Video generation task ID.
- Set
draft: true: Draft video task ID. - Set
draft: false: Normal video task ID.
The Create Video Generation Task is an asynchronous interface. After obtaining the id, you need to use the Query Video Generation Task API to query the status of the video generation task. After the task succeeds, the video_url of the generated video will be output.
