Skip to content
Request
Response

Create Video Generation Task API

This document describes the input and output parameters of the Create Video Generation Task API for your reference when using the interface. The model generates videos based on the input images and text information. After generation is complete, you can query tasks by conditions and obtain the generated videos.

Model Capabilities

Doubao Seedance 2.0 Series (Audio/Video)
Multimodal Reference Video Generation: Input reference images (0~9) + reference videos (0~3) + reference audios (0~3) + text prompts (optional) to generate 1 target video. Note that audio cannot be input alone; at least 1 reference video or image must be included. Supports generating new videos, editing videos, and extending videos. Read tutorials for detailed code examples.
Image-to-Video - First and Last Frames: Input first frame image + last frame image + text prompts (optional) to generate 1 target video.
Image-to-Video - First Frame: Input first frame image + text prompts (optional) to generate 1 target video.
Text-to-Video: Input text prompts to generate 1 target video.

Note
Image-to-Video - First Frame, Image-to-Video - First and Last Frames, and Multimodal Reference Video Generation are three mutually exclusive scenarios. If you need to use both reference images and first/last frame control simultaneously, you can specify the first/last frames through prompts in multimodal reference mode; if you need to strictly ensure that the first and last frames are exactly the same as the specified images, please use Image-to-Video - First and Last Frames mode first.

Request Parameters

Request Information

ItemValue
Request MethodPOST
Official Request URLhttps://ai-tokenhub.com/api/v2/contents/generations/tasks
Content-Typeapplication/json
AuthenticationAPI Key Authentication (Bearer Token)
Task Data Retention Period7 days (calculated from task creation time)

Request Headers

Field NameRequiredDescription
AuthorizationFormat: Bearer <your_api_key>, where <your_api_key> is the long-term API key you obtained from the Tianshu platform
Content-TypeFixed value: application/json

Request Body


model string Required

The Model ID of the model you need to call. Enable the model service and query the Model ID.


content object Required

Information input to the model for video generation, supporting text, images, audio, and video.

Supported Input Combinations:

  • Text
  • Text (optional) + Image
  • Text (optional) + Video
  • Text (optional) + Image + Video
  • Text (optional) + Video + Audio
  • Text (optional) + Image + Audio
  • Text (optional) + Image + Video + Audio

Information Types

Text Information object

Prompt information input to the model.

  • content.type string Required: Type of input content, should be text here
  • content.text string Required: Text prompt input to the model, describing the expected video to be generated.
Note
  • Prompt Language Support: All models support Chinese and English prompts, as well as Japanese, Indonesian, Spanish, and Portuguese.
  • Prompt Word Count Recommendation: Chinese prompts should not exceed 500 characters, English prompts should not exceed 1000 words. Excessive word count can easily lead to scattered information, and the model may ignore details and only focus on key points, resulting in missing elements in the video.
  • For more usage tips: Please refer to the seedance Prompt Guide for detailed prompt usage tips.
Image Information object

Image information input to the model.

Properties:

  • content.type string Required: Type of input content, should be image_url here.
  • content.image_url object Required: Image object input to the model.
  • content.image_url.url string Required: Image URL, image Base64 encoding, or material ID.
    • Image URL: Fill in the public URL of the image.
    • Base64 Encoding: Convert local files to Base64 encoded strings, then submit to the large model. Follow the format: data:image/<image_format>;base64,<Base64_encoding>. Note that <image_format> should be lowercase, e.g., data:image/png;base64,<Base64 image>.
    • Material ID: ID for preset materials and virtual avatars used for video generation, following the format: asset://< ASSET_ID >. For details, see Material Library Usage Guide.
Single Image Requirements
  • Format: jpeg, png, webp, bmp, tiff, gif, heic, heif
  • Aspect Ratio (width/height): (0.4, 2.5)
  • Width/Height Length (px): (300, 6000)
  • Size: Single image less than 30 MB. Request body size does not exceed 64 MB. Do not use Base64 encoding for large files.

Image Count:

  • Image-to-Video - First Frame: 1 image
  • Image-to-Video - First and Last Frames: 2 images
  • Seedance 2.0 Series Multimodal Reference Video Generation: 1-9 images
  • content.role string Conditionally Required: Position or purpose of the image.

Note

  • Image-to-Video - First Frame, Image-to-Video - First and Last Frames, and Multimodal Reference Video Generation (including reference images, videos, and audio) are 3 mutually exclusive scenarios and cannot be mixed.
  • Multimodal Reference Video Generation can specify reference images as first/last frames through prompts, indirectly achieving "first/last frames + multimodal reference" effect. If you need to strictly ensure that the first and last frames match the specified images, use Image-to-Video - First and Last Frames mode first (configure role as first_frame / last_frame).
  • Image-to-Video - First Frame-role Value: Need to pass 1 image_url object, role is first_frame or not filled.

    Image-to-Video - First and Last Frames-role Value: Need to pass 2 image_url objects, and role is required.

    • Role for first frame image: first_frame
    • Role for last frame image: last_frame

    Image-to-Video - Reference Image-role Value: Required, role for each reference image is: reference_image

    Note

    The input first and last frame images can be the same. When the aspect ratios of the first and last frame images are inconsistent, the first frame image is used as the main reference, and the last frame image will be automatically cropped to fit.

    Video Information object

    Video information input to the model.

    Properties:

    • content.type string Required: Type of input content, should be video_url here.
    • content.video_url object Required: Video object input to the model.
    • content.video_url.url string Required: Video URL or material ID.

    Video URL:

    • Fill in the public URL of the video.
    Single Video Requirements
    • Video Format: `mp4`, `mov`. See table below for supported codecs.
    • Resolution: 480p, 720p, 1080p.
    • Duration: Single video duration [2, 15]s. Maximum 3 reference videos. Total duration of all videos does not exceed 15s.
    • Duration: Single video duration [2, 15]s. Maximum 3 reference videos. Total duration of all videos does not exceed 15s.
    • Dimensions:
      • Aspect Ratio (width/height): [0.4, 2.5]
      • Width/Height Length (px): [300, 6000]
      • Total Pixels: (640×640=409600, 2206×2046=2086876), i.e., the product of width and height should be in the range [409600, 2086876].
    • Size: Single video does not exceed 50 MB.
    • Frame Rate (FPS): [24, 60]

    Container Format

    Container FormatCommon File ExtensionsMIMESupported Codecs
    MP4mp4video/mp4Video: H.264/AVC, H.265/HEVC; Audio: AAC, MP3
    QuickTimemovvideo/quicktimeVideo: H.264/AVC, H.265/HEVC; Audio: AAC, MP3
    • content.role string Conditionally Required: Position or purpose of the video. Currently only supports reference_video.
    Audio Information object

    Note: Audio cannot be input alone; at least 1 reference video or image must be included.

    Audio information input to the model.

    Properties:

    • content.type string Required: Type of input content, should be audio_url here.
    • content.audio_url object Required: Audio object input to the model.
    • content.audio_url.url string Required: Audio URL, audio Base64 encoding, or material ID.
    • Audio URL: Fill in the public URL of the audio.
    • Base64 Encoding: Convert local files to Base64 encoded strings, then submit to the large model. Follow the format: data:audio/< audio_format >;base64,< Base64_encoding >. Note that < audio_format > should be lowercase, e.g., data:audio/wav;base64,< base64 audio >.
    • Material ID: ID for preset materials and virtual avatars used for video generation, following the format: asset://< ASSET_ID >.
    Single Audio Requirements
    • Format: `wav`, `mp3`
    • Duration: Single audio duration [2, 15]s. Maximum 3 reference audios. Total duration of all audios does not exceed 15s.
    • Size: Single audio does not exceed 15 MB. Request body size does not exceed 64 MB. Do not use Base64 encoding for large files.
    • content.role string Conditionally Required: Position or purpose of the audio. Currently only supports reference_audio.

    callback_url string

    Fill in the callback notification address for the results of this generation task. When the video generation task has status changes, a POST request will be pushed to this address.

    The callback request content structure is the same as the return body of the Query Task API.

    Callback returned status includes the following:

    • queued: Queued
    • running: Task running
    • succeeded: Task succeeded (if sending fails, i.e., no successful sending information is received within 5 seconds, callback three times)
    • failed: Task failed (if sending fails, i.e., no successful sending information is received within 5 seconds, callback three times)
    • expired: Task timed out, i.e., the task has been in running or queued status for longer than the expiration time. The expiration time can be set through the execution_expires_after field.

    return_last_frame boolean

    Default: false

    • true: Return the last frame image of the generated video. After setting to true, the last frame image of the video can be obtained through the Query Video Generation Task interface. The format of the last frame image is png, with width and height pixel values consistent with the generated video, no watermark. Using this parameter can achieve multiple consecutive videos: use the last frame of the previous generated video as the first frame of the next video task to quickly generate multiple consecutive videos. For call examples, see the official tutorial of Volcengine.
    • false: Do not return the last frame image of the generated video.

    execution_expires_after integer

    Default: 172800

    Task timeout threshold. Specify the expiration time (in seconds) after task submission, calculated from the created_at timestamp. The default value is 172800 seconds, which is 48 hours. Recommended range: [3600, 259200].

    Regardless of which service tier is used, it is recommended to set an appropriate timeout based on the business scenario. After exceeding this time, the task will be automatically terminated and marked as expired status.


    generate_audio boolean

    Default: true

    Control whether the generated video contains synchronized audio with the video.

    • true: The video output by the model contains synchronized audio. The model will automatically generate matching human voice, sound effects, and background music based on the text prompts and visual content. It is recommended to place dialogue parts in double quotes to optimize audio generation. For example: The man stopped the woman and said: "Remember, you must not point at the moon with your finger in the future."
    • false: The video output by the model is silent.
    Note: All generated audio videos are mono-channel, regardless of the number of channels in the input audio.

    tools object

    Configure tools to be called by the model.

    Properties:

    • tools.type string: Specify the tool type to use.
      • web_search: Web search tool.
    Note
    • After enabling web search, the model will independently determine whether to search the Internet (such as products, weather, etc.) based on the user's prompts. This can improve the timeliness of generated videos but will also increase certain latency.
    • The actual number of searches can be obtained through the `usage.tool_usage.web_search` field returned by the Query Video Generation Task API. If it is 0, it means no search was performed.

    safety_identifier string

    Unique identifier for end users, used to assist the platform in detecting users in your application who may violate the Volcano Ark usage policy.

    This identifier is an English string, must be fixed and unique for a single user, and cannot exceed 64 characters. It is recommended to pass a string generated by hashing the username, user ID, or email to avoid leaking user privacy information.


    priority integer

    Default: 0

    Set the execution priority of the current request, which determines its position in the queue. Value range: 0-9, the higher the value, the higher the priority.

    By default, requests are executed in FIFO (First In, First Out) order. After setting a higher priority, the request will be inserted before all lower priority requests under the same Endpoint (inference access point).

    Example:

    An Endpoint currently has 3 queued (status=queued) tasks in its queue, all with priority 0 (default). Queue: [Request A:priority=0] → [Task B:priority=0] → [Task C:priority=0] At this time, a new request with priority=5 is submitted, which will be directly placed at the head of the queue: Queue: [New Request:priority=5] → [Task A:priority=0] → [Task B:priority=0] → [Task C:priority=0]

    Note
    • Requests with the same priority are still sorted by FIFO
    • Priority only affects the queuing order and does not interrupt tasks that are currently executing (status=running)
    • Priority only takes effect within the same Endpoint and does not affect tasks in other Endpoints
    • Offline inference mode (service_tier=flex) does not support configuring priority

    Parameter Upgrade Notes
    • For `resolution`, `ratio`, `duration`, `frames`, `seed`, `camera_fixed`, and `watermark` parameters, the platform has upgraded the parameter passing method, as shown in the example below. All models are still compatible with the old parameter passing method.
    • Different models may support different parameters and values. See Output Video Format for details. When the input parameters or values do not match the selected model, the parameters will be ignored or trigger an error.
    • New Method: Pass parameters directly in the request body. This method is strongly validated. If parameters are filled incorrectly, the model will return an error prompt.
    • Old Method: Append `--[parameters]` after the text prompt. This method is weakly validated. If parameters are filled incorrectly, they will be ignored or trigger an error.

    New Method (Recommended): Pass parameters directly in the request body

    json
    // Specify the aspect ratio of the generated video as 16:9, duration as 5 seconds,
    // resolution as 720p, seed as 11, and include a watermark. The camera is not fixed.
    {
      "model": "doubao-seedance-1-5-pro-251215",
      "content": [
        {
          "type": "text",
          "text": "A kitten yawns at the camera"
        }
      ],
      // All parameters must be written in full; abbreviations are not supported
      "resolution": "720p",
      "ratio": "16:9",
      "duration": 5,
      // "frames": 20, Either duration or frames is required
      "seed": 11,
      "camera_fixed": false,
      "watermark": true
    }

    Old Method: Append --[parameters] after the text prompt

    json
    {
        "id": "019e640a-348e-74f8-8aff-9f7520b76359",
        "upstreamTaskId": "cgt-20260526192731-qjwzf",
        "status": "submitted",
        "model": "doubao-seedance-2-0-260128",
        "createdAt": "2026-05-26T11:27:30.95832Z"
    }

    resolution string

    Default: 720p

    Video resolution. Enumerated values:

    • 480p
    • 720p
    • 1080p: Not supported by Seedance 2.0 fast

    ratio string

    Default: adaptive

    Aspect ratio of the generated video. See the table below for width and height pixel values corresponding to different aspect ratios.

    Enumerated values:

    • 16:9
    • 4:3
    • 1:1
    • 3:4
    • 9:16
    • 21:9
    • adaptive: Automatically select the most suitable aspect ratio based on input (see description below)
    adaptive Adaptation Rules

    When ratio is configured as adaptive, the model will automatically adapt the aspect ratio according to the generation scenario; the actual aspect ratio of the generated video can be obtained through the ratio field returned by the Query Video Generation Task API.

    Value Rules:

    • Text-to-Video: Intelligently select the most suitable aspect ratio based on the input prompts.
    • First Frame / First and Last Frames Video Generation: Automatically select the closest aspect ratio based on the aspect ratio of the uploaded first frame image.
    • Multimodal Reference Video Generation: Judged based on user prompt intent. If it is first frame video generation/video editing/video extension, select the closest aspect ratio based on the image/video; otherwise, select the closest aspect ratio based on the first media file passed in (priority: video > image).

    Width and Height Pixel Values for Different Aspect Ratios

    ResolutionAspect RatioWidth x HeightWidth x Height
    480p16:9864×480864×486
    4:3736×544752×560
    1:1640×640640×640
    3:4544×736560×752
    9:16480×864486×864
    21:9960×416992×432
    720p16:91248×7041280×720
    4:31120×8321112×834
    1:1960×960960×960
    3:4832×1120834×1112
    9:16704×1248720×1280
    21:91504×6401470×630
    1080p
    Not supported by Seedance 2.0 fast
    16:91920×10881920×1080
    4:31664×12481664×1248
    1:11440×14401440×1440
    3:41248×16641248×1664
    9:161088×19201080×1920
    21:92176×9282206×946

    duration integer

    Default: 5

    Either duration or frames is required, frames takes priority over duration. If you want to generate a video with an integer number of seconds, it is recommended to specify duration.

    Duration of the generated video, only supports integers, unit: seconds.

    • Seedance 1.0 pro, Seedance 1.0 pro fast: [2, 12]

    • Seedance 2.0 Series: [4, 15] or set to -1

    Note: Seedance 2.0 Series supports two configuration methods

    • Specify specific duration: Support any integer shown in Table 1
    • Smart duration: Set to -1, and the model will automatically select an appropriate video length (integer seconds) within the valid range. The actual duration of the generated video can be obtained through the duration field returned by the Query Video Generation Task API. Note that video duration is context-dependent, please set it carefully.

    seed integer

    Default: -1

    Seed integer, used to control the randomness of generated content.

    Value range: Integers between [-1, 2^31-1].

    Note: Under the same request, if the model receives different seed values, such as: not specifying a seed value or setting seed to -1 (a random number will be used instead), or manually changing the seed value, different results will be generated.

    Seedance 2.0 Series: When the model receives the same seed value, it will generate similar results, but does not guarantee exact consistency.

    watermark boolean

    Default: false

    Whether the generated video contains a watermark. Enumerated values:

    • true: The generated video will display an AI-generated watermark in the bottom right corner.
    • false: The generated video does not contain a watermark.

    Response Parameters

    id string

    Video generation task ID.

    • Set draft: true: Draft video task ID.
    • Set draft: false: Normal video task ID.

    The Create Video Generation Task is an asynchronous interface. After obtaining the id, you need to use the Query Video Generation Task API to query the status of the video generation task. After the task succeeds, the video_url of the generated video will be output.