Kling TTS Text to Speech
About
Kling TTS API. This endpoint generates an audio of speech from text.
1. Calling the API#
Install the client#
The client provides a convenient way to interact with the model API.
npm install --save @fal-ai/clientMigrate to @fal-ai/client
The @fal-ai/serverless-client package has been deprecated in favor of @fal-ai/client. Please check the migration guide for more information.
Setup your API Key#
Set FAL_KEY as an environment variable in your runtime.
export FAL_KEY="YOUR_API_KEY"Submit a request#
The client API handles the API submit protocol. It will handle the request status updates and return the result when the request is completed.
import { fal } from "@fal-ai/client";
const result = await fal.subscribe("fal-ai/kling-video/v1/tts", {
input: {
text: "Hello world! Kling TTS is available on FAL!"
},
logs: true,
onQueueUpdate: (update) => {
if (update.status === "IN_PROGRESS") {
update.logs.map((log) => log.message).forEach(console.log);
}
},
});
console.log(result.data);
console.log(result.requestId);2. Authentication#
The API uses an API Key for authentication. It is recommended you set the FAL_KEY environment variable in your runtime when possible.
API Key#
import { fal } from "@fal-ai/client";
fal.config({
credentials: "YOUR_FAL_KEY"
});Protect your API Key
When running code on the client-side (e.g. in a browser, mobile app or GUI applications), make sure to not expose your FAL_KEY. Instead, use a server-side proxy to make requests to the API. For more information, check out our server-side integration guide.
3. Queue#
Submit a request#
The client API provides a convenient way to submit requests to the model.
import { fal } from "@fal-ai/client";
const { request_id } = await fal.queue.submit("fal-ai/kling-video/v1/tts", {
input: {
text: "Hello world! Kling TTS is available on FAL!"
},
webhookUrl: "https://optional.webhook.url/for/results",
});Fetch request status#
You can fetch the status of a request to check if it is completed or still in progress.
import { fal } from "@fal-ai/client";
const status = await fal.queue.status("fal-ai/kling-video/v1/tts", {
requestId: "764cabcf-b745-4b3e-ae38-1200304cf45b",
logs: true,
});Get the result#
Once the request is completed, you can fetch the result. See the Output Schema for the expected result format.
import { fal } from "@fal-ai/client";
const result = await fal.queue.result("fal-ai/kling-video/v1/tts", {
requestId: "764cabcf-b745-4b3e-ae38-1200304cf45b"
});
console.log(result.data);
console.log(result.requestId);4. Files#
Some attributes in the API accept file URLs as input. Whenever that's the case you can pass your own URL or a Base64 data URI.
Data URI (base64)#
You can pass a Base64 data URI as a file input. The API will handle the file decoding for you. Keep in mind that for large files, this alternative although convenient can impact the request performance.
Hosted files (URL)#
You can also pass your own URLs as long as they are publicly accessible. Be aware that some hosts might block cross-site requests, rate-limit, or consider the request as a bot.
Uploading files#
We provide a convenient file storage that allows you to upload files and use them in your requests. You can upload files using the client API and use the returned URL in your requests.
import { fal } from "@fal-ai/client";
const file = new File(["Hello, World!"], "hello.txt", { type: "text/plain" });
const url = await fal.storage.upload(file);Auto uploads
The client will auto-upload the file for you if you pass a binary object (e.g. File, Data).
Read more about file handling in our file upload guide.
5. Schema#
Input#
text string* requiredThe text to be converted to speech
voice_id VoiceIdEnumThe voice ID to use for speech synthesis Default value: "genshin_vindi2"
Possible enum values: genshin_vindi2, zhinen_xuesheng, AOT, ai_shatang, genshin_klee2, genshin_kirara, ai_kaiya, oversea_male1, ai_chenjiahao_712, girlfriend_4_speech02, chat1_female_new-3, chat_0407_5-1, cartoon-boy-07, uk_boy1, cartoon-girl-01, PeppaPig_platform, ai_huangzhong_712, ai_huangyaoshi_712, ai_laoguowang_712, chengshu_jiejie, you_pingjing, calm_story1, uk_man2, laopopo_speech02, heainainai_speech02, reader_en_m-v1, commercial_lady_en_f-v1, tiyuxi_xuedi, tiexin_nanyou, girlfriend_1_speech02, girlfriend_2_speech02, zhuxi_speech02, uk_oldman3, dongbeilaotie_speech02, chongqingxiaohuo_speech02, chuanmeizi_speech02, chaoshandashu_speech02, ai_taiwan_man2_speech02, xianzhanggui_speech02, tianjinjiejie_speech02, diyinnansang_DB_CN_M_04-v2, yizhipiannan-v1, guanxiaofang-v2, tianmeixuemei-v1, daopianyansang-v1, mengwa-v1
voice_speed floatRate of speech Default value: 1
{
"text": "Hello world! Kling TTS is available on FAL!",
"voice_id": "genshin_vindi2",
"voice_speed": 1
}Output#
The generated audio
{
"audio": {
"url": "https://v3.fal.media/files/monkey/O-ekVTtYqeDblD1oSf2uv_output.mp3"
}
}Other types#
ImageToVideoV2MasterRequest#
prompt string* requiredimage_url string* requiredURL of the image to be used for the video
duration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
TextToVideoV21MasterRequest#
prompt string* requiredduration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
aspect_ratio AspectRatioEnumThe aspect ratio of the generated video frame Default value: "16:9"
Possible enum values: 16:9, 9:16, 1:1
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
File#
url string* requiredThe URL where the file can be downloaded from.
content_type stringThe mime type of the file.
file_name stringThe name of the file. It will be auto-generated if not provided.
file_size integerThe size of the file in bytes.
file_data stringFile data
ImageToVideoRequest#
prompt string* requiredimage_url string* requiredduration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
TextToVideoV25ProRequest#
prompt string* requiredduration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
aspect_ratio AspectRatioEnumThe aspect ratio of the generated video frame Default value: "16:9"
Possible enum values: 16:9, 9:16, 1:1
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
ImageToVideoV21MasterRequest#
prompt string* requiredimage_url string* requiredURL of the image to be used for the video
duration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
CameraControl#
movement_type MovementTypeEnum* requiredThe type of camera movement
Possible enum values: horizontal, vertical, pan, tilt, roll, zoom
movement_value integer* requiredThe value of the camera movement
ProImageToVideoRequest#
prompt string* requiredimage_url string* requiredduration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
aspect_ratio AspectRatioEnumThe aspect ratio of the generated video frame Default value: "16:9"
Possible enum values: 16:9, 9:16, 1:1
tail_image_url stringURL of the image to be used for the end of the video
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
V1TextToVideoRequest#
prompt string* requiredduration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
aspect_ratio AspectRatioEnumThe aspect ratio of the generated video frame Default value: "16:9"
Possible enum values: 16:9, 9:16, 1:1
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
camera_control CameraControlEnumCamera control parameters
Possible enum values: down_back, forward_up, right_turn_forward, left_turn_forward
Advanced Camera control parameters
LipsyncA2VRequest#
video_url string* requiredThe URL of the video to generate the lip sync for. Supports .mp4/.mov, ≤100MB, 2–10s, 720p/1080p only, width/height 720–1920px.
audio_url string* requiredThe URL of the audio to generate the lip sync for. Minimum duration is 2s and maximum duration is 60s. Maximum file size is 5MB.
ImageToVideoV21StandardRequest#
prompt string* requiredimage_url string* requiredURL of the image to be used for the video
duration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
ImageToVideoV25StandardRequest#
prompt string* requiredimage_url string* requiredURL of the image to be used for the video
duration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
ImageToVideoV21ProRequest#
prompt string* requiredimage_url string* requiredURL of the image to be used for the video
duration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
tail_image_url stringURL of the image to be used for the end of the video
LipsyncT2VRequest#
video_url string* requiredThe URL of the video to generate the lip sync for. Supports .mp4/.mov, ≤100MB, 2-60s, 720p/1080p only, width/height 720–1920px. If validation fails, an error is returned.
text string* requiredText content for lip-sync video generation. Max 120 characters.
voice_id VoiceIdEnum* requiredVoice ID to use for speech synthesis
Possible enum values: genshin_vindi2, zhinen_xuesheng, AOT, ai_shatang, genshin_klee2, genshin_kirara, ai_kaiya, oversea_male1, ai_chenjiahao_712, girlfriend_4_speech02, chat1_female_new-3, chat_0407_5-1, cartoon-boy-07, uk_boy1, cartoon-girl-01, PeppaPig_platform, ai_huangzhong_712, ai_huangyaoshi_712, ai_laoguowang_712, chengshu_jiejie, you_pingjing, calm_story1, uk_man2, laopopo_speech02, heainainai_speech02, reader_en_m-v1, commercial_lady_en_f-v1, tiyuxi_xuedi, tiexin_nanyou, girlfriend_1_speech02, girlfriend_2_speech02, zhuxi_speech02, uk_oldman3, dongbeilaotie_speech02, chongqingxiaohuo_speech02, chuanmeizi_speech02, chaoshandashu_speech02, ai_taiwan_man2_speech02, xianzhanggui_speech02, tianjinjiejie_speech02, diyinnansang_DB_CN_M_04-v2, yizhipiannan-v1, guanxiaofang-v2, tianmeixuemei-v1, daopianyansang-v1, mengwa-v1
voice_language VoiceLanguageEnumThe voice language corresponding to the Voice ID Default value: "en"
Possible enum values: zh, en
voice_speed floatSpeech rate for Text to Video generation Default value: 1
DynamicMask#
mask_url string* requiredURL of the image for Dynamic Brush Application Area (Mask image created by users using the motion brush)
List of trajectories
ImageToVideoV25ProRequest#
prompt string* requiredimage_url string* requiredURL of the image to be used for the video
duration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
VideoEffectsRequest#
URL of images to be used for hug, kiss or heart_gesture video.
effect_scene EffectSceneEnum* requiredThe effect scene to use for the video generation
Possible enum values: hug, kiss, heart_gesture, squish, expansion, fuzzyfuzzy, bloombloom, dizzydizzy, jelly_press, jelly_slice, jelly_squish, jelly_jiggle, pixelpixel, yearbook, instant_film, anime_figure, rocketrocket
duration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
V1ImageToVideoRequest#
prompt string* requiredThe prompt for the video
image_url string* requiredURL of the image to be used for the video
duration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
tail_image_url stringURL of the image to be used for the end of the video
static_mask_url stringURL of the image for Static Brush Application Area (Mask image created by users using the motion brush)
List of dynamic masks
MultiImageToVideoRequest#
prompt string* requiredList of image URLs to use for video generation. Supports up to 4 images.
duration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
aspect_ratio AspectRatioEnumThe aspect ratio of the generated video frame Default value: "16:9"
Possible enum values: 16:9, 9:16, 1:1
negative_prompt stringDefault value: "blur, distort, and low quality"
TextToVideoRequest#
prompt string* requiredduration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
aspect_ratio AspectRatioEnumThe aspect ratio of the generated video frame Default value: "16:9"
Possible enum values: 16:9, 9:16, 1:1
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
TextToVideoV2MasterRequest#
prompt string* requiredduration DurationEnumThe duration of the generated video in seconds Default value: "5"
Possible enum values: 5, 10
aspect_ratio AspectRatioEnumThe aspect ratio of the generated video frame Default value: "16:9"
Possible enum values: 16:9, 9:16, 1:1
negative_prompt stringDefault value: "blur, distort, and low quality"
cfg_scale floatThe CFG (Classifier Free Guidance) scale is a measure of how close you want
the model to stick to your prompt. Default value: 0.5
Trajectory#
x integer* requiredX coordinate of the motion trajectory
y integer* requiredY coordinate of the motion trajectory