Most text-to-video (T2V) diffusion models depend on pre-trained text encoders for semantic alignment, yet they often fail to maintain video quality when provided with concise prompts rather than well-designed ones. The primary issue lies in their limited textual semantics understanding. Moreover, these text encoders cannot rephrase prompts online to better align with user intentions, which limits both the scalability and usability of the models. To address these challenges, we introduce RISE-T2V, which uniquely integrates the processes of prompt rephrasing and semantic feature extraction into a single and seamless step instead of two separate steps. RISE-T2V is universal and can be applied to various pre-trained LLMs and video diffusion models(VDMs), significantly enhancing their capabilities for T2V tasks. We propose an innovative module called the Rephrasing Adapter, enabling diffusion models to utilize text hidden states during the next token prediction of the LLM as a condition for video generation. By employing a Rephrasing Adapter, the video generation model can implicitly rephrase basic prompts into more comprehensive representations that better match the user's intent. Furthermore, we leverage the powerful capabilities of LLMs to enable video generation models to accomplish a broader range of T2V tasks. Extensive experiments demonstrate that RISE-T2V is a versatile framework applicable to different video diffusion model architectures, significantly enhancing the ability of T2V models to generate high-quality videos that align with user intent.

(LLaMA2 + AnimateDiff)
Prompt:
"a young athlete training in swimming"
Prompt:
"black vintage car in museum."
Prompt:
"drone footage of a house on top of the mountain"
Prompt:
"Handheld push-in of middle aged man smiling overlooking the beach"
Prompt:
"person holding a flower"
Prompt:
"The video displays a happy fuzzy panda playing guitar nearby a campfire."
Prompt:
"The video shows a monkey sitting in the stone and scratching his head."
(LLaMA2 + AnimateDiff 2D Anime Style)
Prompt:
"a meerkat looking around"
Prompt:
"light house on the ocean"
(LLaMA2 + AnimateDiff 3D Cartoon Style)
Prompt:
"a bear wearing red jersey"
Prompt:
"a black dog wearing halloween costume"
Prompt:
"a pig wallowing in mud"
Prompt:
"man playing a video game"
(ChatGLM3 + AnimateDiff)
Prompt:
"a pig wallowing in mud"
Prompt:
"close up shot of a burning plant"
Prompt:
"firecrackers in the sky"
Prompt:
"trimming excess leaves on a potted plant"
Prompt:
"view of a jack o lantern with pumpkins in a smoky garden"
(LLaMA2 + CogVideoX)
(LLaMA2 + AnimateDiff)
Prompt:
"A dog on the shore changes from idling to trotting rapidly along the pebbles."
Prompt:
"A teddy bear changes from holding its arms down to raising them in a friendly greeting."
Prompt:
"An astronaut on the Moon transitions from standing still to waving vibrantly above their head."
Prompt:
"The ice cream cone transformed from solid and frozen to melting slowly."
(LLaMA2 + AnimateDiff)
(a)
(b)
(c)
(a) Vue aérienne capturée par un drone des vagues s’écrasant contre les falaises escarpées de la plage de Point Sur à Big Sur.(Drone aerial view capturing the waves crashing against the rugged cliffs of Point Sur Beach in Big Sur.)
(b) Un cygne gracieux nage dans un étang clair.(A graceful swan swims in a clear lake.)
(c) Dans la vidéo, un groupe de poissons-clowns aux couleurs vives orange et blanc navigue à travers le récif corallien.(In the video, a group of brightly colored orange and white clownfish swims through the coral reef.)
(a)
(b)
(c)
(a) 岩の多い海岸にまっすぐ立つ壮大な白い灯台を捉えた映像。周りには果てしなく青い海が広がり、時折波が岸に打ち寄せている。日没時の穏やかなひとときが、シーン全体を夢のように演出している。(A majestic white lighthouse stands straight on the rugged coastline, surrounded by endless blue sea. Occasionally, waves gently hit the shore, creating a soothing atmosphere. As the sun sets, the scene takes on a dreamlike quality, as if the lighthouse is beckoning us to a world of peace and tranquility.)
(b) ビデオには、青い海の中を泳ぐカメが映っている。(The video shows a turtle swimming in the blue sea.)
(c) ビデオには、歳月を経て使い古された船体を持つ小さな木製の帆船が映っている。それは澄み切った夏の空からの日差しを浴びながら、青緑色を帯びた穏やかな海を静かに漂っている。(The video shows a small wooden sailboat with a weathered hull, adrift in a serene and peaceful sea of blue-green water, basking in the warm sunlight of a clear summer day.)
(a)
(b)
(c)
(a) На видео запечатлен захватывающий дух пейзаж с водопадом. Кристально чистая вода падает с высоты 30 футов в глубокий бассейн, окруженный скалами, поднимая туман из брызг, который, словно тонкая вуаль, окутывает весь пейзаж в солнечном свете.(The video captures a breathtaking waterfall landscape. Crystal clear water tumbles down from a height of 30 feet, plunging into a deep pool surrounded by rocks, raising a mist of spray that spreads like a thin veil, enveloping the entire landscape in sunlight.)
(b) На видео запечатлена великолепная панорама живописного пейзажа Норвегии в золотой час. Солнце бросает теплый золотистый отблеск на крутые горы, сверкающие озера и пышную растительность, создавая атмосферу спокойствия и красоты.(The video captures the magnificent panorama of Norway's picturesque landscape at the golden hour. The sun casts a warm golden glow on the steep mountains, sparkling lakes, and lush vegetation, creating an atmosphere of tranquility and beauty.)
(c) На видео запечатлено ослепительное шоу фейерверков. Разноцветные фейерверки вспыхивают в ночном небе, яркие красные, зеленые и синие огни освещают темноту. Взрывы фейерверков следуют один за другим, создавая завораживающее зрелище света и звука, наполняющее воздух.(The video captures a dazzling firework show. Colorful fireworks erupt in the night sky, as bright red, green, and blue lights illuminate the darkness. The explosions follow one after another, creating a mesmerizing spectacle of light and sound that fills the air.)