AI视频进入有声时代!谷歌发布视频生成音频技术,效果惊艳网友!
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">Runway前脚刚发布Gen-3 Alpha,Google后脚就跟了个王炸。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-axegupay5k/e4ee7b7230a144e293a3d2d7a2d41a69~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=pUeYKpJ7URQ3Ip3r1uItGKCZuLw%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">北京时间6月18日凌晨,Google Deepmind发布了</span><strong style="color: blue;"><span style="color: black;">视频生成音频(Video-to-Audio,V2A)技术</span></strong><span style="color: black;">的<span style="color: black;">发展</span>,<span style="color: black;">能够</span>为视频创建</span><strong style="color: blue;"><span style="color: black;">戏剧性的背景音乐</span></strong><span style="color: black;">,</span><strong style="color: blue;"><span style="color: black;">逼真的音效</span></strong><span style="color: black;">,<span style="color: black;">乃至</span>是</span><strong style="color: blue;"><span style="color: black;"><span style="color: black;">名人</span>之间的对话</span></strong><span style="color: black;">。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">V2A技术支持</span><strong style="color: blue;"><span style="color: black;">为AI生成视频“配乐”</span></strong><span style="color: black;">,Google还<span style="color: black;">尤其</span>强调了官网发布的Demo视频都<span style="color: black;">是由于</span>自家在五月份发布的生成视频模型</span><strong style="color: blue;"><span style="color: black;">“Veo”</span></strong><span style="color: black;">和V2A技术</span><strong style="color: blue;"><span style="color: black;">合作打造</span></strong><span style="color: black;">。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">有不少网友<span style="color: black;">暗示</span>,这下<span style="color: black;">最终</span><span style="color: black;">能够</span>给用Luma生成的meme视频配上声音了!</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/04af98549f8c46c290241bae32a85dcd~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=ltWdjBP1e1hCpyuIr%2B5H6qZdxyU%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">从Google Deepmind官网发布的Blog上看,V2A技术会采用</span><strong style="color: blue;"><span style="color: black;">视频像素</span></strong><span style="color: black;">和</span><strong style="color: blue;"><span style="color: black;">文本提示</span></strong><span style="color: black;">来生成与底层视频同步的音频波形。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">首要</span>,V2A会对视频和文本提示进行编码,并<span style="color: black;">经过</span>扩散模型迭代运行,将随机噪声细化为与视频和所<span style="color: black;">供给</span>的文本提示相匹配的真实音频,最后再对音频进行解码并与视频数据相结合。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/7673f63dc7bf43c191df85a0e0ae44aa~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=EK3QQyMFcYyXR%2F5BQTfcS%2BgFFp0%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">X网友纷纷<span style="color: black;">暗示</span>太赞了,但<span style="color: black;">便是</span>有一个小小小小的问题,和<span style="color: black;">一样</span>是凌晨发布的Runway的视频生成模型Gen-3 Alpha<span style="color: black;">同样</span>,这</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">又是一个<span style="color: black;">大众</span>都用不上的超赞模型,到底啥时候开源让咱们试试水!</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/fe7fe808af4246a4976727cf9c8220c7~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=n%2F4XTLoKPFWVlp4mYUaUDe5HV3k%3D" style="width: 50%; margin-bottom: 20px;"></div>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/eceead1cbc654e0aa2d4d62bacb494da~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=2y%2F52pvAMUBR%2BgErW5AjygFWFdU%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">先不急,<span style="color: black;">咱们</span>先尝尝官方发布的Demo咸淡!</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">Google认为,AI视频生成模型飞速发展,但市面上的大<span style="color: black;">都数</span>模型,不管是Sora、Luma,还是<span style="color: black;">刚才</span>发布的Gen-3 Alpha,都只能生成无声视频。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">而Google所<span style="color: black;">开发</span>的V2A技术能够使AI视频进入</span><strong style="color: blue;"><span style="color: black;">“有声时代”</span></strong><span style="color: black;">,进一步推动AI在视听方面的完善与发展。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">从Demo视频来看,其效果真的挺丝滑的,怪不得Google“夸下海口”!</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">例如</span>这个,画面中一个人从前<span style="color: black;">步行到</span>后,<span style="color: black;">能够</span>听到令人不安的背景音乐和嘎吱嘎吱的脚步声。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/a5d758a14d51458a8b3aba195fce6357~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=Ea2DqY2A9sNpfKSIahixNmdzAwU%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">【提示:电影、惊悚片、恐怖片、音乐、紧张感、氛围、混凝土上的脚步声。Prompt for audio: Cinematic, thriller, horror film, music, tension, ambience, footsteps on concrete】</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">诸如此类的视频配乐还有小恐龙破壳的声音、打鼓的音乐声、车流声等等。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/a180e045cfee47059302e08c4a841e11~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=OZ1RPgxb3X1AKy%2FmDZMoYTNaI4M%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">【提示:可爱的小恐龙鸣叫、丛林氛围、鸡蛋破裂。Prompt for audio: Cute baby dinosaur chirps, jungle ambience, egg cracking】</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/a5639824082249a68a04db9a1a11e3f2~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=zn5NHrEza8QZsxx5KP6u%2Bj9iXVQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">【提示:音乐会舞台上的鼓手被闪烁的灯光和欢呼的人群<span style="color: black;">包裹</span>。Prompt for audio: A drummer on a stage at a concert surrounded by flashing lights and a cheering crowd】</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/3cfd6e7c5ce1469cbfec69be8cf18f7f~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=sVGq53eYTxCmMFDZxk0S5MAZXAo%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">【提示:汽车打滑、汽车发动机节流、天使般的电子音乐。Prompt for audio: cars skidding, car engine throttling, angelic electronic music】</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Google还强调,V2A技术之<span style="color: black;">因此</span>能够在卷得要命的AI视频圈“脱颖而出”,是<span style="color: black;">由于</span>该技术<span style="color: black;">能够</span><strong style="color: blue;">理解原始像素</strong>,因而<strong style="color: blue;">哪怕不输入文本提示</strong>,只要用户<span style="color: black;">供给</span>视频,该技术<strong style="color: blue;"><span style="color: black;">亦</span><span style="color: black;">能够</span>为其“配乐”</strong>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">例如,下面吉他声和<span style="color: black;">自动</span>车声是在<span style="color: black;">无</span>任何提示的<span style="color: black;">状况</span>下合成的。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p26-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/141c8594ab914908be13f9fe0aa69cda~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=4Z%2FlWUnhbkPAUbcz6m0925Oe97U%3D" style="width: 50%; margin-bottom: 20px;"></div>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/bcbe7b79f2b14104a70af59b2ac2628a~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=xBoCjwo1RoUFHFPo3vIw5ViE%2Bm8%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">V2A技术<span style="color: black;">亦</span><span style="color: black;">能够</span><strong style="color: blue;">生成<span style="color: black;">名人</span>对话</strong>,<span style="color: black;">例如</span>下面视频中角色所说的台词,“这只火鸡看起来棒极了,我好饿啊(this turkey looks amazing, Im so hungry)”。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/a1d003bbaa9740199522d0ee895455f6~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=Nrn8Ns%2B1SBqvWOAp31neysYHkr0%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">不外</span>从视频中看,<span style="color: black;">名人</span>唇形与台词并不完全匹配,<span style="color: black;">由于</span><strong style="color: blue;">视频模型不会生成与转录文本相匹配的嘴部动作</strong>,Google<span style="color: black;">亦</span>承认这一部分仍在<span style="color: black;">科研</span>完善<span style="color: black;">其中</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">另外</span>,V2A技术还支持为视频输入生成无限数量的音轨,用户<span style="color: black;">能够</span><span style="color: black;">选取</span>定义</span><strong style="color: blue;"><span style="color: black;">“正提示”</span></strong><span style="color: black;">来引导生成所需的声音,或定义</span><strong style="color: blue;"><span style="color: black;">“负提示”</span></strong><span style="color: black;">来引导远离不需要的声音。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">这种灵活性<span style="color: black;">运用</span>户能够更好地<span style="color: black;">掌控</span>V2A的音频输出,从而<span style="color: black;">能够</span>快速尝试<span style="color: black;">区别</span>的音频输出,并<span style="color: black;">选取</span>最佳匹配。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">下面三个视频是Google放出的Demo,<span style="color: black;">咱们</span>猜测Google是想说明用户<span style="color: black;">能够</span>用<span style="color: black;">区别</span>的文本提示来定向地修改配乐中的<span style="color: black;">有些</span>要素,<span style="color: black;">不外</span><span style="color: black;">好似</span>不太<span style="color: black;">显著</span></span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/5f6fe267f97c4949ada7fb2fbef8f410~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=E90evdnmQfQhZvOgkhIpNrhuae4%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">【提示:一艘宇宙飞船在浩瀚的太空中疾驰,星星从它身边飞过,速度<span style="color: black;">火速</span>,科幻感。Prompt for audio: A spaceship hurtles through the vastness of space, stars streaking past it, high speed, Sci-fi】</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/4bd116bc0c5f4e06ba6f59bfe2d91026~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=sWDMX8PBNFt%2BCADF7xxrs2M%2BNSI%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">【提示:空灵的大提琴氛围。Prompt for audio: Ethereal cello atmosphere】</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-6w9my0ksvp/e9ec9eaa989c44e69603d42147cc7aa9~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1722928817&x-signature=h1I8AsdYcOmWKyr1W%2FIvsrPA4gk%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">【提示:一艘宇宙飞船在浩瀚的太空中疾驰,星星从它身边飞过,速度<span style="color: black;">火速</span>,科幻效果。</span></span><span style="color: black;"><span style="color: black;">Prompt for audio: A spaceship hurtles through the vastness of space, stars streaking past it, high speed, Sci-fi</span></span><span style="color: black;"><span style="color: black;">】</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">虽然该技术还未开源,但从现有的Demo来看,待其开源之时,必将又掀起AI视频圈一阵大风浪。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">届时<span style="color: black;">咱们</span>估计能看到AI圈“大混战”——Runway的Gen-3 Alpha刚生成一个视频,隔壁V2A就给它把音乐配好了;Meme变视频还没玩够呢,用户们<span style="color: black;">已然</span>等不及给它配上声音了。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">AI视频圈,到底要“卷”成什么样?!</span></p>
感谢楼主分享,祝愿外链论坛越办越好! 祝福你、祝你幸福、早日实现等。 你说得对,我们一起加油,未来可期。 你的见解独到,让我受益匪浅,非常感谢。 你说得对,我们一起加油,未来可期。 楼主的文章深得我心,表示由衷的感谢!
页:
[1]