Python运用微软edge接口编程生成最自然的合成语音
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> 在Python 下有好几种tts生成包,其中最全的是<span style="color: black;">pyttsx3,</span><span style="color: black;"><span style="color: black;">能够</span>离线生成,<span style="color: black;">然则</span>合成语音<span style="color: black;">通常</span>,</span>众所周知,最接近于人类自然发音的接口是微软语音,几乎<span style="color: black;">能够</span>以假乱真,<span style="color: black;">然则</span>只能windows下在直接调用微软的语音sdk.而在微软给自己edge浏览器<span style="color: black;">供给</span>一个云端合成的版本,虽然在大段语音会卡顿,<span style="color: black;">然则</span>优点太多了,除了语音自然外, <span style="color: black;">能够</span>在任意平台<span style="color: black;">运用</span>,我在ubuntu,Mac OSX均测试<span style="color: black;">经过</span>,第二是不需要api key ,直接就能调用.非常完美的<span style="color: black;">运用</span>.</p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">edge-tts 的安装和<span style="color: black;">运用</span></h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在Python封装了edge-tts 这个包<span style="color: black;">运用</span>的是这个接口</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/27a0da3914ed44afa64d48114b2a8fc0~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1728055365&x-signature=RwANvcbN%2FKQZAgfU1aeN2LnZ5ig%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">它需要如下几个包的支持,播放语音的playsound,mv</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">pip3 install playsound</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">pip3 install mpv</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">pip3 install edge-tts</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">命令行测试</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">按照</span>生成文字对应的mp3</p>edge-tts --text <span style="color: black;">"Hello,world!"</span> --<span style="color: black;">write</span>-media hello.mp3<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">直接播放生成语音,它<span style="color: black;">实质</span>是把生成mp3直接播放</p><span style="color: black;">edge</span>-playback --text <span style="color: black;">"Hello, world!"</span>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">代码中<span style="color: black;">运用</span>edge-tts</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">它在<span style="color: black;">供给</span>一个简单生成语音的demo,<span style="color: black;">这儿</span><span style="color: black;">重视</span><span style="color: black;">运用</span>了async关键字,<span style="color: black;">暗示</span>生成语音是采用异步方式生成,<span style="color: black;">这儿</span>的核心语句就 edge_tts.Communicate().run();</p><span style="color: black;">#!/usr/bin/env python3</span>
<span style="color: black;">"""
Example Python script that shows how to use edge-tts as a module
"""</span>
<span style="color: black;">import</span> asyncio
<span style="color: black;">import</span> tempfile
<span style="color: black;">from</span> playsound <span style="color: black;">import</span>playsound<span style="color: black;">import</span> edge_tts
<span style="color: black;">async</span> <span style="color: black;"><span style="color: black;">def</span> <span style="color: black;">main</span><span style="color: black;">()</span>:</span>
<span style="color: black;">"""
Main function
"""</span>
communicate = edge_tts.Communicate()
ask = input(<span style="color: black;">"What do you want TTS to say? "</span>)
<span style="color: black;">with</span> tempfile.NamedTemporaryFile() <span style="color: black;">as</span> temporary_file:
<span style="color: black;">async</span> <span style="color: black;">for</span> i <span style="color: black;">in</span> communicate.run(ask):
<span style="color: black;">if</span> i[<span style="color: black;">2</span>] <span style="color: black;">is</span> <span style="color: black;">not</span> <span style="color: black;">None</span>:
temporary_file.write(i[<span style="color: black;">2</span>])
playsound(temporary_file.name)<span style="color: black;">if</span> __name__ == <span style="color: black;">"__main__"</span>:
asyncio.get_event_loop().run_until_complete(main())<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">更<span style="color: black;">繁杂</span>的例子参见作者写的srt字幕文件生成mp3 的包. </p><span style="color: black;"><span style="color: black;">edge-srt-to-speech</span></span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这个有更<span style="color: black;">繁杂</span>的调用,所有参数切换,<span style="color: black;">例如</span>更换语言,音色,音量等都是<span style="color: black;">经过</span> edge_tts.Communicate().run();这个函数来切换的.</p> async <span style="color: black;">for</span> j <span style="color: black;">in</span> communicate.run(
text,
codec=<span style="color: black;">"audio-24khz-48kbitrate-mono-mp3"</span>,
pitch=<span style="color: black;">arg</span>[<span style="color: black;">"pitch"</span>],
rate=<span style="color: black;">arg</span>[<span style="color: black;">"rate"</span>],
volume=<span style="color: black;">arg</span>[<span style="color: black;">"volume"</span>],
voice=<span style="color: black;">arg</span>[<span style="color: black;">"voice"</span>],
boundary_type=<span style="color: black;">1</span>,
customspeak=bool(ssml_template),
):<h1 style="color: black; text-align: left; margin-bottom: 10px;">改写的例子</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">我需要写一个例子,<span style="color: black;">便是</span>给定一个句子列表,依次生成语音并播放出来,<span style="color: black;">由于</span>这个包内部采用异步生成接口,<span style="color: black;">因此呢</span>必要改写成同步接口</p><span style="color: black;">import</span> asyncio
<span style="color: black;">import</span> edge_tts
<span style="color: black;">import</span> tempfile
<span style="color: black;">from</span> playsound <span style="color: black;">import</span> playsound
<span style="color: black;">#对话列表</span>dialogues =[<span style="color: black;">"Hello,Im Sam!"</span>,
<span style="color: black;">"Hello,Im Daming!"</span>
]
someone = edge_tts.Communicate()
<span style="color: black;">async</span> <span style="color: black;"><span style="color: black;">def</span> <span style="color: black;">say</span><span style="color: black;">(text)</span>:</span>
print(text)
<span style="color: black;">with</span> tempfile.NamedTemporaryFile() <span style="color: black;">as</span>temporary_file:
print(temporary_file.name)<span style="color: black;">async</span> <span style="color: black;">for</span> i <span style="color: black;">in</span> someone.run(text):
<span style="color: black;">if</span> i[<span style="color: black;">2</span>] <span style="color: black;">is</span> <span style="color: black;">not</span> <span style="color: black;">None</span>:
temporary_file.write(i[<span style="color: black;">2</span>])
playsound(temporary_file.name)<span style="color: black;"># queue.task_done()</span>
<span style="color: black;">#---------异步主函数---------------</span>
<span style="color: black;">async</span> <span style="color: black;"><span style="color: black;">def</span> <span style="color: black;">main</span><span style="color: black;">()</span>:</span>
<span style="color: black;">"""
Main function
"""</span>
print(<span style="color: black;">"Start talking"</span>)
<span style="color: black;">for</span> text <span style="color: black;">in</span> dialogues:
<span style="color: black;">#创建一个异步任务</span>
tasks =
<span style="color: black;">#等待任务结束</span>
<span style="color: black;">await</span> asyncio.wait(tasks)
<span style="color: black;">if</span> __name__ == <span style="color: black;">"__main__"</span>:
asyncio.get_event_loop().run_until_complete(main())<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">其中的关键是两句,相当于主线程在等待语音生成线程结束再执行下一句</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">#创建一个异步任务</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">tasks = </p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">#等待任务结束</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">await asyncio.wait(tasks)</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这般</span>只要修改成dialogs数组内容,<span style="color: black;">咱们</span>就能听到<span style="color: black;">区别</span>对话了.</p>
你的见解独到,让我受益匪浅,非常感谢。
页:
[1]