建设手机网站公司,网站建设与维护兼职,泰安人才网最新招聘信息网,国外wordpress论坛Faster-Whisper 实时识别电脑语音转文本 前言项目搭建环境安装Faster-Whisper下载模型编写测试代码运行测试代码实时转写脚本 参考 前言
以前做的智能对话软件接的Baidu API#xff0c;想换成本地的#xff0c;就搭一套Faster-Whisper吧。 下面是B站视频实时转写的截图
项… Faster-Whisper 实时识别电脑语音转文本 前言项目搭建环境安装Faster-Whisper下载模型编写测试代码运行测试代码实时转写脚本 参考 前言
以前做的智能对话软件接的Baidu API想换成本地的就搭一套Faster-Whisper吧。 下面是B站视频实时转写的截图
项目
搭建环境
所需要的CUDANN已经装好了如果装的是12.2应该是包含cuBLAS了 没装的可以从下面链接下载装一下文末的参考视频中也有讲解 https://github.com/Purfview/whisper-standalone-win/releases/tag/libs Ancanda的运行环境去Clone一下之前配好的环境用之前BertVits的即可
安装Faster-Whisper
输入即可安装
pip install faster-whisper下载模型
https://huggingface.co/Systran/faster-whisper-large-v3 下载完放到代码旁边就可以了
编写测试代码 # local_files_onlyTrue 表示加载本地模型
# model_size_or_pathpath 指定加载模型路径
# devicecuda 指定使用cuda
# compute_typeint8_float16 量化为8位
# languagezh 指定音频语言
# vad_filterTrue 开启vad
# vad_parametersdict(min_silence_duration_ms1000) 设置vad参数
from faster_whisper import WhisperModelmodel_size large-v3
path rD:\Project\Python_Project\FasterWhisper\large-v3# Run on GPU with FP16
model WhisperModel(model_size_or_pathpath, devicecuda, local_files_onlyTrue)# or run on GPU with INT8
# model WhisperModel(model_size, devicecuda, compute_typeint8_float16)
# or run on CPU with INT8
# model WhisperModel(model_size, devicecpu, compute_typeint8)segments, info model.transcribe(audio.wav, beam_size5, languagezh, vad_filterTrue, vad_parametersdict(min_silence_duration_ms1000))print(Detected language %s with probability %f % (info.language, info.language_probability))for segment in segments:print([%.2fs - %.2fs] %s % (segment.start, segment.end, segment.text))
运行测试代码
找个音频放入文件夹内输入python main.py即可运行 可以看到正确不太正确的识别出了音频说了什么。
实时转写脚本
新建一个脚本transper.py 运行即可 此处特别感谢开源项目 https://github.com/MyloBishop/transper import os
import sys
import time
import wave
import tempfile
import threadingimport torch
import pyaudiowpatch as pyaudio
from faster_whisper import WhisperModel as whisper# A bigger audio buffer gives better accuracy
# but also increases latency in response.
# 表示音频缓冲时间的常量
AUDIO_BUFFER 5# 此函数使用 PyAudio 库录制音频并将其保存为一个临时的 WAV 文件。
# 使用 pyaudio.PyAudio 实例创建一个音频流通过指定回调函数 callback 来实时写入音频数据到 WAV 文件。
# time.sleep(AUDIO_BUFFER) 会阻塞执行确保录制足够的音频时间。
# 最后函数返回保存的 WAV 文件的文件名。
def record_audio(p, device):Record audio from output device and save to temporary WAV file.with tempfile.NamedTemporaryFile(suffix.wav, deleteFalse) as f:filename f.namewave_file wave.open(filename, wb)wave_file.setnchannels(device[maxInputChannels])wave_file.setsampwidth(pyaudio.get_sample_size(pyaudio.paInt16))wave_file.setframerate(int(device[defaultSampleRate]))def callback(in_data, frame_count, time_info, status):Write frames and return PA flagwave_file.writeframes(in_data)return (in_data, pyaudio.paContinue)stream p.open(formatpyaudio.paInt16,channelsdevice[maxInputChannels],rateint(device[defaultSampleRate]),frames_per_bufferpyaudio.get_sample_size(pyaudio.paInt16),inputTrue,input_device_indexdevice[index],stream_callbackcallback,)try:time.sleep(AUDIO_BUFFER) # Blocking execution while playingfinally:stream.stop_stream()stream.close()wave_file.close()# print(f{filename} saved.)return filename# 此函数使用 Whisper 模型对录制的音频进行转录并输出转录结果。
def whisper_audio(filename, model):Transcribe audio buffer and display.# segments, info model.transcribe(filename, beam_size5, tasktranslate, languagezh, vad_filterTrue, vad_parametersdict(min_silence_duration_ms1000))segments, info model.transcribe(filename, beam_size5, languagezh, vad_filterTrue, vad_parametersdict(min_silence_duration_ms1000))os.remove(filename)# print(f{filename} removed.)for segment in segments:# print(f[{segment.start:.2f} - {segment.end:.2f}] {segment.text.strip()})print([%.2fs - %.2fs] %s % (segment.start, segment.end, segment.text))# main 函数是整个脚本的主控制函数。
# 加载 Whisper 模型选择合适的计算设备GPU 或 CPU。
# 获取默认的 WASAPI 输出设备信息并选择默认的扬声器输出设备。
# 使用 PyAudio 开始录制音频并通过多线程运行 whisper_audio 函数进行音频转录。
def main():Load model record audio and transcribe from default output device.print(Loading model...)device cuda if torch.cuda.is_available() else cpuprint(fUsing {device} device.)# model whisper(large-v3, devicedevice, compute_typefloat16)model whisper(large-v3, devicedevice, local_files_onlyTrue)print(Model loaded.)with pyaudio.PyAudio() as pya:# Create PyAudio instance via context manager.try:# Get default WASAPI infowasapi_info pya.get_host_api_info_by_type(pyaudio.paWASAPI)except OSError:print(Looks like WASAPI is not available on the system. Exiting...)sys.exit()# Get default WASAPI speakersdefault_speakers pya.get_device_info_by_index(wasapi_info[defaultOutputDevice])if not default_speakers[isLoopbackDevice]:for loopback in pya.get_loopback_device_info_generator():# Try to find loopback device with same name(and [Loopback suffix]).# Unfortunately, this is the most adequate way at the moment.if default_speakers[name] in loopback[name]:default_speakers loopbackbreakelse:print(Default loopback output device not found.Run python -m pyaudiowpatch to check available devices.Exiting...)sys.exit()print(fRecording from: {default_speakers[name]} ({default_speakers[index]})\n)while True:filename record_audio(pya, default_speakers)thread threading.Thread(targetwhisper_audio, args(filename, model))thread.start()main()参考
faster-whisper MyloBishop/transper 基于faster_whisper的实时语音识别 基于faster whisper实现实时语音识别项目语音转文本python编程实现