This project consists of two main components:
- A script (
audio_transcriber.py) to transcribe audio files into text by splitting them into manageable chunks. - A script (
clean_transcript.py) to clean the transcribed text by capitalizing sentences and performing basic cleanup.
- Python 3.x
pydublibraryspeech_recognitionlibraryspaCylibrary and its English language model
Ensure you have Python installed on your system. Then, install the required Python libraries using the following commands:
pip install pydub speech_recognition spacyDownload the English language model for spaCy:
python -m spacy download en_core_web_smpydub requires FFmpeg for handling audio files. Install FFmpeg following the instructions for your operating system.
Run audio_transcriber.py to transcribe an audio file. The script splits the audio into chunks (default is 60 seconds) and transcribes each chunk.
python audio_transcriber.py <path_to_audio_file> -d <chunk_duration_in_seconds> -o <output_text_file><path_to_audio_file>: Path to the audio file you want to transcribe.<chunk_duration_in_seconds>(optional): Duration of each audio chunk in seconds. Default is 60 seconds.<output_text_file>(optional): Path to save the transcribed text. Defaults tooutputs/filename.txtwherefilenameis the name of the audio file.
Run clean_transcript.py to clean the transcribed text file. The script capitalizes sentences and performs basic text cleanup.
python clean_transcript.py <path_to_transcribed_file> -o <path_to_output_cleaned_file><path_to_transcribed_file>: Path to the transcribed text file.<path_to_output_cleaned_file>(optional): Path to save the cleaned text file. Defaults to the input filename with a_cleanedsuffix.
The text cleaning script performs basic cleanup and may not add punctuation or correct all errors accurately. Manual review is recommended for high accuracy requirements.