Python Extracting Audio from Video
- 2021-10-16 02:15:33
- OfStack
Brief introduction
There is no audio in the video written by VideoCapture class in OpenCV. If you want to process audio one step further, you need to use a library-MoviePy. This library is Python video editing library, which can be cropped, spliced, caption inserted, video synthesized, video processed and customized.
Installation
pip install moviepy
Code
from moviepy.editor import *
video = VideoFileClip('test.mp4')
audio = video.audio
audio.write_audiofile('test.mp3')
You can use the ffmpeg-python library without installing the moviepy video editing library, see Reference 4, and the code is slightly more complicated
Audio format
extensions_dict = { "mp4": {'type':'video', 'codec':['libx264','libmpeg4', 'aac']},
'ogv': {'type':'video', 'codec':['libtheora']},
'webm': {'type':'video', 'codec':['libvpx']},
'avi': {'type':'video'},
'mov': {'type':'video'},
'ogg': {'type':'audio', 'codec':['libvorbis']},
'mp3': {'type':'audio', 'codec':['libmp3lame']},
'wav': {'type':'audio', 'codec':['pcm_s16le', 'pcm_s24le', 'pcm_s32le']},
'm4a': {'type':'audio', 'codec':['libfdk_aac']}
}
It can be seen that ogg, mp3, wav and m4a are supported, and the output of m4a failed in personal test. It is recommended to only use mp3 and wav
Test the 2-minute video output mp3 is 1.83 Mb, wav is 20.1 Mb
mp3 is lossy and wav is lossless, optional on demand
Remarks
To realize the lower-level audio and video processing application ffmpeg
Supplement: python processes mp4 video, extracts audio into mp3 or wav, and intercepts it
mp4 video file extracts audio into mp3 or wav file
mp3 is a lossy file, and wav is a lossless file. Just like the video I tested, mp3 exports only a few 10k, and wav exports more than 3M.
from moviepy.editor import *
video = VideoFileClip('aa.mp4')
audio = video.audio
audio.write_audiofile('test.wav')
audio.write_audiofile('test.mp3')
Intercept map or wav files
from scipy.io import wavfile
like = wavfile.read('test.wav')
# print (like)
# Audio results will be returned 1 A tuple . No. 1 1 The dimension parameter is the sampling frequency in seconds; No. 1 2 Dimension data is 1 A ndarray Indicates the song, if the first 2 Dimensional ndarray Only 1 Two data represent mono and two data represent stereo. Therefore, by controlling the 2 Dimension data can be used to crop songs.
# Right like This tuple number 2 Dimension data is clipped, so it is like[1]; No. 1 2 In the dimension data, music data is segmented. start_s Indicates the start time when you want to crop the audio; Likewise end_s Indicates the end time when you cut the audio. Multiplication 44100 Is because it needs to be carried out every second 44100 Subsampling
# Object for the audio 13-48 Intercept in seconds
wavfile.write('test2.wav',44100,like[1][13*44100:48*44100])