Python USES imghdr module to identify image format instance analysis

2020-07-21 08:37:17
OfStack

imghdr module

Function description: imghdr module is used to recognize image format. It determines the format of the image by detecting the first few bytes of the file.

Only 11 API

imghdr.what(file, h=None)

The first argument, file, can be an file object opened in rb mode or a string representing a path and an PathLike object. The h parameter is a 1 byte string. The function returns a string representing the image format.


>>> import imghdr
>>> imghdr.what('test.jpg')
'jpeg'

The specific return value and description are as follows:

返回值	描述	检测方式
jpeg	用JFIF或者Exif格式保存的JPEG图片	第7到第10个字节是b'JFIF'或者b'Exif'
png	可移植网络图形格式(Portable Network Graphic Format)	以字节串b'\x89PNG\r\n\x1a\n'开头
gif	GIF(Graphics Interchange Format)的87版本和89版本	前6个字节为b'GIF87a'或者b'GIF89a'
tiff	TIFF(Tag Image File Format)的两种字节顺序	前两个字节为b'MM'或者b'II'
rgb	SGI ImgLib	以字节串b'\x01\xda'开头
pbm	Portable Bitmap	第1个字节为b'P'，第2个字节为b'1'或b'4'，第3个字节为b'\t'或b'\n'或b'\r'
pgm	Portable Graymap Files	第1个字节为b'P'，第2个字节为b'2'或b'5'，第3个字节为b'\t'或b'\n'或b'\r'
ppm	Portable Pixmap Files	第1个字节为b'P'，第2个字节为b'3'或b'6'，第3个字节为b'\t'或b'\n'或b'\r'
rast	Sun Raster	以字节串b'\x59\xA6\x6A\x95'开头
xbm	X Bitmap Files	以字节串b'#define ‘开头
bmp	Bitmap，Windows标准图像文件格式	以字节串b'BM'开头
webp	谷歌的WebP格式，Python3.5加入	以字节串b'RIFF'开头并且第9到第12个字节为b'WEBP'
exr	OpenEXR，Python3.5加入	以字节串b'\x76\x2f\x31\x01'开头

Module internal defects

When the h parameter is not empty, the module will ignore the file parameter and directly detect the h parameter, but at this time, the file parameter must be provided, which is considered as a design defect. Personally, the blogger feels that the h parameter does not exist and does not need to be included in the parameter list.


>>> import imghdr
>>> imghdr.what('test.jpg', b'\x89PNG\r\n\x1a\n')
'png'
>>>

Customize the detection process

imghdr internally USES test_jpeg, test_png, test_gif and other functions to detect the file format. The module internally maintains a list of functions, imghdr.tests. Each time the what function is called, the detection function will be called in the order in the list, and the loop will exit when the detection function returns the result. Users can modify the detection process by modifying this list. You can also add your own detection functions to the list.

In the following example, the blogger added 1 function to the last prompt file of the detection process, which is not an image:


>>> import imghdr
>>> def final(h, f):
... print("This file isn\'t a image!")
...
>>> imghdr.tests.append(final)
>>> imghdr.what("imghdr.md")
This file isn't a image!

Adding the detection function by yourself needs to receive two parameters, h and f. h is the byte string used for detection, and f is the object of file. However, the built-in detection function in the module does not use this PARAMETER of f...

The imghdr module is started from the command line

While reading the source code, the blogger found two functions not mentioned in the official documentation, which provided a direct way to launch the imghdr module from the command line.

Just call ES67en-ES68en imghdr [-ES70en] file1 file2... You can directly detect the type of file. file can be a file or a folder. By default, this only detects file types at the level 1 below the folder, with the -ES74en parameter added if recursive detection is required.

Each file prints 1 line of output as "file name: file type /None".

Conclusion: There are a lot of internal problems with imghdr module due to the lack of users. However, as long as the module is used in accordance with the official documents, it will not have any problems. After the blog interview, change 1 of this module and then pr.

Above is the article on Python using imghdr module to identify image format instance analysis, I hope to help you. Interested friends can continue to refer to other related topics in this site, if there is any deficiency, welcome to comment out. Thank you for your support!