Brief introduction of python glom module

2021-10-27 07:50:37
OfStack

Directory installation
Simple use
Complex usage
Get rid of troublesome needs
Summarize

Sharp tools make good work! If we want to develop more easily and efficiently, we must learn some "advanced" skills. Not long ago, I saw the code of an Python monk, which used a short and pithy module. I think it is quite useful and share it with you today.

This module is called glom, which is a small module for Python to process data. It has the following characteristics:

Nested structure and path-based access Declarative data transformation using the lightweight Pythonic specification Readable and meaningful error messages Built-in data detection and debugging functions

It looks abstract, doesn't it? Let's use examples to show you 1.

Installation

As an Python built-in module, I believe you must know how to install it:

pip3 install glom

It will be done in a few seconds!

Simple use

Let's look at the simplest usage:


d = {"a": {"b": {"c": 1}}}
print(glom(d, "a.b.c")) # 1

Here, we have a nested json structure with 3 layers. We want to get the corresponding value of c in the innermost layer. The normal writing should be:


print(d["a"]["b"]["c"])

If at this point, I say glom is better than the traditional way, because you don't have to write brackets and quotation marks layer by layer, will you scoff?

OK, let's take a look at the following situation again:


d = {"a": {"b": None}}
print(d["a"]["b"]["c"])

Traversing to an None object, you will receive the following error:


Traceback (most recent call last):
  File "/Users/cxhuan/Documents/python_workspace/mypy/pmodules/pglom/glomstudy.py", line 10, in <module>
    print(d["a"]["b"]["c"])
TypeError: 'NoneType' object is not subscriptable

Let's look at how glom is handled:


from glom import glom

d = {"a": {"b": None}}
print(glom(d, "a.b.c"))

Similarly, glom can't output errors in pairs, and you will get the following errors:


Traceback (most recent call last):
  File "/Users/cxhuan/Documents/python_workspace/mypy/pmodules/pglom/glomstudy.py", line 11, in <module>
    print(glom(d, "a.b.c"))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/glom/core.py", line 2181, in glom
    raise err
glom.core.PathAccessError: error raised while processing, details below.
 Target-spec trace (most recent last):
 - Target: {'a': {'b': None}}
 - Spec: 'a.b.c'
glom.core.PathAccessError: could not access 'c', part 2 of Path('a', 'b', 'c'), got error: AttributeError("'NoneType' object has no attribute 'c'")

If you carefully read the error content, you will find that the error content is extremely detailed and clear, which is simply an artifact for looking for the program bug!

Complex usage

The simple example just now gives everyone an intuitive understanding of glom. Next, let's look at the definition of glom method of glom:

glom(target, spec, **kwargs)

Let's look at the meaning of parameters:

target: Target data, which can be dict, list, or any other object spec: Is what we want to output

Let's use this method.

Let's look at an example first. We have an dict, and to get all the values of name, we can do it through glom:


data = {"student": {"info": [{"name": " Zhang 3"}, {"name": " Li 4"}]}}
info = glom(data, ("student.info", ["name"]))
print(info) # [' Zhang 3', ' Li 4']

In the traditional way, we might need to traverse to get it, but with glom, we only need one line of code, and the output is an array.

If you don't want to output an array, but want an dict, it is also very simple:


info = glom(data, {"info": ("student.info", ["name"])})
print(info) # {'info': [' Zhang 3', ' Li 4']

We only need to assign the original array to a dictionary to receive it.

Get rid of troublesome needs

If I have two sets of data now, I want to take out the value of name:


data_1 = {"school": {"student": [{"name": " Zhang 3"}, {"name": " Li 4"}]}}
data_2 = {"school": {"teacher": [{"name": " Teacher Wang "}, {"name": " Miss Zhao "}]}}

spec_1 = {"name": ("school.student", ["name"])}
spec_2 = {"name": ("school.teacher", ["name"])}
print(glom(data_1, spec_1)) # {'name': [' Zhang 3', ' Li 4']}
print(glom(data_2, spec_2)) # {'name': [' Teacher Wang ', ' Miss Zhao ']}

That's what we usually write, right? What if we have many sets of data, and each set is taken similarly? At this time, we will find a way to avoid writing N line parameters one by one. We can use Coalesce method:


data_1 = {"school": {"student": [{"name": " Zhang 3"}, {"name": " Li 4"}]}}
data_2 = {"school": {"teacher": [{"name": " Teacher Wang "}, {"name": " Miss Zhao "}]}}

spec = {"name": (Coalesce("school.student", "school.teacher"), ["name"])}
 
print(glom(data_1, spec)) # {'name': [' Zhang 3', ' Li 4']}
print(glom(data_2, spec)) # {'name': [' Teacher Wang ', ' Miss Zhao ']}

We can use Coalesce to aggregate multiple requirements, and then take values for the same spec.

Let's have another big killer-value calculation. glom can also simply calculate the value. Let's look at an example:


print(d["a"]["b"]["c"])

Summarize

Introduced so much, we should know the powerful glom, it is said that many bosses like to use it. In fact, it has many other practical functions to be explored, so it will not be introduced here.

The above is the python glom module of the use of a brief introduction to the details, more information about python glom module please pay attention to other related articles on this site!