Common operations of python library sklearn

2021-11-24 01:48:36
OfStack

Preface to the table of contents 1. MinMaxScaler

Preface

sklearn is an important machine learning library of python, which encapsulates a large number of machine learning algorithms, such as classification, regression, dimension reduction and clustering; It also includes three modules: supervised learning, unsupervised learning and data transformation. sklearn has perfect documentation, which makes it easy to use; And it has a large number of built-in data sets, which saves the time of obtaining and sorting data sets. Therefore, it has become an important machine learning library widely used.

sklearn is an important library that is essential for both machine learning and deep learning. It contains almost all the functions needed for machine learning, because there are so many contents in sklearn library that it is likely to confuse and fear beginners if they start from a macro level. On the contrary, this article will not introduce the sklearn library as a whole first, but start with some concrete examples in the sklearn library. After readers have learned some commonly used functions and have a certain understanding of their functions, this article will start from a macro perspective and explain the sklearn library comprehensively and carefully. The examples in this blog are almost all from my own learning process of keras. I suggest eating them with my other blog 1 writing keras, which is better to understand.

1. MinMaxScaler

The primary purpose of the MinMaxScaler function is data normalization. Data normalization is an important link in the process of data preprocessing before we start deep learning, Simply put, the elements of our test samples are concentrated in the interval of [0, 1]. Data normalization can make our neural network model learn to reach the best point faster. If we don't normalize, the neural network may take a long time to converge (that is, reach the best point) or even eventually will not converge. At the same time, data normalization can greatly increase the accuracy of learnable parameters in neural network, so as to achieve better learning effect. The following is an example application of MinMaxScaler function.


from sklearn import preprocessing
import numpy as np

x = np.array([[3., -1., 2., 613.],
              [2., 0., 0., 232],
              [0., 1., -1., 113],
              [1., 2., -3., 489]])

min_max_scaler = preprocessing.MinMaxScaler()
x_minmax = min_max_scaler.fit_transform(x)
print(x_minmax)

Run results:

[[1. 0. 1. 1. ]

[0.66666667 0.33333333 0.6 0.238 ]
[0. 0.66666667 0.4 0. ]
[0.33333333 1. 0. 0.752 ]]
To sum up, it is two steps: 1. scaler=preprocessing. MinMaxScaler ()
2.x1=scaler.fit_transform(x)
x1 is the result of normalization
Note that you can introduce MinMaxScaler in two ways, in addition to the above:

from sklearn.preprocessing import MinMaxScaler