What python Data Mining Needs to Learn

  • 2021-06-29 11:32:49
  • OfStack

1. Operation of Pandas Library

Panda is a particularly important library for data analysis. Here are three things to know:

**pandas grouping calculation;

< pandas index and multiple index;

Indexing is difficult but important

< pandas Multi-table Operation and PivotTable

2. numpy numerical calculation

numpy data calculation is mainly used in data mining. For future machine learning, in-depth learning, which is also a must-know library, we should master the following:

< Understanding of Numpy array;

< Array indexing operation;

< Array calculation;

< Broadcasting (knowledge in linear algebra)

3. Data Visualization - matplotlib and seaborn

< Matplotib Grammar

python's most basic visualization tool is matplotlib.Look at the similarity between Matplotlib and matlib. It is easier to learn if you want to know what the relationship is between them.

< Use of seaborn

seaborn is a very beautiful visualization tool.

< pandas Drawing Function

As mentioned earlier, pandas does data analysis, but it also provides an API for some drawings.

4. Introduction to Data Mining

This is the hardest and most interesting part to learn about:

Definition of Machine Learning

Here's no difference from data mining

The Definition of the Cost Function

< Train/Test/Validate

Definition and Avoidance of < Overfitting

5. Data Mining Algorithms

With the development of data mining, there are already many algorithms. Below, only the simplest, most core and most commonly used algorithms need to be mastered:

Minimum 2 times algorithm;

The gradient decreases;

** Vectorization;

**Great likelihood estimate;

< Logistic Regression;

< Decision Tree;

< RandomForesr;

< XGBoost;

6. Data Mining Actual Warfare

Understand the model through the most famous library in machine learning, scikit-learn.


Related articles: