What python Data Mining Needs to Learn
- 2021-06-29 11:32:49
- OfStack
1. Operation of Pandas Library
Panda is a particularly important library for data analysis. Here are three things to know:
**pandas grouping calculation;
< pandas index and multiple index;
Indexing is difficult but important
< pandas Multi-table Operation and PivotTable
2. numpy numerical calculation
numpy data calculation is mainly used in data mining. For future machine learning, in-depth learning, which is also a must-know library, we should master the following:
< Understanding of Numpy array;
< Array indexing operation;
< Array calculation;
< Broadcasting (knowledge in linear algebra)
3. Data Visualization - matplotlib and seaborn
< Matplotib Grammar
python's most basic visualization tool is matplotlib.Look at the similarity between Matplotlib and matlib. It is easier to learn if you want to know what the relationship is between them.
< Use of seaborn
seaborn is a very beautiful visualization tool.
< pandas Drawing Function
As mentioned earlier, pandas does data analysis, but it also provides an API for some drawings.
4. Introduction to Data Mining
This is the hardest and most interesting part to learn about:
Definition of Machine Learning
Here's no difference from data mining
The Definition of the Cost Function
< Train/Test/Validate
Definition and Avoidance of < Overfitting
5. Data Mining Algorithms
With the development of data mining, there are already many algorithms. Below, only the simplest, most core and most commonly used algorithms need to be mastered:
Minimum 2 times algorithm;
The gradient decreases;
** Vectorization;
**Great likelihood estimate;
< Logistic Regression;
< Decision Tree;
< RandomForesr;
< XGBoost;
6. Data Mining Actual Warfare
Understand the model through the most famous library in machine learning, scikit-learn.