Whenever anyone talks about machine learning or data science, the one language which comes to mind is Python. However, it is not the only language used. We have other languages such as R. But of course, python is mostly preferred over any other language.
There are definitely a few reasons for it. First thing first, It’s easy to learn and write. If you know English, you can learn python. It’s that easy. Now, do you know what makes python the favorite of all? The availability of different libraries.
Libraries are collections of codes that we can use. So basically, someone else wrote the code and created a library out of it. Now if you want to execute some similar task, you can simply install the library, then import it and finally use it in your code.
Python has an enormous number of libraries. It has a library for almost everything. That is the reason it has such a huge fan base. May be bigger than BTS.
Here I am going to list down 10 such libraries which I use in my daily life of doing projects and practicing Machine Learning that too along with the steps they are used at.
This is the first library I learned in python. Pandas is a data manipulation library. It’s used for data analysis and data preprocessing. Pandas can handle 3 types of data structures. 1) Series (1D data structure)2) Dataframe(2D data structure)3) Panel Data (3D data structure)
BTW, by looking at the names of the 3 data structures, can you guess why Pandas is named ‘PANDAS’? Pandas is mostly used for the data cleaning part in traditional Machine learning because it can handle structured data very well. Pandas provide capabilities to read and write data from different sources like CSVs, Excel, SQL Databases, HDFS, and many more. It provides functionalities to add, delete update different feature columns. We can handle missing values, object data types, and outliers in feature columns.
To use pandas, you have to install pandas from its source using the following command in CMD: pip install pandas
And then import it into your code like this:
Numpy is a python library for linear algebra. It is the base library on which different libraries such as pandas or Keras etc are built. Using NumPy we can handle multi-dimensional arrays. In ML language they are called nd arrays. It can handle any kind of matrix manipulation operations such as reshaping, resizing, transposing, matrix manipulation, etc. Think of matrix in python, think of NumPy.
Learning NumPy had an indirect benefit on my learning journey. It made understanding computer vision related coding and functions very easy.
PS: Again try to think why it has this name.
BTW, to make use of NumPy, you first have to install it using the following command in CMD pip install numpy
Then you will have to import it into the notebook where you wish to use it. That can be done using the following code:
These two are python libraries for data visualization. When you are performing data analysis, you simply can not ignore data visualization. Data visualization is a process where you try to explore any feature and extract some pattern from the features using some plot. See, you can also do the same by printing a screenful of numbers, and strings and staring that for n number of hours or by simply making some meaningful plots out of the same data. If you want to go with the second one then these are the python libraries for you. Using Matplotlib or Seaborn you can make a different kinds of analyses such as 1) Univariate Analysis2) Bivariate Analysis3) Multivariate AnalysisThese analyses include many plots such as Line charts, Bar plots, Scatterplot, Countplot, Factorplot, Histograms, Distplot, Heatmap, Jointplot, and many more.
Seaborn is built upon matplotlib also the plot which will require 3–4 lines of code in matplotlib can be made using 1 line of code in seaborn. So you can go ahead with learning seaborn.
To use seaborn, install it using the following command in CMD pip install seaborn
And then import it into your notebook using the following code
This library is widely known as sklearn. It is your one-stop solution for traditional ML. This library is mostly used for: 1) Preprocessing2) Model Building3) Model Training4) Model Evaluation You name any model of ML. Linear regression, logistic regression, Decision tree, random forest, knn, kmeans you can code any of these and many more such models using sklearn. So the thing is that you can’t ignore sklearn.
To use it, install using following command pip install scikit-learn
And then import it into your notebook using the following code
Tensorflow and Pytorch both are python packages for Deep Learning. Starting from preprocessing to model building to model training to model evaluation to model saving, all these steps can be done using either of tensorflow or pytorch.
Tensorflow is developed by Google and pytorch is developed by Facebook. Pytorch was in the market even before tensorflow. Don’t worry, its not important to learn both tensorflow and pytorch. You have to choose any one of those.
Pytorch is generally famous for academic research purposes. It would also assist in deep learning applications. It gives you more liberty to experiment with your layers or network or loss functions. On the other hand,Tensorflow has better documentation and more number of tutorials. So any new person will get huge amount of resources to learn tensorflow.
So with this, you can decide whether to gon ahead with pytorch or tensorflow.
To install tensorflow you have to use following command: pip install tensorflow
To install pytorch you have to use pip install torch
To use tensorflow in your notebook, you have to use:
And for pytorch you have to use:
This is a not so famous python library. But the functionalities it provides is commendable. You can use this library for hyperparameter optimization of your Deep learning models. Hyperaparameter optimization is the process of finding the optimal values of the hyperparameters of your model. In this process, you have to keep track of your model’s performance with different sequences of hyperparameters. Wandb presents you it’s sweep where you can track your whole hyperparameter tuning process.
To use wandb, you first have to install it. Use the following command for that : pip install wandb
Then you have to import it.
Then you have to login to your account using the following code:
For this you will have to signup to wandb which is completely free of cost.
This is a python library for NLP related data preprocessing. The full form is Natural Language Tool Kit. Talking about preprocessing, you can use NLTK for tokenization, vectorization, stemming, lemmetization, name entity recognition (NER) and many more such tasks. It’s one of the easiest library to use
To use NLTK, you first have to install it using the following command in the CMD, pip install nltk
And then importing it in your notebook using the code:
This is a python web development framework. Now you must be thinking where would a web development tool be used in ML. Flask is used when you are completely done with training your model and now you want to move on to model deployment. So you first create a webpage which will be communicating with the user to take inputs from, then the inputs will be sent to a ML model hosted in cloud, then the result is fetched from there and sent to the web page which is built using flask.
To use flask, install simply using pip install flask
And use in your code simply using
A ML python library which hasn’t gained the fame it deserves. Surprise is a library used for creating recommendation systems. There are two types of recommender systems. 1) content based and 2) collaborative filtering This library handles collaborative filtering. Here user’s explicit ratings can be used for collaborative filtering. BTW, explicit rating is the one we give directly such as imdb rating or raing given on amazon product. There is anothe kind of rating called implicit ratings. This kind of rating is extracted from the time anyone spends on the product page or the number of scrolls or clicks on a page etc.
Anyway, to use surprise, install it using : pip install surprise
To use it, simply import using following :
This last python library is alot different from the other 9 libraries. Beautifulsoup is a web scraping tool. It sometimes happen that the kind of dataset you require for your project or product, it isnt available anywhere. But the data you are interested in, is available on different online platform. In that case, you have to extract the data from such web pages or websites, create a dataset out of it and then use it in your project.
Now this extraction can be done either manually or automatically. If you choose the automatic way of doing it, then you can use BeautifulSoup for that. It is a very easy to use library.
After that, import it in your notebook using :
Now you should get your hands dirty with some codes. So practice all the libraries listed here.