Who am I?
Read full article at https://medium.com/@balazskegl/the-data-science-ecosystem-678459ba6013
If you have decided to learn Python as your programming language.
“What are the different Python libraries available to perform data analysis?”
This will be the next question in your mind. There are many libraries available to perform data analysis in Python. Don’t worry; you don’t have to learn all of those libraries. You have to know only five Python libraries to do most of the data analysis tasks. I will give a short introduction to each of these libraries, and I will point you to some of the best tutorials to learn them.
So let’s get started,
It is the foundation on which all higher level tools for scientific Python are built. Here are some of the functionalities it provides:
NumPy does not provide high-level data analysis functionality, having an understanding of NumPy arrays and array-oriented computing will help you use tools like Pandas much more effectively.
The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines , such as routines for numerical integration and optimization. SciPy has modules for optimization, linear algebra, integration and other common tasks in data science.
I couldn’t find any good tutorial other than Scipy.org. This is the best tutorial for learning Scipy.
It contains high-level data structures and tools designed to make data analysis fast and easy. Pandas are built on top of NumPy, and makes it easy to use in NumPy-centric applications.
Pandas is the best tool for doing data munging.
Matlplotlib is a Python module for visualization. Matplotlib allows you to easily make line graphs, pie chart, histogram and other professional grade figures. Using Matplotlib you can customize every aspect of a figure. When used within IPython, Matplotlib has interactive features like zooming and panning. It supports different GUI back ends on all operating systems, and can also export graphics to common vector and graphics formats: PDF, SVG, JPG, PNG, BMP, GIF, etc.
Scikit-learn is a Python module for Machine learning built on top of Scipy. It provides a set of common Machine learning algorithms to users through a consistent interface. Scikit-learn helps to quickly implement popular algorithms on your dataset. Have a look at the list of algorithims available in scikit-learn, and you can quickly realize that it includes tools for many standard machine-learning tasks (such as clustering, classification, regression, etc).
There are also other libraries such as Nltk(Natural language Tool kit), Scrappy for web scraping, Pattern for web mining, Theano for deep learning. But if you are getting started in python, I would recommend you to first get familiar with these 5 libraries. I have mentioned the tutorials that are beginner friendly, before going through these tutorials ensure that you are familiar with basics of python programming.