5 Best Python Libraries For Data Science

If you have decided to learn Python as your programming language.

“What are the different Python libraries available to perform data analysis?”

This will be the next question in your mind. There are many libraries available to perform data analysis in Python. Don’t worry; you don’t have to learn all of those libraries. You have to know only five Python libraries to do most of the data analysis tasks. I will give a short introduction to each of these libraries, and I will point you to some of the best tutorials to learn them.

So let’s get started,

Numpy

It is the foundation on which all higher level tools for scientific Python are built. Here are some of the functionalities it provides:

  1. N- Dimensional array, a fast and memory efficient multidimensional array providing vectorized arithmetic operations.
  2. You can apply standard mathematical operations on arrays of entire data without writing loops.
  3. It is very easy to transfer data to external libraries written in a low-level language (such as C or C++), and also for external libraries to return data to Python as Numpy arrays.Linear algebra, Fourier transforms and random number generation

NumPy does not provide high-level data analysis functionality, having an understanding of NumPy arrays and array-oriented computing will help you use tools like Pandas much more effectively.

Tutorials

  1. Scipy.org provides a brief description to Numpy package.
  2. Here is an amazing tutorial that completely focuses on usability of Numpy

Scipy

The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines , such as routines for numerical integration and optimization. SciPy has modules for optimization, linear algebra, integration and other common tasks in data science.

Tutorial

I couldn’t find any good tutorial other than Scipy.org. This is the best tutorial for learning Scipy.

Pandas

It contains high-level data structures and tools designed to make data analysis fast and easy. Pandas are built on top of NumPy, and makes it easy to use in NumPy-centric applications.

  1. Data structures with labeled axes, supporting automatic or explicit data alignment. This prevents common errors resulting from misaligned data and working with differently-indexed data coming from different sources.
  2. Using Pandas it is easier to handle missing data.
  3. Merge other relational operations found in popular databases (SQLbased, for example)

Pandas is the best tool for doing data munging.

Tutorials

  1. Quick intro to pandas
  2. Alfred Essa has a series of videos on Pandas. These videos should give you a good idea of basic concepts.
  3. Also don’t miss this tutorial by Shane Neeley, this video gives you a comprehensive intro to Numpy, Scipy and Matplotlib.

Matplotlib

Matlplotlib is a Python module for visualization. Matplotlib allows you to easily make line graphs, pie chart, histogram and other professional grade figures. Using Matplotlib you can customize every aspect of a figure. When used within IPython, Matplotlib has interactive features like zooming and panning. It supports different GUI back ends on all operating systems, and can also export graphics to common vector and graphics formats: PDF, SVG, JPG, PNG, BMP, GIF, etc.

Tutorials

  1. Show me do has a good tutorial on Matplotlib
  2. I also recommend the cook book from pack publishers. This is an amazing book for someone getting started in Matplotlib.

Scikit-learn

Scikit-learn is a Python module for Machine learning built on top of Scipy. It provides a set of common Machine learning algorithms to users through a consistent interface. Scikit-learn helps to quickly implement popular algorithms on your dataset. Have a look at the list of algorithims available in scikit-learn, and you can quickly realize that it includes tools for many standard machine-learning tasks (such as clustering, classification, regression, etc).

Tutorials

  1. Introduction to Scikit-learn
  2. Tutorials from Scikit-learn.org

Conclusion

There are also other libraries such as Nltk(Natural language Tool kit), Scrappy for web scraping, Pattern for web mining, Theano for deep learning. But if you are getting started in python, I would recommend you to first get familiar with these 5 libraries. I have mentioned the tutorials that are beginner friendly, before going through these tutorials ensure that you are familiar with basics of python programming.

Advertisements

Send POST request with python using urllib2

Source code:

import httplib
import urllib
import urllib2

urllib2.install_opener(
    urllib2.build_opener(
        urllib2.ProxyHandler({'http': '127.0.0.1:8080'})
    )
)

headers = {
    #'Host': 'host.com',
    #'Connection': 'keep-alive',
    #'Content-Length': '325', 
    #'Origin': 'https://digitalvita.pitt.edu',
    #'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1',
    'Content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    #'Accept': 'text/javascript, text/html, application/xml, text/xml, */*',
    #'Referer': 'https://digitalvita.pitt.edu/index.php',
    #'Accept-Encoding': 'gzip,deflate,sdch',
    #'Accept-Language': 'en-US,en;q=0.8',
    #'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
    #'Cookie': 'PHPSESSID=lvetilatpgs9okgrntk1nvn595'
}

data = urllib.urlencode({
    "username":"admin",
    "password":"admin"})
req = urllib2.Request('http://abc.def/path', data, headers)
response = urllib2.urlopen(req)
print response.read()



Crack the Code

We have crackthecode.py script and goal is get the code that is the key to decrypt message.

Easy solution is brute force the key :))

Code:

import hashlib
import sys

def validatecode(code):	
	sha1 = hashlib.sha1()
	sha1.update(code)
	
	sha224 = hashlib.sha224()
	sha224.update(code)
	
	sha256 = hashlib.sha256()
	sha256.update(code)
	
	sha384 = hashlib.sha384()
	sha384.update(code)
	
	if sha1.hexdigest()[0:2] == 'a6' and sha224.hexdigest()[0:2] == '7b' and sha256.hexdigest()[0:2] == '57' and sha384.hexdigest()[0:2] == 'db':
		return True
	else:
		return False
		
arr  = '0123456789'
code = ''
for i1 in arr:
	for i2 in arr:
		for i3 in arr:
			for i4 in arr:
				for i5 in arr:
					for i6 in arr:
						for i7 in arr:
							code = i1 + i2 + i3 + i4 + i5 + i6 + i7
							if validatecode(code) == True:
								print 'code: ' + code
								sys.exit()

After about 5s, script return the code is: 3495745

Then, run crackthecode.py with this code:

# python crackthecode.py -p 3495745

result:

Capture

Key: 44859c3554ee157264297d62d8aeef64685c57549051836001e14b8821ab6a0f