Difference between revisions of "Python for Data Science"

From Sinfronteras
Jump to: navigation, search
(Pandas)
(Replaced content with "For a standard Python tutorial go to Python <br /> ==Courses== *Udemy - Python for Data Science and Machine Learning Bootcamp :https://www.udemy.com/course/python-f...")
(Tag: Replaced)
Line 119: Line 119:
  
 
<br />
 
<br />
 
+
==[[Data Visualization with Python]]==
==Pandas Built-in Data Visualization==
 
In this lecture we will learn about pandas built-in capabilities for data visualization! It's built-off of '''<code>matplotlib</code>''', but it baked into pandas for easier usage!
 
 
 
Hopefully you can see why this method of plotting will be a lot easier to use than full-on matplotlib, it balances ease of use with control over the figure. A lot of the plot calls also accept additional arguments of their parent matplotlib plt. call.<br />
 
 
 
'''The data we'll use in this part:'''
 
 
 
*[[:File:Df1.csv]]
 
*[[:File:Df2.csv]]
 
*[[:File:Df3.csv]]
 
 
 
<br />
 
{| class="wikitable"
 
|-
 
| colspan="4" |<syntaxhighlight lang="python3">
 
import numpy as np
 
import pandas as pd
 
%matplotlib inline
 
 
 
df1 = pd.read_csv('Df1.csv',index_col=0)
 
df2 = pd.read_csv('Df2.csv')
 
</syntaxhighlight>
 
|-
 
!
 
!Method/Operator
 
!Description/Example
 
!Output/Figure
 
|-
 
! rowspan="5" style="vertical-align:top;" |<h4 style="text-align:left">[https://matplotlib.org/gallery.html#style_sheets Style Sheets]</h4>
 
| rowspan="5" style="vertical-align:top;"|'''<code>plt.style.use(<nowiki>''</nowiki>)</code>'''
 
|Matplotlib has style sheets you can use to make your plots look a little nicer. These style sheets include plot_bmh,plot_fivethirtyeight,plot_ggplot and more. They basically create a set of style rules that your plots follow. I recommend using them, they make all your plots have the same look and feel more professional. You can even create your own if you want your company's plots to all have the same look (it is a bit tedious to create on though).
 
Here is how to use them.
 
 
 
'''Before plt.style.use() your plots look like this:'''<syntaxhighlight lang="python3">
 
df1['A'].hist()
 
</syntaxhighlight><br />
 
 
 
|[[File:PandasBuilt-inData visualization1.png|center]]
 
|-
 
|'''Call the style:'''<syntaxhighlight lang="python3">
 
import matplotlib.pyplot as plt
 
plt.style.use('ggplot')
 
</syntaxhighlight>Now your plots look like this:<syntaxhighlight lang="python3">
 
df1['A'].hist()
 
</syntaxhighlight><br />
 
|[[File:PandasBuilt-inData visualization1.png|center]]
 
|-
 
|<syntaxhighlight lang="python3">
 
plt.style.use('bmh')
 
df1['A'].hist()
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization3.png|center]]
 
|-
 
|<syntaxhighlight lang="python3">
 
plt.style.use('dark_background')
 
df1['A'].hist()
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization4.png|center]]
 
|-
 
|<syntaxhighlight lang="python3">
 
plt.style.use('fivethirtyeight')
 
df1['A'].hist()
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization5.png|center]]
 
|-
 
! rowspan="13" style="vertical-align:top;" |<h4 style="text-align:left">Plot Types</h4>
 
| style="vertical-align:top;" |
 
|There are several plot types built-in to pandas, most of them statistical plots by nature:
 
 
 
*<code>df.plot.area</code>, <code>df.plot.barh</code>, <code>df.plot.density</code>, <code>df.plot.hist</code>, <code>df.plot.line</code>, <code>df.plot.scatter</code>, <code>df.plot.bar</code>, <code>df.plot.box</code>, <code>df.plot.hexbin</code>, <code>df.plot.kde</code>, <code>df.plot.pie</code>
 
|
 
|-
 
| style="vertical-align:top;" |<h5 style="text-align:left">Area</h5>
 
 
 
<code>df.plot.area</code>
 
|<syntaxhighlight lang="python3">
 
df2.plot.area(alpha=0.4)
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization6.png|center]]
 
|-
 
| rowspan="2" style="vertical-align:top;" |<h5 style="text-align:left">Barplots</h5>
 
<code>df.plot.bar()</code>
 
|<syntaxhighlight lang="python3">
 
df2.plot.bar()
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization7.png|center]]
 
|-
 
|<syntaxhighlight lang="python3">
 
df2.plot.bar(stacked=True)
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization8.png|center]]
 
|-
 
| style="vertical-align:top;" |<h5 style="text-align:left">Histograms</h5>
 
<code>df.plot.hist()</code>
 
|<syntaxhighlight lang="python3">
 
df1['A'].plot.hist(bins=50)
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization9.png|center]]
 
|-
 
| style="vertical-align:top;" |<h5 style="text-align:left">Line Plots</h5>
 
<code>df.plot.line()</code>
 
| colspan="2" |<syntaxhighlight lang="python3">
 
df1.plot.line(x=df1.index,y='B',figsize=(12,3),lw=1)
 
</syntaxhighlight>[[File:PandasBuilt-inData visualization10.png|center]]
 
|-
 
| rowspan="3" style="vertical-align:top;" |<h5 style="text-align:left">Scatter Plots</h5>
 
<code>df.plot.scatter()</code>
 
|<syntaxhighlight lang="python3">
 
df1.plot.scatter(x='A',y='B')
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization11.png|center]]
 
|-
 
|You can use c to color based off another column value Use cmap to indicate colormap to use. For all the colormaps, check out: http://matplotlib.org/users/colormaps.html<br /><syntaxhighlight lang="python3">
 
df1.plot.scatter(x='A',y='B',c='C',cmap='coolwarm')
 
</syntaxhighlight><br />
 
|[[File:PandasBuilt-inData visualization12.png|center]]
 
|-
 
|Or use s to indicate size based off another column. s parameter needs to be an array, not just the name of a column:<br /><syntaxhighlight lang="python3">
 
df1.plot.scatter(x='A',y='B',s=df1['C']*200)
 
</syntaxhighlight><br />
 
|[[File:PandasBuilt-inData visualization13.png|center]]
 
|-
 
| style="vertical-align:top;" |<h5 style="text-align:left">BoxPlots</h5>
 
<code>df.plot.box()</code>
 
|<syntaxhighlight lang="python3">
 
df2.plot.box() # Can also pass a by= argument for groupby
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization14.png|center]]
 
|-
 
| style="vertical-align:top;" |<h5 style="text-align:left">Hexagonal Bin Plot</h5>
 
<code>df.plot.hexbin()</code>
 
|Useful for Bivariate Data, alternative to scatterplot:<syntaxhighlight lang="python3">
 
df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b'])
 
df.plot.hexbin(x='a',y='b',gridsize=25,cmap='Oranges')
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization15.png|center]]
 
|-
 
| rowspan="2" style="vertical-align:top;" |<h5 style="text-align:left">Kernel Density Estimation plot (KDE)</h5>
 
<code>df2.plot.kde()</code>
 
|<syntaxhighlight lang="python3">
 
df2['a'].plot.kde()
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization16.png|center]]
 
|-
 
|<syntaxhighlight lang="python3">
 
df2.plot.density()
 
</syntaxhighlight>
 
|[[File:PandasBuilt-inData visualization17.png|center]]
 
|}
 
 
 
 
 
<br />
 
 
 
==Data Visualization with Matplotlib==
 
Matplotlib is the "grandfather" library of data visualization with Python. It was created by John Hunter. He created it to try to replicate MatLab's (another programming language) plotting capabilities in Python. So if you happen to be familiar with matlab, matplotlib will feel natural to you.
 
 
 
It is an excellent 2D and 3D graphics library for generating scientific figures.
 
 
 
ahora
 
'''Some of the major Pros of Matplotlib are:'''
 
 
 
*Generally easy to get started for simple plots
 
*Support for custom labels and texts
 
*Great control of every element in a figure
 
*High-quality output in many formats
 
*Very customizable in general
 
 
 
 
 
'''References:'''
 
 
 
*The project web page for matplotlib: http://www.matplotlib.org
 
*The source code for matplotlib: https://github.com/matplotlib/matplotlib
 
*<span style="background:#D8BFD8">A large gallery showcaseing various types of plots matplotlib can create. Highly recommended!:</span> http://matplotlib.org/gallery.html
 
*A good matplotlib tutorial: http://www.loria.fr/~rougier/teaching/matplotlib
 
 
 
 
 
But most likely you'll be passing numpy arrays or pandas columns (which essentially also behave like arrays). However, you can also use lists.
 
 
 
 
 
Matplotlib allows you to create reproducible figures programmatically. Let's learn how to use it! Before continuing this lecture, I encourage you just to explore the official Matplotlib web page: http://matplotlib.org/
 
 
 
 
 
<br />
 
===Installation===
 
<syntaxhighlight lang="python">
 
conda install matplotlib
 
</syntaxhighlight>
 
 
 
Or without conda:
 
<syntaxhighlight lang="python">
 
pip install matplotlib
 
</syntaxhighlight>
 
 
 
 
'''Importing:'''
 
<syntaxhighlight lang="python">
 
import matplotlib.pyplot as plt
 
</syntaxhighlight>
 
 
 
 
 
'''You'll also need to use this line to see plots in the notebook:'''
 
<syntaxhighlight lang="python">
 
%matplotlib inline
 
</syntaxhighlight>
 
That line is only for jupyter notebooks, if you are using another editor, you'll use: '''<code>plt.show()</code>''' at the end of all your plotting commands to have the figure pop up in another window.
 
 
 
<br />
 
{| class="wikitable"
 
|-
 
| colspan="4" |Array example:<syntaxhighlight lang="python3">
 
import numpy as np
 
x = np.linspace(0, 5, 11)
 
y = x ** 2
 
 
 
x
 
# Output:
 
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
 
 
 
y
 
# Output:
 
array([ 0.  ,  0.25,  1.  ,  2.25,  4.  ,  6.25,  9.  , 12.25, 16.  ,
 
      20.25, 25.  ])
 
</syntaxhighlight>
 
|-
 
!
 
!
 
!Description/Example
 
!Output/Figure
 
|-
 
! rowspan="2" style="vertical-align:top;" |<h3 style="text-align:left">Basic example</h3>
 
|
 
|<syntaxhighlight lang="python3">
 
plt.plot(x, y, 'r') # 'r' is the color red
 
plt.xlabel('X Axis Title Here')
 
plt.ylabel('Y Axis Title Here')
 
plt.title('String Title Here')
 
plt.show()
 
</syntaxhighlight>
 
|[[File:Matplotlib1.png|400px|thumb]]
 
|-
 
|style="vertical-align:top;"|<h4 style="text-align:left">Creating Multiplots on Same Canvas</h4>
 
|<syntaxhighlight lang="python3">
 
# plt.subplot(nrows, ncols, plot_number)
 
plt.subplot(1,2,1)
 
plt.plot(x, y, 'r--') # More on color options later
 
plt.subplot(1,2,2)
 
plt.plot(y, x, 'g*-');
 
</syntaxhighlight>
 
|[[File:Matplotlib2.png|400px|thumb]]
 
|-
 
! rowspan="17" style="vertical-align:top;" |<h3 style="text-align:left; vertical-align: text-top;">Matplotlib Object Oriented Method</h3>
 
|
 
|Now that we've seen the basics, let's break it all down with a more formal introduction of Matplotlib's Object Oriented API. This means we will instantiate figure objects and then call methods or attributes from that object.
 
 
 
The main idea in using the more formal Object Oriented method is to create figure objects and then just call methods or attributes off of that object. This approach is nicer when dealing with a canvas that has multiple plots on it.
 
 
 
To begin we create a figure instance. Then we can add axes to that figure:<syntaxhighlight lang="python3">
 
# Create Figure (empty canvas)
 
fig = plt.figure()
 
 
 
# Add set of axes to figure
 
axes = fig.add_axes([0.1, 0.1, 0.8, 0.8]) # left, bottom, width, height (range 0 to 1)
 
 
 
# Plot on that set of axes
 
axes.plot(x, y, 'b')
 
axes.set_xlabel('Set X Label') # Notice the use of set_ to begin methods
 
axes.set_ylabel('Set y Label')
 
axes.set_title('Set Title')
 
</syntaxhighlight>
 
|[[File:Matplotlib3.png|thumb|372x372px]]
 
|-
 
|
 
|Code is a little more complicated, but the advantage is that we now have full control of where the plot axes are placed, and we can easily add more than one axis to the figure:<syntaxhighlight lang="python3">
 
# Creates blank canvas
 
fig = plt.figure()
 
 
 
axes1 = fig.add_axes([0.1, 0.1, 0.8, 0.8]) # main axes
 
axes2 = fig.add_axes([0.2, 0.5, 0.4, 0.3]) # inset axes
 
 
 
# Larger Figure Axes 1
 
axes1.plot(x, y, 'b')
 
axes1.set_xlabel('X_label_axes2')
 
axes1.set_ylabel('Y_label_axes2')
 
axes1.set_title('Axes 2 Title')
 
 
 
# Insert Figure Axes 2
 
axes2.plot(y, x, 'r')
 
axes2.set_xlabel('X_label_axes2')
 
axes2.set_ylabel('Y_label_axes2')
 
axes2.set_title('Axes 2 Title');
 
</syntaxhighlight><br />
 
|[[File:Matplotlib4.png|thumb|372x372px]]
 
|-
 
| rowspan="4" style="vertical-align:top;"|<h4 style="text-align:left"><code>subplots()</code></h4>
 
|'''The plt.subplots() object will act as a more automatic axis manager:'''<syntaxhighlight lang="python3">
 
# Use similar to plt.figure() except use tuple unpacking to grab fig and axes
 
fig, axes = plt.subplots()
 
 
 
# Now use the axes object to add stuff to plot
 
axes.plot(x, y, 'r')
 
axes.set_xlabel('x')
 
axes.set_ylabel('y')
 
axes.set_title('title');
 
</syntaxhighlight><br />
 
|[[File:Matplotlib5.png|thumb|372x372px]]
 
|-
 
|'''Then you can specify the number of rows and columns when creating the subplots() object:'''<syntaxhighlight lang="python3">
 
# Empty canvas of 1 by 2 subplots
 
fig, axes = plt.subplots(nrows=1, ncols=2)
 
</syntaxhighlight><br />
 
|[[File:Matplotlib6.png|thumb|372x372px]]
 
|-
 
|'''Axes is an array of axes to plot on:'''<syntaxhighlight lang="python3">
 
axes
 
# Output:
 
array([<matplotlib.axes._subplots.AxesSubplot object at 0x111f0f8d0>,
 
      <matplotlib.axes._subplots.AxesSubplot object at 0x1121f5588>], dtype=object)
 
</syntaxhighlight>'''We can iterate through this array:'''<syntaxhighlight lang="python3">
 
for ax in axes:
 
    ax.plot(x, y, 'b')
 
    ax.set_xlabel('x')
 
    ax.set_ylabel('y')
 
    ax.set_title('title')
 
 
 
# Display the figure object   
 
fig
 
</syntaxhighlight><br />
 
|[[File:Matplotlib7.png|thumb|372x372px]]
 
|-
 
|A common issue with matplolib is overlapping subplots or figures. We ca use '''fig.tight_layout()''' or '''plt.tight_layout()''' method, which automatically adjusts the positions of the axes on the figure canvas so that there is no overlapping content:<syntaxhighlight lang="python3">
 
fig, axes = plt.subplots(nrows=1, ncols=2)
 
 
 
for ax in axes:
 
    ax.plot(x, y, 'g')
 
    ax.set_xlabel('x')
 
    ax.set_ylabel('y')
 
    ax.set_title('title')
 
 
 
fig   
 
plt.tight_layout()
 
</syntaxhighlight><br />
 
|[[File:Matplotlib8.png|thumb|372x372px]]
 
|-
 
| rowspan="2" style="vertical-align:top;"|<h4 style="text-align:left">Figure size, aspect ratio and DPI</h4>
 
|Matplotlib allows the aspect ratio, DPI and figure size to be specified when the Figure object is created. You can use the <code>figsize</code> and <code>dpi</code> keyword arguments.
 
 
 
*<code>figsize</code> is a tuple of the width and height of the figure in inches
 
*<code>dpi</code> is the dots-per-inch (pixel per inch).
 
 
 
 
 
 
 
For example: <syntaxhighlight lang="python3">
 
fig = plt.figure(figsize=(8,4), dpi=100)
 
# Output:
 
<Figure size 800x400 with 0 Axes>
 
</syntaxhighlight><br />
 
|
 
|-
 
|The same arguments can also be passed to layout managers, such as the <code>subplots</code> function:<syntaxhighlight lang="python3">
 
fig, axes = plt.subplots(figsize=(12,3))
 
 
 
axes.plot(x, y, 'r')
 
axes.set_xlabel('x')
 
axes.set_ylabel('y')
 
axes.set_title('title');
 
</syntaxhighlight><br />
 
|[[File:Matplotlib9.png|thumb|371x371px]]
 
|-
 
|style="vertical-align:top;"|<h4 style="text-align:left">Saving figures</h4>
 
|Matplotlib can generate high-quality output in a number formats, including PNG, JPG, EPS, SVG, PGF and PDF.
 
 
 
 
 
 
 
To save a figure to a file we can use the <code>savefig</code> method in the <code>Figure</code> class:<syntaxhighlight lang="python3">
 
fig.savefig("filename.png")
 
</syntaxhighlight>
 
 
 
 
 
 
 
Here we can also optionally specify the DPI and choose between different output formats:<syntaxhighlight lang="python3">
 
fig.savefig("filename.png", dpi=200)
 
</syntaxhighlight><br />
 
|
 
|-
 
| rowspan="4" style="vertical-align:top;"|<h4 style="text-align:left">Legends, labels and titles</h4>
 
|'''Figure titles'''
 
A title can be added to each axis instance in a figure. To set the title, use the <code>set_title</code> method in the axes instance:<syntaxhighlight lang="python3">
 
ax.set_title("title");
 
</syntaxhighlight><br />
 
|
 
|-
 
|'''Axis labels'''
 
Similarly, with the methods <code>set_xlabel</code> and <code>set_ylabel</code>, we can set the labels of the X and Y axes:<syntaxhighlight lang="python3">
 
ax.set_xlabel("x")
 
ax.set_ylabel("y");
 
</syntaxhighlight><br />
 
|
 
|-
 
|'''Legends'''
 
You can use the '''label="label text"''' keyword argument when plots or other objects are added to the figure, and then using the '''legend''' method without arguments to add the legend to the figure:<syntaxhighlight lang="python3">
 
fig = plt.figure()
 
 
 
ax = fig.add_axes([0,0,1,1])
 
 
 
ax.plot(x, x**2, label="x**2")
 
ax.plot(x, x**3, label="x**3")
 
ax.legend()
 
</syntaxhighlight><br />
 
|[[File:Matplotlib10.png|thumb|371x371px]]Notice how are legend overlaps some of the actual plot!
 
|-
 
|The '''legend''' function takes an optional keyword argument '''loc''' that can be used to specify where in the figure the legend is to be drawn. The allowed values of '''loc''' are numerical codes for the various places the legend can be drawn. See the documentation page for details. Some of the most common '''loc''' values are:<syntaxhighlight lang="python3">
 
# Lots of options....
 
 
 
ax.legend(loc=1) # upper right corner
 
ax.legend(loc=2) # upper left corner
 
ax.legend(loc=3) # lower left corner
 
ax.legend(loc=4) # lower right corner
 
 
 
# .. many more options are available
 
 
 
# Most common to choose
 
ax.legend(loc=0) # let matplotlib decide the optimal location
 
fig
 
</syntaxhighlight><br />
 
|[[File:Matplotlib11.png|thumb|371x371px]]<br />
 
|-
 
| rowspan="3" style="vertical-align:top;"|<h4 style="text-align:left">Setting colors, linewidths, linetypes</h4>
 
|'''Colors with MatLab like syntax''':
 
We can define the colors of lines and other graphical elements in a number of ways. First of all, we can use the MATLAB-like syntax where <code>'b'</code> means blue, <code>'g'</code> means green, etc. The MATLAB API for selecting line styles are also supported: where, for example, 'b.-' means a blue line with dots:<syntaxhighlight lang="python3">
 
# MATLAB style line color and style
 
fig, ax = plt.subplots()
 
ax.plot(x, x**2, 'b.-') # blue line with dots
 
ax.plot(x, x**3, 'g--') # green dashed line
 
</syntaxhighlight><br />
 
|[[File:Matplotlib12.png|thumb|371x371px]]<br />
 
|-
 
|'''Colors with the color= parameter''':
 
We can also define colors by their names or RGB hex codes and optionally provide an alpha value using the <code>color</code> and <code>alpha</code> keyword arguments. Alpha indicates opacity.<syntaxhighlight lang="python3">
 
fig, ax = plt.subplots()
 
 
 
ax.plot(x, x+1, color="blue", alpha=0.5) # half-transparant
 
ax.plot(x, x+2, color="#8B008B")        # RGB hex code
 
ax.plot(x, x+3, color="#FF8C00")        # RGB hex code
 
</syntaxhighlight><br />
 
|[[File:Matplotlib13.png|thumb|362x362px]]<br />
 
|-
 
|'''Line and marker styles''':
 
To change the line width, we can use the <code>linewidth</code> or <code>lw</code> keyword argument. The line style can be selected using the <code>linestyle</code> or <code>ls</code> keyword arguments:<syntaxhighlight lang="python3">
 
fig, ax = plt.subplots(figsize=(12,6))
 
 
 
ax.plot(x, x+1, color="red", linewidth=0.25)
 
ax.plot(x, x+2, color="red", linewidth=0.50)
 
ax.plot(x, x+3, color="red", linewidth=1.00)
 
ax.plot(x, x+4, color="red", linewidth=2.00)
 
 
 
# possible linestype options ‘-‘, ‘–’, ‘-.’, ‘:’, ‘steps’
 
ax.plot(x, x+5, color="green", lw=3, linestyle='-')
 
ax.plot(x, x+6, color="green", lw=3, ls='-.')
 
ax.plot(x, x+7, color="green", lw=3, ls=':')
 
 
 
# custom dash
 
line, = ax.plot(x, x+8, color="black", lw=1.50)
 
line.set_dashes([5, 10, 15, 10]) # format: line length, space length, ...
 
 
 
# possible marker symbols: marker = '+', 'o', '*', 's', ',', '.', '1', '2', '3', '4', ...
 
ax.plot(x, x+ 9, color="blue", lw=3, ls='-', marker='+')
 
ax.plot(x, x+10, color="blue", lw=3, ls='--', marker='o')
 
ax.plot(x, x+11, color="blue", lw=3, ls='-', marker='s')
 
ax.plot(x, x+12, color="blue", lw=3, ls='--', marker='1')
 
 
 
# marker size and color
 
ax.plot(x, x+13, color="purple", lw=1, ls='-', marker='o', markersize=2)
 
ax.plot(x, x+14, color="purple", lw=1, ls='-', marker='o', markersize=4)
 
ax.plot(x, x+15, color="purple", lw=1, ls='-', marker='o', markersize=8, markerfacecolor="red")
 
ax.plot(x, x+16, color="purple", lw=1, ls='-', marker='s', markersize=8,
 
        markerfacecolor="yellow", markeredgewidth=3, markeredgecolor="green");
 
</syntaxhighlight><br />
 
|[[File:Matplotlib14.png|thumb|362x362px]]<br />
 
|-
 
|style="vertical-align:top;"|<h4 style="text-align:left">Plot range</h4>
 
|We can configure the ranges of the axes using the <code>set_ylim</code> and <code>set_xlim</code> methods in the axis object, or <code>axis('tight')</code> for automatically getting "tightly fitted" axes ranges:<syntaxhighlight lang="python3">
 
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
 
 
 
axes[0].plot(x, x**2, x, x**3)
 
axes[0].set_title("default axes ranges")
 
 
 
axes[1].plot(x, x**2, x, x**3)
 
axes[1].axis('tight')
 
axes[1].set_title("tight axes")
 
 
 
axes[2].plot(x, x**2, x, x**3)
 
axes[2].set_ylim([0, 60])
 
axes[2].set_xlim([2, 5])
 
axes[2].set_title("custom axes range");
 
</syntaxhighlight><br />
 
|[[File:Matplotlib15.png|thumb|362x362px]]<br />
 
|-
 
! rowspan="4" style="vertical-align:top;"|<h3 style="text-align:left">Special Plot Types</h3>
 
| colspan="3" |There are many specialized plots we can create, such as '''barplots''', '''histograms''', '''scatter plots''', and much more. Most of these type of plots we will actually create using seaborn, a statistical plotting library for Python. But here are a few examples of these type of plots:
 
|-
 
|style="vertical-align:top;"|<h4 style="text-align:left">Scatter plots</h4>
 
|<syntaxhighlight lang="python3">
 
plt.scatter(x,y)
 
</syntaxhighlight>
 
|[[File:Matplotlib16.png|thumb|362x362px]]<br />
 
|-
 
|style="vertical-align:top;"|<h4 style="text-align:left">Histograms</h4>
 
|<syntaxhighlight lang="python3">
 
from random import sample
 
data = sample(range(1, 1000), 100)
 
plt.hist(data)
 
</syntaxhighlight>
 
|[[File:Matplotlib17.png|thumb|362x362px]]<br />
 
|-
 
|style="vertical-align:top;"|<h4 style="text-align:left">Barplots</h4>
 
|<syntaxhighlight lang="python3">
 
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
 
 
 
# rectangular box plot
 
plt.boxplot(data,vert=True,patch_artist=True); 
 
</syntaxhighlight>
 
|[[File:Matplotlib18.png|thumb|362x362px]]<br />
 
|}
 
 
 
 
 
 
 
<br />
 
=== Advanced Matplotlib Concepts ===
 
In this lecture we  cover some more advanced topics which you won't usually use as often. You can always reference the documentation for more resources!
 
 
 
Forther reading:
 
<br />
 
{| class="wikitable"
 
|-
 
| colspan="4" |<syntaxhighlight lang="python3">
 
import numpy as np
 
x = np.linspace(0, 5, 11)
 
y = x ** 2
 
</syntaxhighlight>
 
|-
 
!
 
!
 
!Description/Example
 
!Output/Figure
 
|-
 
!style="vertical-align:top;"|<h4 style="text-align:left">Logarithmec scale</h4>
 
|
 
|
 
|
 
|-
 
! rowspan="2" style="vertical-align:top;"|<h4 style="text-align:left">Placement of ticks and custom tick labels</h4>
 
|
 
|
 
|
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">Scientific notation</h5>
 
|
 
|
 
|-
 
! rowspan="2" style="vertical-align:top;"|<h4 style="text-align:left">Axis number and axis label spacing</h4>
 
|
 
|
 
|
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">Axis position adjustments</h5>
 
|
 
|
 
|-
 
!style="vertical-align:top;"|<h4 style="text-align:left">Axis grid</h4>
 
|
 
|
 
|
 
|-
 
!style="vertical-align:top;"|<h4 style="text-align:left">Axis spines</h4>
 
|
 
|
 
|
 
|-
 
!style="vertical-align:top;"|<h4 style="text-align:left">Twin axes</h4>
 
|
 
|
 
|
 
|-
 
!style="vertical-align:top;"|<h4 style="text-align:left">Axes where x and y is zero</h4>
 
|
 
|
 
|
 
|-
 
!style="vertical-align:top;"|<h4 style="text-align:left">Other 2D plot styles</h4>
 
|
 
|
 
|
 
|-
 
!style="vertical-align:top;"|<h4 style="text-align:left">Text annotation</h4>
 
|
 
|
 
|
 
|-
 
! rowspan="5" style="vertical-align:top;"|<h4 style="text-align:left">Figures with multiple subplots and insets</h4>
 
|
 
|
 
|
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">subplots</h5>
 
|
 
|
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">subplot2grid</h5>
 
|
 
|
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">gridspec</h5>
 
|
 
|
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">add_axes</h5>
 
|
 
|
 
|-
 
! rowspan="4" style="vertical-align:top;"|<h4 style="text-align:left">Colormap and contour figures</h4>
 
|
 
|
 
|
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">pcolor</h5>
 
|
 
|
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">imshow</h5>
 
|
 
|
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">contour</h5>
 
|
 
|
 
|-
 
! rowspan="4" style="vertical-align:top;"|<h4 style="text-align:left">3D figures</h4>
 
|
 
| colspan="2" |To use 3D graphics in matplotlib, we first need to create an instance of the <code>Axes3D</code> class. 3D axes can be added to a matplotlib figure canvas in exactly the same way as 2D axes; or, more conveniently, by passing a <code>projection='3d'</code> keyword argument to the <code>add_axes</code> or <code>add_subplot</code> methods.<syntaxhighlight lang="python3">
 
from mpl_toolkits.mplot3d.axes3d import Axes3D
 
</syntaxhighlight><br />
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">Surface plots</h5>
 
| colspan="2" |<syntaxhighlight lang="python3">
 
fig = plt.figure(figsize=(14,6))
 
 
 
# `ax` is a 3D-aware axis instance because of the projection='3d' keyword argument to add_subplot
 
ax = fig.add_subplot(1, 2, 1, projection='3d')
 
 
 
p = ax.plot_surface(X, Y, Z, rstride=4, cstride=4, linewidth=0)
 
 
 
# surface_plot with color grading and color bar
 
ax = fig.add_subplot(1, 2, 2, projection='3d')
 
p = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=matplotlib.cm.coolwarm, linewidth=0, antialiased=False)
 
cb = fig.colorbar(p, shrink=0.5)
 
</syntaxhighlight>[[File:Matplotlib advance1.png|center|thumb|597x597px]]
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">Wire-frame plot</h5>
 
|<syntaxhighlight lang="python3">
 
fig = plt.figure(figsize=(8,6))
 
ax = fig.add_subplot(1, 1, 1, projection='3d')
 
p = ax.plot_wireframe(X, Y, Z, rstride=4, cstride=4)
 
</syntaxhighlight>
 
|[[File:Matplotlib advance2.png|center]]
 
|-
 
|style="vertical-align:top;"|<h5 style="text-align:left">Coutour plots with projections</h5>
 
|<syntaxhighlight lang="python3">
 
fig = plt.figure(figsize=(8,6))
 
 
 
ax = fig.add_subplot(1,1,1, projection='3d')
 
 
 
ax.plot_surface(X, Y, Z, rstride=4, cstride=4, alpha=0.25)
 
cset = ax.contour(X, Y, Z, zdir='z', offset=-np.pi, cmap=matplotlib.cm.coolwarm)
 
cset = ax.contour(X, Y, Z, zdir='x', offset=-np.pi, cmap=matplotlib.cm.coolwarm)
 
cset = ax.contour(X, Y, Z, zdir='y', offset=3*np.pi, cmap=matplotlib.cm.coolwarm)
 
 
 
ax.set_xlim3d(-np.pi, 2*np.pi);
 
ax.set_ylim3d(0, 3*np.pi);
 
ax.set_zlim3d(-np.pi, 2*np.pi);
 
</syntaxhighlight>
 
|[[File:Matplotlib advance3.png|center]]
 
|}
 
 
 
 
 
<br />
 
 
 
==Data visualization with Seaborn==
 
<code>Seaborn</code> is a statistical visualization library designed to work with pandas dataframes well.
 
 
 
 
 
<syntaxhighlight lang="python3">
 
import seaborn as sns
 
%matplotlib inline
 
</syntaxhighlight><br />
 
 
 
 
 
<br />
 
===Built-in data sets===
 
Seaborn comes with built-in data sets!<syntaxhighlight lang="python3">
 
tips = sns.load_dataset('tips')
 
tips.head()
 
# Output:
 
    total_bill    tip    sex  smoker  day    time  size
 
0        16.99  1.01  Female      No  Sun  Dinner    2
 
1        10.34  1.66    Male      No  Sun  Dinner    3
 
2        21.01  3.50    Male      No  Sun  Dinner    3
 
3        23.68  3.31    Male      No  Sun  Dinner    2
 
4        24.59  3.61  Female      No  Sun  Dinner    4
 
</syntaxhighlight><br />
 
 
 
 
 
<br />
 
===Distribution Plots===
 
{| class="wikitable"
 
|-
 
| colspan="4" |<syntaxhighlight lang="python3">
 
import seaborn as sns
 
%matplotlib inline
 
</syntaxhighlight>
 
|-
 
!
 
!
 
!Description/Example
 
!Output/Figure
 
|-
 
! style="vertical-align:top;" |<h4 style="text-align:left">Distribution of a univariate set of observations</h4>
 
|<code>'''distplot'''</code>
 
|The distplot shows the distribution of a univariate set of observations:<syntaxhighlight lang="python3">
 
sns.distplot(tips['total_bill'])
 
# Safe to ignore warnings
 
</syntaxhighlight>
 
 
 
 
 
 
 
To remove the kde layer and just have the histogram use:<syntaxhighlight lang="python3">
 
sns.distplot(tips['total_bill'],kde=False,bins=30)
 
</syntaxhighlight><br />
 
|[[File:Seaborn1.png|center]][[File:Seaborn2.png|center]]
 
|-
 
!style="vertical-align:top;"|<h4 style="text-align:left">Match up two distplots for bivariate data</h4>
 
|<code>'''jointplot()'''</code>
 
|<code>'''jointplot()'''</code> allows you to basically match up two distplots for bivariate data. With your choice of what '''kind''' parameter to compare with:
 
 
 
*<code>scatter</code>, <code>reg</code>, <code>resid</code>, <code>kde</code>, <code>hex</code><br />
 
<syntaxhighlight lang="python3">
 
sns.jointplot(x='total_bill',y='tip',data=tips,kind='scatter')
 
</syntaxhighlight><br /><syntaxhighlight lang="python3">
 
sns.jointplot(x='total_bill',y='tip',data=tips,kind='hex')
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.jointplot(x='total_bill',y='tip',data=tips,kind='reg')
 
</syntaxhighlight><br />
 
|[[File:Seaborn3.png|center]][[File:Seaborn4.png|center]][[File:Seaborn5.png|center]]
 
|-
 
!style="vertical-align:top;"|<h4 style="text-align:left">Plot pairwise relationships across an entire dataframe</h4>
 
|'''<code>pairplot</code>'''
 
| colspan="2" |'''<code>pairplot</code>''' will plot pairwise relationships across an entire dataframe (for the numerical columns) and supports a color hue argument (for categorical columns):<syntaxhighlight lang="python3">
 
sns.pairplot(tips)
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.pairplot(tips,hue='sex',palette='coolwarm')
 
</syntaxhighlight>
 
{| style="margin: 0 auto;"
 
|[[File:Seaborn6.png|center|407x407px]]
 
|[[File:Seaborn7.png|center|407x407px]]
 
|}
 
|-
 
!style="vertical-align:top;"|<h4 style="text-align:left">Draw a dash mark for every point on a univariate distribution</h4>
 
|<code>'''rugplot'''</code>
 
|rugplots are actually a very simple concept, they just draw a dash mark for every point on a univariate distribution. They are the building block of a KDE plot:<syntaxhighlight lang="python3">
 
sns.rugplot(tips['total_bill'])
 
</syntaxhighlight><br />
 
|[[File:Seaborn8.png|center]]
 
|-
 
! rowspan="4" style="vertical-align:top;"|<h4 style="text-align:left">[[wikipedia:Kernel_density_estimation#Practical_estimation_of_the_bandwidth|Kernel Density Estimation plots]]</h4>
 
| rowspan="4" |'''<code>kdeplot</code>'''
 
|kdeplots are Kernel Density Estimation plots. These KDE plots replace every single observation with a Gaussian (Normal) distribution centered around that value. For example:<syntaxhighlight lang="python3">
 
# Don't worry about understanding this code!
 
# It's just for the diagram below
 
import numpy as np
 
import matplotlib.pyplot as plt
 
from scipy import stats
 
 
 
#Create dataset
 
dataset = np.random.randn(25)
 
 
 
# Create another rugplot
 
sns.rugplot(dataset);
 
 
 
# Set up the x-axis for the plot
 
x_min = dataset.min() - 2
 
x_max = dataset.max() + 2
 
 
 
# 100 equally spaced points from x_min to x_max
 
x_axis = np.linspace(x_min,x_max,100)
 
 
 
# Set up the bandwidth, for info on this:
 
url = 'http://en.wikipedia.org/wiki/Kernel_density_estimation#Practical_estimation_of_the_bandwidth'
 
 
 
bandwidth = ((4*dataset.std()**5)/(3*len(dataset)))**.2
 
 
 
 
 
# Create an empty kernel list
 
kernel_list = []
 
 
 
# Plot each basis function
 
for data_point in dataset:
 
   
 
    # Create a kernel for each point and append to list
 
    kernel = stats.norm(data_point,bandwidth).pdf(x_axis)
 
    kernel_list.append(kernel)
 
   
 
    #Scale for plotting
 
    kernel = kernel / kernel.max()
 
    kernel = kernel * .4
 
    plt.plot(x_axis,kernel,color = 'grey',alpha=0.5)
 
 
 
plt.ylim(0,1)
 
</syntaxhighlight><br />
 
|[[File:Seaborn9.png|center]]
 
|-
 
|<syntaxhighlight lang="python3">
 
# To get the kde plot we can sum these basis functions.
 
 
 
# Plot the sum of the basis function
 
sum_of_kde = np.sum(kernel_list,axis=0)
 
 
 
# Plot figure
 
fig = plt.plot(x_axis,sum_of_kde,color='indianred')
 
 
 
# Add the initial rugplot
 
sns.rugplot(dataset,c = 'indianred')
 
 
 
# Get rid of y-tick marks
 
plt.yticks([])
 
 
 
# Set title
 
plt.suptitle("Sum of the Basis Functions")
 
</syntaxhighlight>
 
|[[File:Seaborn10.png|center]]
 
|-
 
|So with our tips dataset:<syntaxhighlight lang="python3">
 
sns.kdeplot(tips['total_bill'])
 
sns.rugplot(tips['total_bill'])
 
</syntaxhighlight><br />
 
|[[File:Seaborn11.png|center]]
 
|-
 
|<syntaxhighlight lang="python3">
 
sns.kdeplot(tips['tip'])
 
sns.rugplot(tips['tip'])
 
</syntaxhighlight>
 
|[[File:Seaborn12.png|center]]
 
|}
 
 
 
 
 
<br />
 
===Categorical Data Plots===
 
Now let's discuss using seaborn to plot categorical data! There are a few main plot types for this:
 
 
 
*<code>factorplot</code>
 
*<code>boxplot</code>
 
*<code>violinplot</code>
 
*<code>stripplot</code>
 
*<code>swarmplot</code>
 
*<code>barplot</code>
 
*<code>countplot</code>
 
 
 
{| class="wikitable"
 
|-
 
| colspan="4" |<syntaxhighlight lang="python3">
 
import seaborn as sns
 
%matplotlib inline
 
</syntaxhighlight>
 
|-
 
!
 
!
 
!Description/Example
 
!Output/Figure
 
|-
 
! rowspan="3" style="vertical-align:top;" |<h4 style="text-align:left">Barplot and Countplot</h4>
 
| rowspan="2" |<code>'''sns.barplot'''</code>
 
|'''<code>barplot</code>''' is a general plot that allows you to aggregate the categorical data based off some function, by default the mean:<syntaxhighlight lang="python3">
 
sns.barplot(x='sex',y='total_bill',data=tips)
 
</syntaxhighlight>
 
 
 
|[[File:Seaborn categorical1.png|center|350x350px]]
 
|-
 
|You can change the estimator object to your own function, that converts a vector to a scalar:<syntaxhighlight lang="python3">
 
import numpy as np
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.barplot(x='sex',y='total_bill',data=tips,estimator=np.std)
 
</syntaxhighlight><br />
 
|[[File:Seaborn categorical2.png|center|350x350px]]
 
|-
 
|'''<code>sns.countplot</code>'''
 
|This is essentially the same as barplot except the estimator is explicitly counting the number of occurrences. Which is why we only pass the x value:<syntaxhighlight lang="python3">
 
sns.countplot(x='sex',data=tips)
 
</syntaxhighlight><br />
 
|[[File:Seaborn categorical3.png|center|350x350px]]
 
|-
 
! rowspan="3" style="vertical-align:top;" |<h4 style="text-align:left">Boxplot and Violinplot</h4>
 
|
 
|Boxplots and Violinplots are used to shown the distribution of categorical data.
 
|
 
|-
 
|'''<code>sns.boxplot</code>'''
 
|A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.<syntaxhighlight lang="python3">
 
sns.boxplot(x="day", y="total_bill", data=tips,palette='rainbow')
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
# Can do entire dataframe with orient='h'
 
sns.boxplot(data=tips,palette='rainbow',orient='h')
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.boxplot(x="day", y="total_bill", hue="smoker",data=tips, palette="coolwarm")
 
</syntaxhighlight>
 
|[[File:Seaborn categorical4.png|center|350x350px]][[File:Seaborn categorical5.png|center|350x350px]][[File:Seaborn categorical6.png|center|350x350px]]
 
|-
 
|<code>'''sns.violinplot'''</code>
 
|A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.<syntaxhighlight lang="python3">
 
sns.violinplot(x="day", y="total_bill", data=tips,palette='rainbow')
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.violinplot(x="day", y="total_bill", data=tips,hue='sex',palette='Set1')
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.violinplot(x="day", y="total_bill", data=tips,hue='sex',split=True,palette='Set1')
 
</syntaxhighlight>
 
|[[File:Seaborn categorical7.png|center|350x350px]][[File:Seaborn categorical8.png|center|350x350px]][[File:Seaborn categorical9.png|center|350x350px]]
 
|-
 
! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">Stripplot and Swarmplot</h4>
 
|'''<code>sns.stripplot</code>'''
 
|The stripplot will draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.<syntaxhighlight lang="python3">
 
sns.stripplot(x="day", y="total_bill", data=tips)
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.stripplot(x="day", y="total_bill", data=tips,jitter=True)
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.stripplot(x="day", y="total_bill", data=tips,jitter=True,hue='sex',palette='Set1')
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.stripplot(x="day", y="total_bill", data=tips,jitter=True,hue='sex',palette='Set1',split=True)
 
</syntaxhighlight><br />
 
|[[File:Seaborn categorical10.png|center|350x350px]][[File:Seaborn categorical11.png|center|350x350px]][[File:Seaborn categorical12.png|center|350x350px]][[File:Seaborn categorical13.png|center|350x350px]]
 
|-
 
|'''<code>sns.swarmplot</code>'''
 
|The swarmplot is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap. This gives a better representation of the distribution of values, although it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them).<syntaxhighlight lang="python3">
 
sns.swarmplot(x="day", y="total_bill", data=tips)
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.swarmplot(x="day", y="total_bill",hue='sex',data=tips, palette="Set1", split=True)
 
</syntaxhighlight><br />
 
|[[File:Seaborn categorical14.png|center|350x350px]][[File:Seaborn categorical15.png|center|350x350px]]
 
|-
 
! style="vertical-align:top;" |<h4 style="text-align:left">Combining Categorical Plots</h4>
 
|
 
|<syntaxhighlight lang="python3">
 
sns.violinplot(x="tip", y="day", data=tips,palette='rainbow')
 
sns.swarmplot(x="tip", y="day", data=tips,color='black',size=3)
 
</syntaxhighlight>
 
|[[File:Seaborn categorical16.png|center|350x350px]]
 
|-
 
! style="vertical-align:top;" |<h4 style="text-align:left">Factorplot</h4>
 
|'''<code>sns.factorplot</code>'''
 
|factorplot is the most general form of a categorical plot. It can take in a '''kind''' parameter to adjust the plot type:<nowiki><syntaxhighlight lang="python3"></nowiki><syntaxhighlight lang="python3">
 
sns.factorplot(x='sex',y='total_bill',data=tips,kind='bar')
 
</syntaxhighlight><br />
 
|[[File:Seaborn categorical17.png|center|250x250px]]
 
|}
 
 
 
 
 
<br />
 
===Matrix Plots===
 
Matrix plots allow you to plot data as color-encoded matrices and can also be used to indicate clusters within the data (later in the machine learning section we will learn how to formally cluster data).
 
{| class="wikitable"
 
|-
 
| colspan="4" |<syntaxhighlight lang="python3">
 
import seaborn as sns
 
%matplotlib inline
 
 
 
flights = sns.load_dataset('flights')
 
 
 
tips = sns.load_dataset('tips')
 
 
 
tips.head()
 
# Output:
 
    total_bill    tip    sex  smoker  day    time  size
 
0        16.99  1.01  Female      No  Sun  Dinner    2
 
1        10.34  1.66    Male      No  Sun  Dinner    3
 
2        21.01  3.50    Male      No  Sun  Dinner    3
 
3        23.68  3.31    Male      No  Sun  Dinner    2
 
4        24.59  3.61  Female      No  Sun  Dinner    4
 
 
 
flights.head()
 
# Output:
 
    year      month    passengers
 
0  1949    January          112
 
1  1949    February          118
 
2  1949      March          132
 
3  1949      April          129
 
4  1949        May          121
 
</syntaxhighlight>
 
|-
 
!
 
!
 
!Description/Example
 
!Output/Figure
 
|-
 
! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">Heatmap</h4>
 
| rowspan="2" |'''<code>sns.heatmap</code>'''
 
|In order for a <code>'''heatmap'''</code> to work properly, your data should already be in a matrix form, the <code>'''sns.heatmap'''</code> function basically just colors it in for you. For example:<syntaxhighlight lang="python3">
 
# Matrix form for correlation data
 
tips.corr()
 
 
 
# Output:
 
            total_bill        tip        size
 
total_bill    1.000000    0.675734    0.598315
 
tip          0.675734    1.000000    0.489299
 
size          0.598315    0.489299    1.000000
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.heatmap(tips.corr())
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.heatmap(tips.corr(),cmap='coolwarm',annot=True)
 
</syntaxhighlight>
 
 
 
|[[File:Matrix plots1.png|center|300x300px]][[File:Matrix plots2.png|center|300x300px]]
 
|-
 
|Or for the flights data:<syntaxhighlight lang="python3">
 
flights.pivot_table(values='passengers',index='month',columns='year')
 
# Output:
 
year        1949    1950    1951    1952    1953    1954    1955    1956    1957    1958    1959    1960
 
month                                             
 
January    112    115    145    171    196    204    242    284    315    340    360    417
 
February    118    126    150    180    196    188    233    277    301    318    342    391
 
March      132    141    178    193    236    235    267    317    356    362    406    419
 
April      129    135    163    181    235    227    269    313    348    348    396    461
 
May        121    125    172    183    229    234    270    318    355    363    420    472
 
June        135    149    178    218    243    264    315    374    422    435    472    535
 
July        148    170    199    230    264    302    364    413    465    491    548    622
 
August      148    170    199    242    272    293    347    405    467    505    559    606
 
September  136    158    184    209    237    259    312    355    404    404    463    508
 
October    119    133    162    191    211    229    274    306    347    359    407    461
 
November    104    114    146    172    180    203    237    271    305    310    362    390
 
December    118    140    166    194    201    229    278    306    336    337    405    432
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
pvflights = flights.pivot_table(values='passengers',index='month',columns='year')
 
sns.heatmap(pvflights)
 
</syntaxhighlight><syntaxhighlight lang="python3">
 
sns.heatmap(pvflights,cmap='magma',linecolor='white',linewidths=1)
 
</syntaxhighlight><br />
 
|[[File:Matrix plots3.png|center|300x300px]][[File:Matrix plots4.png|center|300x300px]]
 
|-
 
! style="vertical-align:top;" |<h4 style="text-align:left">Clustermap</h4>
 
|'''<code>sns.clustermap</code>'''
 
|The clustermap uses hierarchal clustering to produce a clustered version of the heatmap. For example:<syntaxhighlight lang="python3">
 
sns.clustermap(pvflights)
 
</syntaxhighlight>
 
 
 
 
 
Notice now how the years and months are no longer in order, instead they are grouped by similarity in value (passenger count). That means we can begin to infer things from this plot, such as August and July being similar (makes sense, since they are both summer travel months)<syntaxhighlight lang="python3">
 
# More options to get the information a little clearer like normalization
 
sns.clustermap(pvflights,cmap='coolwarm',standard_scale=1)
 
</syntaxhighlight><br />
 
|[[File:Matrix plots5.png|center|300x300px]][[File:Matrix plots6.png|center|301x301px]]
 
|}
 
 
 
 
 
<br />
 
===Grids===
 
Grids are general types of plots that allow you to map plot types to rows and columns of a grid, this helps you create similar plots separated by features.
 
{| class="wikitable"
 
|-
 
| colspan="4" |<syntaxhighlight lang="python3">
 
import seaborn as sns
 
import matplotlib.pyplot as plt
 
%matplotlib inline
 
 
 
iris = sns.load_dataset('iris')
 
iris.head()
 
# Ouput:
 
    sepal_length    sepal_width    petal_length  petal_width    species
 
0            5.1            3.5            1.4            0.2    setosa
 
1            4.9            3.0            1.4            0.2    setosa
 
2            4.7            3.2            1.3            0.2    setosa
 
3            4.6            3.1            1.5            0.2    setosa
 
4            5.0            3.6            1.4            0.2    setosa
 
</syntaxhighlight>
 
|-
 
!
 
!
 
!Description/Example
 
!Output/Figure
 
|-
 
! rowspan="3" style="vertical-align:top;" |<h4 style="text-align:left">PairGrid</h4>
 
| rowspan="3" |'''<code>sns.PairGrid()</code>'''
 
|Pairgrid is a subplot grid for plotting pairwise relationships in a dataset.<syntaxhighlight lang="python3">
 
# Just the Grid
 
sns.PairGrid(iris)
 
</syntaxhighlight><br />
 
 
 
|[[File:Seaborn grids1.png|center|500x500px]]
 
|-
 
|Then you map to the grid<syntaxhighlight lang="python3">
 
g = sns.PairGrid(iris)
 
g.map(plt.scatter)
 
</syntaxhighlight><br />
 
|[[File:Seaborn grids2.png|center|500x500px]]
 
|-
 
|Map to upper,lower, and diagonal<syntaxhighlight lang="python3">
 
g = sns.PairGrid(iris)
 
g.map_diag(plt.hist)
 
g.map_upper(plt.scatter)
 
g.map_lower(sns.kdeplot)
 
</syntaxhighlight><br />
 
|[[File:Seaborn grids3.png|center|500x500px]]
 
|-
 
! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">Pairplot</h4>
 
| rowspan="2" |'''<code>sns.pairplot()</code>'''
 
|A '''<code>pairplot</code>''' is a simpler version of '''<code>PairGrid</code>''' (you'll use quite often)<syntaxhighlight lang="python3">
 
sns.pairplot(iris)
 
</syntaxhighlight><br />
 
|[[File:Seaborn grids4.png|center|500x500px]]
 
|-
 
|<syntaxhighlight lang="python3">
 
sns.pairplot(iris,hue='species',palette='rainbow')
 
</syntaxhighlight>
 
|[[File:Seaborn grids5.png|center|500x500px]]
 
|-
 
! rowspan="4" style="vertical-align:top;" |<h4 style="text-align:left">Facet Grid</h4>
 
|
 
|FacetGrid is the general way to create grids of plots based off of a feature:<syntaxhighlight lang="python3">
 
tips = sns.load_dataset('tips')
 
# tips.head()
 
    total_bill    tip      sex  smoker  day    time  size
 
0        16.99  1.01    Female      No  Sun  Dinner    2
 
1        10.34  1.66      Male      No  Sun  Dinner    3
 
2        21.01  3.50      Male      No  Sun  Dinner    3
 
3        23.68  3.31      Male      No  Sun  Dinner    2
 
4        24.59  3.61    Female      No  Sun  Dinner    4
 
</syntaxhighlight><br />
 
|
 
|-
 
| rowspan="3" |'''<code>sns.FacetGrid()</code>'''
 
|<syntaxhighlight lang="python3">
 
# Just the Grid
 
g = sns.FacetGrid(tips, col="time", row="smoker")
 
</syntaxhighlight>
 
|[[File:Seaborn grids6.png|center]]
 
|-
 
|<syntaxhighlight lang="python3">
 
g = sns.FacetGrid(tips, col="time",  row="smoker")
 
g = g.map(plt.hist, "total_bill")
 
</syntaxhighlight>
 
|[[File:Seaborn grids7.png|center]]
 
|-
 
|<syntaxhighlight lang="python3">
 
g = sns.FacetGrid(tips, col="time",  row="smoker",hue='sex')
 
# Notice hwo the arguments come after plt.scatter call
 
g = g.map(plt.scatter, "total_bill", "tip").add_legend()
 
</syntaxhighlight>
 
|[[File:Seaborn grids8.png|center]]
 
|-
 
! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">JointGri</h4>
 
| rowspan="2" |'''<code>sns.JointGrid()</code>'''
 
|JointGrid is the general version for jointplot() type grids, for a quick example:<syntaxhighlight lang="python3">
 
g = sns.JointGrid(x="total_bill", y="tip", data=tips)
 
</syntaxhighlight><br />
 
|[[File:Seaborn grids9.png|center]]
 
|-
 
|<syntaxhighlight lang="python3">
 
g = sns.JointGrid(x="total_bill", y="tip", data=tips)
 
g = g.plot(sns.regplot, sns.distplot)
 
</syntaxhighlight>
 
|[[File:Seaborn grids10.png|center]]
 
|}
 
 
 
 
 
<br />
 
===Regression plots===
 
Seaborn has many built-in capabilities for regression plots, however we won't really discuss regression until the machine learning section of the course, so we will only cover the '''<code>lmplot()</code>''' function for now.
 
 
 
'''<code>lmplot</code>''' allows you to display linear models, but it also conveniently allows you to split up those plots based off of features, as well as coloring the hue based off of features.
 
{| class="wikitable"
 
|-
 
| colspan="4" |<syntaxhighlight lang="python3">
 
import seaborn as sns
 
%matplotlib inline
 
 
 
tips = sns.load_dataset('tips')
 
 
 
tips.head()
 
# Output:
 
    total_bill    tip    sex  smoker  day    time  size
 
0        16.99  1.01  Female      No  Sun  Dinner    2
 
1        10.34  1.66    Male      No  Sun  Dinner    3
 
2        21.01  3.50    Male      No  Sun  Dinner    3
 
3        23.68  3.31    Male      No  Sun  Dinner    2
 
4        24.59  3.61  Female      No  Sun  Dinner    4
 
 
 
</syntaxhighlight>
 
|-
 
! colspan="2" |
 
!Description/Example
 
!Output/Figure
 
|-
 
! rowspan="8" style="vertical-align:top;" |<h4 style="text-align:left">The lmplot() function</h4>
 
| rowspan="3" |
 
|<syntaxhighlight lang="python3">
 
sns.lmplot(x='total_bill',y='tip',data=tips)
 
</syntaxhighlight><br />
 
 
 
|[[File:Seaborn regression plots1.png|center|300x300px]]
 
|-
 
|<syntaxhighlight lang="python3">
 
sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex')
 
</syntaxhighlight>
 
|[[File:Seaborn regression plots2.png|center|300x300px]]
 
|-
 
|<syntaxhighlight lang="python3">
 
sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex',palette='coolwarm')
 
</syntaxhighlight>
 
|[[File:Seaborn regression plots3.png|center|300x300px]]
 
|-
 
|style="vertical-align:top;" |<h5 style="text-align:left">Working with Markers</h5>
 
|lmplot kwargs get passed through to '''regplot''' which is a more general form of lmplot(). regplot has a scatter_kws parameter that gets passed to plt.scatter. So you want to set the s parameter in that dictionary, which corresponds (a bit confusingly) to the squared markersize. In other words you end up passing a dictionary with the base matplotlib arguments, in this case, s for size of a scatter plot. In general, you probably won't remember this off the top of your head, but instead reference the documentation.<syntaxhighlight lang="python3">
 
# http://matplotlib.org/api/markers_api.html
 
sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex',palette='coolwarm',
 
          markers=['o','v'],scatter_kws={'s':100})
 
</syntaxhighlight><br />
 
|[[File:Seaborn regression plots4.png|center|300x300px]]
 
|-
 
| rowspan="3" style="vertical-align:top;" |<h5 style="text-align:left">Using a Grid</h5>
 
|We can add more variable separation through columns and rows with the use of a grid. Just indicate this with the col or row arguments:<syntaxhighlight lang="python3">
 
sns.lmplot(x='total_bill',y='tip',data=tips,col='sex')
 
</syntaxhighlight><br />
 
|[[File:Seaborn regression plots5.png|center|300x300px]]
 
|-
 
|<syntaxhighlight lang="python3">
 
sns.lmplot(x="total_bill", y="tip", row="sex", col="time",data=tips)
 
</syntaxhighlight>
 
|[[File:Seaborn regression plots6.png|center|300x300px]]
 
|-
 
| colspan="2" |<syntaxhighlight lang="python3">
 
sns.lmplot(x='total_bill',y='tip',data=tips,col='day',hue='sex',palette='coolwarm')
 
</syntaxhighlight>[[File:Seaborn regression plots7.png|center|778x778px]]
 
|-
 
|style="vertical-align:top;" |<h5 style="text-align:left">Aspect and Size</h5>
 
| colspan="2" |Seaborn figures can have their size and aspect ratio adjusted with the '''size''' and '''aspect''' parameters:<syntaxhighlight lang="python3">
 
sns.lmplot(x='total_bill',y='tip',data=tips,col='day',hue='sex',palette='coolwarm',
 
          aspect=0.6,size=8)
 
</syntaxhighlight><br />[[File:Seaborn_regression_plots8.png|center|778x778px]]
 
|}
 
 
 
 
 
<br />
 
===Style and Color===
 
Check out the documentation page for more info on these topics: https://stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html
 
 
 
<br />
 
{| class="wikitable"
 
|-
 
| colspan="4" |<syntaxhighlight lang="python3">
 
import seaborn as sns
 
import matplotlib.pyplot as plt
 
%matplotlib inline
 
tips = sns.load_dataset('tips')
 
</syntaxhighlight>
 
|-
 
!
 
!Method/Operator
 
!Description/Example
 
!Output/Figure
 
|-
 
! rowspan="3" style="vertical-align:top;" |<h4 style="text-align:left">Styles</h4>
 
| rowspan="3" |'''<code>sns.set_style()</code>'''
 
|<syntaxhighlight lang="python3">
 
sns.countplot(x='sex',data=tips)
 
</syntaxhighlight><br />
 
 
 
|[[File:Seaborn_categorical3.png|center|300x300px]]
 
|-
 
|You can set particular styles:<syntaxhighlight lang="python3">
 
sns.set_style('white')
 
sns.countplot(x='sex',data=tips)
 
</syntaxhighlight>
 
|[[File:Seaborn_Style_and_Color2.png|center|300x300px]]
 
|-
 
|<syntaxhighlight lang="python3">
 
sns.set_style('ticks')
 
sns.countplot(x='sex',data=tips,palette='deep')
 
</syntaxhighlight>
 
|[[File:Seaborn_Style_and_Color3.png|center|300x300px]]
 
|-
 
! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">Spine Removal</h4>
 
| rowspan="2" |'''<code>sns.despine()</code>'''
 
|<syntaxhighlight lang="python3">
 
sns.countplot(x='sex',data=tips)
 
sns.despine()
 
</syntaxhighlight>
 
|[[File:Seaborn Style and Color4.png|center|300x300px]]
 
|-
 
|<syntaxhighlight lang="python3">
 
sns.countplot(x='sex',data=tips)
 
sns.despine(left=True)
 
</syntaxhighlight>
 
|[[File:Seaborn Style and Color5.png|center|300x300px]]
 
|-
 
! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">Size and Aspect</h4>
 
|style="vertical-align:top;" |<h5 style="text-align:left">Size</h5>
 
 
 
'''<code>plt.figure(figsize=())</code>'''
 
|You can use matplotlib's '''''<code>plt.figure(figsize=(width,height</code>''''' to change the size of most seaborn plots.
 
 
 
You can control the size and aspect ratio of most seaborn grid plots by passing in parameters: size, and aspect. For example:<syntaxhighlight lang="python3">
 
# Non Grid Plot
 
plt.figure(figsize=(12,3))
 
sns.countplot(x='sex',data=tips)
 
</syntaxhighlight><br />
 
|[[File:Seaborn Style and Color6.png|center|400x400px]]
 
|-
 
|style="vertical-align:top;" |<h5 style="text-align:left">Grid Type</h5>
 
|<syntaxhighlight lang="python3">
 
# Grid Type Plot
 
sns.lmplot(x='total_bill',y='tip',size=2,aspect=4,data=tips)
 
</syntaxhighlight>
 
|[[File:Seaborn Style and Color7.png|center|400x400px]]
 
|-
 
!style="vertical-align:top;" |<h4 style="text-align:left">Scale and Context</h4>
 
|<code>'''set_context()'''</code>
 
|The <code>'''set_context()'''</code> allows you to override default parameters:<syntaxhighlight lang="python3">
 
sns.set_context('poster',font_scale=4)
 
sns.countplot(x='sex',data=tips,palette='coolwarm')
 
</syntaxhighlight><br />
 
|[[File:Seaborn Style and Color8.png|center|400x400px]]
 
|}
 
 
 
 
 
<br />
 
 
 
==Plotly and Cufflinks Data Visualization==
 
Plotly is a library that allows you to create interactive plots that you can use in dashboards or websites (you can save them as html files or static images).<br />
 
 
 
Check out the plotly.py documentation and gallery to learn more: https://plot.ly/python/
 
 
 
<code>Plotly</code> plots can be easily saved online and shared at https://chart-studio.plot.ly. Take a look at this example: https://chart-studio.plot.ly/~jackp/671/average-effective-tax-rates-by-income-percentiles-1960-2004/#/
 
 
 
 
 
<br />
 
===Installation===
 
In order for this all to work, you'll need to install '''<code>plotly</code>''' and '''<code>cufflinks</code>''' to call plots directly off of a pandas dataframe. '''<code>Cufflinks</code>''' is not currently available through '''conda''' but available through '''pip'''. Install the libraries at your command line/terminal using:<syntaxhighlight lang="shell">
 
pip install plotly
 
pip install cufflinks
 
</syntaxhighlight><br />
 
 
 
 
 
<br />
 
===Imports and Set-up===
 
<syntaxhighlight lang="python3">
 
import pandas as pd
 
import numpy as np
 
%matplotlib inline
 
 
 
from plotly import __version__
 
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
 
print(__version__) # requires version >= 1.9.0
 
 
 
import cufflinks as cf
 
 
 
# For Notebooks
 
init_notebook_mode(connected=True)
 
 
 
# For offline use
 
cf.go_offline()
 
</syntaxhighlight><br />
 
 
 
 
 
<br />
 
===Data===
 
<syntaxhighlight lang="python3">
 
df = pd.DataFrame(np.random.randn(100,4),columns='A B C D'.split())
 
df2 = pd.DataFrame({'Category':['A','B','C'],'Values':[32,43,50]})
 
 
 
df.head()
 
# Output:
 
          A          B          C          D
 
0  1.878725    0.688719    1.066733    0.543956
 
1  0.028734    0.104054    0.048176    1.842188
 
2  -0.158793    0.387926  -0.635371  -0.637558
 
3  -1.221972    1.393423  -0.299794  -1.113622
 
4  1.253152  -0.537598    0.302917  -2.546083
 
 
 
df2.head()
 
# Output:
 
    Category  Values
 
0          A      32
 
1          B      43
 
2          C      50
 
</syntaxhighlight>
 
 
 
 
 
<br />
 
{| class="wikitable"
 
|-
 
!
 
!Method/Operator
 
!Description/Example
 
!Output/Figure
 
|-
 
! rowspan="8" style="vertical-align:top;" |<h3 style="text-align:left">Using Cufflinks and iplot()</h3>
 
| style="vertical-align:top;" |'''Scatter'''
 
|<syntaxhighlight lang="python3">
 
df.iplot(kind='scatter',x='A',y='B',mode='markers',size=10)
 
</syntaxhighlight>https://plot.ly/~adeloaleman/15
 
 
 
|[[File:Plotly1.png|center|490x490px]]https://plot.ly/~adeloaleman/15
 
|-
 
| style="vertical-align:top;" |<h4 style="text-align:left">Bar Plots</h4>
 
|<syntaxhighlight lang="python3">
 
df2.iplot(kind='bar',x='Category',y='Values')
 
</syntaxhighlight>https://plot.ly/~adeloaleman/13
 
|[[File:Plotly2.png|center|490x490px]]https://plot.ly/~adeloaleman/13
 
|-
 
| style="vertical-align:top;" |<h4 style="text-align:left">Boxplots</h4>
 
|<syntaxhighlight lang="python3">
 
df.iplot(kind='box')
 
</syntaxhighlight>https://plot.ly/~adeloaleman/11
 
|[[File:Plotly3.png|center|490x490px]]https://plot.ly/~adeloaleman/11
 
|-
 
| style="vertical-align:top;" |<h4 style="text-align:left">3d Surface</h4>
 
|<syntaxhighlight lang="python3">
 
df3 = pd.DataFrame({'x':[1,2,3,4,5],'y':[10,20,30,20,10],'z':[5,4,3,2,1]})
 
df3.iplot(kind='surface',colorscale='rdylbu')
 
</syntaxhighlight>https://plot.ly/~adeloaleman/17
 
|[[File:Plotly4.png|center|490x490px]]https://plot.ly/~adeloaleman/17
 
|-
 
| style="vertical-align:top;" |<h4 style="text-align:left">Spread</h4>
 
|<syntaxhighlight lang="python3">
 
df[['A','B']].iplot(kind='spread')
 
</syntaxhighlight>https://plot.ly/~adeloaleman/19
 
|[[File:Plotly5.png|center|490x490px]]https://plot.ly/~adeloaleman/19
 
|-
 
| style="vertical-align:top;" |<h4 style="text-align:left">Histogram</h4>
 
|<syntaxhighlight lang="python3">
 
df['A'].iplot(kind='hist',bins=25)
 
</syntaxhighlight>https://plot.ly/~adeloaleman/21
 
|[[File:Plotly6.png|center|490x490px]]https://plot.ly/~adeloaleman/21
 
|-
 
| style="vertical-align:top;" |<h4 style="text-align:left">Bubble</h4>
 
|<syntaxhighlight lang="python3">
 
df.iplot(kind='bubble',x='A',y='B',size='C')
 
</syntaxhighlight>https://plot.ly/~adeloaleman/23
 
|[[File:Plotly7.png|center|490x490px]]https://plot.ly/~adeloaleman/23
 
|-
 
| style="vertical-align:top;" |<h4 style="text-align:left">Scatter_matrix</h4>
 
 
 
|<syntaxhighlight lang="python3">
 
df.scatter_matrix()
 
 
 
# Similar to sns.pairplot()
 
</syntaxhighlight>https://plot.ly/~adeloaleman/25
 
|[[File:Plotly8.png|center|490x490px]]https://plot.ly/~adeloaleman/25
 
|}
 
 
 
 
 
<br />
 
 
 
==Word cloud==
 
https://github.com/amueller/word_cloud
 
 
 
 
 
In Dash:
 
* https://community.plot.ly/t/wordcloud-in-dash/11407/4
 
:: https://community.plot.ly/t/show-and-tell-wordcloudworld-com/15649
 
::: https://github.com/mikesmith1611/word-cloud-world
 
:::: http://www.wordcloudworld.com/
 
 
 
 
 
* https://community.plot.ly/t/solved-is-it-possible-to-make-a-wordcloud-in-dash/4565
 
 
 
 
 
<br />
 
===Installation===
 
Using pip:
 
pip install wordcloud
 
 
 
 
 
Using conda:
 
 
 
https://anaconda.org/conda-forge/wordcloud
 
 
 
conda install -c conda-forge wordcloud
 
 
 
 
 
'''Installation notes:'''
 
 
 
<code>wordcloud</code> depends on <code>numpy</code> and <code>pillow</code>.
 
 
 
 
 
To save the <code>wordcloud</code> into a file, <code>matplotlib</code> can also be installed.
 
 
 
 
 
<br />
 
===Minimal example===
 
Can be run in jupyter-notebook:
 
<syntaxhighlight lang="python3">
 
"""
 
Minimal Example
 
===============
 
Generating a square wordcloud from the US constitution using default arguments.
 
"""
 
 
 
import os
 
 
 
from os import path
 
from wordcloud import WordCloud
 
 
 
# get data directory (using getcwd() is needed to support running example in generated IPython notebook)
 
d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()
 
 
 
# Read the whole text.
 
text = open(path.join(d, 'constitution.txt')).read()
 
 
 
# Generate a word cloud image
 
wordcloud = WordCloud().generate(text)
 
 
 
# Display the generated image:
 
# the matplotlib way:
 
import matplotlib.pyplot as plt
 
plt.imshow(wordcloud, interpolation='bilinear')
 
plt.axis("off")
 
 
 
# lower max_font_size
 
wordcloud = WordCloud(max_font_size=40).generate(text)
 
plt.figure()
 
plt.imshow(wordcloud, interpolation="bilinear")
 
plt.axis("off")
 
plt.show()
 
 
 
# The pil way (if you don't have matplotlib)
 
# image = wordcloud.to_image()
 
# image.show()
 
</syntaxhighlight>
 
  
  
Line 1,660: Line 127:
  
 
<br />
 
<br />
 
 
==[[Dash - Plotly]]==
 
==[[Dash - Plotly]]==
  

Revision as of 12:10, 21 June 2020

For a standard Python tutorial go to Python



Courses

  • Udemy - Python for Data Science and Machine Learning Bootcamp
https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/



Anaconda

Anaconda is a free and open source distribution of the Python and R programming languages for data science and machine learning related applications (large-scale data processing, predictive analytics, scientific computing), that aims to simplify package management and deployment. Package versions are managed by the package management system conda. https://en.wikipedia.org/wiki/Anaconda_(Python_distribution)

En otras palabras, Anaconda puede ser visto como un paquete (a distribution) que incluye no solo Python (or R) but many libraries that are used in Data Science, as well as its own virtual environment system. It's an "all-in-one" install that is extremely popular in data science and Machine Learning.Creating sample array for the following examples:



Installation

Installation from the official Anaconda Web site: https://docs.anaconda.com/anaconda/install/


https://linuxize.com/post/how-to-install-anaconda-on-ubuntu-18-04/

https://www.digitalocean.com/community/tutorials/how-to-install-the-anaconda-python-distribution-on-ubuntu-18-04



Anaconda comes with a few IDE

  • Jupyter Lab
  • Jupyter Notebook
  • Spyder
  • Qtconsole
  • and others



Anaconda Navigator

Anaconda Navigator is a GUI that helps you to easily start important applications and manage the packages in your local Anaconda installation

You can open the Anaconda Navigator from the Terminal:

anaconda-navigator



Jupyter

Jupyter comes with Anaconda.

  • It is a development environment (IDE) where we can write codes; but it also allows us to display images, and write down markdown notes.
  • It is the most popular IDE in data science for exploring and analyzing data.
  • Other famoues IDE for Python are Sublime Text and PyCharm.
  • There is Jupyter Lab and Jupyter Notebook



Online Jupyter

There are many sites that provides solutions to run your Jupyter Notebook in the cloud: https://www.dataschool.io/cloud-services-for-jupyter-notebook/

I have tried:

https://cocalc.com/projects/595bf475-61a7-47fa-af69-ba804c3f23f9/files/?session=default
Parece bueno, pero tiene opciones que no son gratis


https://www.kaggle.com/adeloaleman/kernel1917a91630/edit
Parece bueno pero no encontré la forma adicionar una TOC


Es el que estoy utilizando ahora



Some remarks


Executing Terminal Commands in Jupyter Notebooks

https://support.anaconda.com/hc/en-us/articles/360023858254-Executing-Terminal-Commands-in-Jupyter-Notebooks

If we are in the Notebook, and we want to run a shell command rather than a notebook command we use the !

Try, for example:

!ls or
!pwd

It's the same as if you opened up a terminal and typed it without the !



Creating Presentations in Jupyter Notebook with RevealJS


Most popular Python Data Science Libraries

  • NumPy
  • SciPy
  • Pandas
  • Seaborn
  • SciKit'Learn
  • MatplotLib
  • Plotly
  • PySpartk



Data Visualization with Python


Natural Language Processing


Dash - Plotly


Scrapy