|
|
Line 119: |
Line 119: |
| | | |
| <br /> | | <br /> |
− | | + | ==[[Data Visualization with Python]]== |
− | ==Pandas Built-in Data Visualization== | |
− | In this lecture we will learn about pandas built-in capabilities for data visualization! It's built-off of '''<code>matplotlib</code>''', but it baked into pandas for easier usage!
| |
− | | |
− | Hopefully you can see why this method of plotting will be a lot easier to use than full-on matplotlib, it balances ease of use with control over the figure. A lot of the plot calls also accept additional arguments of their parent matplotlib plt. call.<br />
| |
− | | |
− | '''The data we'll use in this part:'''
| |
− | | |
− | *[[:File:Df1.csv]]
| |
− | *[[:File:Df2.csv]]
| |
− | *[[:File:Df3.csv]]
| |
− | | |
− | <br />
| |
− | {| class="wikitable"
| |
− | |-
| |
− | | colspan="4" |<syntaxhighlight lang="python3">
| |
− | import numpy as np
| |
− | import pandas as pd
| |
− | %matplotlib inline
| |
− | | |
− | df1 = pd.read_csv('Df1.csv',index_col=0)
| |
− | df2 = pd.read_csv('Df2.csv')
| |
− | </syntaxhighlight>
| |
− | |-
| |
− | !
| |
− | !Method/Operator
| |
− | !Description/Example
| |
− | !Output/Figure
| |
− | |-
| |
− | ! rowspan="5" style="vertical-align:top;" |<h4 style="text-align:left">[https://matplotlib.org/gallery.html#style_sheets Style Sheets]</h4>
| |
− | | rowspan="5" style="vertical-align:top;"|'''<code>plt.style.use(<nowiki>''</nowiki>)</code>'''
| |
− | |Matplotlib has style sheets you can use to make your plots look a little nicer. These style sheets include plot_bmh,plot_fivethirtyeight,plot_ggplot and more. They basically create a set of style rules that your plots follow. I recommend using them, they make all your plots have the same look and feel more professional. You can even create your own if you want your company's plots to all have the same look (it is a bit tedious to create on though).
| |
− | Here is how to use them.
| |
− | | |
− | '''Before plt.style.use() your plots look like this:'''<syntaxhighlight lang="python3">
| |
− | df1['A'].hist()
| |
− | </syntaxhighlight><br />
| |
− | | |
− | |[[File:PandasBuilt-inData visualization1.png|center]]
| |
− | |-
| |
− | |'''Call the style:'''<syntaxhighlight lang="python3">
| |
− | import matplotlib.pyplot as plt
| |
− | plt.style.use('ggplot')
| |
− | </syntaxhighlight>Now your plots look like this:<syntaxhighlight lang="python3">
| |
− | df1['A'].hist()
| |
− | </syntaxhighlight><br />
| |
− | |[[File:PandasBuilt-inData visualization1.png|center]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | plt.style.use('bmh')
| |
− | df1['A'].hist()
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization3.png|center]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | plt.style.use('dark_background')
| |
− | df1['A'].hist()
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization4.png|center]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | plt.style.use('fivethirtyeight')
| |
− | df1['A'].hist()
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization5.png|center]]
| |
− | |-
| |
− | ! rowspan="13" style="vertical-align:top;" |<h4 style="text-align:left">Plot Types</h4>
| |
− | | style="vertical-align:top;" |
| |
− | |There are several plot types built-in to pandas, most of them statistical plots by nature:
| |
− | | |
− | *<code>df.plot.area</code>, <code>df.plot.barh</code>, <code>df.plot.density</code>, <code>df.plot.hist</code>, <code>df.plot.line</code>, <code>df.plot.scatter</code>, <code>df.plot.bar</code>, <code>df.plot.box</code>, <code>df.plot.hexbin</code>, <code>df.plot.kde</code>, <code>df.plot.pie</code>
| |
− | |
| |
− | |-
| |
− | | style="vertical-align:top;" |<h5 style="text-align:left">Area</h5>
| |
− | | |
− | <code>df.plot.area</code>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df2.plot.area(alpha=0.4)
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization6.png|center]]
| |
− | |-
| |
− | | rowspan="2" style="vertical-align:top;" |<h5 style="text-align:left">Barplots</h5>
| |
− | <code>df.plot.bar()</code>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df2.plot.bar()
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization7.png|center]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | df2.plot.bar(stacked=True)
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization8.png|center]]
| |
− | |-
| |
− | | style="vertical-align:top;" |<h5 style="text-align:left">Histograms</h5>
| |
− | <code>df.plot.hist()</code>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df1['A'].plot.hist(bins=50)
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization9.png|center]]
| |
− | |-
| |
− | | style="vertical-align:top;" |<h5 style="text-align:left">Line Plots</h5>
| |
− | <code>df.plot.line()</code>
| |
− | | colspan="2" |<syntaxhighlight lang="python3">
| |
− | df1.plot.line(x=df1.index,y='B',figsize=(12,3),lw=1)
| |
− | </syntaxhighlight>[[File:PandasBuilt-inData visualization10.png|center]]
| |
− | |-
| |
− | | rowspan="3" style="vertical-align:top;" |<h5 style="text-align:left">Scatter Plots</h5>
| |
− | <code>df.plot.scatter()</code>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df1.plot.scatter(x='A',y='B')
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization11.png|center]]
| |
− | |-
| |
− | |You can use c to color based off another column value Use cmap to indicate colormap to use. For all the colormaps, check out: http://matplotlib.org/users/colormaps.html<br /><syntaxhighlight lang="python3">
| |
− | df1.plot.scatter(x='A',y='B',c='C',cmap='coolwarm')
| |
− | </syntaxhighlight><br />
| |
− | |[[File:PandasBuilt-inData visualization12.png|center]]
| |
− | |-
| |
− | |Or use s to indicate size based off another column. s parameter needs to be an array, not just the name of a column:<br /><syntaxhighlight lang="python3">
| |
− | df1.plot.scatter(x='A',y='B',s=df1['C']*200)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:PandasBuilt-inData visualization13.png|center]]
| |
− | |-
| |
− | | style="vertical-align:top;" |<h5 style="text-align:left">BoxPlots</h5>
| |
− | <code>df.plot.box()</code>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df2.plot.box() # Can also pass a by= argument for groupby
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization14.png|center]]
| |
− | |-
| |
− | | style="vertical-align:top;" |<h5 style="text-align:left">Hexagonal Bin Plot</h5>
| |
− | <code>df.plot.hexbin()</code>
| |
− | |Useful for Bivariate Data, alternative to scatterplot:<syntaxhighlight lang="python3">
| |
− | df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b'])
| |
− | df.plot.hexbin(x='a',y='b',gridsize=25,cmap='Oranges')
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization15.png|center]]
| |
− | |-
| |
− | | rowspan="2" style="vertical-align:top;" |<h5 style="text-align:left">Kernel Density Estimation plot (KDE)</h5>
| |
− | <code>df2.plot.kde()</code>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df2['a'].plot.kde()
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization16.png|center]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | df2.plot.density()
| |
− | </syntaxhighlight>
| |
− | |[[File:PandasBuilt-inData visualization17.png|center]]
| |
− | |}
| |
− | | |
− | | |
− | <br />
| |
− | | |
− | ==Data Visualization with Matplotlib==
| |
− | Matplotlib is the "grandfather" library of data visualization with Python. It was created by John Hunter. He created it to try to replicate MatLab's (another programming language) plotting capabilities in Python. So if you happen to be familiar with matlab, matplotlib will feel natural to you.
| |
− | | |
− | It is an excellent 2D and 3D graphics library for generating scientific figures.
| |
− | | |
− | ahora
| |
− | '''Some of the major Pros of Matplotlib are:'''
| |
− | | |
− | *Generally easy to get started for simple plots
| |
− | *Support for custom labels and texts
| |
− | *Great control of every element in a figure
| |
− | *High-quality output in many formats
| |
− | *Very customizable in general
| |
− | | |
− | | |
− | '''References:'''
| |
− | | |
− | *The project web page for matplotlib: http://www.matplotlib.org
| |
− | *The source code for matplotlib: https://github.com/matplotlib/matplotlib
| |
− | *<span style="background:#D8BFD8">A large gallery showcaseing various types of plots matplotlib can create. Highly recommended!:</span> http://matplotlib.org/gallery.html
| |
− | *A good matplotlib tutorial: http://www.loria.fr/~rougier/teaching/matplotlib
| |
− | | |
− | | |
− | But most likely you'll be passing numpy arrays or pandas columns (which essentially also behave like arrays). However, you can also use lists.
| |
− | | |
− | | |
− | Matplotlib allows you to create reproducible figures programmatically. Let's learn how to use it! Before continuing this lecture, I encourage you just to explore the official Matplotlib web page: http://matplotlib.org/
| |
− | | |
− | | |
− | <br />
| |
− | ===Installation===
| |
− | <syntaxhighlight lang="python">
| |
− | conda install matplotlib
| |
− | </syntaxhighlight>
| |
− | | |
− | Or without conda:
| |
− | <syntaxhighlight lang="python">
| |
− | pip install matplotlib
| |
− | </syntaxhighlight>
| |
− |
| |
− | | |
− | '''Importing:'''
| |
− | <syntaxhighlight lang="python">
| |
− | import matplotlib.pyplot as plt
| |
− | </syntaxhighlight>
| |
− | | |
− | | |
− | '''You'll also need to use this line to see plots in the notebook:'''
| |
− | <syntaxhighlight lang="python">
| |
− | %matplotlib inline
| |
− | </syntaxhighlight>
| |
− | That line is only for jupyter notebooks, if you are using another editor, you'll use: '''<code>plt.show()</code>''' at the end of all your plotting commands to have the figure pop up in another window.
| |
− | | |
− | <br />
| |
− | {| class="wikitable"
| |
− | |-
| |
− | | colspan="4" |Array example:<syntaxhighlight lang="python3">
| |
− | import numpy as np
| |
− | x = np.linspace(0, 5, 11)
| |
− | y = x ** 2
| |
− | | |
− | x
| |
− | # Output:
| |
− | array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
| |
− | | |
− | y
| |
− | # Output:
| |
− | array([ 0. , 0.25, 1. , 2.25, 4. , 6.25, 9. , 12.25, 16. ,
| |
− | 20.25, 25. ])
| |
− | </syntaxhighlight>
| |
− | |-
| |
− | !
| |
− | !
| |
− | !Description/Example
| |
− | !Output/Figure
| |
− | |-
| |
− | ! rowspan="2" style="vertical-align:top;" |<h3 style="text-align:left">Basic example</h3>
| |
− | |
| |
− | |<syntaxhighlight lang="python3">
| |
− | plt.plot(x, y, 'r') # 'r' is the color red
| |
− | plt.xlabel('X Axis Title Here')
| |
− | plt.ylabel('Y Axis Title Here')
| |
− | plt.title('String Title Here')
| |
− | plt.show()
| |
− | </syntaxhighlight>
| |
− | |[[File:Matplotlib1.png|400px|thumb]]
| |
− | |-
| |
− | |style="vertical-align:top;"|<h4 style="text-align:left">Creating Multiplots on Same Canvas</h4>
| |
− | |<syntaxhighlight lang="python3">
| |
− | # plt.subplot(nrows, ncols, plot_number)
| |
− | plt.subplot(1,2,1)
| |
− | plt.plot(x, y, 'r--') # More on color options later
| |
− | plt.subplot(1,2,2)
| |
− | plt.plot(y, x, 'g*-');
| |
− | </syntaxhighlight>
| |
− | |[[File:Matplotlib2.png|400px|thumb]]
| |
− | |-
| |
− | ! rowspan="17" style="vertical-align:top;" |<h3 style="text-align:left; vertical-align: text-top;">Matplotlib Object Oriented Method</h3>
| |
− | |
| |
− | |Now that we've seen the basics, let's break it all down with a more formal introduction of Matplotlib's Object Oriented API. This means we will instantiate figure objects and then call methods or attributes from that object.
| |
− | | |
− | The main idea in using the more formal Object Oriented method is to create figure objects and then just call methods or attributes off of that object. This approach is nicer when dealing with a canvas that has multiple plots on it.
| |
− | | |
− | To begin we create a figure instance. Then we can add axes to that figure:<syntaxhighlight lang="python3">
| |
− | # Create Figure (empty canvas)
| |
− | fig = plt.figure()
| |
− | | |
− | # Add set of axes to figure
| |
− | axes = fig.add_axes([0.1, 0.1, 0.8, 0.8]) # left, bottom, width, height (range 0 to 1)
| |
− | | |
− | # Plot on that set of axes
| |
− | axes.plot(x, y, 'b')
| |
− | axes.set_xlabel('Set X Label') # Notice the use of set_ to begin methods
| |
− | axes.set_ylabel('Set y Label')
| |
− | axes.set_title('Set Title')
| |
− | </syntaxhighlight>
| |
− | |[[File:Matplotlib3.png|thumb|372x372px]]
| |
− | |-
| |
− | |
| |
− | |Code is a little more complicated, but the advantage is that we now have full control of where the plot axes are placed, and we can easily add more than one axis to the figure:<syntaxhighlight lang="python3">
| |
− | # Creates blank canvas
| |
− | fig = plt.figure()
| |
− | | |
− | axes1 = fig.add_axes([0.1, 0.1, 0.8, 0.8]) # main axes
| |
− | axes2 = fig.add_axes([0.2, 0.5, 0.4, 0.3]) # inset axes
| |
− | | |
− | # Larger Figure Axes 1
| |
− | axes1.plot(x, y, 'b')
| |
− | axes1.set_xlabel('X_label_axes2')
| |
− | axes1.set_ylabel('Y_label_axes2')
| |
− | axes1.set_title('Axes 2 Title')
| |
− | | |
− | # Insert Figure Axes 2
| |
− | axes2.plot(y, x, 'r')
| |
− | axes2.set_xlabel('X_label_axes2')
| |
− | axes2.set_ylabel('Y_label_axes2')
| |
− | axes2.set_title('Axes 2 Title');
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib4.png|thumb|372x372px]]
| |
− | |-
| |
− | | rowspan="4" style="vertical-align:top;"|<h4 style="text-align:left"><code>subplots()</code></h4>
| |
− | |'''The plt.subplots() object will act as a more automatic axis manager:'''<syntaxhighlight lang="python3">
| |
− | # Use similar to plt.figure() except use tuple unpacking to grab fig and axes
| |
− | fig, axes = plt.subplots()
| |
− | | |
− | # Now use the axes object to add stuff to plot
| |
− | axes.plot(x, y, 'r')
| |
− | axes.set_xlabel('x')
| |
− | axes.set_ylabel('y')
| |
− | axes.set_title('title');
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib5.png|thumb|372x372px]]
| |
− | |-
| |
− | |'''Then you can specify the number of rows and columns when creating the subplots() object:'''<syntaxhighlight lang="python3">
| |
− | # Empty canvas of 1 by 2 subplots
| |
− | fig, axes = plt.subplots(nrows=1, ncols=2)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib6.png|thumb|372x372px]]
| |
− | |-
| |
− | |'''Axes is an array of axes to plot on:'''<syntaxhighlight lang="python3">
| |
− | axes
| |
− | # Output:
| |
− | array([<matplotlib.axes._subplots.AxesSubplot object at 0x111f0f8d0>,
| |
− | <matplotlib.axes._subplots.AxesSubplot object at 0x1121f5588>], dtype=object)
| |
− | </syntaxhighlight>'''We can iterate through this array:'''<syntaxhighlight lang="python3">
| |
− | for ax in axes:
| |
− | ax.plot(x, y, 'b')
| |
− | ax.set_xlabel('x')
| |
− | ax.set_ylabel('y')
| |
− | ax.set_title('title')
| |
− | | |
− | # Display the figure object
| |
− | fig
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib7.png|thumb|372x372px]]
| |
− | |-
| |
− | |A common issue with matplolib is overlapping subplots or figures. We ca use '''fig.tight_layout()''' or '''plt.tight_layout()''' method, which automatically adjusts the positions of the axes on the figure canvas so that there is no overlapping content:<syntaxhighlight lang="python3">
| |
− | fig, axes = plt.subplots(nrows=1, ncols=2)
| |
− | | |
− | for ax in axes:
| |
− | ax.plot(x, y, 'g')
| |
− | ax.set_xlabel('x')
| |
− | ax.set_ylabel('y')
| |
− | ax.set_title('title')
| |
− | | |
− | fig
| |
− | plt.tight_layout()
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib8.png|thumb|372x372px]]
| |
− | |-
| |
− | | rowspan="2" style="vertical-align:top;"|<h4 style="text-align:left">Figure size, aspect ratio and DPI</h4>
| |
− | |Matplotlib allows the aspect ratio, DPI and figure size to be specified when the Figure object is created. You can use the <code>figsize</code> and <code>dpi</code> keyword arguments.
| |
− | | |
− | *<code>figsize</code> is a tuple of the width and height of the figure in inches
| |
− | *<code>dpi</code> is the dots-per-inch (pixel per inch).
| |
− | | |
− | | |
− | | |
− | For example: <syntaxhighlight lang="python3">
| |
− | fig = plt.figure(figsize=(8,4), dpi=100)
| |
− | # Output:
| |
− | <Figure size 800x400 with 0 Axes>
| |
− | </syntaxhighlight><br />
| |
− | |
| |
− | |-
| |
− | |The same arguments can also be passed to layout managers, such as the <code>subplots</code> function:<syntaxhighlight lang="python3">
| |
− | fig, axes = plt.subplots(figsize=(12,3))
| |
− | | |
− | axes.plot(x, y, 'r')
| |
− | axes.set_xlabel('x')
| |
− | axes.set_ylabel('y')
| |
− | axes.set_title('title');
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib9.png|thumb|371x371px]]
| |
− | |-
| |
− | |style="vertical-align:top;"|<h4 style="text-align:left">Saving figures</h4>
| |
− | |Matplotlib can generate high-quality output in a number formats, including PNG, JPG, EPS, SVG, PGF and PDF.
| |
− | | |
− | | |
− | | |
− | To save a figure to a file we can use the <code>savefig</code> method in the <code>Figure</code> class:<syntaxhighlight lang="python3">
| |
− | fig.savefig("filename.png")
| |
− | </syntaxhighlight>
| |
− | | |
− | | |
− | | |
− | Here we can also optionally specify the DPI and choose between different output formats:<syntaxhighlight lang="python3">
| |
− | fig.savefig("filename.png", dpi=200)
| |
− | </syntaxhighlight><br />
| |
− | |
| |
− | |-
| |
− | | rowspan="4" style="vertical-align:top;"|<h4 style="text-align:left">Legends, labels and titles</h4>
| |
− | |'''Figure titles'''
| |
− | A title can be added to each axis instance in a figure. To set the title, use the <code>set_title</code> method in the axes instance:<syntaxhighlight lang="python3">
| |
− | ax.set_title("title");
| |
− | </syntaxhighlight><br />
| |
− | |
| |
− | |-
| |
− | |'''Axis labels'''
| |
− | Similarly, with the methods <code>set_xlabel</code> and <code>set_ylabel</code>, we can set the labels of the X and Y axes:<syntaxhighlight lang="python3">
| |
− | ax.set_xlabel("x")
| |
− | ax.set_ylabel("y");
| |
− | </syntaxhighlight><br />
| |
− | |
| |
− | |-
| |
− | |'''Legends'''
| |
− | You can use the '''label="label text"''' keyword argument when plots or other objects are added to the figure, and then using the '''legend''' method without arguments to add the legend to the figure:<syntaxhighlight lang="python3">
| |
− | fig = plt.figure()
| |
− | | |
− | ax = fig.add_axes([0,0,1,1])
| |
− | | |
− | ax.plot(x, x**2, label="x**2")
| |
− | ax.plot(x, x**3, label="x**3")
| |
− | ax.legend()
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib10.png|thumb|371x371px]]Notice how are legend overlaps some of the actual plot!
| |
− | |-
| |
− | |The '''legend''' function takes an optional keyword argument '''loc''' that can be used to specify where in the figure the legend is to be drawn. The allowed values of '''loc''' are numerical codes for the various places the legend can be drawn. See the documentation page for details. Some of the most common '''loc''' values are:<syntaxhighlight lang="python3">
| |
− | # Lots of options....
| |
− | | |
− | ax.legend(loc=1) # upper right corner
| |
− | ax.legend(loc=2) # upper left corner
| |
− | ax.legend(loc=3) # lower left corner
| |
− | ax.legend(loc=4) # lower right corner
| |
− | | |
− | # .. many more options are available
| |
− | | |
− | # Most common to choose
| |
− | ax.legend(loc=0) # let matplotlib decide the optimal location
| |
− | fig
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib11.png|thumb|371x371px]]<br />
| |
− | |-
| |
− | | rowspan="3" style="vertical-align:top;"|<h4 style="text-align:left">Setting colors, linewidths, linetypes</h4>
| |
− | |'''Colors with MatLab like syntax''':
| |
− | We can define the colors of lines and other graphical elements in a number of ways. First of all, we can use the MATLAB-like syntax where <code>'b'</code> means blue, <code>'g'</code> means green, etc. The MATLAB API for selecting line styles are also supported: where, for example, 'b.-' means a blue line with dots:<syntaxhighlight lang="python3">
| |
− | # MATLAB style line color and style
| |
− | fig, ax = plt.subplots()
| |
− | ax.plot(x, x**2, 'b.-') # blue line with dots
| |
− | ax.plot(x, x**3, 'g--') # green dashed line
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib12.png|thumb|371x371px]]<br />
| |
− | |-
| |
− | |'''Colors with the color= parameter''':
| |
− | We can also define colors by their names or RGB hex codes and optionally provide an alpha value using the <code>color</code> and <code>alpha</code> keyword arguments. Alpha indicates opacity.<syntaxhighlight lang="python3">
| |
− | fig, ax = plt.subplots()
| |
− | | |
− | ax.plot(x, x+1, color="blue", alpha=0.5) # half-transparant
| |
− | ax.plot(x, x+2, color="#8B008B") # RGB hex code
| |
− | ax.plot(x, x+3, color="#FF8C00") # RGB hex code
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib13.png|thumb|362x362px]]<br />
| |
− | |-
| |
− | |'''Line and marker styles''':
| |
− | To change the line width, we can use the <code>linewidth</code> or <code>lw</code> keyword argument. The line style can be selected using the <code>linestyle</code> or <code>ls</code> keyword arguments:<syntaxhighlight lang="python3">
| |
− | fig, ax = plt.subplots(figsize=(12,6))
| |
− | | |
− | ax.plot(x, x+1, color="red", linewidth=0.25)
| |
− | ax.plot(x, x+2, color="red", linewidth=0.50)
| |
− | ax.plot(x, x+3, color="red", linewidth=1.00)
| |
− | ax.plot(x, x+4, color="red", linewidth=2.00)
| |
− | | |
− | # possible linestype options ‘-‘, ‘–’, ‘-.’, ‘:’, ‘steps’
| |
− | ax.plot(x, x+5, color="green", lw=3, linestyle='-')
| |
− | ax.plot(x, x+6, color="green", lw=3, ls='-.')
| |
− | ax.plot(x, x+7, color="green", lw=3, ls=':')
| |
− | | |
− | # custom dash
| |
− | line, = ax.plot(x, x+8, color="black", lw=1.50)
| |
− | line.set_dashes([5, 10, 15, 10]) # format: line length, space length, ...
| |
− | | |
− | # possible marker symbols: marker = '+', 'o', '*', 's', ',', '.', '1', '2', '3', '4', ...
| |
− | ax.plot(x, x+ 9, color="blue", lw=3, ls='-', marker='+')
| |
− | ax.plot(x, x+10, color="blue", lw=3, ls='--', marker='o')
| |
− | ax.plot(x, x+11, color="blue", lw=3, ls='-', marker='s')
| |
− | ax.plot(x, x+12, color="blue", lw=3, ls='--', marker='1')
| |
− | | |
− | # marker size and color
| |
− | ax.plot(x, x+13, color="purple", lw=1, ls='-', marker='o', markersize=2)
| |
− | ax.plot(x, x+14, color="purple", lw=1, ls='-', marker='o', markersize=4)
| |
− | ax.plot(x, x+15, color="purple", lw=1, ls='-', marker='o', markersize=8, markerfacecolor="red")
| |
− | ax.plot(x, x+16, color="purple", lw=1, ls='-', marker='s', markersize=8,
| |
− | markerfacecolor="yellow", markeredgewidth=3, markeredgecolor="green");
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib14.png|thumb|362x362px]]<br />
| |
− | |-
| |
− | |style="vertical-align:top;"|<h4 style="text-align:left">Plot range</h4>
| |
− | |We can configure the ranges of the axes using the <code>set_ylim</code> and <code>set_xlim</code> methods in the axis object, or <code>axis('tight')</code> for automatically getting "tightly fitted" axes ranges:<syntaxhighlight lang="python3">
| |
− | fig, axes = plt.subplots(1, 3, figsize=(12, 4))
| |
− | | |
− | axes[0].plot(x, x**2, x, x**3)
| |
− | axes[0].set_title("default axes ranges")
| |
− | | |
− | axes[1].plot(x, x**2, x, x**3)
| |
− | axes[1].axis('tight')
| |
− | axes[1].set_title("tight axes")
| |
− | | |
− | axes[2].plot(x, x**2, x, x**3)
| |
− | axes[2].set_ylim([0, 60])
| |
− | axes[2].set_xlim([2, 5])
| |
− | axes[2].set_title("custom axes range");
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matplotlib15.png|thumb|362x362px]]<br />
| |
− | |-
| |
− | ! rowspan="4" style="vertical-align:top;"|<h3 style="text-align:left">Special Plot Types</h3>
| |
− | | colspan="3" |There are many specialized plots we can create, such as '''barplots''', '''histograms''', '''scatter plots''', and much more. Most of these type of plots we will actually create using seaborn, a statistical plotting library for Python. But here are a few examples of these type of plots:
| |
− | |-
| |
− | |style="vertical-align:top;"|<h4 style="text-align:left">Scatter plots</h4>
| |
− | |<syntaxhighlight lang="python3">
| |
− | plt.scatter(x,y)
| |
− | </syntaxhighlight>
| |
− | |[[File:Matplotlib16.png|thumb|362x362px]]<br />
| |
− | |-
| |
− | |style="vertical-align:top;"|<h4 style="text-align:left">Histograms</h4>
| |
− | |<syntaxhighlight lang="python3">
| |
− | from random import sample
| |
− | data = sample(range(1, 1000), 100)
| |
− | plt.hist(data)
| |
− | </syntaxhighlight>
| |
− | |[[File:Matplotlib17.png|thumb|362x362px]]<br />
| |
− | |-
| |
− | |style="vertical-align:top;"|<h4 style="text-align:left">Barplots</h4>
| |
− | |<syntaxhighlight lang="python3">
| |
− | data = [np.random.normal(0, std, 100) for std in range(1, 4)]
| |
− | | |
− | # rectangular box plot
| |
− | plt.boxplot(data,vert=True,patch_artist=True);
| |
− | </syntaxhighlight>
| |
− | |[[File:Matplotlib18.png|thumb|362x362px]]<br />
| |
− | |}
| |
− | | |
− | | |
− | | |
− | <br />
| |
− | === Advanced Matplotlib Concepts ===
| |
− | In this lecture we cover some more advanced topics which you won't usually use as often. You can always reference the documentation for more resources!
| |
− | | |
− | Forther reading:
| |
− | <br />
| |
− | {| class="wikitable"
| |
− | |-
| |
− | | colspan="4" |<syntaxhighlight lang="python3">
| |
− | import numpy as np
| |
− | x = np.linspace(0, 5, 11)
| |
− | y = x ** 2
| |
− | </syntaxhighlight>
| |
− | |-
| |
− | !
| |
− | !
| |
− | !Description/Example
| |
− | !Output/Figure
| |
− | |-
| |
− | !style="vertical-align:top;"|<h4 style="text-align:left">Logarithmec scale</h4>
| |
− | |
| |
− | |
| |
− | |
| |
− | |-
| |
− | ! rowspan="2" style="vertical-align:top;"|<h4 style="text-align:left">Placement of ticks and custom tick labels</h4>
| |
− | |
| |
− | |
| |
− | |
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">Scientific notation</h5>
| |
− | |
| |
− | |
| |
− | |-
| |
− | ! rowspan="2" style="vertical-align:top;"|<h4 style="text-align:left">Axis number and axis label spacing</h4>
| |
− | |
| |
− | |
| |
− | |
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">Axis position adjustments</h5>
| |
− | |
| |
− | |
| |
− | |-
| |
− | !style="vertical-align:top;"|<h4 style="text-align:left">Axis grid</h4>
| |
− | |
| |
− | |
| |
− | |
| |
− | |-
| |
− | !style="vertical-align:top;"|<h4 style="text-align:left">Axis spines</h4>
| |
− | |
| |
− | |
| |
− | |
| |
− | |-
| |
− | !style="vertical-align:top;"|<h4 style="text-align:left">Twin axes</h4>
| |
− | |
| |
− | |
| |
− | |
| |
− | |-
| |
− | !style="vertical-align:top;"|<h4 style="text-align:left">Axes where x and y is zero</h4>
| |
− | |
| |
− | |
| |
− | |
| |
− | |-
| |
− | !style="vertical-align:top;"|<h4 style="text-align:left">Other 2D plot styles</h4>
| |
− | |
| |
− | |
| |
− | |
| |
− | |-
| |
− | !style="vertical-align:top;"|<h4 style="text-align:left">Text annotation</h4>
| |
− | |
| |
− | |
| |
− | |
| |
− | |-
| |
− | ! rowspan="5" style="vertical-align:top;"|<h4 style="text-align:left">Figures with multiple subplots and insets</h4>
| |
− | |
| |
− | |
| |
− | |
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">subplots</h5>
| |
− | |
| |
− | |
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">subplot2grid</h5>
| |
− | |
| |
− | |
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">gridspec</h5>
| |
− | |
| |
− | |
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">add_axes</h5>
| |
− | |
| |
− | |
| |
− | |-
| |
− | ! rowspan="4" style="vertical-align:top;"|<h4 style="text-align:left">Colormap and contour figures</h4>
| |
− | |
| |
− | |
| |
− | |
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">pcolor</h5>
| |
− | |
| |
− | |
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">imshow</h5>
| |
− | |
| |
− | |
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">contour</h5>
| |
− | |
| |
− | |
| |
− | |-
| |
− | ! rowspan="4" style="vertical-align:top;"|<h4 style="text-align:left">3D figures</h4>
| |
− | |
| |
− | | colspan="2" |To use 3D graphics in matplotlib, we first need to create an instance of the <code>Axes3D</code> class. 3D axes can be added to a matplotlib figure canvas in exactly the same way as 2D axes; or, more conveniently, by passing a <code>projection='3d'</code> keyword argument to the <code>add_axes</code> or <code>add_subplot</code> methods.<syntaxhighlight lang="python3">
| |
− | from mpl_toolkits.mplot3d.axes3d import Axes3D
| |
− | </syntaxhighlight><br />
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">Surface plots</h5>
| |
− | | colspan="2" |<syntaxhighlight lang="python3">
| |
− | fig = plt.figure(figsize=(14,6))
| |
− | | |
− | # `ax` is a 3D-aware axis instance because of the projection='3d' keyword argument to add_subplot
| |
− | ax = fig.add_subplot(1, 2, 1, projection='3d')
| |
− | | |
− | p = ax.plot_surface(X, Y, Z, rstride=4, cstride=4, linewidth=0)
| |
− | | |
− | # surface_plot with color grading and color bar
| |
− | ax = fig.add_subplot(1, 2, 2, projection='3d')
| |
− | p = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=matplotlib.cm.coolwarm, linewidth=0, antialiased=False)
| |
− | cb = fig.colorbar(p, shrink=0.5)
| |
− | </syntaxhighlight>[[File:Matplotlib advance1.png|center|thumb|597x597px]]
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">Wire-frame plot</h5>
| |
− | |<syntaxhighlight lang="python3">
| |
− | fig = plt.figure(figsize=(8,6))
| |
− | ax = fig.add_subplot(1, 1, 1, projection='3d')
| |
− | p = ax.plot_wireframe(X, Y, Z, rstride=4, cstride=4)
| |
− | </syntaxhighlight>
| |
− | |[[File:Matplotlib advance2.png|center]]
| |
− | |-
| |
− | |style="vertical-align:top;"|<h5 style="text-align:left">Coutour plots with projections</h5>
| |
− | |<syntaxhighlight lang="python3">
| |
− | fig = plt.figure(figsize=(8,6))
| |
− | | |
− | ax = fig.add_subplot(1,1,1, projection='3d')
| |
− | | |
− | ax.plot_surface(X, Y, Z, rstride=4, cstride=4, alpha=0.25)
| |
− | cset = ax.contour(X, Y, Z, zdir='z', offset=-np.pi, cmap=matplotlib.cm.coolwarm)
| |
− | cset = ax.contour(X, Y, Z, zdir='x', offset=-np.pi, cmap=matplotlib.cm.coolwarm)
| |
− | cset = ax.contour(X, Y, Z, zdir='y', offset=3*np.pi, cmap=matplotlib.cm.coolwarm)
| |
− | | |
− | ax.set_xlim3d(-np.pi, 2*np.pi);
| |
− | ax.set_ylim3d(0, 3*np.pi);
| |
− | ax.set_zlim3d(-np.pi, 2*np.pi);
| |
− | </syntaxhighlight>
| |
− | |[[File:Matplotlib advance3.png|center]]
| |
− | |}
| |
− | | |
− | | |
− | <br />
| |
− | | |
− | ==Data visualization with Seaborn==
| |
− | <code>Seaborn</code> is a statistical visualization library designed to work with pandas dataframes well.
| |
− | | |
− | | |
− | <syntaxhighlight lang="python3">
| |
− | import seaborn as sns
| |
− | %matplotlib inline
| |
− | </syntaxhighlight><br />
| |
− | | |
− | | |
− | <br />
| |
− | ===Built-in data sets===
| |
− | Seaborn comes with built-in data sets!<syntaxhighlight lang="python3">
| |
− | tips = sns.load_dataset('tips')
| |
− | tips.head()
| |
− | # Output:
| |
− | total_bill tip sex smoker day time size
| |
− | 0 16.99 1.01 Female No Sun Dinner 2
| |
− | 1 10.34 1.66 Male No Sun Dinner 3
| |
− | 2 21.01 3.50 Male No Sun Dinner 3
| |
− | 3 23.68 3.31 Male No Sun Dinner 2
| |
− | 4 24.59 3.61 Female No Sun Dinner 4
| |
− | </syntaxhighlight><br />
| |
− | | |
− | | |
− | <br />
| |
− | ===Distribution Plots===
| |
− | {| class="wikitable"
| |
− | |-
| |
− | | colspan="4" |<syntaxhighlight lang="python3">
| |
− | import seaborn as sns
| |
− | %matplotlib inline
| |
− | </syntaxhighlight>
| |
− | |-
| |
− | !
| |
− | !
| |
− | !Description/Example
| |
− | !Output/Figure
| |
− | |-
| |
− | ! style="vertical-align:top;" |<h4 style="text-align:left">Distribution of a univariate set of observations</h4>
| |
− | |<code>'''distplot'''</code>
| |
− | |The distplot shows the distribution of a univariate set of observations:<syntaxhighlight lang="python3">
| |
− | sns.distplot(tips['total_bill'])
| |
− | # Safe to ignore warnings
| |
− | </syntaxhighlight>
| |
− | | |
− | | |
− | | |
− | To remove the kde layer and just have the histogram use:<syntaxhighlight lang="python3">
| |
− | sns.distplot(tips['total_bill'],kde=False,bins=30)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn1.png|center]][[File:Seaborn2.png|center]]
| |
− | |-
| |
− | !style="vertical-align:top;"|<h4 style="text-align:left">Match up two distplots for bivariate data</h4>
| |
− | |<code>'''jointplot()'''</code>
| |
− | |<code>'''jointplot()'''</code> allows you to basically match up two distplots for bivariate data. With your choice of what '''kind''' parameter to compare with:
| |
− | | |
− | *<code>scatter</code>, <code>reg</code>, <code>resid</code>, <code>kde</code>, <code>hex</code><br />
| |
− | <syntaxhighlight lang="python3">
| |
− | sns.jointplot(x='total_bill',y='tip',data=tips,kind='scatter')
| |
− | </syntaxhighlight><br /><syntaxhighlight lang="python3">
| |
− | sns.jointplot(x='total_bill',y='tip',data=tips,kind='hex')
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.jointplot(x='total_bill',y='tip',data=tips,kind='reg')
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn3.png|center]][[File:Seaborn4.png|center]][[File:Seaborn5.png|center]]
| |
− | |-
| |
− | !style="vertical-align:top;"|<h4 style="text-align:left">Plot pairwise relationships across an entire dataframe</h4>
| |
− | |'''<code>pairplot</code>'''
| |
− | | colspan="2" |'''<code>pairplot</code>''' will plot pairwise relationships across an entire dataframe (for the numerical columns) and supports a color hue argument (for categorical columns):<syntaxhighlight lang="python3">
| |
− | sns.pairplot(tips)
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.pairplot(tips,hue='sex',palette='coolwarm')
| |
− | </syntaxhighlight>
| |
− | {| style="margin: 0 auto;"
| |
− | |[[File:Seaborn6.png|center|407x407px]]
| |
− | |[[File:Seaborn7.png|center|407x407px]]
| |
− | |}
| |
− | |-
| |
− | !style="vertical-align:top;"|<h4 style="text-align:left">Draw a dash mark for every point on a univariate distribution</h4>
| |
− | |<code>'''rugplot'''</code>
| |
− | |rugplots are actually a very simple concept, they just draw a dash mark for every point on a univariate distribution. They are the building block of a KDE plot:<syntaxhighlight lang="python3">
| |
− | sns.rugplot(tips['total_bill'])
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn8.png|center]]
| |
− | |-
| |
− | ! rowspan="4" style="vertical-align:top;"|<h4 style="text-align:left">[[wikipedia:Kernel_density_estimation#Practical_estimation_of_the_bandwidth|Kernel Density Estimation plots]]</h4>
| |
− | | rowspan="4" |'''<code>kdeplot</code>'''
| |
− | |kdeplots are Kernel Density Estimation plots. These KDE plots replace every single observation with a Gaussian (Normal) distribution centered around that value. For example:<syntaxhighlight lang="python3">
| |
− | # Don't worry about understanding this code!
| |
− | # It's just for the diagram below
| |
− | import numpy as np
| |
− | import matplotlib.pyplot as plt
| |
− | from scipy import stats
| |
− | | |
− | #Create dataset
| |
− | dataset = np.random.randn(25)
| |
− | | |
− | # Create another rugplot
| |
− | sns.rugplot(dataset);
| |
− | | |
− | # Set up the x-axis for the plot
| |
− | x_min = dataset.min() - 2
| |
− | x_max = dataset.max() + 2
| |
− | | |
− | # 100 equally spaced points from x_min to x_max
| |
− | x_axis = np.linspace(x_min,x_max,100)
| |
− | | |
− | # Set up the bandwidth, for info on this:
| |
− | url = 'http://en.wikipedia.org/wiki/Kernel_density_estimation#Practical_estimation_of_the_bandwidth'
| |
− | | |
− | bandwidth = ((4*dataset.std()**5)/(3*len(dataset)))**.2
| |
− | | |
− | | |
− | # Create an empty kernel list
| |
− | kernel_list = []
| |
− | | |
− | # Plot each basis function
| |
− | for data_point in dataset:
| |
− |
| |
− | # Create a kernel for each point and append to list
| |
− | kernel = stats.norm(data_point,bandwidth).pdf(x_axis)
| |
− | kernel_list.append(kernel)
| |
− |
| |
− | #Scale for plotting
| |
− | kernel = kernel / kernel.max()
| |
− | kernel = kernel * .4
| |
− | plt.plot(x_axis,kernel,color = 'grey',alpha=0.5)
| |
− | | |
− | plt.ylim(0,1)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn9.png|center]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | # To get the kde plot we can sum these basis functions.
| |
− | | |
− | # Plot the sum of the basis function
| |
− | sum_of_kde = np.sum(kernel_list,axis=0)
| |
− | | |
− | # Plot figure
| |
− | fig = plt.plot(x_axis,sum_of_kde,color='indianred')
| |
− | | |
− | # Add the initial rugplot
| |
− | sns.rugplot(dataset,c = 'indianred')
| |
− | | |
− | # Get rid of y-tick marks
| |
− | plt.yticks([])
| |
− | | |
− | # Set title
| |
− | plt.suptitle("Sum of the Basis Functions")
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn10.png|center]]
| |
− | |-
| |
− | |So with our tips dataset:<syntaxhighlight lang="python3">
| |
− | sns.kdeplot(tips['total_bill'])
| |
− | sns.rugplot(tips['total_bill'])
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn11.png|center]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | sns.kdeplot(tips['tip'])
| |
− | sns.rugplot(tips['tip'])
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn12.png|center]]
| |
− | |}
| |
− | | |
− | | |
− | <br />
| |
− | ===Categorical Data Plots===
| |
− | Now let's discuss using seaborn to plot categorical data! There are a few main plot types for this:
| |
− | | |
− | *<code>factorplot</code>
| |
− | *<code>boxplot</code>
| |
− | *<code>violinplot</code>
| |
− | *<code>stripplot</code>
| |
− | *<code>swarmplot</code>
| |
− | *<code>barplot</code>
| |
− | *<code>countplot</code>
| |
− | | |
− | {| class="wikitable"
| |
− | |-
| |
− | | colspan="4" |<syntaxhighlight lang="python3">
| |
− | import seaborn as sns
| |
− | %matplotlib inline
| |
− | </syntaxhighlight>
| |
− | |-
| |
− | !
| |
− | !
| |
− | !Description/Example
| |
− | !Output/Figure
| |
− | |-
| |
− | ! rowspan="3" style="vertical-align:top;" |<h4 style="text-align:left">Barplot and Countplot</h4>
| |
− | | rowspan="2" |<code>'''sns.barplot'''</code>
| |
− | |'''<code>barplot</code>''' is a general plot that allows you to aggregate the categorical data based off some function, by default the mean:<syntaxhighlight lang="python3">
| |
− | sns.barplot(x='sex',y='total_bill',data=tips)
| |
− | </syntaxhighlight>
| |
− | | |
− | |[[File:Seaborn categorical1.png|center|350x350px]]
| |
− | |-
| |
− | |You can change the estimator object to your own function, that converts a vector to a scalar:<syntaxhighlight lang="python3">
| |
− | import numpy as np
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.barplot(x='sex',y='total_bill',data=tips,estimator=np.std)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn categorical2.png|center|350x350px]]
| |
− | |-
| |
− | |'''<code>sns.countplot</code>'''
| |
− | |This is essentially the same as barplot except the estimator is explicitly counting the number of occurrences. Which is why we only pass the x value:<syntaxhighlight lang="python3">
| |
− | sns.countplot(x='sex',data=tips)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn categorical3.png|center|350x350px]]
| |
− | |-
| |
− | ! rowspan="3" style="vertical-align:top;" |<h4 style="text-align:left">Boxplot and Violinplot</h4>
| |
− | |
| |
− | |Boxplots and Violinplots are used to shown the distribution of categorical data.
| |
− | |
| |
− | |-
| |
− | |'''<code>sns.boxplot</code>'''
| |
− | |A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.<syntaxhighlight lang="python3">
| |
− | sns.boxplot(x="day", y="total_bill", data=tips,palette='rainbow')
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | # Can do entire dataframe with orient='h'
| |
− | sns.boxplot(data=tips,palette='rainbow',orient='h')
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.boxplot(x="day", y="total_bill", hue="smoker",data=tips, palette="coolwarm")
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn categorical4.png|center|350x350px]][[File:Seaborn categorical5.png|center|350x350px]][[File:Seaborn categorical6.png|center|350x350px]]
| |
− | |-
| |
− | |<code>'''sns.violinplot'''</code>
| |
− | |A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.<syntaxhighlight lang="python3">
| |
− | sns.violinplot(x="day", y="total_bill", data=tips,palette='rainbow')
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.violinplot(x="day", y="total_bill", data=tips,hue='sex',palette='Set1')
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.violinplot(x="day", y="total_bill", data=tips,hue='sex',split=True,palette='Set1')
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn categorical7.png|center|350x350px]][[File:Seaborn categorical8.png|center|350x350px]][[File:Seaborn categorical9.png|center|350x350px]]
| |
− | |-
| |
− | ! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">Stripplot and Swarmplot</h4>
| |
− | |'''<code>sns.stripplot</code>'''
| |
− | |The stripplot will draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.<syntaxhighlight lang="python3">
| |
− | sns.stripplot(x="day", y="total_bill", data=tips)
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.stripplot(x="day", y="total_bill", data=tips,jitter=True)
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.stripplot(x="day", y="total_bill", data=tips,jitter=True,hue='sex',palette='Set1')
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.stripplot(x="day", y="total_bill", data=tips,jitter=True,hue='sex',palette='Set1',split=True)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn categorical10.png|center|350x350px]][[File:Seaborn categorical11.png|center|350x350px]][[File:Seaborn categorical12.png|center|350x350px]][[File:Seaborn categorical13.png|center|350x350px]]
| |
− | |-
| |
− | |'''<code>sns.swarmplot</code>'''
| |
− | |The swarmplot is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap. This gives a better representation of the distribution of values, although it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them).<syntaxhighlight lang="python3">
| |
− | sns.swarmplot(x="day", y="total_bill", data=tips)
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.swarmplot(x="day", y="total_bill",hue='sex',data=tips, palette="Set1", split=True)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn categorical14.png|center|350x350px]][[File:Seaborn categorical15.png|center|350x350px]]
| |
− | |-
| |
− | ! style="vertical-align:top;" |<h4 style="text-align:left">Combining Categorical Plots</h4>
| |
− | |
| |
− | |<syntaxhighlight lang="python3">
| |
− | sns.violinplot(x="tip", y="day", data=tips,palette='rainbow')
| |
− | sns.swarmplot(x="tip", y="day", data=tips,color='black',size=3)
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn categorical16.png|center|350x350px]]
| |
− | |-
| |
− | ! style="vertical-align:top;" |<h4 style="text-align:left">Factorplot</h4>
| |
− | |'''<code>sns.factorplot</code>'''
| |
− | |factorplot is the most general form of a categorical plot. It can take in a '''kind''' parameter to adjust the plot type:<nowiki><syntaxhighlight lang="python3"></nowiki><syntaxhighlight lang="python3">
| |
− | sns.factorplot(x='sex',y='total_bill',data=tips,kind='bar')
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn categorical17.png|center|250x250px]]
| |
− | |}
| |
− | | |
− | | |
− | <br />
| |
− | ===Matrix Plots===
| |
− | Matrix plots allow you to plot data as color-encoded matrices and can also be used to indicate clusters within the data (later in the machine learning section we will learn how to formally cluster data).
| |
− | {| class="wikitable"
| |
− | |-
| |
− | | colspan="4" |<syntaxhighlight lang="python3">
| |
− | import seaborn as sns
| |
− | %matplotlib inline
| |
− | | |
− | flights = sns.load_dataset('flights')
| |
− | | |
− | tips = sns.load_dataset('tips')
| |
− | | |
− | tips.head()
| |
− | # Output:
| |
− | total_bill tip sex smoker day time size
| |
− | 0 16.99 1.01 Female No Sun Dinner 2
| |
− | 1 10.34 1.66 Male No Sun Dinner 3
| |
− | 2 21.01 3.50 Male No Sun Dinner 3
| |
− | 3 23.68 3.31 Male No Sun Dinner 2
| |
− | 4 24.59 3.61 Female No Sun Dinner 4
| |
− | | |
− | flights.head()
| |
− | # Output:
| |
− | year month passengers
| |
− | 0 1949 January 112
| |
− | 1 1949 February 118
| |
− | 2 1949 March 132
| |
− | 3 1949 April 129
| |
− | 4 1949 May 121
| |
− | </syntaxhighlight>
| |
− | |-
| |
− | !
| |
− | !
| |
− | !Description/Example
| |
− | !Output/Figure
| |
− | |-
| |
− | ! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">Heatmap</h4>
| |
− | | rowspan="2" |'''<code>sns.heatmap</code>'''
| |
− | |In order for a <code>'''heatmap'''</code> to work properly, your data should already be in a matrix form, the <code>'''sns.heatmap'''</code> function basically just colors it in for you. For example:<syntaxhighlight lang="python3">
| |
− | # Matrix form for correlation data
| |
− | tips.corr()
| |
− | | |
− | # Output:
| |
− | total_bill tip size
| |
− | total_bill 1.000000 0.675734 0.598315
| |
− | tip 0.675734 1.000000 0.489299
| |
− | size 0.598315 0.489299 1.000000
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.heatmap(tips.corr())
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.heatmap(tips.corr(),cmap='coolwarm',annot=True)
| |
− | </syntaxhighlight>
| |
− | | |
− | |[[File:Matrix plots1.png|center|300x300px]][[File:Matrix plots2.png|center|300x300px]]
| |
− | |-
| |
− | |Or for the flights data:<syntaxhighlight lang="python3">
| |
− | flights.pivot_table(values='passengers',index='month',columns='year')
| |
− | # Output:
| |
− | year 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
| |
− | month
| |
− | January 112 115 145 171 196 204 242 284 315 340 360 417
| |
− | February 118 126 150 180 196 188 233 277 301 318 342 391
| |
− | March 132 141 178 193 236 235 267 317 356 362 406 419
| |
− | April 129 135 163 181 235 227 269 313 348 348 396 461
| |
− | May 121 125 172 183 229 234 270 318 355 363 420 472
| |
− | June 135 149 178 218 243 264 315 374 422 435 472 535
| |
− | July 148 170 199 230 264 302 364 413 465 491 548 622
| |
− | August 148 170 199 242 272 293 347 405 467 505 559 606
| |
− | September 136 158 184 209 237 259 312 355 404 404 463 508
| |
− | October 119 133 162 191 211 229 274 306 347 359 407 461
| |
− | November 104 114 146 172 180 203 237 271 305 310 362 390
| |
− | December 118 140 166 194 201 229 278 306 336 337 405 432
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | pvflights = flights.pivot_table(values='passengers',index='month',columns='year')
| |
− | sns.heatmap(pvflights)
| |
− | </syntaxhighlight><syntaxhighlight lang="python3">
| |
− | sns.heatmap(pvflights,cmap='magma',linecolor='white',linewidths=1)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matrix plots3.png|center|300x300px]][[File:Matrix plots4.png|center|300x300px]]
| |
− | |-
| |
− | ! style="vertical-align:top;" |<h4 style="text-align:left">Clustermap</h4>
| |
− | |'''<code>sns.clustermap</code>'''
| |
− | |The clustermap uses hierarchal clustering to produce a clustered version of the heatmap. For example:<syntaxhighlight lang="python3">
| |
− | sns.clustermap(pvflights)
| |
− | </syntaxhighlight>
| |
− | | |
− | | |
− | Notice now how the years and months are no longer in order, instead they are grouped by similarity in value (passenger count). That means we can begin to infer things from this plot, such as August and July being similar (makes sense, since they are both summer travel months)<syntaxhighlight lang="python3">
| |
− | # More options to get the information a little clearer like normalization
| |
− | sns.clustermap(pvflights,cmap='coolwarm',standard_scale=1)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Matrix plots5.png|center|300x300px]][[File:Matrix plots6.png|center|301x301px]]
| |
− | |}
| |
− | | |
− | | |
− | <br />
| |
− | ===Grids===
| |
− | Grids are general types of plots that allow you to map plot types to rows and columns of a grid, this helps you create similar plots separated by features.
| |
− | {| class="wikitable"
| |
− | |-
| |
− | | colspan="4" |<syntaxhighlight lang="python3">
| |
− | import seaborn as sns
| |
− | import matplotlib.pyplot as plt
| |
− | %matplotlib inline
| |
− | | |
− | iris = sns.load_dataset('iris')
| |
− | iris.head()
| |
− | # Ouput:
| |
− | sepal_length sepal_width petal_length petal_width species
| |
− | 0 5.1 3.5 1.4 0.2 setosa
| |
− | 1 4.9 3.0 1.4 0.2 setosa
| |
− | 2 4.7 3.2 1.3 0.2 setosa
| |
− | 3 4.6 3.1 1.5 0.2 setosa
| |
− | 4 5.0 3.6 1.4 0.2 setosa
| |
− | </syntaxhighlight>
| |
− | |-
| |
− | !
| |
− | !
| |
− | !Description/Example
| |
− | !Output/Figure
| |
− | |-
| |
− | ! rowspan="3" style="vertical-align:top;" |<h4 style="text-align:left">PairGrid</h4>
| |
− | | rowspan="3" |'''<code>sns.PairGrid()</code>'''
| |
− | |Pairgrid is a subplot grid for plotting pairwise relationships in a dataset.<syntaxhighlight lang="python3">
| |
− | # Just the Grid
| |
− | sns.PairGrid(iris)
| |
− | </syntaxhighlight><br />
| |
− | | |
− | |[[File:Seaborn grids1.png|center|500x500px]]
| |
− | |-
| |
− | |Then you map to the grid<syntaxhighlight lang="python3">
| |
− | g = sns.PairGrid(iris)
| |
− | g.map(plt.scatter)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn grids2.png|center|500x500px]]
| |
− | |-
| |
− | |Map to upper,lower, and diagonal<syntaxhighlight lang="python3">
| |
− | g = sns.PairGrid(iris)
| |
− | g.map_diag(plt.hist)
| |
− | g.map_upper(plt.scatter)
| |
− | g.map_lower(sns.kdeplot)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn grids3.png|center|500x500px]]
| |
− | |-
| |
− | ! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">Pairplot</h4>
| |
− | | rowspan="2" |'''<code>sns.pairplot()</code>'''
| |
− | |A '''<code>pairplot</code>''' is a simpler version of '''<code>PairGrid</code>''' (you'll use quite often)<syntaxhighlight lang="python3">
| |
− | sns.pairplot(iris)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn grids4.png|center|500x500px]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | sns.pairplot(iris,hue='species',palette='rainbow')
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn grids5.png|center|500x500px]]
| |
− | |-
| |
− | ! rowspan="4" style="vertical-align:top;" |<h4 style="text-align:left">Facet Grid</h4>
| |
− | |
| |
− | |FacetGrid is the general way to create grids of plots based off of a feature:<syntaxhighlight lang="python3">
| |
− | tips = sns.load_dataset('tips')
| |
− | # tips.head()
| |
− | total_bill tip sex smoker day time size
| |
− | 0 16.99 1.01 Female No Sun Dinner 2
| |
− | 1 10.34 1.66 Male No Sun Dinner 3
| |
− | 2 21.01 3.50 Male No Sun Dinner 3
| |
− | 3 23.68 3.31 Male No Sun Dinner 2
| |
− | 4 24.59 3.61 Female No Sun Dinner 4
| |
− | </syntaxhighlight><br />
| |
− | |
| |
− | |-
| |
− | | rowspan="3" |'''<code>sns.FacetGrid()</code>'''
| |
− | |<syntaxhighlight lang="python3">
| |
− | # Just the Grid
| |
− | g = sns.FacetGrid(tips, col="time", row="smoker")
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn grids6.png|center]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | g = sns.FacetGrid(tips, col="time", row="smoker")
| |
− | g = g.map(plt.hist, "total_bill")
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn grids7.png|center]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | g = sns.FacetGrid(tips, col="time", row="smoker",hue='sex')
| |
− | # Notice hwo the arguments come after plt.scatter call
| |
− | g = g.map(plt.scatter, "total_bill", "tip").add_legend()
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn grids8.png|center]]
| |
− | |-
| |
− | ! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">JointGri</h4>
| |
− | | rowspan="2" |'''<code>sns.JointGrid()</code>'''
| |
− | |JointGrid is the general version for jointplot() type grids, for a quick example:<syntaxhighlight lang="python3">
| |
− | g = sns.JointGrid(x="total_bill", y="tip", data=tips)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn grids9.png|center]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | g = sns.JointGrid(x="total_bill", y="tip", data=tips)
| |
− | g = g.plot(sns.regplot, sns.distplot)
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn grids10.png|center]]
| |
− | |}
| |
− | | |
− | | |
− | <br />
| |
− | ===Regression plots===
| |
− | Seaborn has many built-in capabilities for regression plots, however we won't really discuss regression until the machine learning section of the course, so we will only cover the '''<code>lmplot()</code>''' function for now.
| |
− | | |
− | '''<code>lmplot</code>''' allows you to display linear models, but it also conveniently allows you to split up those plots based off of features, as well as coloring the hue based off of features.
| |
− | {| class="wikitable"
| |
− | |-
| |
− | | colspan="4" |<syntaxhighlight lang="python3">
| |
− | import seaborn as sns
| |
− | %matplotlib inline
| |
− | | |
− | tips = sns.load_dataset('tips')
| |
− | | |
− | tips.head()
| |
− | # Output:
| |
− | total_bill tip sex smoker day time size
| |
− | 0 16.99 1.01 Female No Sun Dinner 2
| |
− | 1 10.34 1.66 Male No Sun Dinner 3
| |
− | 2 21.01 3.50 Male No Sun Dinner 3
| |
− | 3 23.68 3.31 Male No Sun Dinner 2
| |
− | 4 24.59 3.61 Female No Sun Dinner 4
| |
− | | |
− | </syntaxhighlight>
| |
− | |-
| |
− | ! colspan="2" |
| |
− | !Description/Example
| |
− | !Output/Figure
| |
− | |-
| |
− | ! rowspan="8" style="vertical-align:top;" |<h4 style="text-align:left">The lmplot() function</h4>
| |
− | | rowspan="3" |
| |
− | |<syntaxhighlight lang="python3">
| |
− | sns.lmplot(x='total_bill',y='tip',data=tips)
| |
− | </syntaxhighlight><br />
| |
− | | |
− | |[[File:Seaborn regression plots1.png|center|300x300px]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex')
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn regression plots2.png|center|300x300px]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex',palette='coolwarm')
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn regression plots3.png|center|300x300px]]
| |
− | |-
| |
− | |style="vertical-align:top;" |<h5 style="text-align:left">Working with Markers</h5>
| |
− | |lmplot kwargs get passed through to '''regplot''' which is a more general form of lmplot(). regplot has a scatter_kws parameter that gets passed to plt.scatter. So you want to set the s parameter in that dictionary, which corresponds (a bit confusingly) to the squared markersize. In other words you end up passing a dictionary with the base matplotlib arguments, in this case, s for size of a scatter plot. In general, you probably won't remember this off the top of your head, but instead reference the documentation.<syntaxhighlight lang="python3">
| |
− | # http://matplotlib.org/api/markers_api.html
| |
− | sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex',palette='coolwarm',
| |
− | markers=['o','v'],scatter_kws={'s':100})
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn regression plots4.png|center|300x300px]]
| |
− | |-
| |
− | | rowspan="3" style="vertical-align:top;" |<h5 style="text-align:left">Using a Grid</h5>
| |
− | |We can add more variable separation through columns and rows with the use of a grid. Just indicate this with the col or row arguments:<syntaxhighlight lang="python3">
| |
− | sns.lmplot(x='total_bill',y='tip',data=tips,col='sex')
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn regression plots5.png|center|300x300px]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | sns.lmplot(x="total_bill", y="tip", row="sex", col="time",data=tips)
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn regression plots6.png|center|300x300px]]
| |
− | |-
| |
− | | colspan="2" |<syntaxhighlight lang="python3">
| |
− | sns.lmplot(x='total_bill',y='tip',data=tips,col='day',hue='sex',palette='coolwarm')
| |
− | </syntaxhighlight>[[File:Seaborn regression plots7.png|center|778x778px]]
| |
− | |-
| |
− | |style="vertical-align:top;" |<h5 style="text-align:left">Aspect and Size</h5>
| |
− | | colspan="2" |Seaborn figures can have their size and aspect ratio adjusted with the '''size''' and '''aspect''' parameters:<syntaxhighlight lang="python3">
| |
− | sns.lmplot(x='total_bill',y='tip',data=tips,col='day',hue='sex',palette='coolwarm',
| |
− | aspect=0.6,size=8)
| |
− | </syntaxhighlight><br />[[File:Seaborn_regression_plots8.png|center|778x778px]]
| |
− | |}
| |
− | | |
− | | |
− | <br />
| |
− | ===Style and Color===
| |
− | Check out the documentation page for more info on these topics: https://stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html
| |
− | | |
− | <br />
| |
− | {| class="wikitable"
| |
− | |-
| |
− | | colspan="4" |<syntaxhighlight lang="python3">
| |
− | import seaborn as sns
| |
− | import matplotlib.pyplot as plt
| |
− | %matplotlib inline
| |
− | tips = sns.load_dataset('tips')
| |
− | </syntaxhighlight>
| |
− | |-
| |
− | !
| |
− | !Method/Operator
| |
− | !Description/Example
| |
− | !Output/Figure
| |
− | |-
| |
− | ! rowspan="3" style="vertical-align:top;" |<h4 style="text-align:left">Styles</h4>
| |
− | | rowspan="3" |'''<code>sns.set_style()</code>'''
| |
− | |<syntaxhighlight lang="python3">
| |
− | sns.countplot(x='sex',data=tips)
| |
− | </syntaxhighlight><br />
| |
− | | |
− | |[[File:Seaborn_categorical3.png|center|300x300px]]
| |
− | |-
| |
− | |You can set particular styles:<syntaxhighlight lang="python3">
| |
− | sns.set_style('white')
| |
− | sns.countplot(x='sex',data=tips)
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn_Style_and_Color2.png|center|300x300px]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | sns.set_style('ticks')
| |
− | sns.countplot(x='sex',data=tips,palette='deep')
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn_Style_and_Color3.png|center|300x300px]]
| |
− | |-
| |
− | ! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">Spine Removal</h4>
| |
− | | rowspan="2" |'''<code>sns.despine()</code>'''
| |
− | |<syntaxhighlight lang="python3">
| |
− | sns.countplot(x='sex',data=tips)
| |
− | sns.despine()
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn Style and Color4.png|center|300x300px]]
| |
− | |-
| |
− | |<syntaxhighlight lang="python3">
| |
− | sns.countplot(x='sex',data=tips)
| |
− | sns.despine(left=True)
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn Style and Color5.png|center|300x300px]]
| |
− | |-
| |
− | ! rowspan="2" style="vertical-align:top;" |<h4 style="text-align:left">Size and Aspect</h4>
| |
− | |style="vertical-align:top;" |<h5 style="text-align:left">Size</h5>
| |
− | | |
− | '''<code>plt.figure(figsize=())</code>'''
| |
− | |You can use matplotlib's '''''<code>plt.figure(figsize=(width,height</code>''''' to change the size of most seaborn plots.
| |
− | | |
− | You can control the size and aspect ratio of most seaborn grid plots by passing in parameters: size, and aspect. For example:<syntaxhighlight lang="python3">
| |
− | # Non Grid Plot
| |
− | plt.figure(figsize=(12,3))
| |
− | sns.countplot(x='sex',data=tips)
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn Style and Color6.png|center|400x400px]]
| |
− | |-
| |
− | |style="vertical-align:top;" |<h5 style="text-align:left">Grid Type</h5>
| |
− | |<syntaxhighlight lang="python3">
| |
− | # Grid Type Plot
| |
− | sns.lmplot(x='total_bill',y='tip',size=2,aspect=4,data=tips)
| |
− | </syntaxhighlight>
| |
− | |[[File:Seaborn Style and Color7.png|center|400x400px]]
| |
− | |-
| |
− | !style="vertical-align:top;" |<h4 style="text-align:left">Scale and Context</h4>
| |
− | |<code>'''set_context()'''</code>
| |
− | |The <code>'''set_context()'''</code> allows you to override default parameters:<syntaxhighlight lang="python3">
| |
− | sns.set_context('poster',font_scale=4)
| |
− | sns.countplot(x='sex',data=tips,palette='coolwarm')
| |
− | </syntaxhighlight><br />
| |
− | |[[File:Seaborn Style and Color8.png|center|400x400px]]
| |
− | |}
| |
− | | |
− | | |
− | <br />
| |
− | | |
− | ==Plotly and Cufflinks Data Visualization==
| |
− | Plotly is a library that allows you to create interactive plots that you can use in dashboards or websites (you can save them as html files or static images).<br />
| |
− | | |
− | Check out the plotly.py documentation and gallery to learn more: https://plot.ly/python/
| |
− | | |
− | <code>Plotly</code> plots can be easily saved online and shared at https://chart-studio.plot.ly. Take a look at this example: https://chart-studio.plot.ly/~jackp/671/average-effective-tax-rates-by-income-percentiles-1960-2004/#/
| |
− | | |
− | | |
− | <br />
| |
− | ===Installation===
| |
− | In order for this all to work, you'll need to install '''<code>plotly</code>''' and '''<code>cufflinks</code>''' to call plots directly off of a pandas dataframe. '''<code>Cufflinks</code>''' is not currently available through '''conda''' but available through '''pip'''. Install the libraries at your command line/terminal using:<syntaxhighlight lang="shell">
| |
− | pip install plotly
| |
− | pip install cufflinks
| |
− | </syntaxhighlight><br />
| |
− | | |
− | | |
− | <br />
| |
− | ===Imports and Set-up===
| |
− | <syntaxhighlight lang="python3">
| |
− | import pandas as pd
| |
− | import numpy as np
| |
− | %matplotlib inline
| |
− | | |
− | from plotly import __version__
| |
− | from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
| |
− | print(__version__) # requires version >= 1.9.0
| |
− | | |
− | import cufflinks as cf
| |
− | | |
− | # For Notebooks
| |
− | init_notebook_mode(connected=True)
| |
− | | |
− | # For offline use
| |
− | cf.go_offline()
| |
− | </syntaxhighlight><br />
| |
− | | |
− | | |
− | <br />
| |
− | ===Data===
| |
− | <syntaxhighlight lang="python3">
| |
− | df = pd.DataFrame(np.random.randn(100,4),columns='A B C D'.split())
| |
− | df2 = pd.DataFrame({'Category':['A','B','C'],'Values':[32,43,50]})
| |
− | | |
− | df.head()
| |
− | # Output:
| |
− | A B C D
| |
− | 0 1.878725 0.688719 1.066733 0.543956
| |
− | 1 0.028734 0.104054 0.048176 1.842188
| |
− | 2 -0.158793 0.387926 -0.635371 -0.637558
| |
− | 3 -1.221972 1.393423 -0.299794 -1.113622
| |
− | 4 1.253152 -0.537598 0.302917 -2.546083
| |
− | | |
− | df2.head()
| |
− | # Output:
| |
− | Category Values
| |
− | 0 A 32
| |
− | 1 B 43
| |
− | 2 C 50
| |
− | </syntaxhighlight>
| |
− | | |
− | | |
− | <br />
| |
− | {| class="wikitable"
| |
− | |-
| |
− | !
| |
− | !Method/Operator
| |
− | !Description/Example
| |
− | !Output/Figure
| |
− | |-
| |
− | ! rowspan="8" style="vertical-align:top;" |<h3 style="text-align:left">Using Cufflinks and iplot()</h3>
| |
− | | style="vertical-align:top;" |'''Scatter'''
| |
− | |<syntaxhighlight lang="python3">
| |
− | df.iplot(kind='scatter',x='A',y='B',mode='markers',size=10)
| |
− | </syntaxhighlight>https://plot.ly/~adeloaleman/15
| |
− | | |
− | |[[File:Plotly1.png|center|490x490px]]https://plot.ly/~adeloaleman/15
| |
− | |-
| |
− | | style="vertical-align:top;" |<h4 style="text-align:left">Bar Plots</h4>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df2.iplot(kind='bar',x='Category',y='Values')
| |
− | </syntaxhighlight>https://plot.ly/~adeloaleman/13
| |
− | |[[File:Plotly2.png|center|490x490px]]https://plot.ly/~adeloaleman/13
| |
− | |-
| |
− | | style="vertical-align:top;" |<h4 style="text-align:left">Boxplots</h4>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df.iplot(kind='box')
| |
− | </syntaxhighlight>https://plot.ly/~adeloaleman/11
| |
− | |[[File:Plotly3.png|center|490x490px]]https://plot.ly/~adeloaleman/11
| |
− | |-
| |
− | | style="vertical-align:top;" |<h4 style="text-align:left">3d Surface</h4>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df3 = pd.DataFrame({'x':[1,2,3,4,5],'y':[10,20,30,20,10],'z':[5,4,3,2,1]})
| |
− | df3.iplot(kind='surface',colorscale='rdylbu')
| |
− | </syntaxhighlight>https://plot.ly/~adeloaleman/17
| |
− | |[[File:Plotly4.png|center|490x490px]]https://plot.ly/~adeloaleman/17
| |
− | |-
| |
− | | style="vertical-align:top;" |<h4 style="text-align:left">Spread</h4>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df[['A','B']].iplot(kind='spread')
| |
− | </syntaxhighlight>https://plot.ly/~adeloaleman/19
| |
− | |[[File:Plotly5.png|center|490x490px]]https://plot.ly/~adeloaleman/19
| |
− | |-
| |
− | | style="vertical-align:top;" |<h4 style="text-align:left">Histogram</h4>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df['A'].iplot(kind='hist',bins=25)
| |
− | </syntaxhighlight>https://plot.ly/~adeloaleman/21
| |
− | |[[File:Plotly6.png|center|490x490px]]https://plot.ly/~adeloaleman/21
| |
− | |-
| |
− | | style="vertical-align:top;" |<h4 style="text-align:left">Bubble</h4>
| |
− | |<syntaxhighlight lang="python3">
| |
− | df.iplot(kind='bubble',x='A',y='B',size='C')
| |
− | </syntaxhighlight>https://plot.ly/~adeloaleman/23
| |
− | |[[File:Plotly7.png|center|490x490px]]https://plot.ly/~adeloaleman/23
| |
− | |-
| |
− | | style="vertical-align:top;" |<h4 style="text-align:left">Scatter_matrix</h4>
| |
− | | |
− | |<syntaxhighlight lang="python3">
| |
− | df.scatter_matrix()
| |
− | | |
− | # Similar to sns.pairplot()
| |
− | </syntaxhighlight>https://plot.ly/~adeloaleman/25
| |
− | |[[File:Plotly8.png|center|490x490px]]https://plot.ly/~adeloaleman/25
| |
− | |}
| |
− | | |
− | | |
− | <br />
| |
− | | |
− | ==Word cloud==
| |
− | https://github.com/amueller/word_cloud
| |
− | | |
− | | |
− | In Dash:
| |
− | * https://community.plot.ly/t/wordcloud-in-dash/11407/4
| |
− | :: https://community.plot.ly/t/show-and-tell-wordcloudworld-com/15649
| |
− | ::: https://github.com/mikesmith1611/word-cloud-world
| |
− | :::: http://www.wordcloudworld.com/
| |
− | | |
− | | |
− | * https://community.plot.ly/t/solved-is-it-possible-to-make-a-wordcloud-in-dash/4565
| |
− | | |
− | | |
− | <br />
| |
− | ===Installation===
| |
− | Using pip:
| |
− | pip install wordcloud
| |
− | | |
− | | |
− | Using conda:
| |
− | | |
− | https://anaconda.org/conda-forge/wordcloud
| |
− | | |
− | conda install -c conda-forge wordcloud
| |
− | | |
− | | |
− | '''Installation notes:'''
| |
− | | |
− | <code>wordcloud</code> depends on <code>numpy</code> and <code>pillow</code>.
| |
− | | |
− | | |
− | To save the <code>wordcloud</code> into a file, <code>matplotlib</code> can also be installed.
| |
− | | |
− | | |
− | <br />
| |
− | ===Minimal example===
| |
− | Can be run in jupyter-notebook:
| |
− | <syntaxhighlight lang="python3">
| |
− | """
| |
− | Minimal Example
| |
− | ===============
| |
− | Generating a square wordcloud from the US constitution using default arguments.
| |
− | """
| |
− | | |
− | import os
| |
− | | |
− | from os import path
| |
− | from wordcloud import WordCloud
| |
− | | |
− | # get data directory (using getcwd() is needed to support running example in generated IPython notebook)
| |
− | d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()
| |
− | | |
− | # Read the whole text.
| |
− | text = open(path.join(d, 'constitution.txt')).read()
| |
− | | |
− | # Generate a word cloud image
| |
− | wordcloud = WordCloud().generate(text)
| |
− | | |
− | # Display the generated image:
| |
− | # the matplotlib way:
| |
− | import matplotlib.pyplot as plt
| |
− | plt.imshow(wordcloud, interpolation='bilinear')
| |
− | plt.axis("off")
| |
− | | |
− | # lower max_font_size
| |
− | wordcloud = WordCloud(max_font_size=40).generate(text)
| |
− | plt.figure()
| |
− | plt.imshow(wordcloud, interpolation="bilinear")
| |
− | plt.axis("off")
| |
− | plt.show()
| |
− | | |
− | # The pil way (if you don't have matplotlib)
| |
− | # image = wordcloud.to_image()
| |
− | # image.show()
| |
− | </syntaxhighlight>
| |
| | | |
| | | |
Line 1,660: |
Line 127: |
| | | |
| <br /> | | <br /> |
− |
| |
| ==[[Dash - Plotly]]== | | ==[[Dash - Plotly]]== |
| | | |