Data Visualisation Techniques
Wednesday, 11 January 2017
3 Specifying Range of Data and Selecting Columns : Google Charts with Google Docs data
Sample Codes for illustration of ggplot2
## Sample Codes for illustration of ggplot2
## Susheel Shukla 09-01-2017
### install & load ggplot library
install.package("ggplot2")
library("ggplot2")
### show info about the data
head(diamonds)
head(mtcars)
### comparison qplot vs ggplot
# qplot histogram
qplot(clarity, data=diamonds, fill=cut, geom="bar")
# ggplot histogram -> same output
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()
P.S. - both qplot and ggplot having same graph (as shown here)
# transform input data with functions
qplot(log(wt), mpg - 10, data=mtcars)
# add aesthetic mapping (hint: how does mapping work)
qplot(wt, mpg, data=mtcars, color=qsec)
# change size of points (hint: color/colour, hint: set aesthetic/mapping)
qplot(wt, mpg, data=mtcars, color=qsec, size=3)
qplot(wt, mpg, data=mtcars, colour=qsec, size=I(3)) #aesthetics can be set
to a constant value
instead of mapping
# use alpha blending
qplot(wt, mpg, data=mtcars, alpha=qsec) # values between 0 (transparent)
and 1 (opaque)
# continuous scale vs. discrete scale
head(mtcars)
qplot(wt, mpg, data=mtcars, colour=cyl)
levels(mtcars$cyl)
qplot(wt, mpg, data=mtcars, colour=factor(cyl))
# use different aesthetic mappings
qplot(wt, mpg, data=mtcars, shape=factor(cyl))
qplot(wt, mpg, data=mtcars, size=qsec)
# combine mappings (hint: hollow points, geom-concept, legend combination)
qplot(wt, mpg, data=mtcars, size=qsec, color=factor(carb))
qplot(wt, mpg, data=mtcars, size=qsec, color=factor(carb), shape=I(1))
qplot(wt, mpg, data=mtcars, size=qsec, shape=factor(cyl), geom="point")
qplot(wt, mpg, data=mtcars, size=factor(cyl), geom="point")
# bar-plot
qplot(factor(cyl), data=mtcars, geom="bar")
# flip plot by 90°
qplot(factor(cyl), data=mtcars, geom="bar") + coord_flip()
# difference between fill/color bars
qplot(factor(cyl), data=mtcars, geom="bar", fill=factor(cyl))
qplot(factor(cyl), data=mtcars, geom="bar", colour=factor(cyl))
# fill by variable
qplot(factor(cyl), data=mtcars, geom="bar", fill=factor(gear))
# use different display of bars (stacked, dodged, identity)
head(diamonds)
qplot(clarity, data=diamonds, geom="bar", fill=cut, position="stack")
qplot(clarity, data=diamonds, geom="bar", fill=cut, position="dodge")
qplot(clarity, data=diamonds, geom="bar", fill=cut, position="fill")
qplot(clarity, data=diamonds, geom="bar", fill=cut, position="identity")
qplot(clarity, data=diamonds, geom="freqpoly", group=cut, colour=cut, position="identity")
qplot(clarity, data=diamonds, geom="freqpoly", group=cut, colour=cut, position="stack")
Quick Introduction to ggplot2
here
This is a bare-bones introduction to ggplot2, a visualization package in R. It assumes no knowledge of R.
Preview
Let’s start with a preview of what ggplot2 can do.
Given Fisher’s iris data set and one simple command…
## R Codes
head(iris)
library(ggplot2)
qplot(Sepal.Length, Petal.Length, data = iris, color = Species)
…we can produce this plot of sepal length vs. petal length, colored by species.
R Basics
Vectors
Vectors are a core data structure in R, and are created with
c(). Elements in a vector must be of the same type.
numbers = c(23, 13, 5, 7, 31)
names = c("edwin", "alice", "bob")
Elements are indexed starting at 1, and are accessed with
[] notation.
#indexing
numbers[1]
names[1]
Data frames
Data frames are like matrices, but with named columns of different types (similar to database tables).
books = data.frame(
title = c("harry potter", "war and peace", "lord of the rings"), # column named "title"
author = c("rowling", "tolstoy", "tolkien"),
num_pages = c("350", "875", "500")
)
You can access columns of a data frame with
$.
##column access
books$title
You can also create new columns with
$.ggplot2
install.packages("ggplot2")
library(ggplot2)
Scatterplots with qplot()
## scatterplot with qplot()
head(iris) # by default head displys first 6 rows
head(iris, n=10) # now first 10 rows
qplot(Sepal.Length, Petal.Length, data = iris)
# Plot Sepal.Length vs. Petal.Length, using data from the `iris` data frame.
# * First argument `Sepal.Length` goes on the x-axis.
# * Second argument `Petal.Length` goes on the y-axis.
# * `data = iris` means to look for this data in the `iris` data frame.
##R Code:
plot(Sepal.Length, Petal.Length, data = iris, color = Species) #dude!
#Similarly, we can let the size of each point denote sepal width, by adding a size = Sepal.Width argument.
#R Code:
qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width)
# We see that Iris setosa flowers have the narrowest petals.
#R Code:
qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width, alpha = I(0.7))
# By setting the alpha of each point to 0.7, we reduce the effects of overplotting
##Finally, let's fix the axis labels and add a title to the plot.
#R Code:
qplot(Sepal.Length, Petal.Length, data = iris, color = Species,
xlab = "Sepal Length", ylab = "Petal Length",
main = "Sepal vs. Petal Length in Fisher's Iris data")
##Other common geoms
##In the scatterplot examples above, we implicitly used a point geom, the default when you supply two arguments to qplot().
# These two invocations are equivalent.
qplot(Sepal.Length, Petal.Length, data = iris, geom = "point")
qplot(Sepal.Length, Petal.Length, data = iris)
##Barcharts: geom = "bar"
movies = data.frame(
director = c("spielberg", "spielberg", "spielberg", "jackson", "jackson"),
movie = c("jaws", "avatar", "schindler's list", "lotr", "king kong"),
minutes = c(124, 163, 195, 600, 187)
)
# Plot the number of movies each director has.
qplot(director, data = movies, geom = "bar", ylab = "# movies")
# By default, the height of each bar is simply a count.
# But we can also supply a different weight.
# Here the height of each bar is the total running time of the director's movies.
qplot(director, weight = minutes, data = movies, geom = "bar", ylab = "total length (min.)")
########### Line charts: geom = "line" ##################
qplot(Sepal.Length, Petal.Length, data = iris, geom = "line", color = Species)
# Using a line geom doesn't really make sense here, but hey.
# `Orange` is another built-in data frame that describes the growth of orange trees.
qplot(age, circumference, data = Orange, geom = "line",
colour = Tree,
main = "How does orange tree circumference vary with age?")
# We can also plot both points and lines.
qplot(age, circumference, data = Orange, geom = c("point", "line"), colour = Tree)
***
Subscribe to:
Comments (Atom)


















