here
This is a bare-bones introduction to ggplot2, a visualization package in R. It assumes no knowledge of R.
Preview
Let’s start with a preview of what ggplot2 can do.
Given Fisher’s iris data set and one simple command…
## R Codes
head(iris)
library(ggplot2)
qplot(Sepal.Length, Petal.Length, data = iris, color = Species)
…we can produce this plot of sepal length vs. petal length, colored by species.
R Basics
Vectors
Vectors are a core data structure in R, and are created with
c(). Elements in a vector must be of the same type.
numbers = c(23, 13, 5, 7, 31)
names = c("edwin", "alice", "bob")
Elements are indexed starting at 1, and are accessed with
[] notation.
#indexing
numbers[1]
names[1]
Data frames
Data frames are like matrices, but with named columns of different types (similar to database tables).
books = data.frame(
title = c("harry potter", "war and peace", "lord of the rings"), # column named "title"
author = c("rowling", "tolstoy", "tolkien"),
num_pages = c("350", "875", "500")
)
You can access columns of a data frame with
$.
##column access
books$title
You can also create new columns with
$.ggplot2
install.packages("ggplot2")
library(ggplot2)
Scatterplots with qplot()
## scatterplot with qplot()
head(iris) # by default head displys first 6 rows
head(iris, n=10) # now first 10 rows
qplot(Sepal.Length, Petal.Length, data = iris)
# Plot Sepal.Length vs. Petal.Length, using data from the `iris` data frame.
# * First argument `Sepal.Length` goes on the x-axis.
# * Second argument `Petal.Length` goes on the y-axis.
# * `data = iris` means to look for this data in the `iris` data frame.
##R Code:
plot(Sepal.Length, Petal.Length, data = iris, color = Species) #dude!
#Similarly, we can let the size of each point denote sepal width, by adding a size = Sepal.Width argument.
#R Code:
qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width)
# We see that Iris setosa flowers have the narrowest petals.
#R Code:
qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width, alpha = I(0.7))
# By setting the alpha of each point to 0.7, we reduce the effects of overplotting
##Finally, let's fix the axis labels and add a title to the plot.
#R Code:
qplot(Sepal.Length, Petal.Length, data = iris, color = Species,
xlab = "Sepal Length", ylab = "Petal Length",
main = "Sepal vs. Petal Length in Fisher's Iris data")
##Other common geoms
##In the scatterplot examples above, we implicitly used a point geom, the default when you supply two arguments to qplot().
# These two invocations are equivalent.
qplot(Sepal.Length, Petal.Length, data = iris, geom = "point")
qplot(Sepal.Length, Petal.Length, data = iris)
##Barcharts: geom = "bar"
movies = data.frame(
director = c("spielberg", "spielberg", "spielberg", "jackson", "jackson"),
movie = c("jaws", "avatar", "schindler's list", "lotr", "king kong"),
minutes = c(124, 163, 195, 600, 187)
)
# Plot the number of movies each director has.
qplot(director, data = movies, geom = "bar", ylab = "# movies")
# By default, the height of each bar is simply a count.
# But we can also supply a different weight.
# Here the height of each bar is the total running time of the director's movies.
qplot(director, weight = minutes, data = movies, geom = "bar", ylab = "total length (min.)")
########### Line charts: geom = "line" ##################
qplot(Sepal.Length, Petal.Length, data = iris, geom = "line", color = Species)
# Using a line geom doesn't really make sense here, but hey.
# `Orange` is another built-in data frame that describes the growth of orange trees.
qplot(age, circumference, data = Orange, geom = "line",
colour = Tree,
main = "How does orange tree circumference vary with age?")
# We can also plot both points and lines.
qplot(age, circumference, data = Orange, geom = c("point", "line"), colour = Tree)
***











No comments:
Post a Comment