# R – Statistics

Statistics is a form of mathematical analysis that concerns the collection, organization, analysis, interpretation, and presentation of data. The statistical analysis helps to make the best usage of the vast data available and improving the efficiency of solutions.

R is a programming language and is used for environment statistical computing and graphics. The following is an introduction to basic statistical concepts like plotting graphs such as bar charts, pie charts, Histograms, and boxplots.

In this post, we will be learning about plotting charts for a single variable. The following software is required to learn and implement statistics in R:

- R software
- RStudio IDE

#### Functions for plotting graphs in Statistics

Following is a list of functions that are required to plot graphs for the representation of Statistical data:

**plot() Function:**

This function is used to Draw a scatter plot with axes and titles.**Syntax:**plot(x, y = NULL, ylim = NULL, xlim = NULL, type = "b"....)

**data() function:**

This function is used to load specified data sets.**Syntax:**data(list = character(), lib.loc = NULL, package = NULL.....)

**table() Function:**

the table function is used to build a contingency table of the counts at each combination of factor levels.table(x, row.names = NULL, ...)

**barplot() Function:**

It creates a bar plot with vertical/horizontal bars.**Syntax:**barplot(height, width = 1, names.arg = NULL, space = NULL...)

**pie() Function:**

This function is used to create a pie chart.**Syntax:**pie(x, labels = names(x), radius = 0.6, edges = 100, clockwise = TRUE ...)

**hist() Function:**

The functioncreates a histogram of the given data values.`hist()`

**Syntax:**hist(x, breaks = "Sturges", probability = !freq, freq = NULL,...)

Note:You can find the information about each function using the “?” symbol

before the beginning of each function.

R built-in datasets are very useful to start with and developing skills, So we will be using a few Built-in datasets.

Let’s start by creating a simple bar chart by using chickwts dataset and learn how to use datasets and few functions of RStudio.

#### Bar charts

A Bar chart represents categorical data with rectangular bars where the bars can be plotted vertically or horizontally.

`# ? is used before a function` `# to get help on that function` `?plot ` `?chickwts ` `data(chickwts) ` `#loading data into workspace` `plot(chickwts$feed) ` `# plot feed from chickwts` |

In the above code ‘?’ in front of a particular function means that it gives information about that function with its syntax. In R ‘#’ is used for commenting single line and there is no multiline comment in R. Here we are using ** chickwts** as the dataset and feed is the attribute in the dataset.

**Output:**

`feeds` `=` `table(chickwts$feed)` ` ` `# plots graph in decreasing order` `barplot(feeds[order(feeds, decreasing` `=` `TRUE)]) ` |

**Output:**

`feeds ` `=` `table(chickwts$feed) ` ` ` `# outside margins bottom, left, top, right. ` `par(oma` `=` `c(` `1` `, ` `1` `, ` `1` `, ` `1` `)) ` `par(mar` `=` `c(` `4` `, ` `5` `, ` `2` `, ` `1` `)) ` ` ` `# las is used orientation of axis labels ` `barplot(feeds[order(feeds, decreasing` `=` `TRUE)] ` ` ` `# horiz is used for bars to be shown as horizontal.` `barplot(feeds[order(feeds)], horiz` `=` `TRUE, ` ` ` `# col is used for colouring bars. ` `# xlab is used to label x-axis. ` `xlab` `=` `"Number of chicks"` `, las` `=` `1` `col` `=` `"yellow"` `) ` |

**Output:**

#### Pie charts

A pie chart is a circular statistical graph that is divided into slices to show the different sizes of the data.

`data(` `"chickwts"` `)` ` ` `# main is used to create ` `# an heading for the chart` `d ` `=` `table(chickwts$feed) ` ` ` `pie(d[order(d, decreasing` `=` `TRUE)], ` ` ` `clockwise` `=` `TRUE, ` ` ` `main` `=` `"Pie Chart of feeds from chichwits"` `, )` |

**Output:**

#### Histograms

Histograms are the representation of the distribution of data(numerical or categorical). It is similar to a bar chart but it groups data in terms of ranges.

`# break is used for number of bins.` `data(lynx) ` ` ` `# lynx is a built-in dataset.` `lynx ` ` ` `# hist function is used to plot histogram.` `hist(lynx) ` `hist(lynx, ` `break` `=` `7` `, col` `=` `"green"` `,` ` ` `main` `=` `"Histogram of Annual Canadian Lynx Trappings"` `)` |

**Output :**

`data(lynx)` ` ` `# if freq=FALSE this will draw normal distribution ` `lynx ` `hist(lynx) ` `hist(lynx, ` `break` `=` `7` `, col` `=` `"green"` `,` ` ` `freq` `=` `FALSE main` `=` `"Histogram of Annual Canadian Lynx Trappings"` `)` ` ` `curve(dnorm(x, mean` `=` `mean(lynx), ` ` ` `sd` `=` `sd(lynx)), col` `=` `"red"` `, ` ` ` `lwd` `=` `2` `, add` `=` `TRUE)` |

**Output:**

#### Box Plots

Box Plot is a function for graphically depicting groups of numerical data using quartiles. It represents the distribution of data and understanding mean, median, and variance.

`# USJudgeRatings is Built-in Dataset.` `?USJudgeRatings ` ` ` `# ylim is used to specify the range.` `boxplot(USJudgeRatings$RTEN, horizontal` `=` `TRUE, ` ` ` `xlab` `=` `"Lawyers Rating"` `, notch` `=` `TRUE,` ` ` `ylim` `=` `c(` `0` `, ` `10` `), col` `=` `"pink"` `) ` |

USJudgeRating is a Build-in dataset with 6 attributes and RTEN is one of the attribute among it which is rating between 0 to 10 inclusive. We used it to for plotting a boxplot with different attributes of boxplot function.**Output:**