

If the factor is unordered, then the levels will still appear in some order, but the specific order of the levels matters only for convenience (pen, pencil, brush) – it will determine, for example, how output will be printed, or the arrangement of items on a graph. If the factor is ordered, then the specific order of the levels matters (small < medium < large).

For most analyses, it will not matter whether a factor is ordered or unordered. Regions <- c( "East", "West", "South", "North") factor_regions <- factor(regions) levels(factor_regions) # "East" "North" "South" "West"įactors in R come in two varieties: ordered and unordered, e.g., small, medium, large and pen, brush, pencil. We then print this vector to see the values it takes and confirm whether the levels are factor type data. They are useful in data analysis for statistical modelling.įactors are created using the factor () function by taking a vector as input.Įxercise: Let’s create a vector called regions and the observations in this vector are: "East","West","South","North". Like Employed/Unemployed and True/False etc. They are useful in the columns which have a limited number of unique values. They can store both strings (texts) and integers. We use them to categorise the data and store it as levels. They take on a limited number of different values such variables are often referred to as categorical variables. You can think of factors as special character vectors with some nice additional functions. (x + y) /z > 0 # TRUE z /x # 2 is.integer( 2) # FALSE is.integer(2L) # TRUE is.integer(z /x) # By default, R stores this as numerical # FALSE is.integer( as.integer(z /x)) # We specify z/x to be saved as integer and then test if it is an integer # TRUE x * y >= z # TRUEĪnother type of data in R is factors. Let’s create some variables: a<-2.5, b=3, c=“hello.” We can check the type of a variable simply by using class() or typeof(). Note: In R, integers are subset of numericals.

It is the default numerical type when you work with a number in R. May be used for all scales of measurement, but is particularly suited to ratio scale measurements. Can be used mainly for ordinal or interval data, but may be used as ratio data-such as counts-with some caution. Integer - integer numerical values, without any decimal point. Another type of data type similar to character specific for categorical data is called Factor which we will explore soon. Logical - boolean values of TRUE and FALSE.Ĭharacter - simple character strings (e.g.,words and sentences). The ones that you will be likely to use in this course are In R different names are used to identify data types. There are different types of data such as discrete, continuous, nominal, and ordinal to describe quantitative and qualitative measurements.

5.3.3 Scenario 2: Numerical and categorical explanatory variables.5.3.1 Scenario 1: Numerical explanatory variables.4.2 Sample Size and Sampling Distribution.3.3.6 Transformations and Stats (Advanced optional topic).3.2.6 Annotations, Reference Lines, and Legends.3.2.2 Histograms, Boxplots and Density Plots.University of Stirling - Statistics with R.
