R - Linear Regression
In statistics, linear regression is a regression analysis that models the relationship between a dependent variable and one or more independent variables using the least squares function, known as the linear regression equation.
Simply put, it is a statistical method used to determine the quantitative relationship of mutual dependence between two or more variables.
In regression analysis, if there is only one independent variable and one dependent variable, and their relationship can be approximated by a straight line, this type of regression analysis is called simple linear regression analysis. If the regression analysis includes two or more independent variables and the relationship between the dependent variable and the independent variables is linear, it is called multiple linear regression analysis.
The mathematical equation for simple linear regression analysis:
y = ax + b
- y is the value of the dependent variable.
- x is the value of the independent variable.
- a and b are the parameters of the simple linear regression equation.
Next, we can create a predictive model for human height and weight:
- Collect sample data: height and weight.
- Use the lm() function to create a relationship model.
- Find the coefficients from the created model and create a mathematical equation.
- Get a summary of the relationship model to understand the average error, i.e., residuals (the difference between estimated and actual values).
- Use the predict() function to predict a person's weight.
Prepare Data
The following are height and weight data for individuals:
# Height, in cm
151, 174, 138, 186, 128, 136, 179, 163, 152, 131
# Weight, in kg
63, 81, 56, 91, 47, 57, 76, 72, 62, 48
lm() Function
In R, you can perform linear regression using the lm() function.
The lm() function is used to create a relationship model between independent and dependent variables.
The syntax for the lm() function is as follows:
lm(formula, data)
Parameter descriptions:
- formula - A symbolic formula indicating the relationship between x and y.
- data - The application data.
Create a relationship model and get the coefficients:
Example
# Sample data
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Submit to lm() function
relation <- lm(y ~ x)
print(relation)
Executing the above code outputs:
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-38.4551 0.6746
Use the summary() function to get a summary of the relationship model:
Example
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Submit to lm() function
relation <- lm(y ~ x)
print(summary(relation))
Executing the above code outputs:
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-6.3002 -1.6629 0.0412 1.8944 3.9775
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -38.45509 8.04901 -4.778 0.00139 **
x 0.67461 0.05191 12.997 1.16e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.253 on 8 degrees of freedom
Multiple R-squared: 0.9548, Adjusted R-squared: 0.9491
F-statistic: 168.9 on 1 and 8 DF, p-value: 1.164e-06
predict() Function
The predict() function is used to predict values based on the model we have established.
The syntax for the predict() function is as follows:
predict(object, newdata)
Parameter descriptions:
- object - The formula created by the lm() function.
- newdata - The value to predict.
The following example predicts a new weight value:
Example
# Sample data
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
Submitting to the lm() function
relation <- lm(y~x)
Predicting the weight for a height of 170 cm
a <- data.frame(x = 170) result <- predict(relation, a) print(result)
Executing the above code outputs:
1 76.22869
We can also generate a chart:
## Example
Sample data
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) relation <- lm(y~x)
Generate png image
png(file = "linearregression.png")
Generate chart
plot(y, x, col = "blue", main = "Height & Weight Regression", abline(lm(x~y)), cex = 1.3, pch = 16, xlab = "Weight in Kg", ylab = "Height in cm") ```
The chart is as follows: