Monday, 15 April 2019

Linear Regression


Question#1
If I am to model the relationship between the mean or expected number of games won by a major-league team and the team’s batting average is x, then a straight line would be used and the slope of a line would be negative. This is because a negative slope line implies that y will decrease when x increases and vice versa. An example of a graph with negative slope is as follows:
Negative Slope

m =   =   = - 
This indicates that when x increases by 3, then y decreases instantly by 4, and when x decreases by 3, then y increases automatically by 4.
Question#2
The pattern revealed by the scattergram agrees with my answer to part a.
In order to construct a simple linear regression of the data, a linear relationship between the two variables should exist. Whilst there are a couple of ways to determine whether the linear relationship is present between the two variables or not, the best way is to create a scatterplot using SPSS in which the dependent variable can be plotted against the independent variable.

The eqaution of least squares line is ŷ= a + b x.
Question#3
This graph reveals that the least squares line fits the point on my scattergram.
Question#4
After looking at the data, I have found that the mean or expected number of games won is strongly related to a team’s batting average, as the two variables are positively related to one another and their highest values are also interlinked.
Question#5
From the regression equation, I have seen that the straight line expression is 119.86 +0.346x. It is a reliable equation as the regression F value is <0.05. In the meantime, the value of   and   are 119.86 and 0.346 respectively. These values have been obtained from the regression table.
Question#6
The equation of the least squares line for Brand A and Brand B is as follows:
y = mx + b
Here,
y = how far up
m = gradient or slope (how steep the line is)
x = how far long
b = the Y intercept (the line that crosses the Y axis)
Question#7
For the first brand:
For the second brand:
Question#8
I would like to use the least squares line to predict useful life for a given cutting speed for the second brand, as its value of y is better than the first brand’s value of y.
Question#9
The equation of the least squares line is as follows:
Question#10
After testing at α = 0.05, I have found that the straight-line model contributes information for predicting overhead costs.
Question#11
While a scatterplot allows us to check for autocorrelations, the Correlation matrix is the most effective and best assumption to be made about the random error ϵ in this problem. While computing the matrix of Pearson’s Bivariate Correlation among all independent variables, the Correlation matrix should be smaller than 1.
Question#12
The slope of the least squares line is positive as r is positive.
Question#13
The slope of the least squares line is negative as r is negative.
Question#14
If the value of r is 0, then this will indicate that there is no linear relationship between the data. It means if the value of x increases, then the value of y will also increase. In this situation, the slope of the least squares line will be 0.
Question#15
If the value of r2 is 0.64, then this will indicate that there is a positive linear relationship between the data. It means if the value of x increases, the value of y will automatically decrease. In this situation, the slope of the least squares line will be uneven.
Question#16
The correlation coefficient for both sets of data is as follows:

Coefficient, r
Strength of Association
Positive
Negative
Small
.1 to .3
-0.1 to -0.3
Medium
.3 to .5
-0.3 to -0.5
Large
.5 to 1.0
-0.5 to -1.0

Question#17
The accuracy of weigh-in-motion data is always less for the static weigh scale in which the environment is better controlled. In the absence of the correlation coefficients, it will never be possible to determine the effectiveness of the weigh-in-motion scale.
Question#18
The equation of the least squares line is as follows:
Question#19
When the test is done using α = .05, we can easily say that the data does not support this concept.
Question#20
The estimate of the intercept β0 and slope β1 is 0 and 2 respectively.
Question#21
Yes, the annual energy consumption is positively and linearly related to the shell area of the building.
Question#22
From this photo, it is evident that the observed significance level of the test of part b is 28.
Question#23
The coefficient of determination for a linear regression model with one independent variable is as follows:
R2 = { ( 1 / N ) * Σ [ (xi - x) * (yi - y) ] / (σx * σy ) }2
Here,
N = The total number of observations done as part of this model.
Σ = The summation symbol
xi = The x value for observation i
x = The mean x value
yi = The y value for observation i
y = The mean y value
σx = The standard deviation of x
σy = he standard deviation of y.
Question#24
The predicted value of energy consumption can be determined in the following way.
Here
There is a 95 percent probability, so the confidence interval of the regression line can easily be calculated using the data.
Here
The standard error of the prediction is
For the specific value x0 the prediction value will be
73.16.
This interval is useful as it helps come up with satisfactory results.
References
Nelson, P. R., Copeland, K.A.F., & Coffin, M. (2003). Introductory Statistics for Engineering Experimentation. Burlington, US: Academic Press.
Hoerl, R., & Snee, R. (2012). Wiley and SAS Business Series: Statistical Thinking: Improving Business Performance (2). Hoboken, US: Wiley.
SAGE Publications Ltd. (2017). Correlation and Regression – Pearson [Video file]. doi: 10.4135/9781526400086.
SAGE Publications Ltd. (2014). Correlation & Simple Regression [Video File].doi: 10.4135/9781473996922