Tuesday, December 3, 2019

The Simple Regression Model free essay sample

The simple regression model (SRM) is model for association in the population between an explanatory variable X and response Y. The SRM states that these averages align on a line with intercept ? 0 and slope ? 1:  µy|x = E(Y|X = x) = ? 0 + ? 1x Deviation from the Mean The deviation of observed responses around the conditional means  µy|x are called errors (? ). The error’s equation: ? = y  µy|x Errors can be positive or negative, depending on whether data lie above (positive) or below the conditional means (negative). Because the errors are not observed, the SRM makes three assumptions about them: * Independent. The error for one observation is independent of the error for any other observation. * Equal variance. All errors have the same variance, Var(? ) = 2. * Normal. The errors are normally distributed. If these assumptions hold, then the collection of all possible errors forms a normal population with mean 0 and variance 2, abbreviated ? N (0, 2). Simple Regression Model (SRM) observed values of the response Y are linearly related to values of the explanatory variable X by the equation: y = ? 0 + ? 1x + ? , ? N (0, 2) The observations: 1. re independent of one another, 2. have equal variance 2 around the regression line, and 3. are normally distributed around the regression line. 21. 2 Conditions for the SRM ( Simple Regression Model ) Instead of checking for random residual variation, we have three specific conditions. Checklist for the simple regression model * Is the association between y and x linear? * Have we ruled out obvious lurking variables? Errors appears to be a sample from a normal population. | * Are the errors evidently independent? * Are the variances of the residuals similar? Are the residuals nearly normal? The estimated standard error of b1 substitutes the sample standard deviation of the residuals Se for the standard deviation of the errors , Se (b1) = sen-1 x 1Sx ? Sen x 1Sx The residual standard deviation sits in the numerators of the expression for se (b1). Since the regression line estimates the conditional mean of Y given X, it is the residuals that measure the variation around the mean in regression analysis. The sample size n is again in the dominator. The larger the sample grows, the more information we have and the more precise the estimate of the slope becomes. Confidence Intervals T = b1- ? 1se(b1) If the errors ? is normally distributed or satisfies the CLT condition, then sampling distribution b1 is approximately normal. Since we substitute Se for to calculate the standard error, we use a t-distribution for inference. The 95% confidence interval for the slope ? 1 in the simple regression model is the interval [b1 – t0. 025,n-2 x se(b1), b1 + t0. 025,n-2 x se(b1)] The 95% confidence interval for the intercept ? 0 is [b0 – t0. 025,n-2 x se(b0), b0 + t0. 025,n-2 x se(b0)] Hypothesis Test Equivalent Inferences for the SRM We reject the claim that a parameter in the SRM (? 0 or ? 1) equals zero with 95% confidence (or a 5% chance of a Type I error) if a. Zero lies outside the 95% confidence interval for the parameter; b. The absolute value of the associated t-statistic is larger than t0. 025,n-2 ? 2; or c. The p-value reported with the t-statistic is less than 0. 05. Regression is often used to predict the respons for new, unobserved cases. In this case, the explanatory variables (xnew) is known but the response (ynew) is unknown. To solve this case, the SRM provides a framework that predict ynew and anticipates the accurancy of this prediction. The SRM models implies that ynew is determined by the equation : Ynew = ? 0 + ? 1 xnew + ? new (? new is a random error term that represent the influences of other factors on the new observation). Prediction interval is an interval designed to hold a fraction (usually 95%) of the values of the response for a given value x of the explanatory variable in a regression. The 95% prediction interval for the response ynew in the SRM is the interval [ynew – t0. 025,n-2 se (ynew), ynew – t0. 25+2 se (ynew)] se ynew=se 1+1n+xnew- x2n-1sx2 the standard error of a prediction se (ynew) is tedious to calculate because it adjust for the position of xnew relative to the observed data. The fathers xnew is from x, the longer the prediction interval becomes. Reliability of Prediction Intervals A prediction interval measures the accuracy of predictions of new observations. Provided the SRM (Simple Regression Model) holds, the approximate 95% prediction interval for an observation at x is ynew  ± 2se. Prediction intervals are reliable within the range of observed data, the region in which the approximate interval [y 2se, y + 2se ] holds. Prediction intervals are also sensitive to the assumptions of constant variance and normality. If the variance of the errors around the regression line increases with the size of the prediction, then the prediction intervals will be to narrow for large items. So, before using prediction intervals, verify that residuals of the regression are nearly normal. The method: * Identify x and y * Link b0 and b1 to problem * Describe data. It can be linear, no obvious lurking variable, evidently independent, similar variances, or nearly normal * Check linear condition and lurking variable condition

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.