Uploaded by Emmye Tafesse

Assignment 1 - Econometrics

advertisement
Marinare Nasser & Aida Tafesse Belachew
Assignment 1 - Econometrics
Our results from Stata:
Marinare Nasser & Aida Tafesse Belachew
Questions on Summary statistics:
1. Create a Summary statistic table including variables “classize”, “avgmath”, “avgverb”, and
“disadvantaged”. Describe the feature of the variables, e.g., the center and the variability.
After creating the Summary statistic table we were able to calculate the following:
Median: (67,29599 + 74,37997) / 2 = 70,83798
Mean value: 29,92868 + 67,29599 + 74,37997 + 14,11639 = 185,72103 →
185,72103 / 4 = 46,4302575
Variability: classize 44 - 5 = 39
avgmath 93,93 - 27,69 = 66,24
avgverb 93,86 - 34,8 = 59,06
disadvantaged 76 - 0 = 76
Marinare Nasser & Aida Tafesse Belachew
The variability is for describing the values distribution. Our variability is not extremely high,
which means the data points are more similar and we have no extreme values. Since the
variability isn't that big it is easier to make predictions and assumptions about the data. With
classize having the least spread while disadvantage has the most.
The standard deviation is used to describe how dispersed the data is in comparison to the
mean. The higher the standard deviation is, the more it’s spread out, and if the standard
deviation is lower the closer to the mean it gets. When comparing our standard deviation
results for classize, avgmath, and avgverb with their respective mean results, we see that
the standard deviation results are not close to the respective means which indicates a high
standard deviation.
2. Using the same four variables in question 1, find the correlation for all pairs of variables.
Give a short analysis of the correlation (e.g., choose three or four pairs you think are
interesting).
The disadvantage has a negative correlation with all the variables meaning it would have a
negative slope on a graph with all the variables but significantly with avgverb at -0,6052.
Students that have a disadvantaged background are shown to have a negative effect on the
correlation. If you have that background you are more likely to have an inferior score in
avgmath and avgverb. Thus the correlation between disadvantage and avgmath is only low
to moderate; it is still negative.
Avgmath and avgverb show a significant positive correlation at 0,7807 close to 1,0000. A
graph would show an upward slope and the variables move in the same direction. The
correlation signifies that a student who is good in math will also be good with their reading
skills. In a way, it states that a student usually is good in more subjects if they are good in
one.
Questions on the Linear regression model:
1. Run a linear regression model with OLS estimation. Test heteroskedasticity for this model.
Explain whether there is a heteroskedasticity problem or not.
The result when you test for heteroskedasticity is a graph with an unequal spread of the
residuals. it will have more of a cone shape with less spread in the beginning and spreading
more the further out you get. This is caused by the variability being unequal/non-constant.
H0: α1 = α2=0
NR2 = the number of observations x r2
NR2 = 2,019 x 0,1628 = 0,3286932
Degrees of freedom (df) = 4 - 1 = 3 → Table B.7 Critical values of Chi-square with a level of
significance at 5% and degrees of freedom at 3 we get 7,815.
NR2 (0,3286932) < 7,815 and we reject the null hypothesis
Marinare Nasser & Aida Tafesse Belachew
2. Re-estimate the previous model with robust standard errors. Interpret the estimated
coefficients and the statistical significance for “classize”, “disadvantaged”, and “religious”,
respectively. In addition, interpret �2.
βs are seen to be unchanged but the standard errors are slightly higher but only by a small
fraction. T-value is almost the same but slightly lower.
When we re-estimate the model we get new values. We can see that when the other
coefficients are held constant the disadvantage coefficient has a negative value of 0.1149929. This also tells us that if classize increases disadvantages will have a negative
impact. We can also say the same about religion having a significant negative value of 3.484334.
R2 has a value between 0 and 1, with a higher value closer to 1 meaning the closer our
estimated regression equation fits the data. Our results of R2 (0,1628) is on the lower end of
values and could be a sign that our estimated regression equation doesn't fit the data. Our
R2 with a percentage of 16%, the data points would tend to fall further from the
regression line since the value isn't as high or close to 1.
Download