Tutorial 4 - normality and log transformations


Data Handling & Analysis



Andrew Jackson a.jackson@tcd.ie

Making assumptions

Each group is normally distributed

The residuals off the line are normally distributed

Distributions are where numbers come from

• The binomial distribution tells us how systems like a coin toss behave

• It tells us how many events are likely to occur given repeated attempts

• The event has a fixed probability of occurring each time

0 1 2 3 4 5 6 7 8 9 10

Number of Heads

The normal distribution

• Normal or Gaussian distribution

• “the bell shaped curve”

• Defined by mean and a variance (or standard deviation)

• The PDF or Probability

Density Function of the normal distribution is shown right

Origins of the Normal Distribution

• Assume that an individual’s weight or height

(or whatever we are measuring) is affected by thousands of small +/- effects such as genes or environment

• Add those effects up for each individual, and lo and behold…

• The character will display a normal distribution

Return to our brain/body data

• We need to test whether each group is normally distributed

• Equivalent to asking if the residuals are normally distributed

• Residuals are the difference between an observed value and its predicted value

– Which is the mean value in each group in this case

Exploring Residuals from boxplots

A simple histogram A Q-Q plot (quantile-quantile)

Return to our scatter plot

• We need to test whether our residuals off the line are normally distributed

• Also need to check that there is no trend in the deviation of the residuals along the line

Exploring residuals from scatter plot

Histogram of residuals Q-Q plot of residuals

Testing for a trend in the data

What to do if residuals are not normal?

• Transforming the data is often the solution

• Taking the log of the response variable (y) is first port of call

– For scatter plot type data, can also take the log of the explanatory (x) variable

– We will do this next time we meet
