Consider the scatter plot of annual income versus years of schooling, fitted with a regression line. One person with only a few years of schooling has an exceptionally high income compared to others. Such a data point that does not follow the trend and is far away from the regression line in the vertical direction is called an outlier. Quantitatively, outliers can be identified using residuals, which is the difference between the observed y-value of the data point and the y-value as predicted from the regression equation. Now, the standard deviation of the residual is calculated using its formula. As a rule of thumb, data points located at least two residual standard deviations above and below the regression line are flagged as potential outliers. In addition, data sets may also have influential points. These points are located horizontally, far away from the rest of the points. The addition or removal of the influential points significantly changes the regression line.