As part of an assignment for my quant stat course, we were told to propose an hypothesis showing a potential correlation between two quantifiable variables. I pondered for a moment to research average altitude of major cities vs number of Olympic medals in track and fields but decided to look up internet usage vs literacy rate on the African Continent.

The correlation is weak but a trend is still visible. There are clearly other factors to look into (GDP, cost of living, censorship, access to high speed internet etc...) but it is intriguing that the slope of the regression line is close to 1.

The data were taken from here for the literacy rate and here for the internet usage.

The data will probably be revisited once I learn more tools and will integrate more variables in the analysis ( in addition to the one mentioned earlier, I'd probably look into the democracy index, freedom of press..).

In the meantime, I figure I'd post the graphs in case anyone is interested.


  1. What is the degrees-of-freedom-adjusted R^2 for this fit?

  2. Hello James,

    Thank you for your interest in the data.
    The adjusted R square is 0.106 and df (ANOVA) is 1.



  3. Not sure how you would factor in cost of internet or access to internet through mobile phones, but it's a cool project.

