The Web, Statistics and Predicting the Future with Big Data
Apr 9, 2013
The Internet has developed in an organic fashion to become a series of related subsystems of information. This series of relationships is mapped by web scientists via statistical methods which improve with the amount of data gathered. Finding things on the web and finding things in human systems have a something in common – complexity and semi-predictable distribution. Interestingly, the web science which seeks to find and document information relationships is now being used to find these human things as well.
The spread and linkage of web content follows patterns described by Markov chains. Perhaps one of the best known examples is Google PageRank. New and interesting applications of this method are being used for the analysis of other complex and distributed biological and social phenomena. Recently, these techniques are being used for medical science to predict spread of illness and to find the source of rumors on Twitter. Both are useful in their own way.
So what is needed as input to run these algorithms? Data. The better the data on what has happened, the better the data on what could happen. As data scientists are able to provide more and more accurate descriptions of what does happen, the statistical scientists can provide better and better predictions of possibilities.
With the cross pollination of web science, data science and analytical tools, there is the possibility of taking what we know and making it predict what might be. Here are a few interesting examples.