Book review: The Numerati
I finished reading The Numerati last night and I thought I would share my thought on the book.
My Rating: 6 / 10
The book was informative and interesting and was written around the central theme of mathematically and statistically deriving meaning from various types of data. Some of the data presented in the book are currently available – at least to some companies – while other types of data are slowly being accumulated for future use.
The book is divided in 7 main sections, each briefly presented below. The fundamental objective for deriving predictive models is ultimately to offer us (consumers, patients, individuals) more of what we are looking for. Although this doesn’t sound good at first, the book demonstrates that this objective is not as bad as it sounds.
- As Internet users, we leave vast amounts of data behind as we browse web sites. Companies are heavily analyzing that data in order to develop predictive models for advertising, among other things. Their intend is to develop algorithms to present us with the right messages so we feel compelled to investigate further and potentially purchase.
- As shoppers we sometime un-knowingly share our purchasing preference when we use credit cards or reward cards. Ideally, companies want to understand consumer patterns and offer us items that fit well with our needs and interests.
- As voters, we are being segmented in various groups so political parties can target their messages to obtain our vote. The groups to which we belong aren’t always simple to determine and most people belong to more than one group. Based on that information, how do political parties take advantage of that information to craft the right message to get more votes.
- As bloggers, we may thing we are immune to profiling but this isn’t the case. Tools and algorithms are being developed to determine some key information about the bloggers. In addition to basic information such as age group and gender, companies are trying to analyze bloggers preferences for certain products, services, ideas and so on. Although this might be easy to determine in some specific cases, this is a challenging undertaking when the bloggers are not specifically blogging about those specific products or services.
- Terrorism is also an important area of interest. With events in the recent years, governments have a strong incentive to profile people and groups of people to assess if they are part of terrorist cells.
- As the population ages and the cost of health care greatly increases, companies are trying to build sophisticated databases of health related information and hopefully develop predictive models to help people self-diagnose their illness or better yet potentially anticipate certain diseases.
- Our love life is also being put under analysis. Companies are building algorithms in order to determine the compatibility of people in order to build the perfect couples.
Having worked for a company who heavily used raw data to develop predictive models for pharmaceutical companies, I did enjoy and relate to the content of the book. Each chapter presented a different perspective on the theme of obtaining, aggregating and deriving meaning out of data. In many cases, the results are impressive.
The book fell short of my expectations because it mostly focused on facts – what companies are doing today and in the near future – and much less at giving an overall perspective of the world once all that data becomes available and models are actively used. It might simply be because I understood too well the potential use of all that data.
If you want to understand practical applications of data mining, this book will meet your expectations.