Wednesday, June 11, 2014

Big Data for Knowledge Workers, Information Professionals, Businesses, and Just About Everyone Else

The emergence of "Big data" as a field of study is a new phenomenon - one which few comprehends immediately despite a vague notion of what it stands for.  As a blanket term, big data encompasses data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a period of time.   Of course, the challenge is heavy: we need capture, curation, storage, search, sharing, transfer, analysis and visualization of the data.

Academic libraries have only begun to open the pandora's box of creating data curation programs and initiatives to support universities.  Data curation is a term used to indicate management activities required to maintain research data long-term such that it is available for reuse and preservation. In science, data curation may indicate the process of extraction of important information from scientific texts, such as research articles by experts, to be converted into an electronic format.

Wired Magazine's Chris Anderson, as far back as 2008, boldly declared for the end of theory as we know it because big data can predict the future.  A flurry of books have come out the past few years about big data - all with different approaches and viewpoints of it - but all coming to similar conclusions about its transformative approaches.  With that said, there are some excellent published books out there which serve as an excellent prime into this important area of study which not only academics would need to learn about, but also businesses and knowledge workers that deal with the increasingly vast amounts of data captured everyday.

Nate Silver's The Signal and the Noise is one of the earliest tomes that explained in laymen's language the gift and power of curating and using big data in solving problems. Now a legend, Silver first became known for building a unique system called PECOTA a system for forecasting the performance and career development of Major League Baseball players. In addition, Silver's accuracy of the November 2008 presidential election predictions—he correctly predicted the winner of 49 of the 50 states as well all 35 U.S. Senate races that year—won him further attention and notoriety.

A fascinating read, the book draws on Silver's groundbreaking work, Silver examines the world of prediction, investigating how we can distinguish a true signal from a universe of noisy data.

The other book worth looking at is The Naked Future: What Happens in a World That Anticipates Your Every Move?.  Patrick Tucker, Editor of Futurologist, offers a stunning yet disturbing look into how computer-aided forecasting using big data is positioned for rapid growth over the next decade. The rise of big data will enable us to predict not only events like earthquakes or epidemics, but also individual human behaviors.  We already live in such a world when we run a Google search and our results are often personalized without our even knowing it.  In the future, an app on your phone knows you're getting married before you do. Your friends' tweets can help data scientists predict your location with astounding accuracy, even if you don't use Twitter. Soon, we'll be able to know how many kids in a kindergarten class will catch a cold once the first one gets sick.  This is a hauntingly beautiful gaze into what the future of big data will mean for everyone's future.

Nassim Nicholas Taleb's writing really caught me offguard when I had first come across it but helped me appreciate big data within the randomness of this world.  The Black Swan: Second Edition: The Impact of the Highly Improbable: With a new section: "On Robustness and Fragility" popularized the way in how we understand the influence of highly improbable and unpredictable events that have massive impact, concisely chiseled for a non-academic readership. The book focuses on the extreme impact of certain kinds of rare and unpredictable events (called outliers) and our humans tendency to find simplistic explanations for these events retrospectively. This theory has since become known as the black swan theory, and covers both science and the arts guiding us from a tour of literary subjects in the beginning to scientific and mathematical subjects in the later portions.

MIT Professor Sandy Pentland has often been seen as one of the foremost thinkers and researchers of the area of big data.  Over years of groundbreaking experiments, he distilled remarkable discoveries significant enough to become the bedrock of a whole new scientific field: social physics. Humans have more in common with bees than we like to admit: We’re social creatures first and foremost. Our most important habits of action—and most basic notions of common sense—are wired into us through our coordination in social groups. Social physics is about idea flow, the way human social networks spread ideas and transform those ideas into behaviors.

Ultimately, Social Physics: How Good Ideas Spread—The Lessons from a New Science looks at organizational and human behaviours from a physics standpoint.   Pentland's book questions how can we create organizations and governments that are cooperative, productive, and creative focusing on the engine that drives social physics is big data.  Pentland argues that this newly ubiquitous digital data that is becoming available all around us can help us understand and predict almost all facets of human life. By using these data to build a predictive, computational theory of human behavior we can hope to engineer better social systems.

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die takes us (and big data) on a whirlwind tour of the subject disciplines, and introduces us to the notion of a new term called predictive analytics, which encompasses a variety of statistical techniques from modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events.  For instance, it delves into the relatively new idea of "persuasion modeling," which predicts influence in order to do influence. Barack Obama's campaign used it to influence voters in the 2012 presidential election; marketing uses it to more adeptly persuade customers; and medicine uses it to better select per-patient treatments.

Intriguingly, the book moves beyond forecasting to a very granular level, contesting that while Nate Silver made election forecasts for each state as a whole, the Obama campaign was using predictive analytics to make per-voter prediction. Consequently, true power comes in influencing the future rather than speculating on it--the raison d'être of predictive analytics.  As author Eric Siegel argues, while Nate Silver publicly competed to win election forecasting, Obama's analytics team had quietly competed to win the election itself.