Data Science: An Overview and Its Applications
Gain Insights from both structured and unstructured data
Data science, the field that is now, emerged with the beginning of the internet era when the big data, as it came to be known later to describe the massive amount of data generated on the internet, had to be managed, processed and have knowledge extracted out of it. Data science is the application of scientific and systematic methods, algorithms, and systems to gain insights from structured, semi-structured or unstructured data, not necessarily big data (See Figure below).
“Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyse” –Mckinsey Global Institute (Manyika 2011)
Insight Making
Data science, according to some, is a glammed-up slang for applied statistics; this is because the primary purpose of data science, which is to extract insights, involves the application of the statistical principle. While this may sound like a reasonable analogy, this is like comparing a Formula 1 driver as a chauffeur just because both of them drive cars. Data science is not merely an elaboration of statistics for its end is to gain insights for a specific purpose.
The emergence of data science as a distinct field from statistics can be traced by looking at the history of big data. As early as the end of the last century, when the internet era was beginning to take off, scientists identified the necessity to manage massive datasets that would arise with the prevalence of internet usage. With the dissemination of websites and social networks, large data sets emerged as predicted and the task of effectively managing the big data was eventually achieved around 2010.
“Big data is a massive collection of shareable data originating from any kind of private or public digital sources, which represents on its own a source for ongoing discovery, analysis, and Business Intelligence and Forecasting”, according to Banica et al. (2014).
The massive data generated contains many information and insights that will be valuable to various organizations and institutions — the procedure of organising and processing data to extract information forms the core tenant behind data science. If big data is the body, the data science is the operation procedure applied to it. But data science isn’t necessarily confined only with big data. Data science can be applied to any set of data.
“The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.” – John Turkey
Science can be defined as an investigation to provide answers or explanation for an issue using systematic methods. Data science serves a specific purpose of answering defined questions and hence in the term ‘data science’, the science part is significant and implies the usage of systematic methods to provide conclusions.
In 2012, Harvard business review called it the ‘sexiest job of the 21st century’ and thus generating a buzz and paving the way for a kind of allure to the job that has stuck since. Data science has found its applications in a variety of fields, from sports to astrophysics.
Applications and Concerns:
Various organizations began to look at data for insights as a means for multiple ends. Social network sights provided the scope for analytics and tools thus began to emerge to gain insights from the massive unstructured data.
Lately, all tech giants are involved in data science, developing their algorithm and systems to gain insights, some of which are some of the most robust systems built for this purpose. If your Google search can tell about your interests, your Facebook likes will pinpoint them. With billions of users using these sites, all their data is stored and sorted to provide information that is used for various ends by businesses to governments.
From stirring privacy issues to enabling prediction of an epidemic outbreak, opinion on large scale application of data science is divided.
In 2016, a group of British scientists from Queen Mary University released a paper that used data science on geographical profiling to reveal the identity of the anonymous political graffiti artist Banksy. The paper, although did not disclose the identity of Banksy explicitly, it narrowed his identity to a previously suspected face behind the anonymity, causing an uproar concerning privacy issues.
To rationalize, that is a minor issue compared to the global scourge of government profiling of its citizens worldwide using their internet activity, especially in the post-Snowden times when the paranoia was revealed to be true. If these can be considered as the ill effects of applications of data science in the wrong hands or a means for a secure world, depending on whom you ask, scientists in the early weeks of April released the first-ever image of a black hole, as a result of 3 years research. The image confirmed the validity of Einstein’s general theory of relativity and put humanity a step further in its progress. This was achieved as a result of a data science algorithm developed by an MIT grad in 2016.
For more on data science, watch out this space.
Links, References, Related Posts
– Choosing The Right Data Collection Methods For A Successful Thesis
– Why Tutors India for Econometric and Financial Statistical Analysis Support Services
– The Importance Of Having Biostatisticians On Every Clinical Research
– Why Tutors India for Statistical Support Services – Our Statistics Mentors