By Abhijit Dasgupta
Even though generating data is merely a part of the large cycle of Data Science, it fairly has as much of a role as the other vital areas of Data Science including pre-processing, analysis, and evaluation. Before we dive into the characteristics, ways of generation, and why Big Data receives the world’s attention, let us briefly focus on the history and origins of this ‘catalyst’. Framework, in 2022, discusses the events of the ancient Egyptians who processed data about their families and applies statistics to it, the history of Big Data dates back to the year 1663. In a year when the current well-developed countries were once overwhelmed with wars from foreign troops, John Graunt, a British statistician was rather overwhelmed with large amounts of data during his research on bubonic plague. Moreover, he also became the first-ever human to apply his field of expertise to find insights, something we technically coin as ‘Data Analysis’ today. Since then, the progression of Big Data generation has been evolving as much as technological and industrial advances including key events over the years.
Other than Big Data having a direct impact on the amount of unfiltered yet useful resources we have, there are certain applications where the use of Big Data specifically excels. The predominant factor of using Big Data is that it acts as a fuel to data models coded in place. The bigger the data, the better may be the models leading to the high precision of analysis and prediction. Other specific applications include personalized marketing and sentiment analysis also lesser known as ‘opinion mining’. All these are made possible by recognizing patterns of users which reveals consumer choices. For example, a flight time where all tickets are sold out speaks about the preferred time to fly. Moreover, some factual data collection suggests that 2 billion human genomes will be sequenced by 2025 and would require up to 40 exabytes in data storage. While on the topic of healthcare, there are also personalized cancer treatments that individually provide a specific treatment based on the bodily conditions which, overall, help treat most of the patients better. Looking at such vast, and advancing architectures to what big data can provide, one may wonder how such technology is handled in the first place.
In recent years some of the major challenges facing the industry / corporate are matters of great concern in a tightly integrated globalized world. The questions companies are asking to solve are both business oriented; for instance; how do I minimize my lending risk; how can I improve my revenues; can I reduce costs to increase profits without impacting my top line and so on; while deeper problems in the areas of data science revolves around significant research for problem solving in areas such as;
Scientific understanding of deep learning algorithms: while we applaud the adoption of deep learning as a way to solve various problems in machine learning / AI we still lack the understanding of how deep learning works
Machine Learning for Causal reasoning : we are all familiar with regression and correlational data and ML algorithms come in handy to solve problems in economics; social sciences and so on. However data scientists are devising new methods that incorporate the wealth of data now available to make causal inference estimation more efficient
Managing artisanal data : A good example is this type of data is the court judgment of China those have been released lately. For this kind of precious data we need to find newer algorithms to identify the knowledge
Heterogeneous data : For example, multi-scale spatiotemporal climate models simulate the underlying interference of the physical systems – and each of these could be different data sources leading to a certain phenomenon
Trustworthy AI : we have witnessed rapid adoption of AI in several fields such as ecommerce; military; healthcare and so on. Consequently there is an increasing concern if these decisions based on AI to be trusted; interpreted; moral etc. One approach is to build trust is to provide explanation of an outcome (Explainable AI)
Privacy and Ethics : these are two very important areas of current concern and pose major challenges to companies
A student of data science can perhaps enquire into these areas which are going to be significant challenges in the future. As many universities are opening up newer curriculum in data science it is worth reflecting data science as a field of study (as an independent discipline). For todays researchers; students and fellows the shape of the future would largely depend on the data science research problems one choses to enquire.
The author is director, Bachelor of Data Science (BDS) programme at SP Jain School of Global Management. Views are personal.
Also Read: FFRC bans schools from charging money for security deposit, refundable fund, admission fee