By- Sudeshna Datta
In recent times, Big Data analytics has become an integral part of market research. It involves managing large quantities of data, refining and examining it to identify, uncover and predict details about individual customer preferences, information search patterns, consumer purchase behaviour, market trends and other relevant business information. Though there has been a lot of excitement around Big Data and the potential it holds for making businesses efficient, the significant risks surrounding it in the longer run shouldn’t be overlooked. Big Data can be exposed to risks such as unreliable sources of data, inaccurately analysed data and misinterpretation of data leading to imprecise results putting a deleterious impact on businesses and also posing a threat to security and privacy of data.
In the arena of Big Data, privacy of information may emerge from the use of ‘identifiable information blocks’ that are used to determine an individual’s identity by putting together bits of information through tracking and storing GPS data, web search history, online financial transaction details and other online activities. Some of the most critical issues relating to Big Data are:
- Determining what kind of information can be shared and with whom
- Verifying transmission of cyber information without the threat of leakage
- Accurately analysing the available data through the right algorithms and models.
While experts may argue that Big Data presents vast opportunities for enhancing business performance which are generally undermined, the advocates of privacy are highly apprehensive about the scope for misconduct and manipulation during data collection. The biggest concern is not about the profile data, but the personal data of individuals that can be altered by analytic engines to produce erroneous results.
There are many other issues while handling large volumes of data. Big Data analytics faces the risk of making faulty assumptions due to presence of several variables within the data and of making inaccurate correlations between variables and presenting unreliable patterns that may not exist.
There are two serious limitations to analysis of huge data sets:
- Overestimation of the predictive capabilities of analytics
- Misrepresentation and manipulation of data to produce favourable results
Misinterpretation of data takes place when incorrect causal links are determined between different sets of information collected, whereas in actuality, the relation may not be coincidental. A good example of this can be Google’s Flu Trends tool, which was designed to determine flu outbreaks by analysing the searches made by Google users. While the analytics system initially provided compelling results, over a period of time, the predictions began to vastly diverge from reality. Later, it was found that the algorithms used by the analytics engine weren’t accurate enough and failed to identify significant anomalies such as the 2009 H1N1 pandemic, challenging credibility of the tool.
A common risk that most data analytics projects face is the collection of outdated/ incorrect/ irrelevant data leading to inaccurate results that are often not proven due to a lack of ‘real’ evidence. The risk that organisations face is that of falling behind their competition due to wrong decision making owing to incorrect interpretation of data, inaccurate insights, thus failing to provide customer value and differentiation.
Over-dependence on data as a base for decision making can be a risky proposition, especially for consumer-driven business and research organisations. Measures for protection of data, application of principles and codes for fair practices will promote responsible data management and protect the information of individuals.
The writer is co-founder and executive vice-president, Absolutdata