Data ecosystem needs an overhaul

The data needs of a dynamic and young nation like India call for urgent expansion and upgrade in the quantity and quality of data generated

Further, a common framework will also enable harmonisation of regulation of data fiduciaries and data custodians—both of which are usually the same entity.
Further, a common framework will also enable harmonisation of regulation of data fiduciaries and data custodians—both of which are usually the same entity.

By DPS Negi & Sumit Kumar
Development process is intrinsically data-driven. Data which are reliable, timely and at disaggregated levels are quintessential for evidence-based policymaking, fuelling faster economic growth and inclusive development. However, interdependencies between data and development, in recent times, have been exposed to various confounding stressors, both contextually-rejuvenated traditional as well as emerging ones. Ever-evolving and dynamic development imperatives for a developing country like India require nothing less than systemic overhauling in data generation, dissemination and consumption.

The optimising role of data for national development, under conditions of resource scarcity, is well understood and appreciated. Data, in such societies, are presumed to play dual role of spurring and sustaining the development process, along with slotting in necessary corrective interventions, mostly in the form of welfare schemes in education, health, labour, employment and other related socioeconomic spheres.

As desired levels of development are still distant—reflected by yawning gaps in achieving development goals first under the MDGs and now under the SDGs—data needs of development, in addition to traditional ones, got exposed to at least two major disruptions.

The first disruption revamped the development discourse itself with considerations of sustainability and environment inalienably embedded into development conceptions. Additionally, the role of data in fostering resilient societies is found to considerably mitigate the destructive impacts of disasters which are accentuating in both frequency and intensity.

The response to the Covid-19 pandemic, the latest case in point, has essentially been data-centric, from determining adequate levels of testing, changing positivity rates and their implication to monitoring of fatality rates. The deployment of advanced predictive modelling helped avoid systemic failures arising out of mismatch between sudden swellings in the number of infected persons against the maximum capacity of the health systems to cope with.

The second disruption emanated as a ripple effect of technological revolution. Technology has drastically transformed the processes of data collection, processing, dissemination and analysis. However, the uneven diffusion of technology across sections of society has produced a big underbelly in the form of digital haves and have-nots. The implications of such digital divide are reflected in terms of massive differences in the modes of data generation and consequently the quantum of data availability.

With the advent of artificial intelligence and advanced machine learning algorithms, data for digitally-integrated sections of society can be accessed from many unconventional sources like social media platforms and e-commerce platforms very quickly and at much disaggregated levels. The data, gleaned from Facebook (Covid-19 symptom surveys, population density maps, movement range maps, etc) and through telephonic and email-based surveys, were instrumental in containing the adverse effects of the Covid-19 pandemic.

At the same time, data collection is still confined to traditional paper-pencil interviewing (PAPI) for digitally-laggard sections of society. Such data suffer gross inefficiencies and time lag. Consequently, data needs of development interventions to enable, strengthen and empower the digital have-nots are partially fulfilled and urgently necessitate integration of technology along with widening of scope.

The deployment of technology ushers in three distinct advantages. Firstly, reduction in non-sampling errors leads to better quality of data. The non-response and partial response rates come down due to less stress in the questionnaire canvassing exercise on both the field investigators and respondents. Besides, the entire data collection exercise can be monitored in real time and prompt corrections are inserted wherever and whenever required. The possibilities of wrong entries and other human errors are effectively ruled out.

Secondly, the burgeoning concerns with respect to data confidentiality and privacy can be tackled at an initial stage with the help of advanced encryption technologies.

Thirdly, the entire process of data generation and dissemination is quickened as an entire stage of data entry is bypassed. Recently, the government planned five all-India surveys. Four of these surveys focus on creating a database for migrant workers, domestic workers and the employment generated in the transport sector and by the professionals. The data for these segments are either non-existent or inconsistent at best.

The entire exercise of data collection is technology-driven through the deployment of handheld devices to collect responses, to integration of emerging technologies like geo-fencing. However, data generation goes far beyond the mere existence of robust statistical systems and an enabling technological infrastructure.

The vital importance of skilled and trained field investigators cannot be ruled out. While planning for data generation, the capability of respondents to provide the required information puts the ultimate limit which can be circumvented only through skilled and trained field staff. The respondents in the informal sector, where about 90% of India’s workforce is concentrated, exhibit ostensible limitations especially for recall-based questions due to non-maintenance of formal records and low levels of education.

These surveys, therefore, mark a beginning of systemic overhauling where the data for sectors, partially covered or uncovered, are to be generated through deployment of advanced technology and skilled manpower. Additionally, the entire exercise reflects the awareness, intent and commitment at the top-most level of policymaking to address crippling issues for the benefit of the nation and laying strong foundations for a new India.

Therefore, the data needs of a dynamic and young nation like India call for urgent expansion and upgrade in the quantity and quality of data generated. However, the systemic overhauling must be aligned with the systemic constraints (the capacity of field investigators and respondents) in innovative ways.

Negi is chief labour commissioner and director general, and Kumar is a subject matter expert, Labour Bureau, Ministry of Labour and Employment. Views are personal

Get live Share Market updates and latest India News and business news on Financial Express. Download Financial Express App for latest business news.

First published on: 09-03-2021 at 06:10 IST