Validity and reliability of statistical panel surveys, which enable researchers to study of individual behaviour change over time, are severely limited by methodological factors.
By Bikramjit Chaudhuri and Soumya Kanti Ghosh
In 1726, at age 3, Adam Smith was kidnapped by gypsies. His rescue was lucky for the world, as the discipline of economics would have been significantly different otherwise. Incidentally, the rescue is also described as lucky for the gypsies since Smith was extremely forgetful and it would have a huge burden for the gypsies to keep him captive for a long time!
Readers must be wondering why this sudden reference to Adam Smith in this article. Well, the kidnapping incident reminds us of the necessity as well as the futility of statistical surveys globally and in India. Damned if one does, damned if one does not—a necessary evil, so to say.
Statistical panel surveys involve the collection of data over time from a baseline sample of respondents. Unlike other forms of longitudinal studies, such surveys allow for the study of individual behavior change over time as the same sampling units are followed over time.
The advantage of such surveys is that they enable researchers to measure and analyze changes over time in socio-demographic and economic situations, as well as the attitudes, opinions and behaviors of individuals or aggregates of individuals. The unit of observation of household-based panels is the household and its members. Household panels enable researchers to study household change and the changing dynamics of the individuals within it. A typical example is the CMIE Household Survey.
However, though data used in such surveys have opened up avenues of research that simply could not have been pursued otherwise, their power depends on the extent and reliability of the data as well as on the validity of the restrictions upon which the statistical methods have been built. Otherwise, such data may provide a solution for one problem, but aggravate another.
Limitations of such datasets include, but are not limited to, problems in the design, collection, and management of date for panel surveys. These include the problems of coverage (incomplete account of the population of interest), nonresponse (due to lack of cooperation of the respondent or because of interviewer’s error), recall (respondent not remembering correctly), frequency of interviewing, interview spacing, reference period, the use of bounding to prevent the shifting of events from outside the recall period into the recall period, time-in-sample bias etc.
Another limitation of such datasets is the distortion due to measurement errors, which may arise because of faulty response due to unclear questions, memory errors, deliberate distortion of responses (e.g., prestige bias), inappropriate informants, wrong recording of responses, and interviewer effects. Although these problems can also occur in cross-sectional studies, they are typically aggravated in panel data studies due to their very nature.
Such datasets may also exhibit bias due to sample selection problems. For the initial wave of the panel, respondents may refuse to participate, or the interviewer may not find anybody at home. This may cause bias in the inference drawn from this sample. Although this nonresponse can also occur in cross-sectional data sets, it is more serious with panels because subsequent waves of the panel are still subject to nonresponse. Respondents may die, move, or find that the cost of responding is high.
Additionally, the data suffers from the intrinsic deficiency of reactivity. For example, if we ask people questions about the status of women at two or more points in time, the questioning process itself might produce opinion shifts. Perhaps the act of asking people about the status of women makes them more sensitive to women’s issues. This increased sensitivity might mean they are more likely to favor or oppose changes in the status of women during later surveys. This is called reactivity, because the respondents are reacting to the initial questioning.
Last, but not the least, one of the most significant problems associated with panel data is the issue of attrition (i.e., respondents dropping out of the study). Attrition is a general problem for any study that draws on a panel survey. But, because of the strong correlation of attrition with residential mobility, it is a particularly severe issue for demographic analysis. Marriage, cohabitation, separation, divorce, or childbirth may lead to a residential move, and survey institutes are often unable to keep track of people as they move. A similar issue arises at higher ages, as respondents may be unavailable for an interview because they have moved into an elder-care facility, been hospitalised, or been unable to answer questions for other reasons (such as disability).
The analysis of mortality risks using a survey is also contingent on the availability of information on the reasons why a respondent dropped out. Some panel studies (like the PSID and the US Health and Retirement Survey) are able to verify that a respondent dropped out due to death by linking his or her information to official death registers. In other countries, it is not always possible to confirm deaths in this way.
Thus, such surveys, undertaken with grandiose plans and high hopes, frequently fail to live up to the expectations of the initiators of the research. Practical problems, such as continuity of personnel, institutional commitment, funding, design and substance of the databank, changing of facilities and equipment, and many others may interfere with the success of the project. We are not sure whether the surveys done by CMIE and even NSSO live up to all such challenges. A case in point is that the University of Michigan, in their Consumer Surveys, always asks consumers about their anticipation of unemployment rate changes, and that is subsequently validated. We are not sure whether the CMIE Survey, which is modelled along the lines of the University of Michigan Consumer Survey, even addresses such an issue!