The report of the committee on real sector statistics—constituted by the National Statistical Commission, that gave the back series estimates of the GDP—appeared for a brief period on the MoSPI website. We are lucky to have a copy of the report with us before it was removed. The report gave back series data for GDP as far back as 1993-94 with 2011-12 as the base year. The back series numbers were generated by its sub-committee on linking old and new GDP series. The numbers have created some political controversy regarding GDP growth in India over the performance of UPA and NDA coalition governments during the last two decades. However, the debate has missed the scrutiny of suitability of the statistical methodology used for generating the series. Reliability of the series depends on the methodology employed. Given the potential use of these numbers by researchers and policymakers, a robust statistical approach to generate the series is amust. The attempt here is to critically evaluate the methodology employed to arrive at the back series estimate of GDP.
The report admits that the best back series is the one generated by the statistical agency, which provides the new series numbers itself through the use of comparable approach for the past data points. In the absence of such numbers for the past years based on the new base year, the most widely adopted methodology is using the fixed number multiplier to splice the series. The report has opted for an alternate methodology—the ‘production shift’ approach. However, a statistical scrutiny of the production shift approach is missing in the report and also in the papers cited in the report. Hence, it needs to be evaluated if the approach offers any statistical superiority over the fixed number multiplier method of splicing to generate a robust back series.
Missing theoretical support
The reasoning behind adopting the production shift approach is that the numbers are generated through a production function that undergoes a structural shift whenever a base year is revised. However, if we look into the report, there is no mention of the current or the old production function which proxies the generation of the GDP numbers. If the relevant production function is missing, then how do we know which direction and by what quantity does the production function shift?
What the report seems to do is different from any shift in the actual production function generating the GDP number. Contrary to the reasoning presented here, it seems to adopt an ad hoc procedure, where it assigns a uniformly and linearly declining weight for the back years to the difference in the GDP numbers for the common year. If we look at the graph in the report for the production shift along with the logic presented, the weight assigned to the gap should be non-linear. Moreover, the scheme of weights for the past observations is sensitive to the year till which the new series is to be generated, i.e., it changes the denominator of the multiplying factor. In the report, the back series is reported to be generated as follows. Difference in output, nominal GDP, for the common year from two base years, 2011-12, is redistributed backwards with linearly declining weight up to 1993-94, i.e., for 19 years.
The formula for the aggregate nominal GDP, as mentioned in the report, is as follows: where n is the number of years for which the back series is to be generated, i= n-1, n-2, …, 1 and t= 2010-11, 2009-10, . . . . is the generated number and is the number as per old base year. The accompanying graphic presents the weights used by the report for generating data up to the past 19 years. The weight scheme adopted in the report is as follows: 18/19, 17/19, 16/19, …, 1/19. Now, as per the methodology, if one has to generate back series for the last four years, the weights will be, 3/4, 2/4 and 1/4. Thus, the weight assigned to the recent past drops from 0.95 to 0.75.
Similarly, if one has to generate back series for say 50 years the weight assigned to the recent past will be 49/50 i.e. 0.98. And, for n number of observations, being sufficiently large, it becomes, i.e. ,1. Thus, the scheme adopted in the report is sensitive to the sample size and hence will generate an unreliable number. The accompanying gtaphic presents the shift in the scheme of weights assigned to the back years as we change the sample size for which the past numbers are to be generated. The argument that the previous base year should be given the least weight does not identify the ‘least’ in the report. Hence, a robust ‘path of the weight assignment’ over the sample size is missing. The statistical theory to support the methodology is absent. Thus, in absence of a theoretical support, the methodology based on the logical argument does not yield a robust and reliable number.
On the generated numbers
The report states that that the difference between the old series and the generated series is minimal. Is it so? If we look at the percentage difference in the growth rate of real GDP in new series over the old series, the difference is large. It increases as we move away from the mid-sample. It varies between -23.33 to 6.27 during the reported sample of 19 years. Also, the difference is asymmetric with respect to the mid-point of the sample. The accompanying graphic shows the percentage difference in the growth rates from generated series for the real GDP in the report over the old base year numbers. It reveals that the splicing methodology adopted is such that it rewards the growth rate post mid-point of the sample, whereas pre mid-point, it penalises the same. The penalty factor is much higher than the rewarding factor. The theoretical support for this is absent.
Can we replicate the numbers using methodology mentioned? Any number for the GDP growth in the report should be replicable by using the methodology stated in the report. The accompanying graphic presents the growth rate figures for GDP at market price generated by following the methodology as mentioned in the report for the past 19 years for 2011-12 base year. Data from the EPWRF India time series data base has been used in this note. The numbers generated from this exercise, however, is different from what is reported in the report. The difference in the two estimates is beyond any explanation. Why should the estimates generated by employing the methodology stated in the report differ from the estimate given by the Committee? If a methodology used is robust, the estimates should be easily replicated, and that is not the case here. Also, we observe that, as mentioned earlier, there seems to be some bias in the estimates of the Committee overestimating the growth post mid-point and underestimating the growth for the earlier period for the years of estimation of the back series. Perhaps, the Committee should provide the methodology of estimation and its assumptions in greater detail.
The report uses Bai-Perron test for the stability checks on the aggregate GDP at market price. Report says that the test is used to understand whether the current method creates any statistical breaks in the new back series. Can one do so? It should be noted that the Bai-Perron test is a multiple structural break test and needs a sufficiently large sample to carry out the related statistical analysis. The small sample of 19 observations is not sufficient for applying the test. Also, a break point test cannot be an indicator of robustness of the generated new series over the old series. What the break point test does is to check whether there is statistically significant difference in the means and variances of the sub samples. The use of Bai-Perron test for robustness in the report is misleading.
Also, in the forwarding letter, the Committee states that there are possibilities of many infirmities of language and typographical errors as the report is being submitted before the extended deadline. What we observe is that the errors in the report are beyond the basic language or typographical errors, and are a lack of robustness in the estimate itself. The application of wrong methodologies can misguide researchers and policymakers and may lead to wrong debates. What is needed is a reliable set of estimates and that requires application of robust methodology.
The Author is Assistant Professor, NIPFP
Views are personal