Views
3 months ago

Data integrity, empirical research and economic planning

Published :

Updated :

In his recent article entitled, “The riddle of the missing figures” (FE, 27, 2023), Dr Helal Ahmed gave several examples of discrepancies in data published by Bangladesh Bureau of Statistics (BBS). The concluding paragraph of the article is reproduced as follows:

“The anomalies about data, statistics and information observed in Bangladesh are rarely seen elsewhere in South Asia, including our neighbour India, Sri Lanka, and Pakistan. Their statistics are much cleaner and more transparent. Some problems undoubtedly exist in Bangladesh, as there are significant differences between the export figures of EPB, NBR, and BB. If these figures are published in a coordinated fashion by the relevant agencies, the problems could be identified and overcome. It is recognised all over the world that statistical discrepancies or anomalies create problems for policy making and governance. It is unfortunate that Bangladesh has not yet been able to develop a credible statistical system even after 52 years of existence.”

At its most rudimentary level, data integrity refers to the accuracy and consistency of data across its entire life cycle, from when it is generated and stored to when it is processed, analysed, and used. Data integrity is a complex and multifaceted issue. Therefore, data management professionals must be watchful about the various risks that can compromise data integrity and quality. There are five essential characteristics of data integrity that data science and literature emphasise:

  • Accuracy: Data must be free of errors and must embody the real-world picture or process it is supposed to portray. Obviously, inaccurate data leads to faulty analysis which in turn leads to misleading and conclusions.
  • Consistency: Data consistence means it remains unaltered across all instances over time unless deliberately updated or modified.
  • Completeness: Data completeness means data has all the necessary parts and information needed to lead to correct conclusions and support decision-making processes.
  • Reliability: Data reliability gives confidence to the analysts that he/she will produce that are reliable for decision making. 
  • Validity: Valid data adheres to the set formats and values defined during the data collection process to be used for specified purposes.

While emphasising the importance of data integrity, we may also consider the importance of data quality. The two are closely related but not the same. For example, data could be perfectly accurate and consistent (high integrity) but not relevant to the purpose (business decision) at hand (low quality). Data quality includes the overall essential characteristics, including its accuracy, consistency, completeness, relevance, and timeliness. Thus, we can say that integrity is a constituent of data quality, focusing on the accuracy and consistency aspects of data.

Unfortunately, deliberate data manipulation is not uncommon from pressure by political parties in power in many hybrid and autocratic regimes. In many developing countries the government has been alleged to influence their respective data collection and analysis department to produce favourable economic growth, inflation statistics, poverty alleviation, and unemployment data. Even in the US — believe it not — many of the political hacks including some TV Channels (Fox News in particular) blatantly accused President Obama’s administration of cooking growth and unemployment data as the economy was expanding after the Great Recession of 2008. The data manipulation of the US economy is nearly impossible given there are 75 government and private sources of Economic Data, Statistics, Reports, and Commentaries. Add to that the omnipresent watchful eyes of the print, digital, and broadcast media.

Writing research papers for scholarly journals is quite different from research reports produced for economic policy formulation and development. Publishing research papers in top journals are a long process whereas policy papers must be done by exigencies. Journal papers and the results and implications take a long time to adopt and implement in future policy making. Therefore, policy makers rely on the results and implications of project analysis to develop and adopt for policies. 

In Bangladesh and most other developing countries a predominant number of studies is devoted to funded projects to study and evaluate specific issues and then there are university faculties who generally pursue academic research. Project analysis provides information to policy makers while quantitative academic research seeks new knowledge and understanding of economic theories for future potential application to real life economic policy making. Obviously, inaccurate data used in economic project analysis would lead to faulty policy design resulting in disappointing project outcome.

Publishing research papers in top Journals is very challenging- the paper goes through a gruelling review process. For example, three of my co-authored papers published in the Review of Economics and Statistics (A Harvard University Journal, ranked in top 5 out of 1500 at the time) required me sending data on a USB drive along with paper submission. It takes a long time — often decades — to have some of the remarkable research papers going through the academia and policy application to get recognition in the economics profession.

Quantitative economics does not always produce desired outcomes for many different reasons aside from data inaccuracy. The approach uses a range of complex mathematical and statistical procedures to analyse economic hypothesis implied by theories. These techniques — although by no means perfect — help analysts explain economic issues, as well as predict future economic conditions. The primary analytical method of quantitative economics is regression analysis, which studies economic outcomes as functions of one or more economic predictor variables. These regression techniques are developed to capture the information contained in the data to produce desirable outcome. The data must follow certain restrictive (normality) assumptions. For example, economic time series are generated by a stochastic or random process to meet those assumptions — no human intervention (manipulation) is allowed in the process. Take the case of the GDP data generating process. There are three ways of estimating GDP, each of which should give the same result. These methods are: (1) Output Method (all value added by each producer), (2) Income Method (all income generated) and (3) Expenditure Method (total all spending). One can, therefore, check the accuracy of the GDP data before using them for forecasting and policy analysis.

Data analysis and the resulting economic implications are as reliable as the quality and integrity of data used in the estimation process. To assure data integrity — prior to conducting any analysis— a researcher always takes a glimpse of the plotted time path of the data (in first difference format) to see if there were any suspicious intrusions of outliers in the data series. Once any suspicious data points (outliers) are detected, the researcher may use statistical techniques (such as the use of Interquartile) which may remove outliers. Furthermore, the researcher also conducts a “differencing” method for preparing the data as accurately as possible so that data can generate more precise outcomes. Otherwise, omitting a zero or adding a number not generated by the underlying data-generating process may compromise the much-required error-free data and then the estimated parameters become virtually unreliable and useless.

An example of irregularity or presence of outlier in time series data would be the single-digit lending interest rate in the week or month in which it was forced on banks — instead of being set by the free market discipline of supply and if the researchers fail to use appropriate statistical technique to account for the abrupt data break (outlier) from 13 per cent to 9.0 per cent lending rate, the empirical results based on the outlier inflicted data would be misleading.

It is not uncommon that some researchers under pressure from ‘publish or perish’ push and Ph.D. students who just want to comply with their thesis requirement may not apply rigorous and critical data analysis. At other times, data accuracy may be deliberately compromised by some researchers and students because of the lack of consistency of estimated results implied by economic theory.

Finally, quality research is predicated upon appropriate support with quality data, availability of journals, and state-of-the-art statistical software. Clicking the menus and buttons of statistical estimation software loaded in computers is easy and often mechanical, but if the data (even a single number) used in such automated data analysis is inaccurate, the results would be nothing more than just plain rubbish — consistent with the famous adage “garbage in, garbage out.”

Farzana Zaman is a lecturer of statistics at Rangpur University, [email protected].

Dr. Abdullah A Dewan, formerly a physicist and a nuclear engineer at BAEC, is Professor of Economics at Eastern Michigan University, USA. [email protected]

 

Share this news