Sounds of Unsound ‘P’! Tuning the Data or Striking the Wrong Note?…

Editor’s note: For long, many scientists’ careers have been built upon the pursuit of a single statistical value of p<.05. In many disciplines, that’s the cut off beyond which results can be declared “statistically significant,” i.e., the results obtained were not by fluke. Though this isn’t what it actually means in practice.

In this article, we try to highlight the hypothesis myopia suffered by researchers and analysts, in the pursuit of gathering shreds of evidence in support of a theory while ignoring explanations against it or its rationale. In the end, does statistical significance is equivalent to clinically meaningful data? Or it is an outcome of torturing the data enough so that the data will confess at some point?

One day recently, I was not sure when my wife asked me if she was required to fast for a thyroid function blood test that her doctor had prescribed. So I googled for an answer. The search fetched me first some pieces of information specific to the thyroid test procedures followed by the Cleveland Clinic.

These were related to three tests viz. Thyroid Stimulating Hormone (TSH), Thyroxine (T4) and Microsomal Thyroid Antibodies (TPO) and pronounced that none of them required fasting, and could be tested anytime during the day. A moment later, I opened another page of my search result. It was a research paper published by a medical journal regarding the effect of fasting as compared to not-fasting in the interpretation of thyroid function tests.

Interestingly, the conclusion of the study stated that TSH levels showed a statistically significant decline postprandial (i.e., after a meal/without fasting) in comparison to fasting values. So, I came across an evidence-based viewpoint not favoring the existing procedures of testing followed in most of the diagnostic centers (as in practice in Cleveland Clinic).

I got curious nonetheless, about this particular study; for, it used statistical techniques to back a divergent idea. After reading the paper, I noted the conclusion was effectually undermining the objective the study set for itself at the beginning. The research paper at the beginning itself, clearly declared, it addressed the question: whether a fasting or non-fasting sample would make a clinically significant difference in the interpretation of thyroid function tests. However, doing injustice to this aim, it didn’t sufficiently focus on how the observed variance between fasting and non-fasting samples, which was somewhat expected in any case, should matter clinically.

Especially when it was known that statistical significance is merely a necessary condition, not a necessary and sufficient condition; the conclusion actually led to nowhere. If finding a statistical significance implied and was implied by clinical significance, it would have been ‘necessary and sufficient’ to make an impact on diagnostic practices. So, their statement in conclusion at the end evidently was a self-limiting one, not analyzing the clinical implication of the difference for which a statistical significance testing was conducted. For many clinical studies that look into the effects of particular treatment/factor, it’s very often a problem of sufficiency-deficit, being ultimately trapped in a quandary about statistical significance vis-a-vis clinical significance.

Certain other studies, however, have also shown that early morning blood samples were taken after overnight fasting give rise to higher TSH levels compared to those taken later in the day with no fasting. Mary Shomon, the author of the New York Times best-seller “The Thyroid Diet Revolution: Manage Your Metabolism for Lasting Weight Loss” while discussing on various factors that can potentially influence the TSH level such as medication, pregnancy, etc., considered the fasting/non-fasting variation to be especially problematic in clinical diagnosis of thyroid malfunction.

So there was hardly any knowledge addition in knowing that the fasting/non-fasting difference was statistically significant. In any case, a firm answer to the question my wife asked remained elusive as medical science was still not prepared to recognize the fasting/non-fasting variation in TSH and free T4 levels as clinically relevant (i.e., a ‘sufficiency’ condition not automatically implied).

Nevertheless, it was quite evident to me that researchers and analysts, in many cases suffer from hypothesis myopia in the pursuit of collecting evidence in support of a hypothesis while ignoring explanations against it or its rationale. There is a saying among statisticians: Torture the data, and the data will confess. So they don’t stop wrenching the data till they show a statistical significance and the moment they get it they don’t go beyond. This tendency is not new and brought about in the past, perilous consequences for many a hyped discovery failing to protect the claimed statistical significance. Incidentally, I caught some glimpses of the various abuses of statistical significance from the web. They speak volumes of the degree of concern for the scientists.

Clinical significance necessarily means Statistical Significance, but the converse is not always true.
Illustration by Meghna Chakrabarti

What of course was bothering the scientists from a statistical point of view was related with the researchers’ taking recourse to purposeful dredging and tweaking of data (p-hacking) until the elusive statistical significance is reached to invalidate a hypothesized proposition. The hypothesis (null hypothesis) is often plain guesswork about a phenomenon without expending sufficient efforts to describe or analyze the practical significance of the theory and the risk of the conclusion being subsequently found irreproducible or inadequate in effect-size.

The keenness and motivation of researchers to anyhow publish papers based on statistical significance which were, later on, proved false-positive assumed so alarming a proportion over the decades that scientists even started looking for an estimate of what percentage of published results were subsequently proved wrong. Giving an idea of this quest concerning certain fields of science, in particular, the video here demonstrates a really impressive effort to make people appreciate how such malicious practices with manipulated statistical evidence is not doing any good to science.

So, no wonder why a commotion has been created of late by more than 800 scientists who called for denouncing the use of statistical significance in scientific inferences. In the sections to follow after this paragraph, I shall just focus on two factors which I consider central to the problems about the conceptual recourse of the researchers giving rise to data manipulation.

The problem may have partially rooted in dichotomania!

Statistical hypothesis testing based on samples of numeric observations on a quantitative characteristic (generally a continuous variable) under study in any field of science is a widely used technique and makes use of a test statistic to determine whether to reject a null hypothesis about the characteristic or not. Essentially, it’s a tool in the hands of researchers to probabilistically conclude whether, given one set of results (found by observations taken on a variable), a particular null hypothesis about the nature of variation of the variable as opposed to an alternative hypothesis is significantly plausible or not. The technique thus allows you to dichotomize the range of possibilities either as acceptable (say, white) or as not-acceptable (say, black) ones. Thereby you tend to ignore the bigger picture, a holistic view of the whole spectrum of colors. In the case I cited at the beginning, the study merely focused on the two distinguishable shades of variability (i.e., variations due to fasting & not-fasting) in TSH and free T4 levels among patient and non-patient categories, one purportedly being more acceptable than the other while both are not necessarily in conflict. With so much overlap between their ranges of variation specific to the categories of the states of the sampled individuals, the ‘p’ is not indicative enough.

Why should you like to focus your lens on two shades only when the whole is so colorful beyond your view?
Illustration by Meghna Chakrabarti

A measurable quantity which can be represented by a continuous variable not necessarily shows a dichotomous character (e.g., black and white). So, its range of variation shouldn’t be unnecessarily bisected in two parts across a cut-off line. That is to say that it may not always be essential to find what makes a value of a continuous variable to be on one side of a cut-off line rather than being on the other side of it when the cut-off itself is artificial to the variable.

On the contrary, by application of statistics in science, we intend to study the natural variation of a phenomenon (not a man-made difference) or the effect of a man-made variation on the natural variation of an event. It relies on a fundamental premise that it is just natural for a quantity representing a phenomenon to vary over a range of values (in which every imaginable number of the variable is likely to occur). The variation most often doesn’t depend so exactly on some other factor/variable(s) that it can be described by a mathematical function, though an inexplicable relationship may exist. Unlike laws of Physics which define exact relationships between physical quantities, laws of mathematical exactitude don’t always exist for the theories in certain other fields of science (e.g., psychology, life sciences, etc.) and their phenomena.

Methods of statistical inferencing are, therefore, applied most extensively in these areas for showing the effect of or, association with other factors (variables) that may exist and influence the variability of the experimented variable. In doing so, researchers do have to introduce assumptions about the nature of variation of the observed numbers. According to Sander Greenland’s findings, cognitive biases of researchers (dichotomania being one of them) including untenable assumptions, play mysterious roles for the null-hypothesis-significance-testing (NHST) that often results in false-positive significance. Principally because, biased intuitive reasoning usually takes over the logical consistency in inferential arguments as The American Statisticians (Vol. 73, No. 51, 2019; Editorial) puts the essence of it as follows.

A label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important. Nor does a label of statistical non-significance lead to the association or effect being improbable, absent, false, or unimportant. Yet the dichotomization into “significant” and “not significant” is taken as an imprimatur of authority on these characteristics.

Applying a standard theoretical model on observed data may be untenable

It is widely acknowledged by a large section of the science community that there is a considerably high chance that a variable representing a phenomenon may not follow a particular probability distribution (theoretical model) as is required to be valid for testing of hypotheses. Thus prior investigation into the pattern of variability with a large number of observations to know how much close or distant the pattern is from the presumed model is absolutely critical. Else, blindly accepting a probability distribution reifying the very nature of the variability as being conforming to the standard model may actually be a far-fetched imposition.

In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred under the condition that the null hypothesis is true. In other words, a result is statistically significant, by the standards of the model as assumed, when p < α [where α, the predefined significance level (generally very small, 0.05 or smaller) is the probability of committing a type-I error i.e. the error in rejecting the null hypothesis, when it is actually true; and p (called the p-value of the study) is the probability of obtaining a result at least as extreme, given that the null hypothesis was true]. Though it is intended to keep the chance to commit type-I error minimum by taking α = 0.05, say i.e. taking a 5% chance only to be wrong to reject the null hypothesis when it is actually true under the model distribution assumed for the population at large, it may be quite likely that the distribution, in reality, being non-conforming to the assumed model doesn’t have so low a probability for that wrong to materialize.

Chance to commit a Type-I error may not actually be as small as you wanted it to be if you have assumed a probability distribution for the variable that happens to be unrealistic. Illustration by Meghna Chakrabarti

A few years back, the American Statistical Association (ASA) released a policy statement aiming to halt misuse of p-values. This was the first time that the 177-year-old ASA made explicit recommendations on such a foundational matter in statistics. Explaining the significance of the ASA recommendation, Nature emphasized on the need to weighing the evidence instead of blindly accepting a p-value of 0.05 or less to mean that a finding is statistically significant:

A p-value of 0.05 does not mean that there is a 95% chance that a given hypothesis is correct. Instead, it signifies that if the null hypothesis is true, and all other assumptions made are valid, there is a 5% chance of obtaining a result at least as extreme as the one observed. And a p-value cannot indicate the importance of a finding; for instance, a drug can have a statistically significant effect on patients’ blood glucose levels without having a therapeutic effect.

Recognizing the seriousness of the tendency of a large section of scientists and statisticians finding themselves constrained to selectively publish their results based on a single magic number, a very recent special issue of The American Statisticians has prescribed what not to do with p-values and significance testing. This advisory comes along with 43 innovative and thought-provoking papers to guide the researchers about what to do as well to face research questions. In essence, the recommendations, envisage a new world order for hypothesis testing, where studies with “p < 0.05” and studies with “p > 0.05” are not automatically in conflict, and therefore, researchers will see their results more easily replicated—and, even when not, they will better understand why.

This blog originated out of a dinner table conversation at the Chakrabarti household, where Satyabrata Chakrabarti (the dad and former Deputy Director General at Central Statistics Office, Government of India), tries to convince his two daughters how statistical significance may or may not be equivalent to clinically meaningful data

Cover image by Meghna Chakrabarti (L), Author Satyabrata Chakrabarti (M) and content research/editing/blog design by Rituparna Chakrabarti (R)

This image has an empty alt attribute; its file name is brain-1-2.png

We publish using the Creative Commons Attribution (CC-BY) license so that users can read, download and reuse text and data for free – provided the authors, illustrators, and the primary sources are given appropriate credit.

Feeding an extra portion too much

the unwholesome way to growing them up

Editor’s note: As under-nourishment is receding and developing countries are moving up the economic ladder; the generation Z’s are getting overindulgent in fast food and sedentary lifestyles.

The effect is burdening and menacingly giving rise to several diseases. At this rate, halting the tide of obesity by 2025 may remain challenging unless and until the feeding behavior for children is addressed. The author digs deeper into how the problem gets initiated early in life, by looking at the world through his lens of numbers.

Laughing Buddha these days do wonder if it’s time to shed those extra kilos and become the modern age Zen.
Image credit Pixabay.

In the last couple of months, my wife and I have put on a few extra kilos, which we are keen to shed now. For this makes our aging bones toiling much harder, especially in lifting the increased weights up the staircase to our third-floor apartment. We were quick to attribute this condition to the harsh winter this time around and its attendant high level of air pollution in the national capital region, which prevented us from our usual outdoor activities in the morning. So, as the spring set in we have resumed our routine morning workouts.

In the park we visit, as we make the rounds, we, therefore keep ourselves particularly observant of the masses of living matter that other people are carrying on their feet so that we get some psychological comfort from the weight-comparing drills we do with the eyes.

Unlike Indians in general, people of our area are well known for their good physique and higher consciousness about the figure and shape of their own persons. Strangely enough, our eye estimation suggested, much to our satisfaction, that oversized figures have outnumbered, by a good margin, the wiry figures typical of this area. I got convinced, thanks to these pot-bellied visitors, of how the food industry has fatted here.

No, this opening narrative is no anecdotal evidence as my description may suggest; nor the result of an aberration of parallactic vision. It’s just an inset image within a bigger picture of nutritional transformation that people from all parts of the globe are being subjected to.

Yes, lopsided nourishment is what nutritional transformation is all about and evidently one of the ill effects of economic development causing drastic changes in food culture.

The Rapidly Growing Overfed

Created by the extremes of the food supply, the world we are living in now is polarized between hundreds of millions of unfed and overfed people. A research published in 2016 has revealed how the growing number of overfed people has brought about a new imbalance – offsetting the broader social, economic and medical concerns for the unfed and underfed population. The study shows an enourmous 167% rise in the number of obese people between 1975 and 2014 compared to only 35% fall in the number of underweight people during the same period.

Obesity increment rate has outpaced the decline of the underweight.
In the last 40 years, the number of obese people has increased by almost 2.6 fold from 105 million in 1975 to 641 million in 2014. Plotted by the author; Data Source: NCD Risk Factor Collaboration (NCD-RisC)

Overweight and obesity are defined as ”abnormal or excessive fat accumulation that presents a risk to health”. Generally, for an adult, these are measured by a person’s body mass index (BMI), which is a person’s weight in kilograms divided by the square of his height in meters (kg/m2). An adult with a BMI of 25 to 29 is considered to be overweight, while someone with a reading of 30 or more is obese. According to the World Health Organization (WHO), unfortunately, more women than men are tipping the scales at obesity levels. This could be attributed to several factors like lifestyle, economic conditions, racial makeup, and preexisting health conditions. Regardless to say obesity could consequently impact women’s reproductive health.

Concerns about the health and economic burden of increasing BMI have led to adiposity (i.e., a condition of being severely overweight or, obese) being included among the global non-communicable disease (NCD) targets, with a target of halting, by 2025, the rise in the prevalence of obesity at its 2010 level.

Age-standardized mean BMI in men (A) and women (B) by country in 1975 and 2014.
Data Source: NCD Risk Factor Collaboration (NCD-RisC)

Disease Burden: Obesity vs. Under-nutrition

Globally more people are obese than underweight – this occurs in every region except parts of sub-Saharan Africa and Asia. Overweight and obesity are also linked to more deaths worldwide than underweight. Although there’s been a focus on mortality, there’s a huge volume related to things that don’t really kill you. Jessica Hamzelou wrote in 2012 about a shift in disease burden.

In 1990, under-nutrition was a leading cause of disease burden, measured as the number of years of healthy life an average person could expect to lose as a result of illness or early death. Back then, a high body-mass index, or BMI, was ranked tenth. Now, under-nutrition has dropped to eighth place, while BMI has risen to become the sixth leading cause of disease burden.

Jessica Hamzelou

Morning shows the day

Childhood obesity threatens our younger generations, similar to what under-nourishment did in the past few decades. The prevalence of overweight and obesity among children (under-5) and adolescents aged 5-19 has risen dramatically from just 4% in 1975 to over 18% in 2016. However, this trend remained consistent for both the genders, unlike adults.

Standards of measuring overweight/obesity for children under 5 years of age are, however, different from those for adults and adolescents. Unlike BMI, this is measured from the weight-for-height distribution of a vast number of children. In the case of overweight and obese children, the weight-for-height lies in the higher extremes of this distribution.

One of the leading causes of childhood obesity is the larger portions fed in early childhood days, according to Hayley Syrad, from University College, London. Some parents may be over-feeding their children, and in the process driving them to a higher risk of obesity-related health hazards. So the problems do show up just in childhood when a healthy diet and healthy size of diet-portions are often lost sight of.

Studies show that overweight children are more likely to become overweight adults. Even birth weight tracks a person’s growth to adulthood: “A bigger baby is likely to be a bigger child and then a bigger adult”, researchers said. Dr. Clair McCarthy, of Harvard Health Publishing, predicted based on a new study in December 2017 that more than half of the children are going to be obese adults.

Not only are more than half of current children going to be obese by 35, but an obese 2-year-old has only a one in four chance of not being obese at age 35. If that 2-year-old is severely obese, the chance of being at a healthy weight at 35 is only one in five. By the time that severely obese child is 5, they have only a one in 10 chance of not being obese at 35.

Dr. Clair McCarthy, of Harvard Health Publishing

Over-nourishment – an effect of economic transition

The global burden of obesity and overweight has increased at an accelerated rate as the under-developed and developing countries have moved up the economic ladder and switched from traditional diets to western food styles, B M Popkin and L S Adair et al,. of the University of North Carolina observed. This Nutrition transition in case of children under 5 years of age has resulted in a great extent from the practice of feeding large meals that wouldn’t have been afforded otherwise by parents from lower income bracket.

The concern about the growing prevalence of overweight children, therefore, is no longer restricted to developed countries alone. With the economic situation improving in developing countries, child-feeding behavior is changing in these countries too.

We take the help of statistics for some of the developing countries to show how those who can afford to spend more on food are more prone to have overweight children. Compared to children from the families of the lowest echelon of income (termed 1st wealth quintile or 20% population of the lowest income), children from the 5th wealth quintile families (comprising 20% population with the highest income) are exposed to a higher prevalence of overweight, as the developing countries’ data shows.

The tendency and stimulus to feed large portions of meals that cause more harm to a child’s growth are linked to having enough money.
Plotted by the author; Data source- UNICEF’s Expanded Global Database on malnutrition 2019

Urban lifestyle impacts feeding practices

The phenomenon of the prevalence of overweight children being correlated with the economic condition of their parents is also discernible from the rural-urban differential of the prevalence. The fact that urban people are economically better off than rural people in countries of the developing world is reflected in how they feed their children.

Lack of capacity to afford and lesser accessibility to pre-prepared and packaged foods of low nutrition is a boon in disguise for people in rural areas of the developing countries. Children of rural areas as compared to urban children, therefore, are less exposed to overfeeding as may be evident from the prevalence differentials in some of the developing countries.

Overfeeding children is more of an urban trait.
Plotted by the author; Data source- UNICEF’s Expanded Global Database on malnutrition 2019

Rapid urbanization and improving connectivity, however, are quickly obliterating the rural-urban divide. A study has attributed the global obesity epidemic among adults to the rising rural-BMI.

…contrary to the dominant paradigm, more than 55% of the global rise in mean BMI from 1985 to 2017—and more than 80% in some low- and middle-income regions—was due to increases in BMI in rural areas…rural under-nutrition disadvantage in poor countries (being replaced) with a more general malnutrition disadvantage that entails excessive consumption of low-quality calories.

NCD Risk Factor Collaboration (NCD-RisC)

Educated mothers do them more harm

Data shows children being overweight is more prevalent where mothers of the children are more educated. Children by mothers having no education or primary education are less likely to be overfed than children of mothers having secondary or higher education according to estimates of the WHO-UNICEF-World Bank Group joint malnutrition database.

Female education in developing countries is positively correlated with the economic condition of people. So, the more educated mothers are likely to be from economically better off families and, therefore, have feeding behavior as of the higher  Wealth Quintal population. Education seems to have mattered little as the chart below shows for the selected developing countries. 

Mothers should know that babies and young children who are not overweight should eat until they are full rather than being made to finish everything on their plate.
Plotted by the author; Data source: UNICEF’s Expanded Global Database on malnutrition 2019.

To halt the increasing economic and health burden of obesity, the growing prevalence of overweight children can’t be set aside for the future. It needs urgent attention in developing countries and must not be allowed to slip out of hands. Focus on sensitizing the mothers to arrest the proliferation is the key.

Parents must practice responsive feeding or feeding when hungry

It’s a common mistake among parents to overfeed their toddlers, thinking it’s a necessary way of making sure they grow up healthy. Pressuring a child to eat is a feeding behavior that attracted the most attention of researchers. It’s the size of the portions of the feed forced upon children that matters in spoiling response behavior of children more than the frequency of feeding or feeding an extra Mars bar or an apple, said the researchers. A child should eat a child-sized portion, not an adult-sized one. Using smaller plates is one way to make this easier. For every extra 24 calories consumed during each meal, there was a 9% increased risk of becoming overweight or obese, studies found.

Forcing the child to eat raises the risk of weight gain by undermining the child’s ability to self-regulate food intake.
Image credit Pixabay.

If post-2000 trends of accelerated growth of overweight/obesity continue unabated, especially among children, the probability of meeting the global obesity target is virtually zero. Halting the growth has to happen from the beginning when the children will be tuned to regulate how their own appetite should be fulfilled, and mothers will stop pressing for an extra portion.

Author: Satyabrata Chakrabarti; Edits and blog design: Rituparna Chakrabarti

We publish using the Creative Commons Attribution (CC-BY) license so that users can read, download and reuse text and data for free – provided the authors, illustrators, and the primary sources are given appropriate credit.

The ‘Capacity’ Paradox – part 2

Editor’s note: Continuing the narrative from the previous part, the author delves in the depth of capacity crisis for sustainable development data. He is optimistic that the challenges are going to open up new vistas for national and international agencies to cooperate on mutually reinforcing efforts.

Not grabbing the opportunity for creating the appropriate capacity to address the data gaps, in the opinion of the author, could be as costly a mistake as failing the people of their rights. The narrative ends with a sigh for the dream data remaining elusive.

I presented some pieces of data and a chart in the last part related to some selected countries to demonstrate how the countries have progressed in developing the national capacity for production and dissemination of official statistics.  Any intelligent person who knows that one must always look into the elements the data are made of would be reasonably justified to conclude that the countries of the developing world are doing well in gradually improving their statistical capacity in terms of a defined set of parameters

At the same time, however, what message these impressive ‘score-lines’ give to a layman is a question that may have bothered none. Could it be simply a notion as being understood by a common man that some countries have already crossed the 90% mark and therefore, have not much to achieve further? If so, is it not an impression NSO’s would love to project in unreserved exhibitionism?

Now there lies a fallacy in this notion.

Let’s face it as a question that a common man could ask: Does the score in a true sense reflect a complete image of a county’s statistical capacity?  Or, it’s just a fractured image? 

One must not miss seeing what the SCI metadata has to say about this by way of explaining the structure, constituents, method, and rationale of the indicator. However, before we are done with the metadata thing, let’s try to understand the fallacy a bit more clearly.

Countries are severely constrained in statistical capacity to meet the challenges confronting the statistical systems as of 2018. Image credit Meghna Chakrabarti

Countries are in fact severely constrained in statistical capacity to meet the challenges confronting the statistical systems as of 2018. By 2016, the nations of the world were already seized with the 2030 Agenda of Sustainable Development Goals (SDGs) that their leaders had committed to achieving in the interest of the humanity and the planet earth.

Attaining the SDG targets under the goals by the year 2030, as they say, crucially depend on the countries’ ability to track the progress with good penetrative statistics. Knowing just the national count of the people to be reached with the good effects is not enough; a much stronger statistical capacity is required to know who they are, where they are located and what challenges they are facing.

A new statistical framework of 230+ indicators has been prescribed for tracking progress towards attaining the targets. An assessment of the IAEG-SDGs of the UN has revealed that as of 13 February 2019: The updated tier classification contains 101 Tier I indicators, 84 Tier II indicators, and 41 Tier III indicators

This means, over 125 indicators (over 54%) belonging to Tier II,  Tier III or multi-tier category have either no internationally established standards or methodology available, or data are not regularly published by the countries.  This is a serious capacity deficit and a global syndrome.

Apart from these conceptual and methodological challenges, the overarching SDG principle of leaving no one behind has raised the bar (for the national statistical offices as well as for global monitoring agencies). UNICEF’s report ‘Progress for Every Child in the SDG Era underscores the criticality:

It is no longer enough to monitor progress by global aggregates or national averages alone. Results need to be disaggregated to monitor progress among sub-national groups of people, especially those who are vulnerable such as the girls, children living in remote rural areas or informal urban settlements.

Children and Gender equality are central to the whole of the SDG agenda. 44 child-related indicators are situated under the 17 SDGs. Analyzing these indicators, UNICEF’s report maps them thematically into 5 dimensions of children’s rights:

  • the right to survive and thrive (17 indicators under SDG 2 and SDG 3)
  • the right to learn (5 indicators under SDG 4)
  • the right to be protected from violence (10 indicators under SDG 5, SDG 8 and SDG 16)
  • the right to live in a safe and clean environment (10 indicators under SDG 1, SDG 3, SDG 6, SDG 7 and SDG 13)
  • the right to a fair chance / to have an equal opportunity to succeed (4 indicators under SDG 1)

The report reveals that data are not available for each of these dimensions in substantial proportions (of the 202 countries covered): 22% missing the data for the dimension ‘survive & thrive’; 63% for the dimension ‘learning’; 64% for the dimension ‘protection’; 24% for the dimension ‘environment’; and 63% missing for the dimension ‘fair chance’.

If we look at the gender responsive indicators of the SDGs (there are 54 in number spread over 12 of the 17 SDGs), the story is no different – “only about 26% of the data necessary for global monitoring of the gender-specific indicators are currently available”.

When this is the picture at the national level, the stories of non-availability of data at the sub-national levels for the child- or gender-related indicators, or for any other sub-populations/groups of people are obviously all the more disquieting, not to speak of the untrodden areas of environment, climate, life below water and the like affecting livability on earth.

So, a person struggling for the SDG data would wonder whether the 2018 SCI score is any indication of the actual current statistical capacity of a country or what? As its structural constitution is defined for a pre-SDG framework and not redesigned post-2015 to account for the data on SDG indicators, SCI in the present form just reflects a statistical capacity for an incomplete basket of statistical deliverables.

The varying degrees of challenges confronting national statistical offices with the advent of the SDGs in not being able to produce the required data, especially disaggregated data, for the lack of capacity could be somewhat fathomed if realistically assessed statistical capacity were known overall for the countries and for the constituent dimensions. A warrant for action is already spelled out in what Target 18 under SDG 17 states:

By 2020, enhance capacity-building support to developing countries, including for least developed countries and small island developing States, to increase significantly the availability of high-quality, timely and reliable data disaggregated by income, gender, age, race, ethnicity, migratory status, disability, geographic location and other characteristics relevant in national contexts.

The target essentially calls for action on the part of development agencies which will eagerly seek information on the intensity and depth of capacity deficit in the face of the SDGs across the world to assess how much effort and energy they may have to apply and where.

The SCI, if re-engineered taking into account the SDG indicators, could provide much of that all-important insight in terms of the score-levels of statistical capacity. Undoubtedly, a new score for 2018 by the hypothetical SDG-laden SCI should invariably be much lower than the existing 2018 score for even the very-high-scoring countries in the World Bank database. Now these hypothetically discordant scores as compared to the existing scorelines could potentially instigate debate within the number cognoscenti.

If it were so happening, anyone, in the same way as what happened to Mark Twain, might easily be tempted to ascribe the denigrated numbers to incredibility of statistics. Save probably those who studied the metadata and appreciated the plausibility of how scores could dwindle on loading the construction framework of SCI with additional parameters as of SDGs; they might easily argue, ‘a student of the fourth standard scoring 90 % overall in the fourth grade examination will in all probability do extremely badly if allowed to sit for an eighth grade test.’ Then that’s a matter of capacity gap – a gap due to the difference in the frame of reference as it would be the case if the SDGs were embedded in the SCI framework.

Statistical capacity indicator scores from 2004-18 in 8 developing nations. Data compiled by the author, source World Bank, SCI database.

In the chart depicted in the previous section (also see above), the temporal ups and downs of the lines, however, are happening in spite of no change in the frame of reference of the SCI. These are somewhat akin to fluctuations in productivity (I spoke of in part 1).Change in the frame of reference by placing the SDGs on the SCI framework could materialize if and only when the data would be forthcoming on the majority of SDG indicators.

Since 2004, when Marrakesh Action Plan for Statistics was developed,  strategic planning has been  recognised to be a powerful engine for guiding the  national statistics development programmes (NSDP), increasing political and financial support for statistics, and ensuring that countries are able to produce the data and statistics needed for monitoring and evaluating their development outcomes. A Global Action Plan for Sustainable Development Data has come into being at Cape Town in January 2017 for coordinated action on capacity development for sustainable development data.

Is it still too soon to expect the governments and statistics agencies to be investing resources and energies for augmenting the capacity to produce sustainable development data?

The governments and statistics agencies investing resources and energies for augmenting the capacity to produce sustainable development.
Image creditMeghna Chakrabarti

Dreaming a dream of the spring setting in when the agencies of change go on waving the magic wands (to transform the data eco-system), the ‘goose’ of golden data lies in slumber. And thus, SCI keeps on serving with the scores as they are – the only presentable measures of statistical capacity that does not lay the ‘golden eggs’.

This blog originated out of a dinner table conversation at the Chakrabarti household, where Satyabrata Chakrabarti (the dad and former Deputy Director General at Central Statistics Office, Government of India), tries to convince his two daughters of the impact and current assessment of a statistical tool for social and economic sectors. While Meghna and I go in a trajectory to assess, its impacts in our fields.

Cover image by Meghna Chakrabarti (L), Author Satyabrata Chakrabarti (M) and editing/blog design by Rituparna Chakrabarti (R)

We publish using the Creative Commons Attribution (CC-BY) license so that users can read, download and reuse text and data for free – provided the authors, illustrators, and the primary sources are given appropriate credit.

The ‘Capacity’ Paradox

Editor’s note: In this two-part blog series, the author explores the issue of the statistical capacity deficit in the current global context. In part 1, his main aim is to uphold the interplay of the system’s capacity, it’s efficacy and productivity, impacting the socio-economic sphere globally, especially the developing nations.

I have been tickling my imagination to mentally visualize the ‘golden eggs’ laid by a goose with ‘statistical capacity’. One day, no sooner I closed my eyes than my thought process veered off the track and entered the dull territory of numbers. The detour thereafter is the story here – passing on a road lined with numerical landmarks shining gloriously with images of statistics atop in some semblance of ‘jewels in the crown’. 

Indeed, in the beginning, as I kept on stirring my imagination for a while, I did make an effort to identify something royal about numbers. It’s then I realized the connection that ‘statistical system’ has with the business of the king. The journey, therefore, started with me being captivated by the ideas and concepts of a system’s capacity that enables rulers to indulge in statistical stockpiling, although they turned out to be harmlessly mundane and theoretical as in the descriptions to follow.

The golden eggs laid by the statistical capacity. Image credit Meghna Chakrabarti

A system, as the wise men say, has to perform a given set of tasks routinelyy. So, it must have some capacity to do so. So a system, or for that matter a person or a machine which can produce something has a capacity to create a certain quantity of something in a given period. This is quite trivial. However, what’s not so nugatory is that the capacity does not necessarily remain equally productive all the time. When capacity is not equally productive, i.e., sometimes less productive and sometimes more, then the wise men say that it’s a matter of efficiency of the system or the person.

So, how do I know the efficiency of the ‘goose that laid golden eggs’? Is that conceivable?

For a moment I’m inclined to believe: yes, it is. For there is a measure called productivity to quantify the efficiency of a person, machine, factory, or a system. Who doesn’t know that inputs (i.e., labor, material, energy, cost, time, etc.) are used by the system and converted into useful outputs? So this notion about the effectiveness of any productive effort termed productivity is measured in terms of the rate of output per unit of input.

A trick, however, lies here in the fact that even when you keep the inputs fixed, the productivity can vary.

That’s because there are factors which, uncontrollably, unnoticeably or otherwise, influence the output to change from time to time. These factors, even when known for the role they play, often remain unrecognized or unexplained. If they are not explainable or perceptible, variation in productivity may appear strange, or even intriguing. And you know it very well that Mark Twain was intrigued; intrigued by his own productivity figures. Mark Twain wrote in 1906,

Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics….

What puzzled Mark Twain was the output-to-time ratio as a measure of his ability to write words (i.e., writing productivity) since the ratio when measured on two separate occasions differed violently in magnitude. This is quite an analogy to the cases where we observe variation in human performance (nay, productivity) due to factors not easily visible/ discernible at the time of measuring it. As for example, weather, time of the day or, level of comfort in the room or something of that sort having an effect on the mood or psychological/mental condition of the performer.

However, the great author in the pleasure of his own creative inspiration perhaps made himself imperceptive to the factors playing on his own productive capacity. And as a result, that remark of Disraeli that he immortalized in his auto-biography made statistics damnable forever.

Nevertheless, in a contrasting scene of reality, statistical numbers indicating the state of development often fail to reveal the vital things when they do matter a lot. Even those produced in the mills of highly reputed institutions or by the national statistical offices acclaimed for their high statistical standards are frequently damned for such reasons.

Statistical capacity building enables statistical practitioners in the public and private sectors to use methods for data collection, analysis and interpretation; and contributing to the development of statistical infrastructure and human resources in official, survey, business, education and research. Image credit:

Heaps of literature on concepts and methods for collection of data, a compilation of statistics and dissemination of results including analytical tools have been produced and promoted under the aegis of the United Nations Statistical Commission (UNSC) over the last 70 years just to make official data all over the world sounding credible and dependable. These works covering almost all fields of economic and social relevance in development have helped to strengthen national statistical systems, especially the statistical capacity building.

By ‘statistical capacity’ they mean a nation’s ability to collect, analyze and disseminate high-quality data about its population and economy. In this sense, however, countries have attained varying levels of capacity at the national level, the most significant difference being in the ability to collect the data. The collection of data for calculating recommended statistical measures at regular intervals involves systemic rigor and requires considerable resources for conducting large scale operations. The situation becomes all the more challenging when the system gets to respond to new realities, e.g., evolution of the measurement paradigm necessitated by new statistics-based evaluation.

No system, in fact, can very quickly adapt to significant reforms in the statistical framework that may evolve as a necessity. The national policies which are able to quickly respond and make structural/procedural transformation are really robust in statistical capacity.

The World Bank developed a Statistical Capacity Indicator (SCI) for assessing the capacity of a country’s statistical system. It is a composite score based on a diagnostic framework assessing the areas: methodology, data sources, and periodicity and timeliness. Countries are scored against 25 criteria in these areas, using publicly available information and/or country input. The overall statistical capacity score is then calculated as a simple average of all the area scores on a scale of 0 – 100. From the scores of nearly 140 developing countries of the world in the last 15 years, what is evident is a clear trend of improvement for most of the nations, though there are times for falls after rises.

Afghanistan, for example, rose from almost a state of void (24.4 in 2004) to 50.0 in 2018; it’s a success story of a devastated nation. The period covered coincides with the countries being engaged with the Millenium Development Goals (MDGs) and therefore, the third dimension: ‘periodicity and timeliness looks at the availability of key socio-economic indicators, of which 9 are MDG indicators.

Statistical capacity indicator scores from 2004-18 in 8 developing nations. Data compiled by the author, source World Bank, SCI database.

It’s interesting to see how some of the major developing countries have fared. Mexico, a high performing country with a scoreline of 74.4 in 2004, 85.6 in 2010, and 92.2 in 2015, finished at 96.7 in 2018 – a story of steady improvement; especially against an ordinary scenario for its own region (Latin America & the Caribbean) with the average score hovering in the range of 74 – 78 during the period, it’s remarkable.

Against South Asia’s regional score ranging between 65 and 76, India finished at 91.1 in 2018 moving through 78.9 in 2004, 81.1 in 2010 and 77.8 in 2015; whereas Bangladesh moves from 70.0 in 2004 to 72.2 in 2018 after a rise to 76.6 in 2015; and Pakistan moved from 73.3 in 2004 to 78.9 in 2018.

Indonesia, another major country in East Asia and the Pacific region having a regional score of 77.5 in 2018, moved from 86.7 in 2004 to 90.0 in 2018. On the other hand, the Philippines and Thailand of this region remained almost static during the period scoring in the range 81 – 88. In sub-Saharan Africa, Rwanda did remarkably well moving from 61.1 in 2004 to 78.9 in 2018 and exceeding the regional score of 62.4 in 2018 by a considerable margin, whereas Ghana, not so worse a beginner moved up by 20 points from 51.1 in 2004 to 71.1 in 2018; and Tanzania moved from 67.8 to 71.1 during the corresponding period.

But is this enough? In the coming blog, we will discuss the fallacy in the SCI system and how it fails to give a credible picture of the statistical capacity of the developing countries. I will try to answer the following question – is SCI a proper tool in the present context to measure with? Also, I will talk about how the Sustainable Development Agenda requires a new measurement paradigm and offers opportunities for national systems to evolve into core entities of country-specific data eco-systems for the same.

This blog originated out of a dinner table conversation at the Chakrabarti household, where Satyabrata Chakrabarti (the dad and former Deputy Director General at Central Statistics Office, Government of India), tries to convince his two daughters of the impact and current assessment of a statistical tool for social and economic sectors. While Meghna and I go in a trajectory to assess, its impacts in our fields.

Cover image by Meghna Chakrabarti (L), Author Satyabrata Chakrabarti (M) and editing/blog design by Rituparna Chakrabarti (R)

We publish using the Creative Commons Attribution (CC-BY) license so that users can read, download and reuse text and data for free – provided the authors, illustrators, and the primary sources are given appropriate credit.