Sounds of Unsound ‘P’! Tuning the Data or Striking the Wrong Note?…

Editor’s note: For long, many scientists’ careers have been built upon the pursuit of a single statistical value of p<.05. In many disciplines, that’s the cut off beyond which results can be declared “statistically significant,” i.e., the results obtained were not by fluke. Though this isn’t what it actually means in practice.

In this article, we try to highlight the hypothesis myopia suffered by researchers and analysts, in the pursuit of gathering shreds of evidence in support of a theory while ignoring explanations against it or its rationale. In the end, does statistical significance is equivalent to clinically meaningful data? Or it is an outcome of torturing the data enough so that the data will confess at some point?

One day recently, I was not sure when my wife asked me if she was required to fast for a thyroid function blood test that her doctor had prescribed. So I googled for an answer. The search fetched me first some pieces of information specific to the thyroid test procedures followed by the Cleveland Clinic.

These were related to three tests viz. Thyroid Stimulating Hormone (TSH), Thyroxine (T4) and Microsomal Thyroid Antibodies (TPO) and pronounced that none of them required fasting, and could be tested anytime during the day. A moment later, I opened another page of my search result. It was a research paper published by a medical journal regarding the effect of fasting as compared to not-fasting in the interpretation of thyroid function tests.

Interestingly, the conclusion of the study stated that TSH levels showed a statistically significant decline postprandial (i.e., after a meal/without fasting) in comparison to fasting values. So, I came across an evidence-based viewpoint not favoring the existing procedures of testing followed in most of the diagnostic centers (as in practice in Cleveland Clinic).

I got curious nonetheless, about this particular study; for, it used statistical techniques to back a divergent idea. After reading the paper, I noted the conclusion was effectually undermining the objective the study set for itself at the beginning. The research paper at the beginning itself, clearly declared, it addressed the question: whether a fasting or non-fasting sample would make a clinically significant difference in the interpretation of thyroid function tests. However, doing injustice to this aim, it didn’t sufficiently focus on how the observed variance between fasting and non-fasting samples, which was somewhat expected in any case, should matter clinically.

Especially when it was known that statistical significance is merely a necessary condition, not a necessary and sufficient condition; the conclusion actually led to nowhere. If finding a statistical significance implied and was implied by clinical significance, it would have been ‘necessary and sufficient’ to make an impact on diagnostic practices. So, their statement in conclusion at the end evidently was a self-limiting one, not analyzing the clinical implication of the difference for which a statistical significance testing was conducted. For many clinical studies that look into the effects of particular treatment/factor, it’s very often a problem of sufficiency-deficit, being ultimately trapped in a quandary about statistical significance vis-a-vis clinical significance.

Certain other studies, however, have also shown that early morning blood samples were taken after overnight fasting give rise to higher TSH levels compared to those taken later in the day with no fasting. Mary Shomon, the author of the New York Times best-seller “The Thyroid Diet Revolution: Manage Your Metabolism for Lasting Weight Loss” while discussing on various factors that can potentially influence the TSH level such as medication, pregnancy, etc., considered the fasting/non-fasting variation to be especially problematic in clinical diagnosis of thyroid malfunction.

So there was hardly any knowledge addition in knowing that the fasting/non-fasting difference was statistically significant. In any case, a firm answer to the question my wife asked remained elusive as medical science was still not prepared to recognize the fasting/non-fasting variation in TSH and free T4 levels as clinically relevant (i.e., a ‘sufficiency’ condition not automatically implied).

Nevertheless, it was quite evident to me that researchers and analysts, in many cases suffer from hypothesis myopia in the pursuit of collecting evidence in support of a hypothesis while ignoring explanations against it or its rationale. There is a saying among statisticians: Torture the data, and the data will confess. So they don’t stop wrenching the data till they show a statistical significance and the moment they get it they don’t go beyond. This tendency is not new and brought about in the past, perilous consequences for many a hyped discovery failing to protect the claimed statistical significance. Incidentally, I caught some glimpses of the various abuses of statistical significance from the web. They speak volumes of the degree of concern for the scientists.

Clinical significance necessarily means Statistical Significance, but the converse is not always true.
Illustration by Meghna Chakrabarti

What of course was bothering the scientists from a statistical point of view was related with the researchers’ taking recourse to purposeful dredging and tweaking of data (p-hacking) until the elusive statistical significance is reached to invalidate a hypothesized proposition. The hypothesis (null hypothesis) is often plain guesswork about a phenomenon without expending sufficient efforts to describe or analyze the practical significance of the theory and the risk of the conclusion being subsequently found irreproducible or inadequate in effect-size.

The keenness and motivation of researchers to anyhow publish papers based on statistical significance which were, later on, proved false-positive assumed so alarming a proportion over the decades that scientists even started looking for an estimate of what percentage of published results were subsequently proved wrong. Giving an idea of this quest concerning certain fields of science, in particular, the video here demonstrates a really impressive effort to make people appreciate how such malicious practices with manipulated statistical evidence is not doing any good to science.

So, no wonder why a commotion has been created of late by more than 800 scientists who called for denouncing the use of statistical significance in scientific inferences. In the sections to follow after this paragraph, I shall just focus on two factors which I consider central to the problems about the conceptual recourse of the researchers giving rise to data manipulation.

The problem may have partially rooted in dichotomania!

Statistical hypothesis testing based on samples of numeric observations on a quantitative characteristic (generally a continuous variable) under study in any field of science is a widely used technique and makes use of a test statistic to determine whether to reject a null hypothesis about the characteristic or not. Essentially, it’s a tool in the hands of researchers to probabilistically conclude whether, given one set of results (found by observations taken on a variable), a particular null hypothesis about the nature of variation of the variable as opposed to an alternative hypothesis is significantly plausible or not. The technique thus allows you to dichotomize the range of possibilities either as acceptable (say, white) or as not-acceptable (say, black) ones. Thereby you tend to ignore the bigger picture, a holistic view of the whole spectrum of colors. In the case I cited at the beginning, the study merely focused on the two distinguishable shades of variability (i.e., variations due to fasting & not-fasting) in TSH and free T4 levels among patient and non-patient categories, one purportedly being more acceptable than the other while both are not necessarily in conflict. With so much overlap between their ranges of variation specific to the categories of the states of the sampled individuals, the ‘p’ is not indicative enough.

Why should you like to focus your lens on two shades only when the whole is so colorful beyond your view?
Illustration by Meghna Chakrabarti

A measurable quantity which can be represented by a continuous variable not necessarily shows a dichotomous character (e.g., black and white). So, its range of variation shouldn’t be unnecessarily bisected in two parts across a cut-off line. That is to say that it may not always be essential to find what makes a value of a continuous variable to be on one side of a cut-off line rather than being on the other side of it when the cut-off itself is artificial to the variable.

On the contrary, by application of statistics in science, we intend to study the natural variation of a phenomenon (not a man-made difference) or the effect of a man-made variation on the natural variation of an event. It relies on a fundamental premise that it is just natural for a quantity representing a phenomenon to vary over a range of values (in which every imaginable number of the variable is likely to occur). The variation most often doesn’t depend so exactly on some other factor/variable(s) that it can be described by a mathematical function, though an inexplicable relationship may exist. Unlike laws of Physics which define exact relationships between physical quantities, laws of mathematical exactitude don’t always exist for the theories in certain other fields of science (e.g., psychology, life sciences, etc.) and their phenomena.

Methods of statistical inferencing are, therefore, applied most extensively in these areas for showing the effect of or, association with other factors (variables) that may exist and influence the variability of the experimented variable. In doing so, researchers do have to introduce assumptions about the nature of variation of the observed numbers. According to Sander Greenland’s findings, cognitive biases of researchers (dichotomania being one of them) including untenable assumptions, play mysterious roles for the null-hypothesis-significance-testing (NHST) that often results in false-positive significance. Principally because, biased intuitive reasoning usually takes over the logical consistency in inferential arguments as The American Statisticians (Vol. 73, No. 51, 2019; Editorial) puts the essence of it as follows.

A label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important. Nor does a label of statistical non-significance lead to the association or effect being improbable, absent, false, or unimportant. Yet the dichotomization into “significant” and “not significant” is taken as an imprimatur of authority on these characteristics.

Applying a standard theoretical model on observed data may be untenable

It is widely acknowledged by a large section of the science community that there is a considerably high chance that a variable representing a phenomenon may not follow a particular probability distribution (theoretical model) as is required to be valid for testing of hypotheses. Thus prior investigation into the pattern of variability with a large number of observations to know how much close or distant the pattern is from the presumed model is absolutely critical. Else, blindly accepting a probability distribution reifying the very nature of the variability as being conforming to the standard model may actually be a far-fetched imposition.

In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred under the condition that the null hypothesis is true. In other words, a result is statistically significant, by the standards of the model as assumed, when p < α [where α, the predefined significance level (generally very small, 0.05 or smaller) is the probability of committing a type-I error i.e. the error in rejecting the null hypothesis, when it is actually true; and p (called the p-value of the study) is the probability of obtaining a result at least as extreme, given that the null hypothesis was true]. Though it is intended to keep the chance to commit type-I error minimum by taking α = 0.05, say i.e. taking a 5% chance only to be wrong to reject the null hypothesis when it is actually true under the model distribution assumed for the population at large, it may be quite likely that the distribution, in reality, being non-conforming to the assumed model doesn’t have so low a probability for that wrong to materialize.

Chance to commit a Type-I error may not actually be as small as you wanted it to be if you have assumed a probability distribution for the variable that happens to be unrealistic. Illustration by Meghna Chakrabarti

A few years back, the American Statistical Association (ASA) released a policy statement aiming to halt misuse of p-values. This was the first time that the 177-year-old ASA made explicit recommendations on such a foundational matter in statistics. Explaining the significance of the ASA recommendation, Nature emphasized on the need to weighing the evidence instead of blindly accepting a p-value of 0.05 or less to mean that a finding is statistically significant:

A p-value of 0.05 does not mean that there is a 95% chance that a given hypothesis is correct. Instead, it signifies that if the null hypothesis is true, and all other assumptions made are valid, there is a 5% chance of obtaining a result at least as extreme as the one observed. And a p-value cannot indicate the importance of a finding; for instance, a drug can have a statistically significant effect on patients’ blood glucose levels without having a therapeutic effect.

Recognizing the seriousness of the tendency of a large section of scientists and statisticians finding themselves constrained to selectively publish their results based on a single magic number, a very recent special issue of The American Statisticians has prescribed what not to do with p-values and significance testing. This advisory comes along with 43 innovative and thought-provoking papers to guide the researchers about what to do as well to face research questions. In essence, the recommendations, envisage a new world order for hypothesis testing, where studies with “p < 0.05” and studies with “p > 0.05” are not automatically in conflict, and therefore, researchers will see their results more easily replicated—and, even when not, they will better understand why.

This blog originated out of a dinner table conversation at the Chakrabarti household, where Satyabrata Chakrabarti (the dad and former Deputy Director General at Central Statistics Office, Government of India), tries to convince his two daughters how statistical significance may or may not be equivalent to clinically meaningful data

Cover image by Meghna Chakrabarti (L), Author Satyabrata Chakrabarti (M) and content research/editing/blog design by Rituparna Chakrabarti (R)

This image has an empty alt attribute; its file name is brain-1-2.png

We publish using the Creative Commons Attribution (CC-BY) license so that users can read, download and reuse text and data for free – provided the authors, illustrators, and the primary sources are given appropriate credit.

Sleep-flying is a thing!

I assure you that this blog is not about astral projections, but about the mindblowing discovery on how birds can sleep-fly without bumping into trees. So, here you go! I have given away the suspense. Nonetheless, do read along as I cherish my love for sleeping and long-standing collaboration with Ipsa Jain.

Scientists have found that migratory birds can fly for 200 days straight, eating and sleeping while soaring through the sky. Image credit Ipsawonders

A few years back, I got the chance to visit Sultanpur Bird Sanctuary, India. It is a magical place to be. Every winter, around 250 species of birds and 1 enthusiastic Homo species known as Bird watchers confluence in the park. Both playing one’s cards close to their chest; displaying their magnificence, skills and power.

Then there is me fighting off my early morning slumber, and continuously bickering about how long can it possibly take to reach the park through the infamous Delhi-NCR traffic. However, the serenity of this place has something, that allowed me to think about how these nomadic birds sleep while migrating all the way from Siberia, Russia, Turkey, and Eastern Europe.

Alas! I’m not the only one who comes up with such fantastic thoughts. For years, scientists have been suspicious that birds could sleep mid-flight, as several bird species can fly non-stop for weeks. On the other hand, some researchers propose that few birds can forgo sleep entirely while flying for extended periods of up to 200 days straight.

This time scale will lable a human insane, even if s(he) contemplates trying it out, Isn’t it? Humans along with many other species would experience irritability, hallucination, cognitive impairments, paranoia, and psychosis as side effects of sleep deprivation within 3 days or less.

So what makes birds’ brain so special?

Due to the lack of studies monitoring the sleep patterns of flying birds, the above hypotheses had previously been uncharted. Ratthenborg and his team in 2016, were among the first ones to pursue this question as they embarked on a red-eye flight to the Galápagos Islands; monitoring the brain activity of great frigatebirds (Fregata minor).

The great frigatebird is a fascinating model to study these questions as this species of large seabirds can spend weeks continuously flying over the ocean in search for food and shelter, and to my surprise without bumping into obstacles on its way. The team’s work provided evidence that birds do indeed sleep while flying.

The great frigatebird (Fregata minor) is a large seabird in the frigatebird family. Their nesting populations are located in the tropical Pacific (including the Galapagos Islands) and Indian Oceans, as well as a small population exist in the South Atlantic. Image credit Charles J Sharp

How do they know that?

The team attached a lightweight, portable device onto the heads of frigatebirds, to track the brain activity. Their equipment used electroencephalography (EEG) to identify if and when the birds were asleep during the flight. After 10 days of non-stop flight, the birds returned to land, and the researchers recollected the devices to observe the results.

The team showed that flying frigatebirds display unihemispheric slow wave sleep (USWS). It is a unique capability of the brain, that allows the animals to doze off one hemisphere of the brain at a time. This way is allowing them to watch out for potential threats and roadblock through one open eye.

Other animals and birds are also equipped with such a superpower. For example, the Dolphins have been observed to exhibit USWS, letting them sleep while swimming. Also, on land, the Mallard ducks (Anas platyrhynchos) keep one cerebral hemisphere up and running letting the corresponding eye open, directed away from the fellow flock-mates, but toward potential threats. This way it has devised a safety net out of the use of USWS, when sleeping at the edge of their group.

The Mallard (Anas platyrhynchos) is a dabbling duck that breeds throughout the temperate and subtropical Americas, Eurasia, and North Africa. Image credit momentofscience

If now you are thinking that this is the coolest part, wait for it.

Rattenborg and his colleague also found that frigatebirds continue to fly even when both the cerebral hemispheres are asleep, that means both the eyes are entirely closed. For simplicity sake, imagine it as some sort of autopilot mode. The monitored birds in this study, even experienced brief bouts of rapid eye movement (REM) sleep, although they lasted only a few seconds. They observed that during deep REM sleep birds head droops due to relaxed muscle tone, although this did not affect the flight pattern. Suggesting that the frigatebirds did sleep for brief periods in mid-flight (~ 42 min per day), they spent a majority of the flight awake and half-brain awake.

Admittedly, it still remains unclear how birds have adapted to function with such little amount of sleep. Nevertheless it opens up other questions like, why us and many other animals suffer consequences of sleep deprivation dramatically.

The cover image is made by a science communicator friend, Ipsa Jain. She uses arts and design to start conversations about science. Ipsawonders is one woman labor of love. She wants to create beautiful things that speak science

We publish using the Creative Commons Attribution (CC-BY) license so that users can read, download and reuse text and data for free – provided the authors, illustrators, and the primary sources are given appropriate credit.

The ‘Capacity’ Paradox – part 2

Editor’s note: Continuing the narrative from the previous part, the author delves in the depth of capacity crisis for sustainable development data. He is optimistic that the challenges are going to open up new vistas for national and international agencies to cooperate on mutually reinforcing efforts.

Not grabbing the opportunity for creating the appropriate capacity to address the data gaps, in the opinion of the author, could be as costly a mistake as failing the people of their rights. The narrative ends with a sigh for the dream data remaining elusive.

I presented some pieces of data and a chart in the last part related to some selected countries to demonstrate how the countries have progressed in developing the national capacity for production and dissemination of official statistics.  Any intelligent person who knows that one must always look into the elements the data are made of would be reasonably justified to conclude that the countries of the developing world are doing well in gradually improving their statistical capacity in terms of a defined set of parameters

At the same time, however, what message these impressive ‘score-lines’ give to a layman is a question that may have bothered none. Could it be simply a notion as being understood by a common man that some countries have already crossed the 90% mark and therefore, have not much to achieve further? If so, is it not an impression NSO’s would love to project in unreserved exhibitionism?

Now there lies a fallacy in this notion.

Let’s face it as a question that a common man could ask: Does the score in a true sense reflect a complete image of a county’s statistical capacity?  Or, it’s just a fractured image? 

One must not miss seeing what the SCI metadata has to say about this by way of explaining the structure, constituents, method, and rationale of the indicator. However, before we are done with the metadata thing, let’s try to understand the fallacy a bit more clearly.

Countries are severely constrained in statistical capacity to meet the challenges confronting the statistical systems as of 2018. Image credit Meghna Chakrabarti

Countries are in fact severely constrained in statistical capacity to meet the challenges confronting the statistical systems as of 2018. By 2016, the nations of the world were already seized with the 2030 Agenda of Sustainable Development Goals (SDGs) that their leaders had committed to achieving in the interest of the humanity and the planet earth.

Attaining the SDG targets under the goals by the year 2030, as they say, crucially depend on the countries’ ability to track the progress with good penetrative statistics. Knowing just the national count of the people to be reached with the good effects is not enough; a much stronger statistical capacity is required to know who they are, where they are located and what challenges they are facing.

A new statistical framework of 230+ indicators has been prescribed for tracking progress towards attaining the targets. An assessment of the IAEG-SDGs of the UN has revealed that as of 13 February 2019: The updated tier classification contains 101 Tier I indicators, 84 Tier II indicators, and 41 Tier III indicators

This means, over 125 indicators (over 54%) belonging to Tier II,  Tier III or multi-tier category have either no internationally established standards or methodology available, or data are not regularly published by the countries.  This is a serious capacity deficit and a global syndrome.

Apart from these conceptual and methodological challenges, the overarching SDG principle of leaving no one behind has raised the bar (for the national statistical offices as well as for global monitoring agencies). UNICEF’s report ‘Progress for Every Child in the SDG Era underscores the criticality:

It is no longer enough to monitor progress by global aggregates or national averages alone. Results need to be disaggregated to monitor progress among sub-national groups of people, especially those who are vulnerable such as the girls, children living in remote rural areas or informal urban settlements.

Children and Gender equality are central to the whole of the SDG agenda. 44 child-related indicators are situated under the 17 SDGs. Analyzing these indicators, UNICEF’s report maps them thematically into 5 dimensions of children’s rights:

  • the right to survive and thrive (17 indicators under SDG 2 and SDG 3)
  • the right to learn (5 indicators under SDG 4)
  • the right to be protected from violence (10 indicators under SDG 5, SDG 8 and SDG 16)
  • the right to live in a safe and clean environment (10 indicators under SDG 1, SDG 3, SDG 6, SDG 7 and SDG 13)
  • the right to a fair chance / to have an equal opportunity to succeed (4 indicators under SDG 1)

The report reveals that data are not available for each of these dimensions in substantial proportions (of the 202 countries covered): 22% missing the data for the dimension ‘survive & thrive’; 63% for the dimension ‘learning’; 64% for the dimension ‘protection’; 24% for the dimension ‘environment’; and 63% missing for the dimension ‘fair chance’.

If we look at the gender responsive indicators of the SDGs (there are 54 in number spread over 12 of the 17 SDGs), the story is no different – “only about 26% of the data necessary for global monitoring of the gender-specific indicators are currently available”.

When this is the picture at the national level, the stories of non-availability of data at the sub-national levels for the child- or gender-related indicators, or for any other sub-populations/groups of people are obviously all the more disquieting, not to speak of the untrodden areas of environment, climate, life below water and the like affecting livability on earth.

So, a person struggling for the SDG data would wonder whether the 2018 SCI score is any indication of the actual current statistical capacity of a country or what? As its structural constitution is defined for a pre-SDG framework and not redesigned post-2015 to account for the data on SDG indicators, SCI in the present form just reflects a statistical capacity for an incomplete basket of statistical deliverables.

The varying degrees of challenges confronting national statistical offices with the advent of the SDGs in not being able to produce the required data, especially disaggregated data, for the lack of capacity could be somewhat fathomed if realistically assessed statistical capacity were known overall for the countries and for the constituent dimensions. A warrant for action is already spelled out in what Target 18 under SDG 17 states:

By 2020, enhance capacity-building support to developing countries, including for least developed countries and small island developing States, to increase significantly the availability of high-quality, timely and reliable data disaggregated by income, gender, age, race, ethnicity, migratory status, disability, geographic location and other characteristics relevant in national contexts.

The target essentially calls for action on the part of development agencies which will eagerly seek information on the intensity and depth of capacity deficit in the face of the SDGs across the world to assess how much effort and energy they may have to apply and where.

The SCI, if re-engineered taking into account the SDG indicators, could provide much of that all-important insight in terms of the score-levels of statistical capacity. Undoubtedly, a new score for 2018 by the hypothetical SDG-laden SCI should invariably be much lower than the existing 2018 score for even the very-high-scoring countries in the World Bank database. Now these hypothetically discordant scores as compared to the existing scorelines could potentially instigate debate within the number cognoscenti.

If it were so happening, anyone, in the same way as what happened to Mark Twain, might easily be tempted to ascribe the denigrated numbers to incredibility of statistics. Save probably those who studied the metadata and appreciated the plausibility of how scores could dwindle on loading the construction framework of SCI with additional parameters as of SDGs; they might easily argue, ‘a student of the fourth standard scoring 90 % overall in the fourth grade examination will in all probability do extremely badly if allowed to sit for an eighth grade test.’ Then that’s a matter of capacity gap – a gap due to the difference in the frame of reference as it would be the case if the SDGs were embedded in the SCI framework.

Statistical capacity indicator scores from 2004-18 in 8 developing nations. Data compiled by the author, source World Bank, SCI database.

In the chart depicted in the previous section (also see above), the temporal ups and downs of the lines, however, are happening in spite of no change in the frame of reference of the SCI. These are somewhat akin to fluctuations in productivity (I spoke of in part 1).Change in the frame of reference by placing the SDGs on the SCI framework could materialize if and only when the data would be forthcoming on the majority of SDG indicators.

Since 2004, when Marrakesh Action Plan for Statistics was developed,  strategic planning has been  recognised to be a powerful engine for guiding the  national statistics development programmes (NSDP), increasing political and financial support for statistics, and ensuring that countries are able to produce the data and statistics needed for monitoring and evaluating their development outcomes. A Global Action Plan for Sustainable Development Data has come into being at Cape Town in January 2017 for coordinated action on capacity development for sustainable development data.

Is it still too soon to expect the governments and statistics agencies to be investing resources and energies for augmenting the capacity to produce sustainable development data?

The governments and statistics agencies investing resources and energies for augmenting the capacity to produce sustainable development.
Image creditMeghna Chakrabarti

Dreaming a dream of the spring setting in when the agencies of change go on waving the magic wands (to transform the data eco-system), the ‘goose’ of golden data lies in slumber. And thus, SCI keeps on serving with the scores as they are – the only presentable measures of statistical capacity that does not lay the ‘golden eggs’.

This blog originated out of a dinner table conversation at the Chakrabarti household, where Satyabrata Chakrabarti (the dad and former Deputy Director General at Central Statistics Office, Government of India), tries to convince his two daughters of the impact and current assessment of a statistical tool for social and economic sectors. While Meghna and I go in a trajectory to assess, its impacts in our fields.

Cover image by Meghna Chakrabarti (L), Author Satyabrata Chakrabarti (M) and editing/blog design by Rituparna Chakrabarti (R)

We publish using the Creative Commons Attribution (CC-BY) license so that users can read, download and reuse text and data for free – provided the authors, illustrators, and the primary sources are given appropriate credit.

The ‘Capacity’ Paradox

Editor’s note: In this two-part blog series, the author explores the issue of the statistical capacity deficit in the current global context. In part 1, his main aim is to uphold the interplay of the system’s capacity, it’s efficacy and productivity, impacting the socio-economic sphere globally, especially the developing nations.

I have been tickling my imagination to mentally visualize the ‘golden eggs’ laid by a goose with ‘statistical capacity’. One day, no sooner I closed my eyes than my thought process veered off the track and entered the dull territory of numbers. The detour thereafter is the story here – passing on a road lined with numerical landmarks shining gloriously with images of statistics atop in some semblance of ‘jewels in the crown’. 

Indeed, in the beginning, as I kept on stirring my imagination for a while, I did make an effort to identify something royal about numbers. It’s then I realized the connection that ‘statistical system’ has with the business of the king. The journey, therefore, started with me being captivated by the ideas and concepts of a system’s capacity that enables rulers to indulge in statistical stockpiling, although they turned out to be harmlessly mundane and theoretical as in the descriptions to follow.

The golden eggs laid by the statistical capacity. Image credit Meghna Chakrabarti

A system, as the wise men say, has to perform a given set of tasks routinelyy. So, it must have some capacity to do so. So a system, or for that matter a person or a machine which can produce something has a capacity to create a certain quantity of something in a given period. This is quite trivial. However, what’s not so nugatory is that the capacity does not necessarily remain equally productive all the time. When capacity is not equally productive, i.e., sometimes less productive and sometimes more, then the wise men say that it’s a matter of efficiency of the system or the person.

So, how do I know the efficiency of the ‘goose that laid golden eggs’? Is that conceivable?

For a moment I’m inclined to believe: yes, it is. For there is a measure called productivity to quantify the efficiency of a person, machine, factory, or a system. Who doesn’t know that inputs (i.e., labor, material, energy, cost, time, etc.) are used by the system and converted into useful outputs? So this notion about the effectiveness of any productive effort termed productivity is measured in terms of the rate of output per unit of input.

A trick, however, lies here in the fact that even when you keep the inputs fixed, the productivity can vary.

That’s because there are factors which, uncontrollably, unnoticeably or otherwise, influence the output to change from time to time. These factors, even when known for the role they play, often remain unrecognized or unexplained. If they are not explainable or perceptible, variation in productivity may appear strange, or even intriguing. And you know it very well that Mark Twain was intrigued; intrigued by his own productivity figures. Mark Twain wrote in 1906,

Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics….

What puzzled Mark Twain was the output-to-time ratio as a measure of his ability to write words (i.e., writing productivity) since the ratio when measured on two separate occasions differed violently in magnitude. This is quite an analogy to the cases where we observe variation in human performance (nay, productivity) due to factors not easily visible/ discernible at the time of measuring it. As for example, weather, time of the day or, level of comfort in the room or something of that sort having an effect on the mood or psychological/mental condition of the performer.

However, the great author in the pleasure of his own creative inspiration perhaps made himself imperceptive to the factors playing on his own productive capacity. And as a result, that remark of Disraeli that he immortalized in his auto-biography made statistics damnable forever.

Nevertheless, in a contrasting scene of reality, statistical numbers indicating the state of development often fail to reveal the vital things when they do matter a lot. Even those produced in the mills of highly reputed institutions or by the national statistical offices acclaimed for their high statistical standards are frequently damned for such reasons.

Statistical capacity building enables statistical practitioners in the public and private sectors to use methods for data collection, analysis and interpretation; and contributing to the development of statistical infrastructure and human resources in official, survey, business, education and research. Image credit:

Heaps of literature on concepts and methods for collection of data, a compilation of statistics and dissemination of results including analytical tools have been produced and promoted under the aegis of the United Nations Statistical Commission (UNSC) over the last 70 years just to make official data all over the world sounding credible and dependable. These works covering almost all fields of economic and social relevance in development have helped to strengthen national statistical systems, especially the statistical capacity building.

By ‘statistical capacity’ they mean a nation’s ability to collect, analyze and disseminate high-quality data about its population and economy. In this sense, however, countries have attained varying levels of capacity at the national level, the most significant difference being in the ability to collect the data. The collection of data for calculating recommended statistical measures at regular intervals involves systemic rigor and requires considerable resources for conducting large scale operations. The situation becomes all the more challenging when the system gets to respond to new realities, e.g., evolution of the measurement paradigm necessitated by new statistics-based evaluation.

No system, in fact, can very quickly adapt to significant reforms in the statistical framework that may evolve as a necessity. The national policies which are able to quickly respond and make structural/procedural transformation are really robust in statistical capacity.

The World Bank developed a Statistical Capacity Indicator (SCI) for assessing the capacity of a country’s statistical system. It is a composite score based on a diagnostic framework assessing the areas: methodology, data sources, and periodicity and timeliness. Countries are scored against 25 criteria in these areas, using publicly available information and/or country input. The overall statistical capacity score is then calculated as a simple average of all the area scores on a scale of 0 – 100. From the scores of nearly 140 developing countries of the world in the last 15 years, what is evident is a clear trend of improvement for most of the nations, though there are times for falls after rises.

Afghanistan, for example, rose from almost a state of void (24.4 in 2004) to 50.0 in 2018; it’s a success story of a devastated nation. The period covered coincides with the countries being engaged with the Millenium Development Goals (MDGs) and therefore, the third dimension: ‘periodicity and timeliness looks at the availability of key socio-economic indicators, of which 9 are MDG indicators.

Statistical capacity indicator scores from 2004-18 in 8 developing nations. Data compiled by the author, source World Bank, SCI database.

It’s interesting to see how some of the major developing countries have fared. Mexico, a high performing country with a scoreline of 74.4 in 2004, 85.6 in 2010, and 92.2 in 2015, finished at 96.7 in 2018 – a story of steady improvement; especially against an ordinary scenario for its own region (Latin America & the Caribbean) with the average score hovering in the range of 74 – 78 during the period, it’s remarkable.

Against South Asia’s regional score ranging between 65 and 76, India finished at 91.1 in 2018 moving through 78.9 in 2004, 81.1 in 2010 and 77.8 in 2015; whereas Bangladesh moves from 70.0 in 2004 to 72.2 in 2018 after a rise to 76.6 in 2015; and Pakistan moved from 73.3 in 2004 to 78.9 in 2018.

Indonesia, another major country in East Asia and the Pacific region having a regional score of 77.5 in 2018, moved from 86.7 in 2004 to 90.0 in 2018. On the other hand, the Philippines and Thailand of this region remained almost static during the period scoring in the range 81 – 88. In sub-Saharan Africa, Rwanda did remarkably well moving from 61.1 in 2004 to 78.9 in 2018 and exceeding the regional score of 62.4 in 2018 by a considerable margin, whereas Ghana, not so worse a beginner moved up by 20 points from 51.1 in 2004 to 71.1 in 2018; and Tanzania moved from 67.8 to 71.1 during the corresponding period.

But is this enough? In the coming blog, we will discuss the fallacy in the SCI system and how it fails to give a credible picture of the statistical capacity of the developing countries. I will try to answer the following question – is SCI a proper tool in the present context to measure with? Also, I will talk about how the Sustainable Development Agenda requires a new measurement paradigm and offers opportunities for national systems to evolve into core entities of country-specific data eco-systems for the same.

This blog originated out of a dinner table conversation at the Chakrabarti household, where Satyabrata Chakrabarti (the dad and former Deputy Director General at Central Statistics Office, Government of India), tries to convince his two daughters of the impact and current assessment of a statistical tool for social and economic sectors. While Meghna and I go in a trajectory to assess, its impacts in our fields.

Cover image by Meghna Chakrabarti (L), Author Satyabrata Chakrabarti (M) and editing/blog design by Rituparna Chakrabarti (R)

We publish using the Creative Commons Attribution (CC-BY) license so that users can read, download and reuse text and data for free – provided the authors, illustrators, and the primary sources are given appropriate credit.

What is social media doing to us?

Are we really connected?

Are we really connected? Meghna Chakrabarti

Recently I came across this short animated film on Youtube named ‘Best Friend’. The story revolves around a man named Arthur who lives all by himself and is addicted to a product called ‘Best Friend’.

Arthur does not have friends. He lives in a time far into future where everyone has some sort of a chip implanted in their brains which allows them to see projections of people, customized only for them; people whom they can call ‘friends’.

Fast forward to the ending climax of the movie, Arthur gets into trouble with a vagabond when he tries to recharge his chip. The vagabond, in desperation to have ‘friends,’ rips out the chip from Arthur.

Scary, isn’t it?

The movie highlighted an alarming perspective of the current psychology of the tech-savvy, social media addicted millennials. Although social media has been successful in making the world more connected, it has also established a false sense of connectivity.

Like the chip implanted in Arthur’s brain, social media has emerged as a necessity for every individual. Without its involvement, You are nobody; You are “friendless”!

Obviously, the creators of social networking sites nurture this fear of being alone, to make sure their clients are dependent on them. I often wondered why people posted so many photos of themselves on social media. I feel that social media has made the norm so, that we put up a facade for the world to see, to be somebody we are not, to let people know how amazing a life we lead even if it is not real.


More likes, more followers, more ‘friends’. Not getting enough likes on a post seems maddening enough that it can potentially send a person into depression. I think why social media is so addictive is because we can connect to an individual or a group without making much of an effort and confrontation. That may be a plus point. However, can one really connect to a person and be empathetic by just exchanging a few texts? I think not.

Human beings are social animals, we rely on our senses to experience the world around us. When it comes to connecting with others, these senses help us to kindle intimate connections. Social media can surely connect people who may be far away from each other, but it still lacks the physical sense of being.

I am not negating the positives that social media has to offer. Indeed, without the benefits of the internet, a few years ago, talking to someone far off and conveying our thoughts and opinions on a large platform seemed tedious. Now, the internet stands as a tool for the millennials to connect globally and bring about social awareness on a large scale; something that seemed impossible a decade ago.

However, we need to understand that there is a world outside the virtual one, in which we choose to remain immersed; that we can connect more closely to people when we interact with them face to face. After all, it is an innate human tendency to respond to the warmth of another being, and feel more comfortable in somebody’s company.

We should also embrace our imperfections and celebrate who we are and what we are. So, rather than putting up a facade and being entangled in this virtual cobweb, why not be real for a while?


The article and the cover image are the compositions of aspiring computer engineer Meghna Chakrabarti. Follow her stepping stones at Tumblr and Facebook. The view shared here are her own, and she is excited to hear your opinion on how social media is impacting us. Can a computer engineer help in resolving the conundrum? The blog was originally published on her personal Tumblr page on 22nd December 2018. Here, republished with permission.


We publish using the Creative Commons Attribution (CC-BY) license so that users can read, download and reuse text and data for free – provided the authors, illustrators, and the primary sources are given appropriate credit.

‘Pain-Drug-Pain Repeat’ – The Catch-22 of Addiction

The Cobweb Of Addiction. Ipsa Wonders

At some point in our lives, we all must have experienced pain. Do you remember the last time, this unpleasant sensation created absolute emotional havoc for you? However, why does our body need to feel pain? Aren’t we better off without it?

If we draw parallels, the mechanism causing pain is quite comparable to the pipeline during wartime correspondence. Although, health care professionals will argue that pain is far more elegant, complex and faster.

The key players here are the site of injury (the war-front), the nerve cells (the military correspondent), your spinal cord (the operator) and Mr BIG BRAIN (the high commander).

Let’s say you placed your hand on a hot stove (please don’t do it!). Your nerve cells instantaneously gather this information. In response to it, nerve cells fire millions of signals to the spinal cord. This information is then relayed to our brain to make you feel the pain and alerts you to pull your hand away in split seconds, which saves your hand from any further burning.

What a painful save, isn’t it? However, this bugger pain will stay with you for sometime to come.

Many Players Are Invoved To Make You Feel The Pain.

When we think of pain, most of us think of acute pain, which is common and often a temporary condition. With acute pain, you typically know where and why it hurts. For instance, your scrapped knee bothers you, or you feel the pain at the site of an incision, post-surgery. The chronic pain, on the other hand, is defined as pain that lasts more than 12 weeks, sometimes even the whole lifetime. This kind of pain in many cases persists, even when the damage is completely healed or may arise without any initial injury.

The phantom of chronic pain has crumpled one in five of us, i.e., a total of 1.5 billion people around the globe. Leading a meaningful life with chronic pain is taxing, and seems to depend on the patient’s will and assistance from healthcare professionals.

Additionally, these are the patients who are most prone to fall victim to long term drug abuse, in a desperate attempt to find relief. To seek a solution for these patients, it is critical to understand what could trigger pain and addiction, and if these two are co-dependent.

A recent joint study, lead by Lisa R. LaRowe , at the Binghamton and Syracuse University, New York looked into this matter closely.

The group looked at results from over 100 studies on pain and substance abuse. They integrated these two parameters (pain and addiction due to substance abuse), as an empirical inquiry into a reciprocal mathematical model. This way, they could prove that pain and substance abuse interact in the manner of a positive feedback loop, i.e., greater the pain a person experiences greater the maintenance of addiction over time.

This might seem intuitive, however, so far researchers have only examined either how substance use affects pain or how pain affects substance use, separately. This kind of modelling for the first time stitch together two different types of research to demonstrate how pain and substance use affect one another.

It is like a never-ending vicious cycle. While substance abuse can be a potential risk factor for chronic pain, experiencing pain can motivate people to be dependent on substances harder to quit.

This study will be especially important for the cases, where the clinicians treating addictions, might help their patients managing underlying chronic pain or for those patients who self-medicate to cope with pain. Providing their patient’s alternative health strategies could assist their patients to combat substance abuse and cope with pain.

Following up with this study, it will be now up to the biochemists and neuroscientists to understand the underlining mechanism and potential proteins underneath this co-dependency, so as to develop treatments to break this loop.


The cover image is made by a science communicator friend, Ipsa Jain. She uses arts and design to start conversations about science. Ipsawonders is one woman labor of love. She wants to create beautiful things that speak science


We publish using the Creative Commons Attribution (CC-BY) license so that users can read, download and reuse text and data for free – provided the authors, illustrators, and the primary sources are given appropriate credit.