Author Archives: Daniel Reidpath

Software for cost effective community data collection

In 2012 we started enumerating a population of about 40,000 people in the five Mukim (sub-districts) in the District of Segamat in the state of Johor on peninsular Malaysia.  This marked the start of the data collection for establishing a Health and Demographic Surveillance Site (HDSS) — grandly called the South East Asia Community Observatory (SEACO).

When establishing SEACO we had the opportunity to think, at length, about how we should collect individual and household data.  Should we use paper-based questionnaires? Should we use Android Tablets?  Should we use a commercial service, or should we use open source software?  When we eventually collect the data, how should we move it from paper/tablets into a usable database?  In thinking about this process, one of the real challenges was that HDSS involve the longitudinal follow-up of individuals, households, and communities.  Whatever data collection system we chose, therefore, had to simplify the linking of data about Person_1 at Time_1 with the data about Person_1 at Time_2.

I eventually settled on OpenDataKit (ODK).  ODK is a marvellous piece of software developed by a team of researchers at the University of Washington, it runs on Android Tablets and it was released under an open source license. We hacked the original codebase to allow the encryption of data on the Tablet (later it became a mainstream option in ODK), I wrote a small Python script for downloading the data from the Tablets, and the IT Manager wrote a PHP script to integrate the data with a MySQL database. We managed the entire process from collection to storage, and it worked extremely well.  I hate the idea of using proprietary software if I don’t have to, and when we set up SEACO we decided that as much as possible we would use open source software so that others could replicate our approach.

 

SEACO data collector using an Android Tablet with ODK completes a household census, 2012

Recently we moved to away from ODK to a proprietary service: surveyCTO.  Unlike ODK, we have to pay for the service and for reasons I will go into, it has thus far been worth the move.

ODK did not do exactly what we needed and this meant that the IT team regularly made adjustments to the code-base (written in Java).  The leading hacker on the team moved on.  That left us short-handed and also without someone with the familiarity he had with the ODK codebase. I was torn between trying to find a new person who could take on the role of ODK hacker versus moving to proprietary software.  The final decision rested on a few factors — factors that are worth keeping in mind should the question arise again.  First, our operation has grown quite large. There are multiple projects going on at any one time, and we required a full-time ODK person.  SurveyCTO maintained most of the functionality we already had, and it also had some additional features that were nice for monitoring the data as they came in, and managing access to data.  Second, the cost of using surveyCTO was considerably lower than the staff costs associated with having an in-house developer.  We would lose the capacity for some of our de novo development but benefit by having a maintained service at a fraction of the cost.

If I had more money, my preference would be to maintain the capacity for in-house development.  If I were only doing relatively small, or only one-off cross-sectional studies, I would use ODK without hesitation.  For a large, more complex operation, a commercial service made economic and functional sense.

One of the other services I considered was Magpi. At the time I took the decision, it was more expensive than surveyCTO for our needs. If you, however, are just beginning to look at the problem, you should look at all options. I am sure there are now other providers we had not considered.

Fat on the success of my country

When I first visited Ghana in the early 1990’s, there was a very noticeable relationship between BMI and wealth.  Rich people were far more likely to be overweight and obese than poor people.  That visit took place about ten years after the 1982-1984 famine.  Some of the roots of the famine lay in natural causes resulting in crop failure and some lay in local and regional politics, and it was small children that bore the brunt of it.  Less than ten years after the famine it was perhaps unsurprising to see that (on average) the thinnest were the poorest, and the fattest were the richest.

Working in Australia in the early 2000s, however, there appeared to be exactly the opposite relationship.  It appeared that the poorest were more likely to be overweight or obese and the wealthiest, normal weight. This observation was certainly borne out at an ecological level when my colleagues and I found an unmistakable relationship between area level, socioeconomic disadvantage, and obesogenic environments — fast food chain “restaurants” were more likely to be found in poorer areas.

So which is it?  Are the poor more likely to be overweight and obese, or is it the rich?  One of the challenges in working out this relationship is that it appears to be different in different countries.  Neuman and colleagues conducted a multi-level study of low-and middle-income countries (LMICs) looking at this very problem using DHS Survey data.  They found an interaction between country-level wealth, individual-level wealth, and BMI.  Unfortunately, the study was limited to LMICs because the DHS surveys do not operate in high-income countries. While it would be tempting to extrapolate the interaction into high-income countries, without the data, it would just be a guess.

We don’t have the definitive answer, but a recent paper by Mohd Masood and me, based on his PhD research, provides some nice insights into the issue.  We were able to bring together data from 206,266 individuals in 70 low-, middle- and high-income countries using 2003 World Health Survey (WHS) data.  The WHS data are now getting a little old, but it is the only dataset we knew of that provided BMI and wealth measures from a sample of all countries, using a consistent methodology, all measured over a similar period of time.

 

Mean BMI of the five quintiles of household wealth in countries ranging from the poorest to the richest (GNI-PPP). [https://doi.org/10.1371/journal.pone.0178928]

The analysis showed that as country-level wealth increased, mean BMI increased in all wealth groups, except the very wealthiest group.  The mean BMI of the wealthiest 20% of the population declined steadily as the wealth of the country increased.  In the wealthiest countries, the mean BMI converged for the poorest 80% of the population around a BMI of 24.5 (i.e., near the WHO cut-off for overweight of 25).  The wealthiest 20% had a mean BMI comfortably below that, around 22.5.

It is obviously not inevitable that as the economic position of countries improves, everyone except the very richest put on weight.  There are thin, poor people and fat, rich people living in the wealthiest of countries.  Nonetheless, the data do point to structural drivers creating obesogenic environments. My colleagues and I had argued, at least in the context of Malaysia, that the increasing prevalence of obesity was an ineluctable consequence of development. The development agenda pursued by the government of the day decreased physical activity, promoted a sedentary lifestyle, and did nothing to moderate the traditional fat rich, simple carbohydrate diet associated with the historically rural lifestyle of intensive agriculture.

We really need more data points (i.e., a repeat of the WHS) to try and tease out the effect of economic development on obesity in the poorest to the richest quintiles of the population.  I would suspect, however, that countries need to think more deeply about what it is they pursue (for their population) when they pursue national wealth.

 

 

 

Does global health need a ‘red team’?

Looking at population health, time-series data it is easy to imagine that everything is getting better and better. What is more, as your eye tracks the line into some imaginary future, it is easy to believe that things will continue to get better and better.  It is a soothing balm to the more insidious thought, that doom awaits us around every corner.  In the world of stock pickers and equities experts, the balm is the Ying of the bull to the Yang of the bear. Hope versus despair.

The late Hans Rosling has done more to ground people in that hopeful view of the future than any other person.  The gapminder website, his creation, provides clear, firm evidence of global improvements in health and well-being across a wide range of outcomes.  As you follow the motion picture trends, countries improve. Some occasionally collapse, horribly. Then they recover. And on average, all improve.  Poverty, life expectancy, education, the infant mortality rate — it does not matter what you focus on, the world has been getting better and better

Figure 1 is a quick snapshot of this improvement in life expectancy from 1915 and 2015. In both years, higher national wealth was associated with better life expectancy.  In 1915, a country with a GDP/capita (adjusted for inflation and price) of $1,000 had a life expectancy of 30 years. In 2015 a country with a GDP/capita (adjusted for inflation and price) of $1,000 had a life expectancy of 60 years.

 

Figure 1. The left and right panels show the countries’ life expectancy in relationship to the GDP/capita (adjusted for inflation and price) 100 years apart. In 1915 a country with a GDP/capita of $1000 had a life expectancy around 30 years. In 2015 it was around 60 years — a difference of about 30 years. Source: Gapminder

In contrast, in the middle of the 18th Century, life expectancy was similar across all countries, without regard to national wealth. Little had changed by the middle of the 19th Century. Sixty years later (1915), there was a strong association between national wealth and life expectancy; and over the next 100 years, things became much better for everyone.

Will this continue?

Let’s hope that it will.  There are however significant threats visible on the horizon — and I would argue that Global Health needs a strong Red Team to make plain that dreadful prospect, often and forcefully. And as the Red Team argues their side we should hope fervently that they are utterly and comprehensively wrong! We should nonetheless listen to the arguments and not glaze over or dismiss them as we would Cassandra.

Red Teams arose in the US military and intelligence communities. They were there to argue against self-satisfied complacency. If the majority view was purple, they argued orange, if Winter, then Summer. Their purpose was to find the weaknesses in the status quo. One of the most extraordinary examples of the power of a contrarian view was the Millenium Challenge 2002, in which Paul van Riper showed that a demonstrably weaker force (the Red Team) could be devastatingly effective against the powerful (Blue Team) when they were prepared to play outside the constrained paradigm of accepted norms.

In Global Health the situation is, of course, entirely different — we do not battle each other, but we do struggle with (and against)  nature and the environment.  What is not different between Intelligence agencies and Global Health agencies is that views become entrenched. The Philosopher of Science, Thomas Kuhn, described the entrenchment of scientific ideas in terms of normal science: “the regular work of scientists theorizing, observing, and experimenting within a settled paradigm or explanatory framework”. These “settled paradigms” can permit significant new developments, but they brook no serious opposition (only tinkering at the margins). They are the VHS manufacturer to the plucky Betamax.

“Beta what?”, I hear you ask, and the point is made.

Global Health has large, powerful groups that are in danger of playing a form of technocratic hegemony — Global Health, normal science.  It’s incremental, unabrasive, and potentially wrong or ineffectual. Some of the possible threats to global health are well known, and if we focus only on those related to climate change and population growth the following is a reasonable starting list:

The global expansion of humans over the past 10,000 years was made possible by the growth of agriculture, which in turn was made possible by a stabilisation in the climate about … 10,000 years ago.  Our current success is again a product of agricultural developments. Paul Ehrlich, in his 1968 book The Population Bomb wrote a Malthusian tale of global starvation.  His prediction failed to take account of Norman Borlaug’s green revolution, and the development of semidwarf wheat, which saw grain yields triple in the 1960s and 1970s. The predicted cycle of devastating starvation was averted.

Success in the past, unfortunately, does not tell us anything about the future. Timely science then does not predict timely science now. Although Borlaug’s work saw Ehrlich’s predicted threats displaced in time, towards the end of his Nobel Prize acceptance speech, Borlaug said:

Malthus signaled the danger a century and a half ago. But he emphasized principally the danger that population would increase faster than food supplies. In his time he could not foresee the tremendous increase in man’s food production potential. Nor could he have foreseen the disturbing and destructive physical and mental consequences of the grotesque concentration of human beings into the poisoned and clangorous environment of pathologically hypertrophied megalopoles. Can human beings endure the strain? Abnormal stresses and strains tend to accentuate man’s animal instincts and provoke irrational and socially disruptive behavior among the less stable individuals in the maddening crowd.

We must recognize the fact that adequate food is only the first requisite for life. For a decent and humane life we must also provide an opportunity for good education, remunerative employment, comfortable housing, good clothing, and effective and compassionate medical care. Unless we can do this, man may degenerate sooner from environmental diseases than from hunger.

So far, the international, multilateral approach to a possibly gloomy future is to seek hope — it does, after all, spring eternal.  We will reduce greenhouse gas emissions, tackle global poverty through economic growth, and increase food production. We will not need to tackle population growth, nor will we have to make do with less. We write about planetary health, but we do not develop strategies for a planet that is less human-friendly tomorrow than it is today.

I hope that global health and well-being will improve well into the future, well past my life and I hope well past that of my children, (and their children, …). In case it does not, I would like to think that there is a Global Health Red Team that does not just echo gloomy news in the halls of power, but argues for and develops strategies suitable for the world in which we are all worse off.  What should our goal be in that worse off world?  Is it a global goal, an equitable goal of mutual pain, or is it a “My Country First”, Shakespearean tragedy of the commons?

There is an ironic twist to the use of Red Teams in the US military that may have some bearing on their use in Global Health.  In the Millenium Challenge 2002 when the Red Team devastated the Blue Team in the first few days of a fortnight-long exercise, the judges reset the clock. They hamstrung the Red Team, and then let everything play out in a way that would ensure that normal (military) science came out unscathed.

Global Health needs to be intellectually braver.

 

Guidelines for the reporting of COde Developed to Analyse daTA (CODATA)

I was reviewing an article recently for a journal in which the authors referenced a GitHub repository for the Stata code they had developed to support their analysis. I had a look at the repository. The code was there in a complex hierarchy of nested folders.  Each individual do-file was well commented, but there was no file that described the overall structure, the interlinking of the files, or how to use the code to actually run an analysis.

I have previously published code associated with some of my own analyses.  The code for a recent paper on gender bias in clinical case reports was published here, and the code for the Bayesian classification of ethnicity based on names was published here. None of my code had anything like the complexity of the code referenced in the paper I was reviewing.  It did get me thinking however about how the code for statistical analyses should be written. The EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network has 360 separate guidelines for reporting research.  This includes guidelines for everything from randomised trials and observational studies through to diagnostic studies, economic evaluations and case reports. Nothing on the reporting of code for the analysis of data.

On the back of the move towards making data available for re-analysis, and the reproducible research movement, it struck me that guidelines for the structuring of code for simultaneous publication with articles would be enormously beneficial.  I started to sketch it out on paper, and write the idea up as an article.  Ideally, I would be able to enrol some others as contributors.  In my head, the code should have good meta-data at the start describing the structure and interrelationship of the files.  I now tend to break my code up into separate files with one file describing the workflow: data importation, data cleaning, setting up factors, analysis.  And then I have separate files for each element of the workflow. My analysis is further divided into specific references to parts of papers. “This code refers to Table 1”.  I write the code this way for two reasons.  It makes it easier for collaborators to pick it up and use it, and I often have a secondary, teaching goal in mind.  If I can write the code nicely, it may persuade others to emulate the idea.  Having said that, I often use fairly unattractive ways to do things, because I don’t know any better; and I sometimes deliberately break an analytic process down into multiple inefficient steps simply to clarify the process — this is the anti-Perl strategy.

I then started to review the literature and stumbled across a commentary written by Nick Barnes in 2010 in the journal Nature. He has completely persuaded me that my idea is silly.

It is not silly to hope that people will write intelligible, well structured. well commented code for statistical analysis of data.  It is not silly to hope that people will include this beautiful code in their papers.  The problem with guidelines published by the EQUATOR Network is in the way that journals require authors to comply with them. They become exactly the opposite of guidelines, they are rules — the ironic twist on the observation by Geoffrey Rush’s character, Hector Barbossa in Pirates of the Caribbean.

Barnes wrote, “I want to share a trade secret with scientists: most professional computer software isn’t very good.”  Most academics/researchers feel embarrassed by their code.  I have collaborated with a very good Software Engineer in some of my work and spent large amounts of time apologising for my code.  We want to be judged for our science, not for our code.  The problem with that sense of embarrassment is that the perfect becomes the enemy of the good.

The Methods sections of most research articles make fairly vague allusions to how the data were actually managed and analysed.  One may make references to statistical tests and theoretical distributions.  For a reader to move from that to a re-analysis of the data is often not straight forward.  The actual code, however, explains exactly what was done.  “Ah! You dropped two cases, collapsed two factors, and used a particular version of an algorithm to perform a logistic regression analysis.  And now I know why my results don’t quite match yours”.

It would be nice to have an agreed set of guidelines reporting COde Developed to Analyse daTA (CODATA).  It would be great if some authors followed the CODATA guidelines when they published.  But it would be even better if everyone published their code, no matter how bad or inefficient it was.