Papyrus Walk a miscellany of musings

Guidelines for the reporting of COde Developed to Analyse daTA (CODATA)

I was reviewing an article recently for a journal in which the authors referenced a GitHub repository for the Stata code they had developed to support their analysis. I had a look at the repository. The code was there in a complex hierarchy of nested folders. Each individual do-file was well commented, but there was no file that described the overall structure, the interlinking of the files, or how to use the code to actually run an analysis.

I have previously published code associated with some of my own analyses. The code for a recent paper on gender bias in clinical case reports was published here, and the code for the Bayesian classification of ethnicity based on names was published here. None of my code had anything like the complexity of the code referenced in the paper I was reviewing. It did get me thinking however about how the code for statistical analyses should be written. The EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network has 360 separate guidelines for reporting research. This includes guidelines for everything from randomised trials and observational studies through to diagnostic studies, economic evaluations and case reports. Nothing on the reporting of code for the analysis of data.

On the back of the move towards making data available for re-analysis, and the reproducible research movement, it struck me that guidelines for the structuring of code for simultaneous publication with articles would be enormously beneficial. I started to sketch it out on paper, and write the idea up as an article. Ideally, I would be able to enrol some others as contributors. In my head, the code should have good meta-data at the start describing the structure and interrelationship of the files. I now tend to break my code up into separate files with one file describing the workflow: data importation, data cleaning, setting up factors, analysis. And then I have separate files for each element of the workflow. My analysis is further divided into specific references to parts of papers. “This code refers to Table 1”. I write the code this way for two reasons. It makes it easier for collaborators to pick it up and use it, and I often have a secondary, teaching goal in mind. If I can write the code nicely, it may persuade others to emulate the idea. Having said that, I often use fairly unattractive ways to do things, because I don’t know any better; and I sometimes deliberately break an analytic process down into multiple inefficient steps simply to clarify the process — this is the anti-Perl strategy.

I then started to review the literature and stumbled across a commentary written by Nick Barnes in 2010 in the journal Nature. He has completely persuaded me that my idea is silly.

It is not silly to hope that people will write intelligible, well structured. well commented code for statistical analysis of data. It is not silly to hope that people will include this beautiful code in their papers. The problem with guidelines published by the EQUATOR Network is in the way that journals require authors to comply with them. They become exactly the opposite of guidelines, they are rules — the ironic twist on the observation by Geoffrey Rush’s character, Hector Barbossa in Pirates of the Caribbean.

Barnes wrote, “I want to share a trade secret with scientists: most professional computer software isn’t very good.” Most academics/researchers feel embarrassed by their code. I have collaborated with a very good Software Engineer in some of my work and spent large amounts of time apologising for my code. We want to be judged for our science, not for our code. The problem with that sense of embarrassment is that the perfect becomes the enemy of the good.

The Methods sections of most research articles make fairly vague allusions to how the data were actually managed and analysed. One may make references to statistical tests and theoretical distributions. For a reader to move from that to a re-analysis of the data is often not straight forward. The actual code, however, explains exactly what was done. “Ah! You dropped two cases, collapsed two factors, and used a particular version of an algorithm to perform a logistic regression analysis. And now I know why my results don’t quite match yours”.

It would be nice to have an agreed set of guidelines reporting COde Developed to Analyse daTA (CODATA). It would be great if some authors followed the CODATA guidelines when they published. But it would be even better if everyone published their code, no matter how bad or inefficient it was.

Who will guard the journals? Gender bias in the “Big Five” medical journals.

Journals, by which I mean Editors, have shaped modern science, particularly in medicine. The publication policies of journals now direct the kinds of ideas that are acceptable, how to present the ideas and the ethical frameworks that should govern data collection, authorship, treatment of participants, and data sharing. Journals will refuse to publish a paper if they are not satisfied that the authors have fulfilled those requirements. The journals have become both arbiters and gatekeepers of sound scientific practice. A recent journal issue on conflicts of interest appearing in the Journal of the American Medical Association (JAMA) is a case in point (2 May 2017).

Editors will also self-publish encyclicals of good conduct, laying down the rules of engagement for the future. The JAMA editorial supporting the recent special issue is such an example. When the journals involved are at the top of their fields, these views reverberate. In medicine, the Big Five journals in general and internal medicine are the New England Journal of Medicine (NEJM), Lancet, JAMA, British Medical Journal, and Annals of Internal Medicine. When the Editors speak, the field listens. Their role is revelatory. It is an imperfect conduit of nature’s voice whispered to researchers in their labs and clinics.

The rules do not, unfortunately, prevent the publication of bad science. The Autism-MMR paper in the Lancet is an excellent example of bad science slipping into the field. In general, however, failures of science lie at the feet of the scientists. The journals rise above it. A retraction here, a commentary there, and the stocks or pillory of peer humiliation are kept for the authors.

It is easy when criticism can be deflected, and laid at the feet of authors. What, however, should the response be when researchers identify a bias in the Big Five journals? Bias in medicine is a serious issue. It indicates a skew in the published science – a tendency to emphasise one kind of science over another or the promotion of one interest over another. It carries risks into the future of skewing practice and funding.

In 2016, Giovanni Filardo and colleagues identified a gender bias in first authors of research articles published in the Big Five. The journals were more likely to publish articles with a man as a first author than a woman. The most biased journal was NEJM. You will not have read about the research in that journal, however, because they rejected the paper when it was submitted. Unfortunately, the bias in the gender of published first authors is not a local, journal issue. The bias has a larger and more insidious career effect. Women are less likely to be in the prestigious position of the first author in prestigious Big Five journals, and ceteris paribus they are disadvantaged in funding applications, job applications, receipt of awards, and recognition.

My co-authors and I recently published an investigation of gender bias in clinical case reports. You may be unsurprised to learn that clinical case reports are more likely to be about men. Apparently, clinical cases about a man are just more interesting than a clinical case about a woman. All but one of the investigated journals showed a gender bias, and the most biased journal was NEJM.

Of course, journals can and should reject research papers that are not relevant or deficient in quality. And our paper may have been both. The fact that a journal like the NEJM should have rejected two recent papers that identified the journal as being the most gender-biased among the Big Five begins to look like an avoidance of criticism.

If there is a tendency to avoid self-reflection, particularly in an area as important as bias in science, then the editorial decisions begin to have much greater significance, and at least a whiff of hypocrisy. The origins of a bias may be authorial. A greater proportion of articles written by men than women are submitted to the journal; a greater proportion of clinical case reports about men rather than women are submitted to the journal. The Editors are in a position to correct the submission bias, just as they vigorously correct other biases. The Big Five would have acceptance rates below 10%; they presumably have a bias towards higher rather than lower quality science. We are suggesting that in exercising their Editorial judgment they could include factors they have (presumably) hitherto not noticed in their own behavior. They might find it easier to explain these editorial shifts if they based it on scientific research published in their own journals. At the very least it indicates that the issue is taken seriously.

This article was co-written by Daniel D Reidpath and Pascale Allotey

Sharing data while not sharing data

There has been a major shift among journals towards making data available at the time of publication. The PLoS stable of journals which includes PLoS Medicine, PLoS Biology, and PLoS One, for example, have a uniform publication policy that is quite forthright about the need to share data.

I have mixed feelings about this. I have certainly advocated for data sharing and (with Pascale Allotey) conducted one of the earliest empirical investigations of data sharing in Medicine. I can understand, however, why researchers are reluctant to provide open access to data. The data can represent hundreds, thousands, or tens of thousands of person-hours of collection and curation. The data also represent a form of Intellectual Property in the development of the ideas and methods that lead to the data collection. For many researchers, there may be a sense that others are going to swoop in and collect the glory with none of the work. There have certainly been strong advocates for data sharing where the motivation looked to be potentially exploitative (see our commentary).

I recently stumbled across a slightly different issue in data sharing. It arose in an article in PLoS One by Buttelmann and colleagues. Their study looked at whether great apes (Orangutans, Chimpanzees and Bonobos) could distinguish in a helping task between another’s true and false beliefs. The data set comprised 378 observations from 34 apes in two different studies, and they made their data available … as a jpg file. A small portion of it appears below, and you can download the whole image from PLoS One.

Partial data from Buttleman et al. (2017)

It seems strange to me to share the data as an image file. If you wanted people to use the data, surely you would share it as a text file, CSV, xlsx, etc. If the intention was to satisfy the journal requirements but discourage use, then an image file looks (at first glance) to be a perfect medium. Fortunately, there are some excellent online tools for optical character recognition (OCR), and the one I used made quick work of the image file. I downloaded it as in xlsx format, read it into R, and cleaned up a few typographical errors that were introduced by the OCR. You can download their data in a machine-readable form here. I have included in the download an R script for reading the data in and running a simple mixed effects model to re-analyse their study data. My approach was a little better than theirs, but the results look pretty similar. I am not sure why they did not account for the repeated measurement within ape, but ignoring that seems to be the typical approach taken within the discipline.

Would you give knee surgery to the FAT MAN?

I do understand your plight, Mr Smith. An arthritic knee can be extremely painful. And you say it’s so bad you can’t even walk from the living room to the kitchen. That’s actually very good news! Yes, yes … awful … but terribly good news. If you can’t walk to the kitchen, you can’t eat. If you can’t eat you’ll lose weight. And the faster you lose weight, the sooner we’ll schedule your knee surgery.

On 15 March 2017, Dr David Black, NHS England’s medical director for Yorkshire and the Humber, sent a letter of praise to the Rotherham Clinical Commissioning Group (RCCG). The RCCG had decided to restrict the access to smokers and “dangerously overweight patients” of hip and knee surgery. The letter was leaked, and it has triggered, according to the Guardian, “a storm of protest.”

The title of this blog is a play on David Edmond’s book, Would you kill the fat man, an exploration of moral philosophy and difficult choices about the valuation of human life. The RCCG’s decision intrigued me. It was essentially a decision about rationing a finite commodity — healthcare. In a world of plenty, rationing healthcare is a non-question. In the real world, however, in a world of shrinking healthcare budgets and a squeezed NHS, resources must be allocated in a way that means some people will receive less healthcare or no healthcare. Fairness requires that the rules of allocation are transparent and reasonable.

While you ponder, whether you would give knee surgery to the FAT MAN, I have a follow-up question. Would you want to see a doctor who would deny you knee surgery because of some characteristic of yours unrelated to whether you would benefit from knee surgery?

I am sorry Mrs Smith, today we decided not to offer clinical services to women, people under 5’7″, or carpenters. We need to cut the costs of our clinical services, and by excluding those groups, we can save an absolute bundle.

I have heard it said of the doctor, academic and human rights advocate, Paul Farmer, that he would regularly re-allocate hospital resources from Boston to his very needy patients in Haiti. He used to raid the drug stocks of a Boston hospital, stuff them in his suitcase and fly them back to his patients in Haiti. I have no idea if the story is true or not. It does mark, however, one of the great traditions of medicine. The role of a doctor is to advocate vigorously for the health (and often social) needs of the patient. The patient actually in front of them. The one in need. Because, if your doctor will not advocate for your health needs, who will? This is why all the great TV hospital dramas show a clash between the doctor and the hospital administrator. Administrators ration. Doctors treat. The doctor goes all out to save little Jenny, against all odds. The surly hospital administrator stands in front of the operating room, hand outstretched and declares (Pythonesque): “None shall pass.”

Under the current NHS system of clinical commissioning groups, there are family doctors who are simultaneously trying to make rational decisions about the allocation of limited resources to a population, and trying to be the best health advocates for the patient in front of them. That screams conflict of interest. If you live in the catchment area of the RCCG and want my advice, check out which doctors are part of the RCCG. If your doctor is one of them, change doctor immediately. Treating you, advocating for your health interests is what you need and should want. Unfortunately, if she is part of the RCCG when she is treating you, you are not her principal concern. Run(!) assuming of course that you don’t need knee surgery.

Should smokers and overweight people receive knee surgery? Let’s start with smokers. Why would you not want to treat a smoker? It is difficult to come up with arguments that are not so outrageous that they are embarrassing to make. But I won’t let personal embarrassment get in the way of stating the top two silly arguments that came to mind:

Smoking is a disgusting habit and anyone who smokes deserves all the pain they get?
Smokers won’t live as long as non-smokers, so the investment in surgery to reduce pain and improve mobility in smokers will not have the net benefits to society as the same investment in non-smokers.

The arguments for restricting the surgery to people who are not overweight are similarly cringe-worthy. There are also clinical reasons for prioritising the overweight. The load on joints resulting from increased weight creates greater wear-and-tear and, the broader inflammatory processes that obesity triggers also seem to increase the risks of osteoarthritis — affecting hands as well as knees. [See for example, here and here].

I can’t find the RCCG’s arguments for restricting access to knee surgery for smokers and people who are overweight, but prima facie it looks a lot like a variant of victim blaming.

Full disclosure. I am all for the rational allocation of resources. I think smoking is a disgusting habit. I am overweight and trying to do something about it. I also think that the arguments for resource allocation need to be more explicit about the social values upon which they are often implicitly based.