Papyrus Walk a miscellany of musings

Prevalence of sexual assault at Australian Universities is … non-zero.

A few days ago the Australian Human Rights Commission (AHRC) launched Change the course, a national report on sexual assault and sexual harassment at Australian universities lead by Commissioner Kate Jenkins. Sexual assault and sexual harassment are important social and criminal issues, and the AHRC report is misleading and unworthy of the gravity of the subject matter.

It is statistical case-study in “how not to.”

The report was released to much fanfare, receiving national media coverage including TV and newspapers, and a quick response from universities. “At a glance …” the report highlights among other things:

30,000+ students responded to the survey — remember this number, because (too) much is made of it.
21% of students were sexually harassed in a university setting.
1.6% of students were sexually assaulted in a university setting.
94% of sexually harassed and 87% of sexually assaulted students did not report the incidents.

From a reading of the survey’s methodology, any estimates of sexual harassment/assault should be taken with a shovel-full of salt and should generate no response other than that of the University Of Queensland’s Vice-Chancellor, Peter Høj‘s, that any number greater than zero is unacceptable. What we did not have before the publication of the report was a reasonable estimate of the magnitude of the problem and, notwithstanding the media hype, we still don’t. The AHRC’s research methodology was weak, and it looks like they knew the methodology was weak when they embarked on the venture.

Where does the weakness lie? The response rate!!!

A sample of 319,252 students was invited to participate in the survey. It was estimated at the design stage that between 10 and 15% of students would respond (i.e., 85-90% would not respond) (p.225 of the report). STOP NOW … READ NO FURTHER. Why would anyone try to estimate prevalence using a strategy like this? Go back to the drawing board. Find a way of obtaining a smaller, representative sample, of people who will respond to the questionnaire.

Giant samples with poor response rates are useless. They are a great way for market research companies to make money, but they do not advance knowledge in any meaningful way, and they are no basis for formulating policy. The classic example of a large sample with a poor response rate misleading researchers was the Literary Digest poll to predict the outcome of the 1936 US presidential election. They sent out 10 Million surveys and received 2.3 Million responses. By any measure, 2.3 Million responses to a survey is an impressive number. Unfortunately for the Literary Digest, there were systematic differences between responders and non-responders. The Literary Digest predicted that Alf Landon (Who?) would win the presidency with 69.7% of the electoral college votes. He won 1.5% of the electoral college votes. This is a lesson about the US electoral college system, but it is also a significant lesson about the non-response bias. The Literary Digest had a 77% non-response rate; the AHRC had a 90.3% non-response rate. Who knows how the 90.3% who did not respond compare with the 9.7% who did respond? Maybe people who were assaulted were less likely to respond and the number is a gross underestimate of assaults. Maybe they were more likely to respond and it is a gross overestimate of assaults. The point is that we are neither wiser nor better informed for reading the AHRC report.

Sadly, whoever estimated the (terrible) response was even then, overly optimistic. The response rate was significantly lower than the worst-case scenario of 10% [Response Rate = 9.7%, 95%CI: 9.6%–9.8%].

In sharp contrast to the bad response rate of the AHRC study, the Crime Victimisation Survey (CVS) 2015-2016, conducted by the Australia Bureau of Statistics (ABS) had a nationally representative sample and a 75% response rate — fully completed! That’s a survey you could actually use for policy. The CVS is a potentially less confronting instrument, which may account for the better response rate. It seems more likely, however, that recruiting students by sending them emails is neither sophisticated enough nor adequate.

Poorly conducted crime research is not merely a waste of money, it trivialises the issue. The media splash generates an illusion of urgency and seriousness, and the poor methodology means it can be quickly dismissed.

If there is a silver lining to this cloud, it is that AHRC has created an excellent learning opportunity for students involved in quantitative (social) research.

Addendum

It was pointed out to me by Mark Diamond that a better ABS resource is the 2012 Personal Safety Survey, which tried to answer the question about the national prevalence of sexual assault. A Crime Victimisation Survey is likely to receive a better response rate than a survey looking explicitly at sexual assault. I reproduce the section on sample size from the explanatory notes because it highlights the difference between a well conducted survey and the pile of detritus reported by AHRC.

There were 41,350 private dwellings approached for the survey, comprising 31,650 females and 9,700 males. The design catered for a higher than normal sample loss rate for instances where the household did not contain a resident of the assigned gender. Where the household did not contain an in scope resident of the assigned gender, no interview was required from that dwelling. For further information about how this procedure was implemented refer to Data Collection.

After removing households where residents were out of scope of the survey, where the household did not contain a resident of the assigned gender, and where dwellings proved to be vacant, under construction or derelict, a final sample of around 30,200 eligible dwellings were identified.

Given the voluntary nature of the survey a final response rate of 57% was achieved for the survey with 17,050 persons completing the survey questionnaire nationally. The response comprised 13,307 fully responding females and 3,743 fully responding males, achieving gendered response rates of 57% for females and 56% for males.

Low cost reproducible microscopy

There has been growing interest in reproducible research. The interest arises from the idea that scientific discoveries that are one off, isolated and never to be repeated have limited value. For research to inform future science other must be able to reproduce the results. There are even courses on reproducible research. However, a look at the courses and a quick search of PubMed will reveal that when people refer to reproducible research, they often mean shared data or shared analytic code. And when I write “shared data”, I don’t even really mean any data, I mean electronic data … a spreadsheet, a database, etc. Of course, the Methodology section of journal articles are supposed to support reproducible research, but these are often hints and teasers for what was done, rather than a genuine “how to”.

Reproducibility becomes more challenging when all one has to work with is a one-off observation. How do I show you what I saw? The question became particularly relevant to me in a recent discussion with a colleague about reproducible microscopy. The obvious answer is, “a photograph” — and with a professional set up, one can achieve spectacular results — but what should be done in resource poor settings where money and equipment are limited?

I am one of the investigators on a Wellcome Trust funded “Our Planet Our Health” award led by Rebekah Brown at the Monash Sustainable Development Institute. The “Revitalisation of Informal Settlements and their Environment” (RISE) study involves the collection of large quantities of diverse data from informal settlements in Makassar, Indonesia and Suva, Fiji. Most of the samples will be collected by in-country teams, and they will not have access to high-end equipment. Nonetheless, some of the samples will have to be examined under a microscope in Makassar and Suva. It is likely that we will have to rely on basic equipment and Lab Technicians with limited skills in microscope photography. From my SEACO experience, I would be looking for a low-cost solution that can be implemented with basic training. The solution “works” if the images are appropriate; that is, they are the thing of scientific interest, and are of sufficient fidelity that a researcher somewhere else in the world can interpret them appropriately. It is not necessary for the technician to be brilliant, just adequate.

I decided to play. I am neither a good photographer nor am I good at microscopy. I reasoned that if I could get something approximating a reasonable image, then a Lab Technician with some actual training would have no problems. The best low-cost camera solution is not to buy another piece of equipment at all. We are already committed to using a smartphone/Tablet solution in the RISE project and plan to use them for capturing photographs, tagging photographs, and uploading them to a server. The only challenge was getting the smartphone camera to “peek” into the microscope. Fortunately, there is a broad range of solutions, and I opted for the very cheapest I could find on eBay. It cost me USD$5.99, brand new, including postage and handling.

The mount is straightforward to use, although my first attempt was pretty awful. I found a weevil crawling around the kitchen (welcome to the tropics!) and that became the first portraiture subject.

A photograph of a weevil taken with a google phone using a smartphone-microscope adapter

The images are of much higher resolution than I have posted here. I didn’t know what I was doing, and most of the image is taken up with the microscope surrounds rather than the subject of the photograph. I tried the next day, this time using a peppercorn as the subject — it didn’t move as quickly.

A photograph of a peppercorn taken with a google phone using a smartphone-microscope adapter

The only real difference in my approach was that this time I zoomed in slightly on the peppercorn. I will never look at peppercorns the same way. What appears to be (to my unqualified eyes) fungal mycelium is less than appealing. Nonetheless, it also seems like the general approach to capturing microscope images might be a reasonable. As long as the technician knows what to photograph, the quality of the images is almost certainly good enough for others to view and interpret. This is potentially quite exciting because it does allow science (and quite basic science) to be virtual and shared. A photograph of a microscopic image taken in Makassar could be shared with the world within hours giving scientists anywhere an opportunity to look, think, interpret, question and suggest.

Software for cost effective community data collection

In 2012 we started enumerating a population of about 40,000 people in the five Mukim (sub-districts) in the District of Segamat in the state of Johor on peninsular Malaysia. This marked the start of the data collection for establishing a Health and Demographic Surveillance Site (HDSS) — grandly called the South East Asia Community Observatory (SEACO).

When establishing SEACO we had the opportunity to think, at length, about how we should collect individual and household data. Should we use paper-based questionnaires? Should we use Android Tablets? Should we use a commercial service, or should we use open source software? When we eventually collect the data, how should we move it from paper/tablets into a usable database? In thinking about this process, one of the real challenges was that HDSS involve the longitudinal follow-up of individuals, households, and communities. Whatever data collection system we chose, therefore, had to simplify the linking of data about Person_1 at Time_1 with the data about Person_1 at Time_2.

I eventually settled on OpenDataKit (ODK). ODK is a marvellous piece of software developed by a team of researchers at the University of Washington, it runs on Android Tablets and it was released under an open source license. We hacked the original codebase to allow the encryption of data on the Tablet (later it became a mainstream option in ODK), I wrote a small Python script for downloading the data from the Tablets, and the IT Manager wrote a PHP script to integrate the data with a MySQL database. We managed the entire process from collection to storage, and it worked extremely well. I hate the idea of using proprietary software if I don’t have to, and when we set up SEACO we decided that as much as possible we would use open source software so that others could replicate our approach.

SEACO data collector using an Android Tablet with ODK completes a household census, 2012

Recently we moved to away from ODK to a proprietary service: surveyCTO. Unlike ODK, we have to pay for the service and for reasons I will go into, it has thus far been worth the move.

ODK did not do exactly what we needed and this meant that the IT team regularly made adjustments to the code-base (written in Java). The leading hacker on the team moved on. That left us short-handed and also without someone with the familiarity he had with the ODK codebase. I was torn between trying to find a new person who could take on the role of ODK hacker versus moving to proprietary software. The final decision rested on a few factors — factors that are worth keeping in mind should the question arise again. First, our operation has grown quite large. There are multiple projects going on at any one time, and we required a full-time ODK person. SurveyCTO maintained most of the functionality we already had, and it also had some additional features that were nice for monitoring the data as they came in, and managing access to data. Second, the cost of using surveyCTO was considerably lower than the staff costs associated with having an in-house developer. We would lose the capacity for some of our de novo development but benefit by having a maintained service at a fraction of the cost.

If I had more money, my preference would be to maintain the capacity for in-house development. If I were only doing relatively small, or only one-off cross-sectional studies, I would use ODK without hesitation. For a large, more complex operation, a commercial service made economic and functional sense.

One of the other services I considered was Magpi. At the time I took the decision, it was more expensive than surveyCTO for our needs. If you, however, are just beginning to look at the problem, you should look at all options. I am sure there are now other providers we had not considered.

Fat on the success of my country

When I first visited Ghana in the early 1990’s, there was a very noticeable relationship between BMI and wealth. Rich people were far more likely to be overweight and obese than poor people. That visit took place about ten years after the 1982-1984 famine. Some of the roots of the famine lay in natural causes resulting in crop failure and some lay in local and regional politics, and it was small children that bore the brunt of it. Less than ten years after the famine it was perhaps unsurprising to see that (on average) the thinnest were the poorest, and the fattest were the richest.

Working in Australia in the early 2000s, however, there appeared to be exactly the opposite relationship. It appeared that the poorest were more likely to be overweight or obese and the wealthiest, normal weight. This observation was certainly borne out at an ecological level when my colleagues and I found an unmistakable relationship between area level, socioeconomic disadvantage, and obesogenic environments — fast food chain “restaurants” were more likely to be found in poorer areas.

So which is it? Are the poor more likely to be overweight and obese, or is it the rich? One of the challenges in working out this relationship is that it appears to be different in different countries. Neuman and colleagues conducted a multi-level study of low-and middle-income countries (LMICs) looking at this very problem using DHS Survey data. They found an interaction between country-level wealth, individual-level wealth, and BMI. Unfortunately, the study was limited to LMICs because the DHS surveys do not operate in high-income countries. While it would be tempting to extrapolate the interaction into high-income countries, without the data, it would just be a guess.

We don’t have the definitive answer, but a recent paper by Mohd Masood and me, based on his PhD research, provides some nice insights into the issue. We were able to bring together data from 206,266 individuals in 70 low-, middle- and high-income countries using 2003 World Health Survey (WHS) data. The WHS data are now getting a little old, but it is the only dataset we knew of that provided BMI and wealth measures from a sample of all countries, using a consistent methodology, all measured over a similar period of time.

Mean BMI of the five quintiles of household wealth in countries ranging from the poorest to the richest (GNI-PPP). [https://doi.org/10.1371/journal.pone.0178928]

The analysis showed that as country-level wealth increased, mean BMI increased in all wealth groups, except the very wealthiest group. The mean BMI of the wealthiest 20% of the population declined steadily as the wealth of the country increased. In the wealthiest countries, the mean BMI converged for the poorest 80% of the population around a BMI of 24.5 (i.e., near the WHO cut-off for overweight of 25). The wealthiest 20% had a mean BMI comfortably below that, around 22.5.

It is obviously not inevitable that as the economic position of countries improves, everyone except the very richest put on weight. There are thin, poor people and fat, rich people living in the wealthiest of countries. Nonetheless, the data do point to structural drivers creating obesogenic environments. My colleagues and I had argued, at least in the context of Malaysia, that the increasing prevalence of obesity was an ineluctable consequence of development. The development agenda pursued by the government of the day decreased physical activity, promoted a sedentary lifestyle, and did nothing to moderate the traditional fat rich, simple carbohydrate diet associated with the historically rural lifestyle of intensive agriculture.

We really need more data points (i.e., a repeat of the WHS) to try and tease out the effect of economic development on obesity in the poorest to the richest quintiles of the population. I would suspect, however, that countries need to think more deeply about what it is they pursue (for their population) when they pursue national wealth.