A few weeks ago I was swearing at my computer and had to go buy a Twix bar from the canteen to calm myself. There was some frantic chocolate scoffing that afternoon.
The source of my irritation? Statistics. I am not a great wielder of statistical power, but I am very interested in their dark arts. This leads to the common situation where I know I’m doing something wrong, such as using stepwise regressions to build a model, the fact I use frequentist over Bayesian probabilities, and even my over reliance on P Values to communicate scientific results, but I just don’t know how to do it better.
I’m expecting there are three reactions to that sentence. The first is “I don’t have a clue what any of that means”. Don’t worry, my grasp of it is very shaky, and it’s not something I’ve ever been taught. It’s something I’ve discovered through hanging out with statisticians.
The second is “Man, I have that exact same problem, but every time I try and learn how to do it, I can’t figure it out.” My friends we are in the same boat. I do not feel I have enough statistical training to tackle these problems.
And lastly the third kind of person is reading that and thinking “Well obviously the answer is *string of gibberish*”
I have had good stats teachers, but they are sadly few and far between, and there are a lot of poor stats teachers who get in there in the mean time and deeply confuse me. I have a lot of good friends who try to teach me and I end up glazing over. What I mean to say is that the following is not personal – and it’s as much a criticism of myself as those who have tried to teach me . . .
Loads of statistically savvy people are willing to teach, they just don’t seem to get it through to me. So seeing as I’m supposed to be quite good at this education malarky, here’s my guide to teaching me statistics.
Make Sure We’re Speaking a Common Language
Yes, we really have to start with the basics here. Statistical language is incomprehensible to me. And that’s because we’re all taught differently.
As an example, I refer to response variables as ‘y’ and explanatory variables as ‘x’. A good friend of mine refers to explanatory variables as ‘y’ and response variables as ‘a’ or ‘b’. This causes huge confusion whenever we ask one another stats questions off the cuff.
And the common language refers to more than just making sure I understand what your big formulas are saying. This is what the homepage of R looks like. R is a sophisticated and free statistical tool that we should all be using. I’ve seen more intuitive GeoCities layouts. This is written by and for coders and I have to explain how to extract a zip file to some of my colleagues.
Why are you writing your R manual or your page about your fancy new statistical technique? Are you trying to share it with others who think like you? Fine, carry on. Are you trying to improve the statistical techniques used by frustrated, busy scientists who haven’t had more than a few week stats CPD a year?
Use your words.
Now the R Book is a good start for people wanting to learn R but I still wish it was written by Andy Field, who’s Discovering Statistics book is still my favourite bible, even though I don’t use SPSS anymore. If you’ve read both, you’ll see the difference in style is extreme, and I think it’s because, as a social scientist, Field has a better grasp of how people think. (Although speaking of GeoCities sites . . . I still love the book!)
Edited to Add: I lie! Andy Field has written an R textbook, which I have just bought! Thanks to Comparatively Psyched for the heads up!
Teach Me Something I Can Use
This may seem counterintuitive to what I said further up, but if you’re trying to teach me, say, an alternative method to a stepwise regression, don’t just give me a dataset and tell me the code to run.
Tell me how to arrange my dataset in the way its needed. Ask me questions about my data – get me thinking about the complexities of the experiment I designed. And then tell me the code to run. Don’t forget to walk me through the output. For example, the documentation for the lars package in R explains how I can run a least angle regression on a sample dataset. Great. I can copy and paste that code ad libitum. Can I get it to work on my data? Even though to the best of my knowledge I’ve arranged it in the same way? Nope.
Get me to work through the whole process and you show me where your new method fits into my life.
What’s the Application?
I recently sat through a stats seminar where someone was showing off a new method. In the same presentation they briefly glossed over ternary plots as a way of showing off new data.
Applied scientists work in a world that judges us on the number of papers we produce and the impacts our papers have. That is literally how we get our baseline funding.
I don’t disagree that there are lots of problems with publishing but you’re asking me to relearn how I think about statistics, and then to communicate all this in a real-world paper with real-world data (that doesn’t always play nicely). If you’re asking somebody to use an amazing new technique, you’re asking them to get that past reviewers (who more often than not will not know your new stats).
If you have a great technique but it won’t actually give me a conclusion that I can use to improve animal welfare, then it’s not going to help me. And related to this . . .
What does it Mean?
The truth of the matter is that the statistical tests we commonly use are ‘plug and play’. We get into the habit of checking the things we want to look at noting the laundry list of caveats in a footnote.
Walk me through an example of what my results mean. If you’ve got me using my own data, tell me if this result confirms or denies my hypothesis, show me why, give me some indication of the next step.
I’m amazed at how many people don’t do this when trying to explain stats to me. You’re interested in the method, I get that. I’m fascinated by recording aggression in groups, but there’s a time and a place to discuss this, or just to tell you what aggression means.
Don’t Assume I’m Stupid
I see this all the time when statisticians are trying to teach something to scientists. They spend a very long time on the basics because our fundamentals are so scattered. This is not the most helpful approach. The other method I often see, when I say I don’t understand or even hesitate, the statistician repeats what they’ve said, more slowly and slightly louder.
We’re not stupid. Try teaching us a complex problem in an environment we’re familiar with (i.e. with our own data) and you’ll be surprised how many fundamental skills we’ll pick up because of it. To use a simple analogy, if you wanted to teach me how to maintain a car, wouldn’t you be be better off showing me how to take an engine apart rather than build one from scratch?
Don’t spend half our time explaining the problem to me – I get that there is a problem with the statistics I already use, it’s why I’ve sought you out. Is a finer understanding of the theory really going to help me use this test in future?
Finally – Why Are You Teaching Me?
This blog post sounds very whiny. Trust me, I know.
I know I should have learned all this earlier in my career. I know I should use R every day until I’m fluent. I know I shouldn’t using all these out of date stats. But the sad truth is that I haven’t, I don’t and I can’t.
I want to change, and I need the great community of statisticians to help me. So if you’re a statistician who wants to help me and people like me, this is how I’d suggest doing it.