You’re probably here because you reached out with a stats question and I pointed you to this post. Thank you so much for your question! Chances are that I’d love to answer it, but I need some things from you before I try to help1.
In order to give you good advice, I need to understand your problem in a fair amount of detail before we meet. That way, when we meet, we can have a productive discussion that uses both of our time effectively.
I would anticipate spending 30 minutes to 3 hours on prep work before our meeting, depending on how organized your data is. I know that’s not thrilling, but I promise it is worthwhile.
Please take a moment to answer the following questions in writing. It’s really helpful to have answers in writing so that I can refer back to them if I forget anything or get confused about the details of your project. If any of these questions are irrelevant to you, just skip them!
What are you trying to learn or do? Please describe what you want to learn without addressing how you want to learn, using plain, non-technical language.
Why does this project matter? I am not an expert in your field, so it is really hard for me to know what parts of your project matter most. Help me focus my energy by explaining why your work is important or the decisions you will make based on your analysis.
Who is the audience for your work? I want to help you solve your problem in a way that your audience will accept, and to do that, I have understand your audience. What do your stakeholders or audience members care about? How much statistical background do they have? What kind of relationship do you have to them?
What kind of constraints do you have? Is it more important to get a correct answer or a quick answer? What kind of computational and fiscal resources can you allocate to this project?
How are you currently trying to solve your problem or analyze your data? It’s actually best to answer this question twice: once at a very abstract level, using as little statistical or scientific jargon as possible, and then once with lots of technical detail.
Next I’m going to need a detailed explanation of your data. Jeff Leek has an extensive guide on How to share data with statistician, which you should read and follow.
In brief: I need to know how the raw data were collected, and any data cleaning or processing steps that you took. You should send me both your raw data and your cleaned data, plus a description of the data cleaning process. It’s important to know about data cleaning and prep because those processes can greatly influence what kinds of analysis are appropriate. For both the raw data and the cleaned data, please create a data dictionary that describes what each row of data represents and a description of each column. Ideally the cleaned data is in tidy format (see here for instructions on how to tidy data in R
). Jeff Leek’s guide has extensive details about how to write a data dictionary.
My most frequent request for consulting clients is for a detailed data dictionary.
Once you have: (1) written answers to all the above and (2) data and a data dictionary (if applicable), send me everything and schedule a meeting.
I’m probably going to spend most of the meeting asking followup questions to test my understanding of your problem. Once I feel that I understand your problem, I can start to propose solutions. I might be able to solve your problem on the spot; it also might take me a couple days.
While you wait for our meeting, there are some other useful things you can do to prepare:
Organize your data really well. More often than not, I don’t get to do much statistical consulting because the data is disorganized. It’s critical to know where the raw data is, how it gets cleaned, and where the clean data is. Ideally, store your data in
.csv
files in tidy format. One of the most demotivating experiences as a consultant is when someone shows up with disorganized data with unclear providence. If I don’t trust the data because it’s been stored in a messy and disorganized way, I can’t do an analysis I trust and consulting isn’t very fun.Plot your data. The very first thing I am going to do with your data is make a bunch of plots. It’s helpful if you do this before we meet, because that way you have time to track down anything weird that shows up when the data get plotted.
Send me your code. If you have already started to analyze your data, send me what you’ve already done!
Thanks for taking the time to read through this! I look forward to hearing from you.
Footnotes
This post is based on Caitlin Hudon’s fantastic data intake form, which focuses on data science in industry. I have blatantly stolen some of her intake questions. In this post I focus more on statistical consulting.↩︎