Sept. 11, 2024

Bonus Episode: IVECCS Live Sessions - I Don't Think That Paper Means What You Think It Does! With Prof Steven Epstein

Bonus Episode: IVECCS Live Sessions - I Don't Think That Paper Means What You Think It Does! With Prof Steven Epstein

Hands up if you still remember much of the statistics you learned back in vet school… Some of us have looked at way more P nappies than p-values since our student days!

 

But this is veterinary SCIENCE after all, and part of science is reading the occasional paper. Reading papers—beyond just the abstract and the conclusions—and with at least a little bit of discernment, does involve some degree of understanding statistics, just to make sure you’re not being misled.

 

So, in this episode, we’re talking about statistics, and we have the best person for the job. (Don’t skip it because of that word—I promise it’s actually quite fun!)

 

Dr. Steven Epstein is a Professor of Clinical Small Animal Emergency and Critical Care at the University of California, Davis. His research interests include CPR, diagnostic testing in the emergency room, and antimicrobial resistance patterns, and he loves a bit of statistics.

 

Join us for an easy-to-understand, practical, and entertaining update on statistics, recorded live at IVECCS 2024. You’ll even learn a nifty tip to help you (finally!) remember the difference between sensitivity and specificity!

Hands up if you still remember much of the statistics that you learnt back at vet school.I'm not talking to you if you're in your low 20s and you graduated like 5 minutes ago.I'm talking about the rest of us who have more important things to think about, like dinner or what's showing on Netflix.Some of us have looked at way more P nappies than P values since the student days.
But this is veterinary science, after all, and part of science is reading the occasional paper.And reading papers beyond just the abstract and the conclusions, and with at least a little bit of discernment, does involve a little bit of statistics.So in this episode, we're talking about statistics.
But wait, wait, wait, wait.Before you skip this one to listen to the next episode of your favorite true crime podcast, it's actually really fun.I guess this professor, Stephen Epstein, who is a professor of clinical small animal emergency and critical care at the University of California at Davis.And here's a knack for making this stuff really easy to understand, practical and entertaining.
I promise we even have a Princess Bride reference for you.I sat down with Stephen on the couch live at IVEX, and for a final IVEX special episode, we chat about why and how to look at literature with a smattering of scepticism and a dash of understanding of those numbers.
You'll even learn a way to finally remember the difference between sensitivity and specificity.If nothing else, learning the stuff will make you feel.Just a little.Bit smarter, so stick with us.Oh, and if you're listening to this on Spotify, you can actually watch the video recording of this interview.If you look for the.
Video button now and before we jump in Steven is a return guest if you want to learn more from him and his colleagues at Davis and the hundreds of other world class specialists I've had the privilege of asking my dumb questions of over the.Years.Join our red fault nerds at VVN.
Dot supercast.com.And remember our IVIC special, which we'll keep live for a few more days.That'll give you 30.Percent of our annual subscription, which brings it down to well under 100 U.S. dollars for a year of great quality education.The link for that special is right in the show description.
OK, let's go.Welcome back.Good to see you again.Last time we chatted we were in Australia, in Port Douglas.Yep, now we're in St.
Louis, MO at IBEX.Your baby, your big conference.Yes, epic conference.Your program has done really well, thank you.And your topic is basically how to read literature.That's a summary of it.What's the title of your talk is?So I don't think that paper means what you think it means.
OK, we definitely will have a good Princess Bride quote.Explain The Princess Bride reference, I will admit.Ignorant.Seen the movie, I don't.Think I have is that bad?Yes, classic 80s movie.Is that a quote from the movie?Yeah.So they used, I don't think that word means what you think it means throughout the movie.
So it's very quotable and I just changed it to fit my topic.And I don't think that paper means what you think it means.Apologies for my ignorance on Princess Bride.Something to watch on the flight home.There we go.OK, great.Why?Why?Let's talk.I do like a bit of statistics, and last year on the program, Doctor Hopper got me to do a talk called Core Principles of Statistics, which actually got very good reviews of all things.
Apparently I can make that entertaining.So I think it was an opportunity to kind of highlight some really just basic literature critical evaluation to people at is accessible and that everyone can do.So there's always that time where we're sitting around like, Oh, I thought that was a great paper.
And then someone else who knows a bit more than you like this room and it's like, oh, apparently it's not a great paper.So just getting this like how to think about not just reading the abstract, but does that really mean?You mean there's more to reading a paper than just?Depends how much time you have for sure.
And we live in different worlds.In your world you deal with specialists and residents and stuff who are actually versed in research to some extent.Like I learned statistics at uni but I've not touched it since then.So I.I.I can't look at the numbers.I don't know what they mean really.
Should we, should we know?Is it worth review as a GP veterinarian who occasionally wants to read a paper to or get sent stuff Or you know, the reps come in and they say, oh, you've got this amazing new product, here's the literature.Is it worth educating yourself on P values and power, all that stuff?
In a very light version, I really do think it is.And some really basic stuff on study design, because I do see people having conversations with owners about what a paper means.And that isn't actually a take away point or that Rep comes to you about this brand new drug and they're like, here's our literature.
And when someone's upselling it, it's like you can always, you can make spin sound great, but that is like, is that really what that study showed you?So example of that was we brought out in our lecture was with the new monoclonal antibody for canine parvovirus.
So that's definitely getting marketed.And it's I think a drug that has a lot of promise.But the one study that is published doesn't provide clinical treatment to the patients.So it's an experimental design.It's a great background for designing a clinical study.
But to me, I can't translate that this is an amazing drug, which it looks like in that research setting into an amazing drug for a clinical patient yet.So I think that what to me was a great example of how study design impacts a clinical conclusion.
So I should probably learn how to do this because of what I do.I I disseminate information and sometimes papers are quoted and then I'll go and then I'll disseminate that without necessarily critically looking at it.So are are there things that are simple enough to discuss in this format here without spending hours reading or studying this up, that when we talk about study design or things to things that we should remember or look out for or be wary of or or keep in mind when reading papers?
Yeah, I'm going to think you brought up like what is AP value?Like to me that is a very simple thing that everyone can know and actually really affect how you interpret a paper, even if I just read in the abstract.So it's, I don't know, when I started learning statistics, that was like one of those amazing things where we arbitrarily said AP value of less than point O 5 is significant.
Like people, I think remember that from vet school.But then no one takes that extra 30 seconds to realize that actually means there's a 5% chance of the findings just a fluke.Yeah, OK.That is literally what that value means.So the lower that number, the less their chances that it's a fluke finding.
Lower being because it's point.OK, so now I'm going to show my ignorance.And I was actually really good at statistics.I scored well at duty and then the moment I walked out of that exam, I just put it all behind.Like we all did.So it said explain the P value again, refresh my memory.So that is considering the likelihood there's a difference between two groups.
OK, so the stats computer does its magic, which none of us remember how to do, and then gives you this value.And that's hard science.That's math.And then a medical community came around and say AP value of less than point O 5.As a community, we're going to say we believe this result.
So point O 5 less than meaning point O 4 or point O3.Point O2 point OOO one.So the lower that number is approaching 0, the less chance that is a fluke.OK, but you're saying point O 5 still means that there is a. 5.
Percent.That you found a difference and there really isn't 1.OK, OK.And you say we don't have to worry about.Treatments, decisions on that.So 5% doesn't sound like too bad odds though if it's, let's say, well, it's suppose it depends on the therapy or what they're trying to prove, right.
If it's a, if it's a, well, there's, there's a 5% chance that your patients are going to die when you use this, then maybe that's a different scenario.Then yeah, there's no harm in doing this and it'd actually be 95% sure it's going to help.Does that come into it?That does a little bit into it.It's just a lot of it is the fancy math on how you get to that number and veterinary studies.
We all do these small studies.So when you start looking at numbers and say all right, I'm just going to change one value each way and all of a sudden now you have a non significant finding.Oh, really?Yeah.So is that why we worry so much about study size?So if I have a case series of 12 patients, does that mean that a small one patient more or less could make a big difference to my?
People you something significant to something that's now not significant.See, I forgot.That so that's why it that 5% chance actually doesn't matter it means one out of 20 studies if they all had that number OK are wrong in your.Conclusion So what should we be looking for from numbers less than point O 5 so we.
Should clue where that is to 0.So less than 001, you're more likely to believe that's a 10th of percent chance.Now you're 99.9% sure you're right versus 95.And the impact, is that really going to change something I do for a patient.So if this is going to really say I'm altering my clinical practice, I would like the stats to be much more down, closer to 0 than 5%, OK.
So take away so far as when I next time I read some sort of literature, I'm going to look at actually look at the P value.Let's just go.OK, there's A and and interpreted that in light of what I'm.Yeah.Is this a clinic practice changing?Yeah, paper or is this a.Interesting.
It's like, oh that's interesting.That's cool.Yeah, I'll keep an eye on that.And then to help interpret that I'm going to look at study size.So if it's a less than point O 5 and it's a 20,000 patients over a 20 year period, then that becomes you go OK, cool, that makes OK anything else?
The other kind of big thing we covered in there was some examples of prognostic values and how we really should use those OK when we're talking to clients.So kind of classic that everyone's probably heard of is high wack takes and GDV we think is a predictor of death.
Yes.I think.That we used to.I think there's a lot of people who still say things like that, so we know sensitivity, specificity, and predictive values are all words everyone hates.Yes.And I don't like them either, but knowing what a predictive value is, if it's what I should know about something, if I'm going to talk to a point.
So we looked at an example where it was lactate and dog systeptic paranitis.OK, so they show that if it's above 4, it was fairly sensitive and specific for mortality.They had difference.
The lactates between dead and alive dogs were different.OK, so there's a lot of good superficial evidence that lactates is bad.But then should I say to my client, well, your dog's lactate is above 4, I'm worried he's not going to make it because that's really the piece of thing I want to know as a clinician.
Yep, Yep, Yep.So from that are.We going to go on this treatment journey, is it?Is it worth spending $10,000?Because I'm really worried his prognosis is poor.So one of the things that we did in that papers go through the actual modeling of what is this predictive value.And in the end it showed up, it was about 75%.
So that means if I told 4 clients whose dogs presented with a septic abdomen and lactated 4 they were going to probably going to die one of those clients, I'm wrong.OK.That seems like a lot of wrongness to be to having that.Especially when it's life and death decisions, right?
Yeah, yeah, for sure.So you're right.Specificity, Sensitivity.I understand it, but I always have to pause and think about it.It's it's not natural to anyone, no.I don't know why it is so hard.I suppose it's because the words sound kind of similar and it's very similar concepts, but just look that from a different angle.
So just so we all have it in our heads, go through it in your words.So where I cut it easy as I remember, specificity has AP OK.So if you have high specificity, there's low false positives.High specificity low OK.
Yes, yeah, sensitivity has got an end.So if you have a sensitive test, there's low false negatives.I like it so that is.All I remember about those two things and then I can go backwards and figure out the rest.Great.OK, cool.So and I'd have to write that down.Yeah.OK, but in the end.
Oh.They're both are low, so it's low low for both the ones.Low positive, low negative.Yes, I'm false.That's cool, I love memory tricks like that.Yeah.And then, but remembering it's really the predictive value that's the most important thing on whether you're going to actually have a conversation with a client about whether this is predictive of death, whether this is predictive of of treatment failure.
Yeah, yeah.So that's the most important number.And how do we get to that number or how do the studies get to the predictive value?So those values are based on the sensitivity and the specificity.And the biggest thing is how prevalent is that disease in your population?
OK.So sensitive specificity, those are fixed numbers, but what influence predictive value is really if you actually have a one in 100 or A1 in 1000 chance of that, then it becomes less relevant whereas yeah, so.So like the very classic was a leukemia positive test on a snap.
If I'm running that in a population of seven-year old cats that are not ill and it's part of a chemistry, it's a bit off and I get a positive that's very unlikely that that cat really hasn't.And and your sensory specificity ties into that.
So that test is about 95% sensitive 95%.Specific, which is sounds great and good, right?Yep, that's a really good test.But does that mean the cat really has leukemia in that population?I will be dubious.Now I still change my population.
I work in a shelter and I'm running this on tests that have been cats that are kittens that are anemic and non regenerative anemias.Now that's the prevalence of FLV in that disease population is very high.So that positive test in that group I could hang my hat on, that kitten has SCLV.
Because specificity, sensitivity, because it's a percentage.And this is exactly why it's important to have this talk in my head.It was always OK, there's a 5% chance that that I'm wrong if it's a 95% on on both, but you're right that percentage, how do you figure out that predictive value or, or will that will the paper tell you that?
Some of them do, some of them are don't.So there's the formulas of your true positives over test positives, and true positives is positive predictive value.So it's the same kind of crazy two by two table of true positive, true negative.And we all hated seeing those things and you memorized it for an exam and got rid of it.
But I feel like this misconception that the more important thing is a sensitive in the specificity, which is when you're validating a test.But we're clinicians.Yep.And in clinicians, it's really the positive and negative predictive values that alter how we interpret those tests.
And that little piece of information got lost in our memory bank.So how do we make that practical because we don't always have on hand like I can look at the test, the test will tell me like, oh, I join on the podcast, I'll ask, I'll ask the speakers and they'll say, yeah, 97% sensitive.
But how do we?Is it just the keeping in mind?What do you have in front of you?Saying what is my disease?What is my?Population am I?It's, I'll just use another example.It's in Australia.We have the snake venom detection kit.
It's a very important answer, right?But we know they're not 100%.Accurate.They're highly specific, but they're not 100.Percent, yeah.So if I have, I was trying to think but basically it comes down to how how much do I think my patient is likely to have bitten by been bitten by a snake it's going to influence?
If you ran that test on every dog that came into your clinic, yeah, you're going to have false positives.Yeah, for sure.If you ran that test on every dog who had flaccid paralysis, you're going to positive it.Really.Becomes highly predictive.You change the prevalence of when you're running the test.
Yeah, gold, gold and it or the other scenario is dog comes in and he's been seen with a snake and I ran it.Yeah, then it means something for sure.Whereas if, if it's just as you said, just a random, then OK.And this is really the principle why we say don't just run random tests, run them for a reason, otherwise you don't know how to which.
Which I've had this conversation before with Professor Jill Madison.I was talking about this in pre anesthetic blood screens saying that all of those things that we do have and we're running it on all animals.So then suddenly that predictive value becomes.So then if you get the high level enzymes unexpectedly and then we freak out and keep in mind that that that could just be that one patient that falls out of.
That range and the other important thing to remember about that is our reference intervals for all those are for 95% of the population.So that means you have a 20 profile chemistry test.One of those tests will be out of that reference interval just because it's a patient and not abnormal for that one.
Yeah, yeah.It begs the question about the health screen.Annual annual bloods.They're good baselines as a marker for when the animal changes.But when you're just running it in clinically healthy patients and we've all done this like now, I'm not sure what to do with this number.
OK.Are there other things that we as non researchers and non statisticians do wrong or misinterpret or misunderstand from the talk or is that does that sort of cover what we talk?About I mean, the other thing is when authors make a conclusion not supported by their data and that's the harder one to pick up.
So the classic one of that was a paper that was published in 2021, veterinary surgery, and they looked at what's the difference in mortality if I wait to cut a GDV to the.Morning.Somebody told me I should ask you about that paper.Or I'll put it immediately.
OK, yes.So I read that.Paper.All and their conclusion was it's OK to wait.Which is what I read.But when you go back and look at it like, what does a hypothesis S tell you?The likelihood of finding a difference?So all we could ever do is reject.
There is no difference.You can never prove there is a difference.That is a core tenet of statistics is that P value.The smaller it gets, the more likely you believe there is.Note you reject that there isn't a difference.
That's that really weird thing about stats is we're testing the null hypothesis of no difference.I'm not testing.Is there a difference?That's that's new to.Me mind boggling or?If I had it before, I completely forgot about that, OK.So this paper was actually really well designed and like, I feel like they did a great job of trying to set up the study, but the problem is they set it up as a study to look for a difference, OK?
They didn't set it up as a study to look for the same value being no change in death.So the problem behind it is they didn't find a statistically significant difference in waiting and mortality OK, But they powered their study to look for a difference.
What do you mean powered this study?I don't understand that.So that is doing some fancy statistics ahead of time to say how many dogs do we need to put in here?OK, so they set it up with a 20% chance of not finding a difference by chance by just pure fluke.
So they did their study, they didn't find a difference and they concluded there is no difference.But because of their powering and the number of dogs they put the study, there's a 20% chance there is a difference and they didn't.Find it.OK.And that's just the numbers game.
That's just a numbers game.I struggle with this.I'll have to sit there and think about this carefully, because yes.So what you really want to do is so it's really you're looking to do a non inferior study.So I want to know by doing 2 treatments I have the same outcome.Yep.But most of what we do in vet Med is with two treatments.
I'm looking to see if one's better.Yep.So this is a very specific time we're trying to prove there isn't a difference.We're not trying to prove there is a difference.And to prove there's not a difference you need way more animals.So they would have needed 1200 dogs with their mortality rates to prove To me as a clinician with 95% certainty, waiting to cut a GDV doesn't matter.
And how many did they have?About 110.OK, so you said good study design.Oh, everything checks out except for the numbers.They just needed more numbers so their conclusion that not that waiting doesn't matter isn't 1 you can make from their study design.
And you can see why we get tricked because I read that I was like, wow, you know, it would influence not what I do, but maybe it was what I don't do because it means if I can't cut something for whatever reason now, I would have concluded, oh, well, there's a study that said it's OK to wait.
So I'm not going to, I'm not going to euthanize it based on.That which I don't know if we could wait or not like that is a really hard study in veterinary medicine to do.We're going to have to have 1200 dogs that get equal care in each group.So you're going to need multiple universities doing this.
And then are we all doing the same care?And are they all had the same amount of money so we could do the same amount of post op care?Which is a very complicated study, right?Yeah.So in the end, so like to account for all those things you probably need 23000 animals probably.So with the numbers in that study, if, if they could, they framed it differently to make it a useful finding or is it was it they specifically wanted to show no difference and that's the that's the framing matters in that.
I mean, I'm going into it the bias that was a bunch of surgeons doing it.So I'm assuming they're looking for a reason why they don't have to come in in the middle of the night and cut a GDV and their goal was to hopefully say this is evidence to not do.It.So if that was their goal, they just, that's not something they could ask and there's a lot of other good information in their paper they are that we found out about GDV's.
But that sole one of saying can I wait?They could answer that.Well, they could have answered the opposite, that it's not OK to wait.They could have OK so if they found a big difference if if the waiting dogs died, 50% more likely yes.Great study.Yep, that we could have shown taken it off the table, but if your goal was them looking to prove I can wait.
They didn't have the right numbers and study design for that.Random question, I don't know if you have the answer at all because I'm listening to you going should this there's obviously a lot I don't know and I don't understand and I don't know how to do those numbers that you're doing so short of me re studying my statistics and then every paper I do do you and this could be all fair as well.
Do you you work with with AI stuff at all?Do you do play with?A little bit that I've been playing around with.Because it's a smart thing, right?And I could, I'm sure JGPT knows the basics of statistics.I I keep wondering, could I just put a paper and ask a specific question, say here's paper keeping in mind the P values and the study design and stuff.
Can you tell me what I can deduce from it?Have you ever tried that?I want to try that.I keep thinking I haven't tried it but I need somebody.To to check if it.Yeah, that's a really interesting question.Because it makes it then accessible to to people who don't do this for a living like me.
I'm like, all right, that paper sounds really cool.I'm going to start changing things in my practice.I can either phone Steven and say, Steve, can you read this paper for me?Tell me.If it makes sense.I mean, the reality is that's what the peer review process is supposed to do.Yes.So that's, so that's my problem because I, I feel, I've always felt like I know this science is not perfect, but I trust in science, right?
And I have to because I don't have the time and energy or the intelligence to be that critical about every paper that I that.I read and everything.So I, so I feel like if I read something published in a reputable journal, somebody else smarter than me has checked this and it's not going to be total bullshit.But is that not necessarily accurate?
And I have had other people say, no, you can't, you can't.Do that.So I mean, I think it's it is a challenge in that there's very few statisticians in veterinary medicine.OK.And so how do people get good at knowing study design?
We have to research it.But now you have how many papers in veterinary medicine?So how many people are really qualified to comment on study design in there?So you have to we, we don't have enough qualified reviewers to put everything through a really rigorous statistical review comment.
So that's how things get out there.And people think it's like, oh, this seems OK, I've seen in another paper.So like, yeah, that's probably fine.And the honestly part of what I think has happened over the last 1520 years is statistics programs have gotten easier and easier to use.
OK.So when you had to do all your stats by hand and you calculated it up and then he had looked up in a textbook that for this test I got this Z value that equates to this P value.Like only statisticians were doing statistics.Now anyone could download the program and say run this test and the program runs it.
Program doesn't know is this the right test or not.OK, I see what you mean.So that's how you could get extra data out there was not.So the programs are mathematic, are really good at the maths, but but they don't understand context and study design basically.
It's the old coin of old computer jargon.Garbage in garbage app.Gotcha.So I could tell it to run this test.It does.It will run the test.I could tell it to run the wrong test, or I could tell it to run the right test.OK.So I try and make these episodes useful for general practitioners.
At the moment I feel like all I'm taking away from this is shit.I don't know what you're trying to be cynical.Can we take more from it to make it better at least?Or is it just don't believe everything that you read?I think it is don't believe everything you read and to be honest, in a small veterinary paper B.
Small B case numbers you mean?Yeah.So like 5000 patients like look at it and say like the authors make really strong conclusions or our results suggest there may be this finding versus the people who state we strongly believe that divorce is done.
So I think you have to the people that are stronger, it stated.The more concerned you need to be to look at it critically.Yeah, OK, OK.So the only papers we can trust if if it has Epstein on them.And there's a lot of good people out there doing it.And I mean, part of it is if there's like in human medicine, it's great.
There's lots of like in critical care blogs that are out there.Like here's the benefits of this study.Here's the limitations of the.Study.So like having more of that in Vetmed where people who know, like here are the highlights of the paper, here are the concerns about the paper.
That's something that I'd love to see, but I mean, I'm not sure who has the time to do that.Yeah.Is there anything more that I've missed out on?I've I've taken a lot from that that was really useful even even just reviewing what AP value means.Everything.Those are really the big things.And like, it was like basically I had three main points in there and you've heard them and well, somehow that took an hour for the lecture.
I was going to say no, not right.I'm fly's been having fun, but not that much.Steven, thank you so much for sitting down.I like that.That was a cool talk.And I see this is the the thing with our profession as well.And I could say this because I am AGP vet this.We often just want there's so much to know and so much to do and so much to think about that for someone just want somebody to give us the answer.
Yep, we don't at at the risk of offending people, we don't want to think like scientists, even though it is virtually science.And I'm 100% guilty of that.That's why I could.Say this, I think we all are.I mean, I don't critically review every paper.Like sometimes like read the abstracts, like, oh, OK, that's not going to change what I'm doing.
So I'm not like spending the time reading all the methods and thinking about them.It's when it's really going to change what I do as a veterinarian.And that's when I like.I need to make sure I'm making the right.Maybe that's a a good take away to say fine, read stuff, take it on board.But if you're literally going to change something really important to how you treat your animals, then you should probably take the time to look at are they?
Talking to someone else.Is there some I wish I could go into into details like are they are they examples out there or big, big well known papers that that you feel are not necessarily true, but that.Might be no, I'm not going on.I was going.To say, OK, All right.
Thanks, Steven.Always a pleasure.Before you disappear, I wanted to tell you about my weekly newsletter.I speak to so many interesting people and learn so many new things while making the clinical podcast, so I thought I'd grant a little summary each week of the stuff that stood out for me.
We call it the Vet Vault 321 and it consists of three clinical pills.These are three things that I've taken away from making the clinical podcast episodes.My light bulb moments.Two other things.These could be quotes, links, movies, books, a podcast highlight, maybe even from our own podcast.
Anything that I've come across outside of clinical vetting that I think that you might find interesting.And then one thing to think about, which is usually something that I'm pondering this week and that I'd like you to ponder with me.If you'd like to get these in your inbox each week, then follow the newsletter link in the show description wherever you're listening.
It's free and I'd like to think it's useful.OK, we'll see you next time.