Nov. 13, 2025

149: AI Radiology for Vets: How Accurate Are Todays Tools Really? With Dr Steve Joslyn

149: AI Radiology for Vets: How Accurate Are Todays Tools Really? With Dr Steve Joslyn

We scrutinise one of the most practical yet under‑examined advances in veterinary practice: AI‑based radiology interpretation tools. 

 

I sit down with veterinary radiologist and Vedi entrepreneur Dr  Steve Joslyn to unpack the rise of AI-powered radiology tools in general practice. But this isn’t just opinion: Steve reveals the findings from his team’s recent study that put 6 commercially available AI radiology softwares in the spotlight - or up on the light box - to assess whether they deliver on what they promise.

From how these systems are trained, to where they shine (and where they fail), this conversation gives a no-nonsense look at what AI can actually do for your diagnostic imaging workflow. 

 

What You’ll Learn:

  • How these tools are built: Neural networks, down-sampling, and the truth behind “ground truth”.
  • The data dilemma: Why most AI tools perform best in theory, not in general practice.
  • Where they fall short: From image quality issues to breed bias and external validation gaps.
  • New accuracy data: Insights from Dr Joslyn’s pilot study comparing six commercial AI tools.
  • A decision-making playbook: When to trust AI, when to double-check, and when to avoid it entirely.
  • Ethics and workflow impact: Who’s responsible? What do you tell clients? Can AI triage be trusted?
  • How to stay future-ready: What’s coming next – and how to adapt without compromising care.

🎧 Listen now for the tools to ask better questions about AI in your clinic.

 

Find out how we can help you build you in your vet career at ⁠thevetvault.com⁠.

Check out our Advanced Surgery Podcast at ⁠⁠⁠⁠⁠⁠⁠cutabove.supercast.com⁠⁠⁠⁠⁠⁠⁠

Get case support from our team of specialists in our ⁠⁠⁠⁠⁠⁠⁠Specialist Support Space⁠⁠⁠⁠⁠⁠⁠.

⁠Subscribe to our weekly newsletter⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠here⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠for Hubert's favourite clinical and non-clinical learnings from the week.

Join us in person for our epic adventure CE events at ⁠⁠⁠⁠⁠Vets On Tour⁠⁠⁠⁠⁠. (Next up: Japan snow conference!) 

 

Concerns About Veterinary AI Radiology Software

  • Actual Accuracy: External validation studies conducted by Dr. Steve Joslyn's team found that the accuracy of the six leading commercial veterinary AI radiology interpretation tools is currently closer to 50/50 than the 90% to 95% claimed by the companies.
  • Technical Fragility and Lack of Robustness The models exhibit concerning behavior when confronted with real-world scenarios:
  • Decreased Performance with More Information: Unlike a human clinician, if you submit two views of an abdomen versus three views, the AI's performance can actually decrease. This happens because most systems process each image individually, and additional views increase the chance of conflicting information, compounding the system's "miss rate".
  • Lack of Repeatability: When the same radiograph is submitted twice, but rotated slightly (no more than five degrees), the reports can disagree. Cases called normal in the first report were sometimes called abnormal in the second. This lack of repeatability is an alarming side effect.

Ethical Risks and Professional Liability

The lack of transparency and low accuracy poses ethical and legal concerns:
  • The Blind Leading the Blind: The common practice of using AI tools as "an extra set of eyes" to confirm a vet’s feelings is dangerous because if the vet lacks confidence, relying on an inaccurate tool is "closer to the blind leading the blind".
  • Trust and Misleading Claims: The tools may appear helpful because they generate the expected response. However, if vets trust the tools without understanding radiological interpretation, this can lead to increased morbidity and mortality for animals due to false positives (unnecessary surgery) or false negatives (missed needed surgery).
  • Lack of Transparency and Regulation: Unlike in human medicine (where regulators like the FDA enforce strict testing and marketing standards), there is no clear regulator in the veterinary world. Companies are not forthcoming about the training datasets or testing methods they use, forcing vets to rely solely on the company’s marketing claims (e.g., 95% accuracy), which may be misleading.
  • Liability: Most AI companies include terms and conditions stating that the interpretation is still the decision of the vet, meaning that if an animal is adversely affected, the liability falls back onto the veterinary professional.

The Data Gap in Veterinary Medicine

The performance difference between veterinary and human AI radiology is a matter of training data quantity:
  • In human medicine, algorithms can be trained on massive, consistent datasets (e.g., 120,000 frontal chest x-rays of the same species/breed, taken under perfect conditions with confirmed follow-up data).
  • Achieving this same level of parity in veterinary medicine is incredibly challenging due to the introduction of different breeds, rotations, and views. It is estimated that 15 million x-rays would be needed to reach the same performance level, a massive project that would take years.
For these reasons, the current consensus is to approach AI radiology tools with caution and recognize that they are not yet consistently helpful, and may even be doing more harm than good in some clinical situations.

 

Quick announcement before we start this episode, which is relevant to this episode, but also for what's to come here at the Vet Vault over the next year or so.If you're noticing A trend here at the Vet Vault in that I keep doing episodes about new technology, especially AI based tech for vets.I've noticed that trend too.
And a part of it wants to go, hey, this isn't supposed to be a tech podcast.We're supposed to talk about ways for us in the vet profession to be happier, for lack of a better word, in our vet lives.But then the reality is that there is a wave of technology coming our way, like it or not.
And I think, I hope, that many of these tools have the potential to affect how we work as vets and will therefore affect in one way or another our day-to-day experience of vet life.So I'm personally very curious, and I've decided that I'm going to keep following that curiosity to see if I can help all of us stay up to date, or at least as much as possible, by continuing to do episodes where we explore veterinary software and tech tools.
Not all the time, but judging by the way my interview bookings are looking for the next couple of months, I reckon every second or third episode might be about a software tool, probably AI, that has the potential to make your life easier.So let me know what you think.And don't worry, I'm very much still into the human side of it.
Science.Our next episode, for example, will be about a very human experience.Imposter syndrome.It's a goodie.And if there's one thing that Chachibiti does not seem to suffer from, it is imposter syndrome.OK, back to this episode.So I've just said that AI has the potential to change how we work and one way that it's already changing workflows for many vets is with AI radiology interpretation softwares.
It feels like a no brainer, right?Give the thing that is the master of pattern recognition a picture where effective pattern recognition is the superpower that will allow you to interpret it.Well, I'm certainly very well aware of my own limitations in reading rads, and if there's an effective and affordable tool that can make me better at this, well, shut up and take my money as they say.
But but only if it is better than me.And despite the fact that at the time of releasing this episode, there are 6 commercially available veterinary radiology AI tools out there in the market, has anyone actually asked the question?How good are they really?
Well, yes, someone has.Doctor Steve Joslin is a veterinary radiologist and a total AI geek, and he's part of a team from the University of Murdoch in WA that has been testing these tools as part of a soon to be published series of studies.In this conversation, we get a sneak peek into the findings and spoiler alert in 2025 in Vetmed.
Well, actually, I won't spoil it.Have a listen and design for yourself.Steve gives us a basic overview of how these tools work, how they are trained, and why that matters, especially in veterinary science.And of course, we dive into the report card of the six leading veterinary AI radiology interpretation tools.
Please enjoy Doctor Steve Jocelyn.Doctor Steve Jocelyn, back of the vet felt couch.Hey.Long time so AII did my own episode.
Everybody who knows me knows I'm I'm bullish on AI in veterinary science life in general, bullish but wary.I'm excited about it.I, I, I think I just somebody asked me the other day, why am I so into it?I was like, well, I think it's inherently because I'm lazy and if I can, Oh my.
God can't.Do work for me, then I'm kind of I want to look for it, I want to find it.Like for me, I'm lazy too, but it's because I get I'm lazy with the mundane and I want to skip to the interesting or the where I'm using my brain and if I can get rid of some admin, there's some mundane tasks that do that that I'm all for for sure.
Lazy is is one part and then also if if it's I can find a tool that can make me better at what I'm doing without that again, I'm all for it.So when I'm looking at my great clinical practice, I'm, I find the more I learn on my clinical podcast, the more I realize how how vast amount of shit I don't know.
And if I go well, if I can get tools that can augment that a little bit without relying on my skull Jelly, then maybe not a bad idea.So I've been having quite a few conversations about AI and virtual science and clinical things.But you the man to talk to about AI imaging, radiology interpretation, which is one of the things that's out there.
I know it's a big thing in in human Med to send is radiology CTMRI images and say, well, get the AI to interpret it for me.Tell me what's wrong, What am I missing?And it is spilling over into the vet space.There are some products out there and I know people, there are some people who love it so that we don't want to pay Steve to read our ads anymore because Steve's expensive.
So we get an AI to.Do it for sure.I mean, we're just going back like I'm also very excited about all aspects of AI, not just the image or lesion detection or image analysis, but more even on the large language models in the generative AI too.
So I'm, I'm feel like I can talk to those as how they're helping in veterinary radiology as well.I'm also a realist and I think what like where I would probably start is to say that AI technology, whether or not it's convoluted neural networks or large language models will to at one stage in our life start interpreting radiographs with such high diagnostic utility.
That's going to happen.There's a time scale in between there that is uncertain and what we're being promised is happening.You know, that that where we've reached that Mecca, that that heightened ability already is probably false and misleading.We will get there.
It's just how do we get there quickly and responsibly and ethically and promoting animal welfare in the meantime.So I, I know it's going to get there.I'm not saying radiologists, you know, we're, we're concerned about a job.I'm not at all.But it's just how to get there in a safe, productive manner without misleading vets and owners and adversely affecting animal welfare.
Yeah, that's one thing I'm learning with using large language models mostly.It's so politely good at so many things that what scares me is I'm just going to trust it all the time.And then every now and again it will blunder in a big way.And then I know this from my work or looking at clinical stuff because I'm still slightly skeptical.
So I do still second guess it.And every now and again there's a significantly large mistake that I'm like, oh shit, if I was just trusting it, I would be in trouble.So is it a similar thing with with because again, the people are using the AI radiology or image interpretation tools are saying it's amazing?
It's, yeah, identical.It's concerning in that it, if it looks good and acts like the way we want to, the way we expect, we're not critically assessing if it's actually doing its job.And the most common remark I get with that is, well, it's just an extra set of eyes confirming what I might be feeling or letting me challenge myself.
And that sounds like that's a helpful tool, but it's actually closer to the blind leading the blind and concerning to us.The the easiest way to assess this is just say with these radiographic AI tools.
Are the radiologists using them to make their life easier?Triage, you know, elevate cases, help them, you know, just oh, this one.Suppose he has a sponic mass.Let me have a look.The radiologists aren't using that at all.They don't they look at it and they don't trust it.They can see the flaws.So therefore it's OK for general practitioners to use it.
That that logic doesn't make sense in my head.But yeah, the same thing if it like the large language models, it looks like a response that you'd expect.So we therefore think it's being helpful.But for people that know that space intimately, it's that they can pick an apartment.
Saying this is actually putting our lives and animals lives at risk and therefore it probably shouldn't be trusted on face value the way it currently is and the way the popularity is.Is it worth discussing at all?Was it interesting how it works?I have some vague concept of how because this is different to a large language model.
Yeah, it's not generative.So very different.Again, I can get a little bit technical, but also coloring it with I'm a vet radiologist and tech enthusiasts and I love physics and and stats, but that doesn't make me an AI researcher in terms of making and creating and training and all that.
So, so that's beyond me.But let's kind of like break it down to how these start in terms of training and what is happening when you submit a ready graph.So we have a, call it a canine abdominal right lateral radiograph that becomes a DICOM file.
And do you remember like how big these files are easily between 2:00 and 5:00 megabytes depending on how big the plate is and how big the animal is.Now that is like 2, it's called 2 megabytes.That's like 2 million pixels, right?
So there's across the top and you have whatever 10,000 pixels and across the bottom you have like 5000 pixels and then each pixel has a intensity.That's what you're seeing that pixel density.Now there's no graphics card in the world that can process 2 million input values and then come out with one value at the end.
So they downsample it.And if you know what downsampling is, that's taking that image and making it more boxy.So a set of 10 pixels by 10 pixels becomes one pixel.That's how they process it.And so this pixelated downsampled image is actually what gets inputted into it.
Now there's all these hidden layers, which the way it works is saying, hey, I'm seeing an opacity in this dorsal region of the abdomen and one little node, I think it's called a node highlights as well.
I've, I'm here to look for opacity in the dorsal space.I'm going to highlight when I see it.And then that node gets the flow of traffic here.And then it goes to the next one.And then the next three nodes in the next hidden layer might go, is it big?Is it small?Is it, is it normal size?
And that's kind of how all these nodes are working.And when they're assessing for one label like Renomegaly, it'll basically say, cool, I'm looking at this image and I'm looking for is this kidney big or is it normal in size or is it small?
And all these hidden layers take that data and they go, you know, based off of the fact I've seen something here and it stands this region of the lumbar spine, I'm going to output saying there's renamegly presented here.And that's sort of how they work.
There's a lot of magic that happens in the middle.They're doing complex math, probabilities waiting.And the output is, hey, it appears to be amegly.But the whole thing is it's like like you can't just build an algorithm that then takes that image and then gives you an output because how does it know what to output, right?
How does it know to tell you that that's rhetomegaly versus a small kidney?So what they do is they say, cool, we have this infrastructure of this convoluted neural network and they say, cool, now we need to train it.So what do you think you would use to train these images?
Like if you're just like, hey here, we need you to train this algorithm now in your mine, how do you think that that process would work?OK, so if we're sticking to the example of a, it's a lateral image of a dog's abdomen.You want to show it a lot of normals, right?
To say here's 1000 normal dogs, remember that.And then here's here's some examples of an abnormal kidney.OK, so one step back.Who decided those first images were normal and how was that decision made?Well, that's going to be your job.
Potentially, but a.Panel of experts you'd want radiologists to agree that this is normal to say OK, we've just checked the these thousand images of Doug Abdomen's are normal go.Yep.OK, so how do we know that they're correct?
It might be that they all have a consensus or that they are all seen a lot, but everybody needs a gold standard ground truth.This is called ground truth.So we're saying, hey, this is normal, but our ground truth is a bunch of radiologists and turns out they don't see a lot of renal megaly where they are.
So that ground truth is skewed to that bias.So we actually have to take a step further back.We have to actually find confirmed normal abdominal radius where the where the kidneys are normal.So that might mean how do we do that?Is it confirmed off of ultrasound as well?
Is it confirmed on CT?Is it confirmed on gross anatomy post mortem?You know, this is a healthy Labrador and it's got no renal values.It's 2 year old and we just happened to take an X-ray of the advent because we are training our staff on this new X-ray machine that went in perfect.
That seems like it's a good gold standard.But the problem is, and I'll skip ahead, all of our radiology AI systems are trained off of radiologist reports.So the AI algorithm can never become better than the radiologist because the gold standard that it's aiming towards is the radiologist.
That's what's telling it if it's right or wrong.So we need to take a step further back and say, you know what, what did the dog actually have?And let's find those cases.So if we have those cases and say it was just, you know, these are a bunch of Labradors, we've got 10 of the same whatever litter mates or siblings and we're going to do ultrasound on them all.
We're going to measure the kidneys, make sure there's no abnormal renal function.And then we can confidently say those kidneys are normal for Labradors.And we put that into the the output layer and then we say, hey, regardless of this magic happening at each hidden layer, we want you to say that this is a normal kidney at the other end.
And this is called training at you saying, hey, I put that in there, but I'm telling you it's normal.And all of that probabilities and waiting, it gets finely tuned so that the start becomes normal.And that's how we train.And you give it more and more images, as you said, lots of images of normal kidneys on abdominal radiographs on the left lateral abdominal radiograph.
And over time, you can see this neural network paths to each node become bigger or smaller so that when it sees a normal kidney, it's likely to lead to a normal kidney out here.Now then you need lots of abnormals.Again, who's deciding is abnormal?
Are we happy with a radiologist that's measuring something to say that these are abnormal?Do we need something better?Is ultrasound needed post mortem of that?So the same thing.Now we're saying, hey, algorithm, I'm going to train you.I'm putting this image in at the start.This is an abnormally enlarged kidney.
And so then that whole algorithm, the same one that can depict normal, gets finely tuned so that when it sees A2 big kidney, it follows a path that leads to it.So the whole key here is that we need lots of normals and lots of abnormals to train it to.
Then when you test it, it's called validation and you can do internal validation or external internals.What the company should do saying hey, we grabbed our own X-ray from over here and we tested it and it performed whatever at a 95% accuracy or AF1 score of 70%.
It's a validation.Ideally you got to prove that it works without making up stuff and missing stuff.So that process would be here's 1000 normal X-rays, give it to the AI and AI says it's normal.And then we have radiologists.Or I know for a fact that there's a normal whichever technique and vice versa.
Does it agree with a panel of specialists in diagnosing the following pathologies?Yep.Yep.Yeah, that's exactly right.So depending on where they get those cases, if those cases were used in training, the algorithm is going to perform recently well because that's what it was used to train Like, yeah, I know that.
Word it's it's like when you do when you do exemplary by doing the the paper, getting the paper ahead of time and.I'll give you an example of like how this can go wrong is that there was 1 algorithm that was trained.I think this was human radiographs and all of the normal radiographs were a particular image and, you know, they're from a data set somewhere.
Maybe it was like screening for tuberculosis or something like that.So most of these, at least people are going to be normal.But then to train it on the abnormal, they got a data set from a hospital with pneumonias and fibrosis and all sorts of things, Mets.But each one of those images had a tiny little white square in the top right corner that was used to blank out the patient signal and the identifier.
And so the algorithm just learned it's like, hey, if I see a white square in the top, I'm just gonna say this is abnormal.And it was so good at doing that.It's like, wow, this algorithm is amazing.But that's because they tested it on their own internal data, whereas then they bring in, you know, an X-ray from a, you know, called a hospital in Europe with a, an obvious pneumonia and they test it and it doesn't like it as as abnormal.
And that's because it was trained on that other feature that it happened to pick up, which is really interesting.So the training is, is hugely important, particularly with a good quality of ground truth data like what you're establishing is normal, abnormal and then internal validating it.
So they can say, yeah, we tried it on our other cases here, but then the key is external validating it.How does it work on external data sets from a different hospital?Maybe it's the same region, but you know, the whole differences in what data set you're using to train internal validate and external validate can make massive differences on how it actually performs.
So what we're finding is you know, these algorithms which are available, you know, the six of them, I think right now that I can see that are commercially available, they've been trained off of gold standard abdominal radiographs from a university, right?These are perfectly aligned, perfectly columnated.
Radiographic technique is top notch.But then when you externally validate it with AGP sourced X-ray, and this is not talking negative towards GPS at all.They have a million more things that they're dealing with.But radiographic technique is not perfect.It's slightly rotated, it's columnated too much.
It's it's centered too cranially and as a result, those are different variables that the algorithm hasn't seen.So the algorithm is not going to say, wait, that's not of great radio graphic quality.It's just going to do its task.Is this kidney big or small?And if it's not seen that weird variation or that weird breed, it's going to make a call and it's going to be confident in that it has the green line that says it's there or it's not.
Don't know if it matters or not, but you're just want to go back to your your Ground Zero truth.Is it not fair to say that at this might be philosophical question, but back in the day when we started taking X-rays, but it became a thing with the first veterinary radiology textbook that for those would they have said a ground truth to say, well, we know that there's a normal dog.
We've done a bunch of them and then subsequently humans like yourself have specialized and trained and looked and looked and looked.Is it not sufficient to say yes specialists?Based on their training, which was based on the ground truth from 50 years ago or 80 years ago or 100 years ago, because they'll call it ground truth.
It's it might be the best available ground truth that we have.OK, now this is really common in any research publication.There will always be a requirement to when people are using, you know, doing research on new diagnostic steps, all of this is just diagnostic steps.
Everything has a sensitivity and specificity.Everything will lead to, you know, true positives, true negatives, false positives, false negatives.That matrix that I think's the decision matrix is inherent with any diagnostic research.So everybody understands what a ground truth is.
And most of the time, the reviewers of journals will say, how did you know that was like, you know, if we're talking about, you know, whatever the recent type with the brands of stuff, how did they know that was actually arthritis?How did they know that there wasn't any other disease present?
Like, the whole thing to call something normal will usually come with it.Yeah.But what did you use to establish that normal or establish that disease?Because that is fundamentally important.And that's what's happening here.Most of these algorithms are just trained off of a radiologist report.
And still, you know, those radiologists might be working very well, but even in their mind, they've been trained off of, you know, a bunch of normals and a bunch of abnormals.And they've gone to, you know, case rounds and post mortem rounds to figure out what it actually had.
And they're the human in their brain is slowly working out how to recognize normals and abnormals.And, and that's when we're teaching students, the biggest skill you can have yourself is just to look at as many normals as you can.So you familiarize yourself what it should be.So the the normal versus abnormal thing is, is coming up a lot, but it's permeative through everything we do I think.
So if you say what we are seeing, just to clarify, you are part of a group looking into the performance of these AI image reading models, right?Like who's your We the Royal?We, we, there's a, we've got basically a group of us at Murdoch University.
And what we've done is we've been able to source abdominal radiographs and actually we have a whole bunch of other sourced radiographs on thoracic series as well that we'll publish soon.But we've been able to find a whole bunch of cases that have come from GP.
So this is GP quality abdominal radiographs.Some of them are perfect radiographic technique, some of them are not, but they're typical of what we see in general practice and under the typical environment of how they take radiographs.So this is not a criticism to that environment.
It's just the realistic how the quality looks when it comes from, you know, one in the morning from.I'm not offended at all, Steve.I've taken many subpar express in my career, I'm not embarrassed to say.It so essentially, when we find those cases, we also looked for if this in a short period of time had another diagnostic test like a CT scan, abdominal ultrasound surgery or clinical response to a treatment that helped us establish what it actually had in its outcome.
To then look back at saying, cool, we have this abdominal radiograph with this ground truth.It had ACT and it had surgery at the same time 4 hours later.We're pretty certain that the features on the radiograph would be represented in the ground truth diagnostics.And then they were sent obviously to the AI companies already, but we sent them again and we said, oh, let's measure how repeatable the results are coming back from here.
And you know, the the AI company comes back and it says it, you know, it automatically is present and small intestinal obstructions present.Everything else is normal.Well, we know what it actually had.So we can say actually those are both false positives and it missed this thing that it actually has.
That's a false negative.And you know, maybe it did get gastric dilation.So that's a true positive.And we did that for 100 plus cases and we sent those 100 cases to each of the six available commercial AI algorithms.
And we basically processed all the stats as to how they're performing with a proper ground truth.And it's not comparing them to a radiologist.We still would need to do that as well just to make sure that we're comparing, you know, the radiologist to the AI.
But what we are making sure is that we're telling the the vets and the general public what type of cases that these AI companies are struggling with, when they'll miss lesions, when they'll find lesions.And all of it relates back to what data they trained it with and how they trained it and how they validated.
It's stuff that needs to be said in terms of the claims that these companies are, have are not what they appear when we actually open the hood and take a look at how well they're performing with a known case.And that's alarming because I know personally and, and, and a lot of my friends have seen cases that go to surgery that don't need it, have seen cases that needed surgery that the AI recommended that everything was fine.
And as a result, if people are trusting it, that don't themselves have an understanding of radiological interpretation, these animals are, you know, obviously increasing their morbidity and mortality as well.So that's just the alarming thing.And there's another topics we can talk about in terms of how transparent the AI companies are with how they're testing and how they're training, which they're not forthcoming about.
So what we're left with this just we don't know how their algorithms are working or where they work well, but we're just supposed to believe they're claims and take that and that's the concerning part.So the the wizard is not showing us behind the kids and basically.
What's happened?Not in in in the human healthcare, they have to tell everybody how they've trained their models, what technology they're using.Obviously there's intellectual property in some of these aspects which they don't have to disclose, but they do have to tell us what data set they trained on.
Like if they trained off of a whole bunch of normal Labradors, well, if you showed a bulldog, it's not going to perform very well.But looks are different.Well, they're not normal.Bulldogs are not normal.It should negative.It should, but like, but that's the thing.So if they tell you, hey, we only train this on Labradors, German Shepherds, Cavaliers and Boxers.
So we feel strongly in that our algorithm will work well in those situations and also in our internal validation.It's found, you know, 7 out of 10 small intestinal obstructions.We missed 3.So then if the vet knows that and they go, oh, this thing is saying it's a small intestinal obstruction, but this is a collie and it's a small intestinal obstruction that I'm worried about.
It's saying it's for sure there.I just have to be weary that it might either be over calling or under calling that.And if we know that information, you can make a better informed decision when you're working this case up at 2:00 in the morning.Did You Know, and that the reason those allergic dogs keep coming back to you with ear infections is because the Ebiquel or the Sider Point or whatever skin Med you use for maintenance is very unlikely to have enough of an effect in the external ear canals to prevent those frustrating ear flares?
Did you know that there's a published procedure for dealing with dangerously deep eye ulcers that involves using tissue glue in the eye?Did you know that you can use a blood glucose monitor to very reliably diagnose an arterial traumas in a cat?Or that a large percentage of disc dogs that present without deep pain can recover ambulation without surgery?
Much larger percentage than we used to think.Or than having owners present with their French Bulldog during anesthetic recovery dramatically reduces their risk of complications.If you know all of those things, then you probably already subscribe to the Vet Vault Clinical Podcast.
And if you didn't, well then maybe you should.Those are just a short off the top of my head list of things that I've personally learned from making a clinical episodes in the last two months.Imagine how much more you can learn how shit hot you can be at your job if you listen to all 618 episodes in our vault and kept up with us with two new episodes per week.
Come and join us at the Red vault.com.So in your, your study, and it's obviously six different softwares, but is there like a, an indication of how accurate or how unreliable is it?Like, well, 8 out of 10, it's going to nail the diagnosis, but there's a 2 out of 10 chance they're going to miss it.
Do you have enough to to make any sort of decision?We, we do, but again, this is our pilot study and we only had 53 cases across the board.I don't have the numbers in front of me, but we have a bunch of small intestinal instructions.We have a bunch of hepatomegalese pore source of detail like a fusions or peritonitis, renal megalese.
Like we have a bunch of each one of the labels that they're, that they're predicting, but it's really hard to get, you know, ground truth data off of cases.Like I don't know if you've spent time in our imaging department, but if you went in and say, hey, I need a list of all of your Labrador abdominal radiographs that have hepatomegaly, it's actually not an easy task to like just pull that up.
And you know, you might have to troll through your practice notes and find all the radiology reports and then use a language processor to extract it.And then as well as like there's some critical information about that animal that might be at the GP clinic and not the referral center.And so there's this mismatch.So it's actually really hard to get data from, you know, what it actually had to what the images showed.
And that's what took us a long time.We, we end up using my, you know, my other hat as vetty.So with vetty, we follow the animal.This, this is kind of a shameless plug, but it's also how we were able to do this.We follow the animal and their clinical notes for life and it's all locked to the microchip.
So we were able to troll through, find the animals that had abdominal X-rays and then look to see if they were scanned at a referral or emergency center that have ACT or surgery.And that's how we sort of collated a lot of this information.But other than that, it's really hard.
And so our data is limited and, you know, in terms of it's a small subset of external validation.But what we're hoping to establish is that people will send us more cases where there is an outcome proven by another modality.
And we'll start building more external validation to kind of keep these algorithms in check with what they're claiming and what actually is happening.But from the the small study and you know, we can't make any definitive recommendations, but is it like a 5050 hit rate or much better or or what, what are we expecting it's?
Closer depending on what metric we're looking at, but it's closer to 5050 rather than closer to the 9095 that they're claiming.Yeah, because we know that I, I know that image and you know it is very well as a a magician, it is an imperfect art.So we don't totally, my expectations are not that it's going to get everything right every single time.
You said we haven't looked at comparing those answers to a specialist radiologist or comparing those answers to your average GP clinician.I just needed to be better than me because as you said, I mean dogs get cut that don't need to get cut in in GP land all the time because some vet looks at it and goes, oh, I think this is something the end is not we.
We accept that as long as it's no worse than me, which is not hard.But if it's a limit, it might not be as good as Steve.But if it's better than me and I can send it an image at 2:00 in the morning for no real money, is it still a helpful tool?
Yeah.So, yeah, I agree.Like the radiologists aren't available.And if they are available, it's a ball link to send them a study.Like, you know, with the way that these Tele radiology providers are set up, you got to log in, you got to enter all these patient details.Sometimes you got to upload a thumb drive.You got to send the case like it's actually still a.
And and costly that that's the reality we try and make vet care affordable.Yep, and it's costly because of all these systems in place that they have to employ.They have to employ case managers to help find the missing images and the history and upload this.It's because that's a problem people or vets faced with that they're not going to send every case.
They're going to send, you know, the problematic case that they really need help with.Whereas the AI companies have said, well, I can tell what they've done is they've said, you know, let's just make it easy for the vets to submit a case, just send it.They don't have to send history or whatever, like we'll just analyse the images and tell them what it is back.
So what they've done well is that user interface, that workflow where it reads it automatically and sends them a report back, that is like just absolutely impressive customer experience and user interface.The problem is, is that what it's sending back is, you know, potentially misleading and a detriment.
And but again, yeah, you're 2 in the morning.Do you do you need this help?It looks right.And I know a lot of people have said actually it's great for my young vet that's just graduated their work up, you know, in the middle of nowhere.They don't have support, nobody's answering phone calls.So this just helps them.
That's the scary part, because it's probably not helping them.It's been blind.And the blind based off of the metrics that we can see is that it's probably doing more harm than good.But it's if it's the only thing available.So yeah, that's another ethical discussion.
Well, that's what I'm trying to get to is because it's tempting to use it for that.It's not just new grads.It's me.I, I know it's really interesting you say, but what stops us from referring more ads?And I have some insights now I, I have some data for a human radiology referral service that had a A1 month free trial and people loved it.
They sent heaps.It's, I can't say numbers, but let's say 20-30, forty cases in a month were seen and then payment switched on and it's down to one or two a month.So we want the help because we know it's hard.Totally.We all of us in this experience of it.But when we go, well, now I've got to go talk to the client on an already expensive case and say, yeah, I'm not 100% sure about this.
Can I spend another 300 bucks and not again, no offense to you guys, that's fully worth it.That's what you trade for.It's been a lot of time.I appreciate why it has to cost what it has to cost, but it's the clients.Just go, no, I'll trust your judgement.So again, if it's going to help me at all, then I'd want the help.
But if if at this point what I'm trying to get is at this point, are you saying probably not going to help you make decisions even if you're a new grad clinician?Yeah.I, I at this point, I don't think it's helping.I think it's doing more harm than good.
It'll change and it'll change if they get better data and better testing ability, but right now it's not helping.Where they offer a sort of an out is that a lot of these services will have an option for a human radiologist to interpret those radiographs.
And that's, you know what, maybe they, they give that AI service for cheap or free, and then it's to funnel more cases to the radiologist and whatever, that's fine.I, I'd say that maybe the AI should just be a bonus of interest.
But if the, the ability to send the X-ray is what made so much easier, we will see more being sent.And if you know, radiologist themselves get more efficient with their reporting using language models, the cost should come down.And that's probably the way it should go with the companies that are just offering AI only just an AI output with no human to send it to you.
After that, I think they're going to struggle as more and more transparencies sort of released on how these algorithms are working or not.Like one company in the paper calls every single radiograph abnormal.It'll find an abnormality with everything and then, you know, the training is like, well, you can slide the sensitivity bar over here and they're like, what?
So now I'm in control of how.Sensitive, really.It's crazy.It's a bit, it's a bit like the large language part.If I give it a piece of writing and say, can you improve it, it will always want to improve it.And they specifically say, yeah, it's very hard to get it.
Not even if I gave a Shakespeare.It will come up with some way to make it better.It wants to please.You asked it to check, so let me check.So, you know, it's really crazy like some of the findings of the study, if you know how we do 3 views of an abdomen and again, most.
People do 2 but we do what?Should be doing 3.In fact we cover that in one of your other podcasts, which is a nice little plug to it.Now what do you say?Like click the link in the top right of the window to find that other podcast.But the the AI when you give it 2 views, it has a performance level.
When you give it the third view, it actually goes down in performance.So you're giving it more information and it's going down.So can you guess why that's happening?It's an information overload.So because there's on the left and the right, things are slightly different.We know that gas moves and organs shift and, and is it then that it's conflicting information and then it can't, it can't agree with itself.
The nodes will go, oh, that's big there, it's not big there.And then it jams the signal.It's it's along the right track.So most of these companies, if not all of them, process each image individually irrespective about the others.And if you give it a third image or a fourth, there's an increased chance that it's going to overcall something or under call something.
So weirdly that kind of goes and telling you that, well, we give it more information to help it and actually performs worse.That's kind of alarming in itself.So the miss rate compounds the more things you give it the that's 30% mistake times 30% mistake.
Exactly.Wow.And then the other really alarming side is that when you give it the study, we also sent the same study again.And what we did is we rotated by like 1-2, you know, no more than five degrees.
So it looks identical to you and me looking at it slightly rotated.And the reason we did that is so that when those pixels were down sampled, we knew that they were all slightly different.It wasn't going to process the same image twice, right?It was going to process the image slightly different.
And that just really forces it to come make sure that it's presenting itself, you know, truly.Now the reports came back, they had some similarities, but they also disagreed with each other.So we have a lot that were called normal in the first report and called abnormal in the second report, or a lot of labels that were there before and, and part in another paper that will come later this year is looking at how repeatable that is.
Now the next logical question is like, how repeatable is the human radiologist as well?You send me an abdominal radiograph.Am I going to say the same thing each time or, you know, you send it to, you know, vet CT, teleradiology company and then you also send it to index.Are they going to agree there's two humans, but they're not the same human.
So it's hard.The way to compare that would be make sure that you send it to the same radiologist at that service and do it again for a very important reason.You can't let them know that they're part of a research study because then they might change the habit in which they report.
So you just, it's really hard to make sure we're doing this in a fully transparent way, but also replicating what happens in real life so that all the GP vets can actually say, oh, that would happen to me if I, if I did it in that case.So take away from me from now is yeah it's great, but not there yet.
Approach with caution and more information needed to see how good or bad these things really are at this point.Yeah, I I think that's the fair assumption.And, and again, a lot of people aren't going to be happy with me, but if there was a diagnostic test brought out in say it was a, it was a medicine or like whatever clinical pathological tests that they bring out.
And you know, they bring it out to the market and everybody's like, cool, this, this works in dogs.And they're like, Yep.And they're like, cool, do you have data to back that up?And they're like, I'm not sorry we can't show you.It's intellectual property.And you're like, what?Just trust this.Just trust this.It's just to diagnose.Pancreatitis, that's OK.
It's 95% accurate.You're like, accurate to what?So like this is what's happening.And because their software and because they can stay one step away and in their terms and conditions, it is very clear this is still your decision as the vet.We cannot be held responsible for anything, but it's your decision to take it or leave it.
And but at the same time they market saying, hey, we're 9599% accurate, 90% accurate.That's the scary thing.But if we held the same standards that we have for like physical diagnostic tests or like snap tests and and that there's way more research about how those PCR tests.
And.That's published, but with, you know, it's software, they don't have to do it.And there's no, you know, the governing authorities, the, if it's the FDA for humans in America, they're very strict with how these AI algorithms have to present themselves, have to be tested, have to be marketed and what claims they can say.
But in the vet world, who is the regulator?You know, it always comes down to the board saying it's your responsibility as the vet.But now we're saying, hey GP vet, make sure you understand all your decision matrix stats, your sensitivities, your specificities, your F1 scores, because it all comes down on you and the vets.
Like Jesus, it's 2O clock in the morning.I can't figure that shit out right now.So just, I just need some help.And of course it looks helpful.So of course, like this is totally understandable how it's all played out.I think, yeah, it'll be interesting what happens after this research is out over the next, you know, 6 to 12 months is when people and owners start going, wait a second.
I did consent to using that AI and they did go to surgery and turns out it might not have needed it and my animal died.I'm I'm going to be angry at somebody.I'm going to be angry at the vet and the vets can be angry at the AI company and the AI comes to go, hey, that's on you.
So that's, I think that's going to happen, but at least if we can get this out quickly, it just, it shifts the conversation to start going where are we at and how is this working and what should we be conscious of?And maybe the vet board be like, shit, you're a young vet working by yourself in the middle of Australia.
You have no help at all.And you trusted an AI algorithm and we can see in your notes that you thought it had this and the AI algorithm also thought it had that.Actually, that's understandable.Did you have any other options of help here?No, whatever, OK, I can see that.
But as long as that vet knows and the business knows and the owner knows these things are fallible, they do have flaws, then probably that's where we go.I think it's a good wake up call to the human teleradiology companies that they have to concentrate on user experience and making it easy for vets to seek their services and.
Not, not human medicine.Actual sorry.Yeah, meat and bone and flesh.Meat, Meat and bone Veterinary radiologist yeah.And that's kind of what I think is taking them by surprise.So they've reacted with position statements and white papers and all this sort of stuff going, hey, you can't trust these and, and, and a lot of scrutiny has happened, which is true, but the cheapy vets and the companies like we're giving them a service that's super easy to use and it's affordable and it's fun and it's exciting and it's all this new tech.
I think the radiology companies need to improve their service delivery and service offering and customer workflows and that's what I'm super excited about.But the end.Same with the radiologist.Like the reporting now, I used to take 40 to 40 minutes to an hour to report ACT.
You know, a multi region CTI can probably do that in 10 minutes now because I'm using a language model to improve my verbal spaghetti that comes out to make it concise, to make it corrected, you know, terminology.
So like I used to have Dragon Dragondictate that would take everything that I said and just write it out.And you still have to go and edit that into something.Yeah, Pretty.Yeah, yeah.So it's kind of like, kind of like the the note taking things that.Yeah, exactly.And extract relevant things and put it in a nice format.
Yeah.And so I've, I've trained one of my custom GBTS to basically say, cool, I like what you said instead of, you know, I usually say, you know, I'm presented with ACT of an abdomen.I'm going to inspect it with multi planar reconstruction, different planes, multi planes and the Dragondictate And the Apple dictate will always say multiplayer, like gaming multiplayer reconstruction.
But the GBT recognizes that and just fixes it.And it's like, cool, fix it.So you mentioned very earlier in pulling this data and we've talked about it very before.So basically tying your patient's data to the microchip, microchip number is the key that unlocks it all.
And it means that you can take the patient record.So like vaccination records and blood tests and all those things from the GP clinic to the referral center to the emergency center.And we don't have to send it.It's there in Verdi, right?Yes, that's what we do.OK, You guys are playing with AI.
Well, it's just to me, I just look like, OK, surely AI in what you do with this has to become a thing.What are you guys doing with it?So just on that, you don't have to take that.It's just because that microchip showed up and was scanned at the emergency hospital they now have access to.
That's what I mean recently, yeah, yeah, exactly.Not physically take it, but it's it's always there.It's always.For you to unlock?Yep.Yep, exactly.So going back, all of AI progression and training all depends on good quality, verifiable data.
What did that animal actually have?And then when you preempt language models and prompts and summary generations or clinical history, you know, letters for the referring vet.When you say build this for me, if you don't have good quality patient date on that animal, there's a chance that it will hallucinate something else in there.
So our motto or our aim is just to make sure we have everything about that animal.And therefore, when we do a task like a, call it a, you know, generate a summary for the last 12 months and our vets now when they start a consult, instead of trolling through the clinical notes, we'll load up a summary.
Just saying, hey, you saw this animal for Titus Media last year.It was lame in February, but really they're here for his itchy skin.You see that and you can, it'll tell you where once it's pulled that and otherwise, you know, you open it up and like, hey, how's the elbow lameness?And they're like, you fixed that a year ago.
You saw it for itchy skin two weeks ago.Why didn't you forget that?What it means is that before every consult, and I increasingly do that for exact that reason, is you need to sit down for 10 minutes and try and skim, really.And if it's a 10 year patient, there's a lot of history to quickly scroll through to make sure you don't miss those little relevant bits.
Yeah.And So what we do as well is that when that animal does present to the emergency center, that summary gets updated.And so we'll say, hey, we've also seen it at Cottesloe and Claremont recently.Here's a summary of that animal's history for you as the emergency vet who'd never seen this animal before.
And that's sort of how we're going.But we're making sure each event, each diagnosis is perfectly mapped to that animal in that microchip and across different clinics and different services and different labs.It's the microchip that is the unique identifier and, and that we've introduced that process, whereas before it was APDF attached to a different patient's file named the same patient name.
And as a result, that risk means that that AI would hallucinate or pull in incorrect information into a summary, which would generate.So.So let me just understand what this looks like practically.So this is independent of the pimps.
So let's say my referring clinic, let's say I'm on an emergency shift 10:00 on a Saturday night.My referring vet uses perms XI use perms Y patient comes in, we're with Betty.So I scanned the microchip and all this history regardless of the the PIMS pops up and now with an AI summary on the Verdi side.
So not on my PIMS.It's going to come up on Verdi to say hey Fluffy was at The Reg vets last week for vomiting.Here's its blood results.And his medications and.His medication and initial summary.Yeah.So we partner with the GP clinics to give them these efficiency tools that basically mean that on arrival in a GP clinic, they're there for vaccination or they're there for sick, they get a microchip scan, just microchip scan and a wait.
That's the only workflow difference.And then that's where we can see what's happening.Or we're involved with clinical pathology submissions with our other modules, we're involved with microchip registrations, we're involved with a few more modules that we're building.So just tools for the GP vet to use.
They're scanning every animal that arrives.It's just part of the way they do things.So when that animal shows up at an emergency center or a referral hospital, they can scan that microchip and the other vets are said, here's all the information you need.And instead of them trolling through all the papers and papers and papers, we present it for them and tell them where it is if they want to go look for it.
So an AI summary with little links that it can say.Yeah, we usually links or time.So you can at least tell your way and does that include because again, I know you guys partner with the labs, so that'll be lab results.So I, I, yeah, potentially when I unlock the patient, it will say to me, yeah, there's the bloods from last week when it was there.
Yep, You can see the actual result or it'll present itself, say, hey, this is a chronic, poorly managed diabetic.This is what it's being medicated with.This is its habits, and this is what the owners were concerned about.And that sort of summary is should be enough as an emergency vet to kind of, you know, hit the ground running.
And with the emergency workflow, we actually do a bit more.So they'll scan the microchip and we'll make it very easy because we've seen that animal at the GP clinic.We're not going to ask the owner to sit there and enter all these extra information that they did four hours ago, right?Just make it easy for them.That's really cool.
Such exciting stuff.I love this.You know, I like it.I because I'm lazy.I'm lazy and I wonder better.That's what it goes down.Equally as lazy.I heard a quote a while ago.It's like, and I hate it because it's true.It's like veterinary radiologists are either, you know, tall, blonde European women or fat, lazy men.
And I was like, yeah, every single is kind of fits all.Right, Steve, let's wrap that there.That's really, really cool.Oh, very quick question back to the radiology are, are they ahead of us in human because they, they've got a lot more training, right?
Because again, I think the risk is the reason I specifically ask this is people look at studies or research or, or results from the human world and goes, oh, it's amazing.This stuff's got really high hit rates.That's what I'm going to use it.But the differentiator is lots of training and regulation versus the Wild West with his thin, healthy Labradors.
Yeah.So let me, I'll end with this bar napkin math sort of thing that I do.So essentially they had 120,000 frontal X-rays of the chest and they said to each patient, try to be the same species and the same breed.It's like cool done right?
And then they said, hey, hold still and take a deep breath and let me call and make perfectly around you.And they're like, Yep, done.So these are 120,000 frontal X-rays of the chest done in a hospital setting where it's perfect radio graphic conditions.And then they have the follow up data to say this was a COVID pneumonia, this is tuberculosis, this is, you know, metastatic disease, whatever.
And they have the good quality data to train an algorithm with 120,000 images, gold standard ground truth images.But we can say cool, how long would it take us to get 120,000 images of a Labrador lateral thorax with ground truth data?
It would take us years.It would be a massive project.So then you say cool, for every single view, that's another 120,000 images.So now we have 3 views.We're already at 360,000 images that we need, right?Still 100 of 1 breeds of one breed of one breed in perfect rotation.
So as soon as you introduce rotation, you need another 120,000 views, right?And they're different, you know, it's column.It is.So essentially when we did this very quick math, it was like we need, I think it was like 15,000,000 X-rays to get the same parody of performance that the human radiology algorithms are doing.
And that's going to take years.So we have to either fix it with anatomical solutions or the best thing that GP vets can do is take gold standard radiographs as best as they can.So it basically goes back to what we were taught in day one of first year radiology.
Steve, that was epic.Thank you so, so much for making time and sharing your wisdom, and I look forward to catching up in another year or so to see where we're at.Before you disappear, I wanted to tell you about my weekly newsletter.
I speak to so many interesting people and learn so many new things while making the clinical podcast.So I thought I'd grant a little summary each week of the stuff that stood out for me.We call it the Vet Vault 321 and it consists of three clinical pills.These are three things that I've taken away from making the clinical podcast episodes.
My light bulb moments.Two other things.These could be quotes, links, movies, books, a podcast highlight, maybe even from my own podcast.Anything that I've come across outside of clinical vetting that I think that you might find interesting.And then one thing to think about, which is usually something that I'm pondering this week and that I'd like you to ponder with me.
If you'd like to get these in your inbox each week, then follow the newsletter link in the show description wherever you're listening.It's free and I'd like to think it's useful.Okay, we'll see you next time.