Everybody likes to talk about how necessary refsets are, but I’m not sure if the reason why is always known. One of the challenges facing implementers is how might they use what’s available and what else might they actually need?
So here’s what I think are two fundamental use cases – Exchange and Interface.
And for the purposes of this article when I say ‘refset’ – I just mean a subset of SNOMED CT.
In any messaging scenario – either to an intermediate repository or directly to the recipient – the structured messages (HL7v2, CDA etc.) require certain codes to go in certain fields. Some code systems express these constraints as basic tables. In SNOMED CT the simplest* (standardised) way to express a subset is by using a Simple type refset.
For example we might constrain acceptable values for a Diagnosis field to subtypes of 404684003|Clinical finding (finding)|.
The reason for this is that if a document author has the choice of all 300K+ concepts in SNOMED CT, and they want to record an instance of Staph Infection… There’s a risk they might select 65119002|Staphylococcus|, when the intention is really 56038003|Staphylococcal infectious disease|. The first concept is a actually from the ‘organism’ hierarchy; ie. Not a diagnosis! This difference might not be obvious to front line medical staff whose job doesn’t revolve around understanding terminologies.
When these constraints are clearly articulated, they can be used to validate the documents. So the audience for these refsets is really:
- Developers who want to comply with standard specifications; and
- Testers checking for conformance to standard specifications.
But NOT Clinicians.
Note! When using just a Simple type refset to express these constraints, the use of ANY post-coordinated expressions is immediately prohibited. Ideally, messaging constraints should be expressed as intensional refsets…
So a clinician now has some constraints on what they might pick for a diagnoses. However, there’s still a choice of at least 90,000+ clinical findings…
Depending on how much effort their software vendor puts into search functionality, having all these in a drop down box is going to lead to a pretty poor user experience. And not every interface uses dropdowns… Maybe radio buttons, check boxes or any other sort of UI control. And depending on the user’s expectations of the software, it’s probably they won’t want noise – concepts they’d never pick.
It’s unlikely a:
- Gynaecologist will need to record a 92428008|Benign neoplasm of testis (disorder)| diagnoses; or
- Radiographer is going to bill for performing a 12845003|Malaria smear (procedure)|; or
- Pharmacist will dispense an Extemporaneous preparation with 227036006 | Luncheon meat (substance)| as an active ingredient.
Putting “search solutions” aside, terminology implementation at the interface is may require subsets of the exchange subsets.
There’s nothing stopping software vendors creating these subsets for their own software/customers. But the challenge from a national (or even international) perspective is – What might you be able to provide to encourage adoption and where might you start?
A common call is for a ‘Top 100’ refset… For example a “GP 100” But what does this mean?
- Who’s 100? A GP Registrar in Alice Springs or a private GP in Toorak, Melbourne?
- 100 What? Diagnoses, Procedures, Prescribables, Reasons for encounter?
- What about things not in the 100? Free text? Choose “something close”?
I’m not dismissing the “Top 100” idea, but I think it’s purpose needs to be defined. For me, I think the Top 100 should represent a ‘starter’ refset.
- Implementers can add/remove from it as they see fit.
- Users may search beyond the 100+ if they need to.
There’s not necessarily any requirement to use a Top100. It might make things easier, but it should be up the software developer – to satisfy the interests of their customers.
The only requirement of a Top100, and any customisation of such, is that it MUST be a subset of an Exchange refset.
The provision of such “Starter refsets” needn’t come with any guarantees. If a developer chooses to customise the refset to their design; after the initial creation there doesn’t need to be any ongoing relationship to the “Starter”. Any changes are at their discretion – so long as their messages continue to conform to the Exchange refset.
What about the recipient?
Does it matter?
AFAIK, a document should be able to be rendered (for humans) by anybody, without even needing access to any terminology servers. Anybody can view a CDA document with their usual web browsers. Other message types might require specialised software, but all the information for a human to process the content – should be available in the document.
To do something smart like decision support or secondary analysis… that IS going to require some some work….
One of my main complaints (there are many) with Refaets is specifically the notion of an Exchange Refset. Here are two reasons:
1. SNOMED CT is DL-based such that if I use a give code, the it necessarily carries with it all its parent (ancestor) codes. So, if code X is in the exchange Refset, I should be able to use any descendant of X just as well.
2. Is a variant of 1 and involves post coordination. If |fracture of foot| is in the Exchange Refset, why can’t I record |fracture of foot| : |laterality| = |left|?
Exactly. As I said, you need an intensional (which admittedly, I realised courtesy of you). The enumerated lists approach on its own is flawed for exchange.
“Exchange refsets” could be interpreted to mean “a descendant of any concept in the refset”, which would cover post coordinated expressions which are subsumed by a member of the reference set. However most people assume they mean any concept id explicitly listed and no others. This is often not helped by documentation accompanying the reference sets not being explicit on this point…however you could argue it isn’t the reference set documentation but where it is being used (i.e. information model documentation) that should specify this sort of detail about the meaning of the reference set in the context it is being used.
While we’re whinging…personally I find the term “intensional reference sets” unhelpful – it seems to alienate most readers when the concept itself is fairly simple. This results in explaining the term rather than moving on to the interesting bits! I know you’ve used it because it is what is commonly used in the IHTSDO and is “correct”, however I think that the IHTSDO really should use a more approachable word for the average reader to avoid the associated issues.
With exchange (enumerated or intensional) reference sets I think all we are doing is trying to express the binding between the terminology and information model. As Michael points out doing that in an enumerated list will always be problematic, depending upon how that “binding” is interpreted.
The interface refsets are just a technique for making implementations easy and usable. As you point out, they aren’t essential but may be useful, and will be very site specific. Even though they will be site specific, the majority of the content will be usable from site to site, so they will be useful as a starting point even if they aren’t perfect for the site.
Other techniques may make them irrelevant – for example good searching functionality paired with a good collection of frequency of use data may be more effective. However you need to start somewhere if you don’t have that seed frequency data – this is as good a place as anywhere, provided you have a longer term plan.
I think the key message is these interface refsets don’t need to have such heavyweight management and centralisation around them. You still need to be careful – done poorly they can frustrate users and skew recorded data – but overall I see where you are coming from and agree.