Visualizing relationships between biomedical concepts - an introductory post
Published
January 1, 2026
Introduction
Ontologies are extremely useful in illuminating the interconnectedness in our world. More fundamentally, they help define what things are. This is important especially in science. Since I have been utilizing ontologies more for work, I wanted to see if work with them programmatically since it may be useful in he future. Online resources like Ontobee fulfill most of my needs already.
This post goes further and attempts to programmatically represent and visualize ontologies.
Data
I use the ontology data found at Ontobee, specifically the OWL files. The W3C Ontology Web Language (OWL) documents represent knowledge from ontologies to be exploited by computer programs.
I investigate the Ontology of Biomedical Investigations in this post. I use the {rdflib} package to read the file. The Resource Description Framework (RDF) is the syntax or the data model for which OWL documents are made machine readable.
Total of 112206 triples, stored in hashes
-------------------------------
(preview supressed for performance)
Goal
My goal in this post is to extract and visualize a small portion on the ontology. I will look at all the knowledge related to the term ‘protein’.
Analysis
Define Triples
In RDF, a triple or the combination of three things together (three-part statement) are fundamental units of the data, representing facts. The query language SPARQL is used to interact with this data.
# A tibble: 1 × 1
n
<dbl>
1 112206
There are 112,206 triples or facts in this RDF file. Though that number is printed from obi, we can see how counting the subjects (s), predicates (p), and objects (o) gives the same statistic.
Make Graph Data
To be honest, I’ve never used SPARQL queries or RDF data before so I relied in ChatGPT here to make a directed graph. Basically in order to visualize the graphical structure of the data, you need to extract nodes and edges. The term ‘protein’ was used to filter the queried data as well.
Code
# ---- Create Node and Edge Tables ----# ---- Query: Find all classes whose labels contain 'protein' ----terms_query <-" PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> SELECT ?term ?label WHERE { ?term rdfs:label ?label . FILTER(CONTAINS(LCASE(STR(?label)), 'protein')) }"terms <- rdflib::rdf_query(obi, terms_query)# ---- Query: Get subclass relationships among those protein terms ----relations_query <-" PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> SELECT ?child ?parent WHERE { ?child rdfs:subClassOf ?parent . ?child rdfs:label ?label . FILTER(CONTAINS(LCASE(STR(?label)), 'protein')) }"relations <- rdflib::rdf_query(obi, relations_query)# ---- Prepare nodes ----nodes <- terms |> dplyr::distinct(term, label) |> dplyr::mutate(id = term,label = stringr::str_replace(label, "@en", ""), # remove language tag if presenttitle = id) |> dplyr::reframe(term = dplyr::first(term),label = dplyr::first(label),title = dplyr::first(title),.by = id )# ---- Prepare edges ----edges <- relations |> dplyr::rename(from = child, to = parent) |> dplyr::filter(from %in% nodes$id & to %in% nodes$id)
Make Graph
he {visNetwork} package represents a graph beautifully! Select a term. in the dropdown to highlight a specific term from the filtered data in the graph.
While I’m not proud of how much I relied on ChatGPT, the results show me how powerful this can be for extracting and. visualizing knowledge. I can identify terminology and view relationships pretty easily. There’s alot that can be enhanced like the query or labels on the graph, but this was a great first stab at the goal.