Open Science Stories Podcast - Sharing source code

I had the unique opportunity of recording an episode for a podcast about Open Science hosted by my colleague Heidi Seibold. I talk about how making research artifacts open source can help professional and aspiring scientists to better understand your work. Here’s the transcript and my thoughts on it.

But first, listen to the actual episode:

Behind the scenes

I got involved in the podcast when Heidi advertised it on our internal communication platform. I thought I could talk about my efforts in ensuring that the experiments in a paper I had recently published could be easily reproduced, so I contacted her without second thoughts. She suggested I talk about open source in general, and promptly shared a document with guidelines and suggestions to teach me about the format of the podcast and help me get started with drafting the episode.

I was a bit slow in coming up with a good story, and my first attempts were rather abysmal. She suggested I follow the “Show, Don’t Tell” mantra, which consists in telling stories and building images in the readers’ mind to get your idea across. This is rather the opposite to my usual way of writing, which could be described as “Tell, Don’t Show”. I was also a bit reluctant as that involved opening up and sharing real events and feelings that happened during a tense moment of my life.

A few months later Heidi had the great idea of organizing an online collaboration event to help me and others to make progress with our story, giving us feedback privately and answering our questions. This event was extremely useful and within a few days I was able to finish a draft, which we improved and finalized during the next weeks.

I did the recording myself and send it to Heidi so she could edit it and create the final audio for the episode. Unfortunately the quality of the recording from my earphones’ microphone was really not good enough for this, so Heidi very kindly suggested I borrow her professional microphone. Reading aloud turned out to be more difficult than I thought, initially I would run out of breath in the middle of sentences or trip over my tongue. I also had the cold, emotionless intonation of a robot. It took some practice to get a decent recording.

Overall, the process was very well organized and I felt supported throughout. Heidi was always kind, cheerful and encouraging, and she worked like a professional who has been doing this forever. I am just a bit sad that it took me so long, more than six months, to write the story and do the recording. I also feel that my narration could be so much better, but apparently many people dislike hearing themselves talk.

Apologies to any French whom I may have offended.

Transcript

(Intro by Heidi)

After a disappointing stint as a consultant I was home alone, facing a fundamental question that everybody has to answer sooner or later: “what should I do with my life now?” I have been interested in artificial intelligence since I started programming computers as a kid, and I had been studying about it at university for years, but at that time I had not yet found a topic that I was super interested in. So I doubled down on reading scientific publications on certain topics that tickled my curiosity, trying to go deep into the subject. I started from a stack of papers that I used to read on the metro going back and forth from the office, but did not really have the time or energy to dive too deep into.

Just like reading a recipe is not enough to learn how to make good pancakes, I believe that the ultimate test for whether you really understand a concept is to recreate it from scratch. You will always stumble upon some tiny bit that is hidden from sight and looks irrelevant, but is actually holding the whole thing together. This is why I spent a lot of time sitting at my kitchen table, trying to re-create AI methods that I found particularly intriguing.

Unfortunately, I quickly became overwhelmed by all the concepts in statistics and mathematics that I was expected to know, and which are usually hidden in 800-page long books whose titles always start with “Introduction to”. To make things worse, scientific articles are written by experts for experts, and I found many of them almost indecipherable as they try to condense those books into a dense half page of gibberish. Inevitably, my programs would get stuck doing dumb things, far away from the wonders described in the articles.

This pushed me to switch my focus to the basics that I was obviously lacking, seeking advice in some of the holiest texts in artificial intelligence. But it can be quite difficult to see the forest for the trees during self-directed study. It was like going from an expensive French restaurant serving tiny portions to a free-for-all buffet with more food than you could possibly imagine. As I found out after months of feasting, some rather abstract concepts happen to be quite simple to apply once you figure them out. It is a beautiful feeling, similar to being pleased with the order in your room after spending an afternoon tidying it up, or feeling relief after a storm clears up and the sun shines again.

After starting a PhD I had the chance to discuss many articles with my peers and I realized that complaining about missing details is actually fairly common. Sometimes we would spend a lot of time thinking at how a certain technique was applied and discussing why the findings of the article were or were not good based on some crucial missing piece. After writing a few articles myself, however, I see why it is not appropriate to share all the details there. Since different people find interest in different things, it is not easy to write a comprehensive and coherent story that satisfies everybody and bores nobody at the same time. What I chose to do instead, is to openly share the source code which is the basis of my research and make it easy for others to get started using it in their own work. This only requires a minor commitment from my part to tidy it up and add some instructions, and it really feels great to know that others are actually using what I did.

Let’s see what we can learn from this story! People who work with computers must fix and account for every little detail before feeding a program to the computer, even though not everything is mentioned in the final article. Sometimes, even experts cannot understand exactly what was done only by reading a paper, so you can imagine how lost beginners are. But scientific articles are not the right venue to share this kind of thing. In research, sharing the source code can make it much easier for others to understand your work on a deeper level and to build new things on top of it. It only requires a very small effort from your part, but it can save days of struggles to your fellow scientists.

(Outro by Heidi)