Deferred Interview with Jaime Obregón, the Superhero of Transparency
ForoCoches can be a “toxic” place, but occasionally they surprise us. They had the great idea to invite Jaime García-Obregón, a well-known activist for public transparency, to answer questions. I was fortunate enough to ask him some technical questions, and I’ve decided to share his answers in the form of a “deferred interview”.
JM: Hello, Jaime. I’ve been following you for a while, and I want to say that you are brave. My first question is about your “data stack”. When reviewing your GitHub, I see that you use Node.js, MongoDB, Selenium a lot… Do you use Big Data tools like Snowflake, ClickHouse, Spark? Is the volume of public data so large that it requires these types of tools? Do you use graph databases like Neo4j to relate companies or names?
Jaime: I don’t know which GitHub you’ve been looking at, but of all the tools you mention, I only use Node.js :) My technological stack is deliberately minimalist: many of my projects don’t even use a database as a backend. I don’t use frameworks; I like solving my own problems, not adding layers to the technology onion simply because it’s “cool”, trendy, or solves the problem that someone else thinks I have.
JM: How much data do you think is not online? Is it possible that they keep it or claim technical issues to not upload it? Are there tools that facilitate citizens to demand the publication of the same?
Jaime: Some very valuable datasets for fighting corruption are not published because it’s the business of a few. Often public data is left in a drawer because someone in the Administration thinks it does not have enough quality, or out of mere paternalism, or simply ignorance.
JM: Have you considered uploading everything to a Snowflake and sharing it? I think a DuckDB could work right now.
Jaime: None of those tools are needed. It is enough for Administrations to release data in structured formats and to update the catalogs on datos.gob.es. In my projects, there is a button to download all the data I have gathered, structured, cleaned, connected… Not much more is needed.
JM: Have you tested if the latest LLMs can help detect irregularities?
Jaime: I am working with AI and I am familiar with LLMs. They are extremely useful for extracting knowledge from specifications and annexes, for example. But not for detecting irregularities. Heuristic analysis is much more effective than AI.
From the responses, it’s striking that Jaime is an “old dog”. He avoids frameworks due to the problem of technical debt they bring and their underutilization (this sounds familiar to me with LangChain, hehe).
He also highlights the importance of datos.gob.es being truly a hub for public data, with APIs and databases ready to download.
Here I disagree a bit, and I think it would also be useful to exert pressure by unifying all that information in a particular but public hub.
Without further ado, I want to thank Jaime again for his participation and his work, which is very necessary! (unfortunately).