Managing the loads of information we have available will be more and more important over time, and pushed over more people, even if they’re not acknowledging it.
One of opportunities in this scenario is what I’m calling information engineering, that includes ways to inquiry an information platform.
Why we need to look at information from an engineering perspective?
Simply because we already have lots of lessons learned from the last century, specially last decades of proliferation of database and data store systems.
An engineering mindset applied to this, with the help of a scientific approach, can make the transition to a fully omniscient information era sustainable. Being consistent in the practices of dealing with information is what will make everything possible. Consistency and its counterpart reliability are key marks of engineering.
This knowledge is not something to disregard today, with all the hype around machine learning, to in a few years or in a decade, we figure out that is just the same as before, that we reinvented the wheel again - that we distracted ourselves from the real questions on how we adequately manage information - from the source, through the increased importance of storing it, through selecting and promoting the data that enables a jump in human prosperity, to retrieving it in consistent, reliable way in our every day lives.
How we are looking for information in the last decades?
A great example of popular machine format to declare what kinds of information you want is SQL. It’s highly successful because it was designed with a schema-full mindset, where you engineer a structure to tame the information into tables, partitions, databases and so on.
Another one that is becoming extremely important in the last few years, even more this year of 2024 are the LLMs, or large language models. It also have certain ways to consistently retrieve the data you want. One prominent word for querying data is being called prompt engineering, even with (many, many) doubts that aligns with all engineering practices [even worse than software and data engineering].
Let me argue why both can be compared, because looking this way can provide the angle we need to look to embrace the opportunities of this new era.
LLMs are being trained on troves of scrapped data, data that has some formal structure [^1], like words or images, but unlike databases that are called so far schema-full (relational dbs) or schema-less (document dbs), it’s lower level. The important here is: we are finding ways to make it emerge information from it, consistent (not perfectly, but improving) with the context of the query we made - the question we asked the information system to answer.
On the other hand, we already are heavy users of relational databases since the 1980’s, and document databases since 2000’s, from the idea that we can query data in a consistent way - relational algebra - that became popular with SQL, and influences to this day how we ask our data storage to show results as close as possible to what we want to answer.
Then what? Where is the opportunity itself?
I’m arguing that both have structures, and for that an engineering mind is required. The only difference for now is that the most recent one is emerging from the machine itself, from simple tokens of information married to clever usage of linear algebra, instead of higher level structures we built to tame data, like the schemas, tables, databases powered by relational algebra.
Another place to look are the daily routines we have, writing thousands of lines of strange languages called programming languages, scripting languages, etc; reading even more every single day. Inserting these novel ways to filter out the noise from the data, writing in more concise ways, reading around in a energetic efficient way for our brains is also important.
Ok, where next?
So finding the places where all the above touch each other is key to see where the currently called software developers, software engineers, programmers; anyone using current paradigms to tell a machine what to do in useful ways - all information alchemists - can also emerge with the dawn of this new chapter for the information era, raising the bar, not becoming part of the information noise.
Follow the next chapter here, where I will show angles that we are either overlooking or under looking.
I will keep grammar errors on purpose, because I’m not a LLM
[^1] Sequences of bytes that have well know ways to be stored and retrieved into computer-human interfaces, like LCD displays, keyboards and so on. It has a “lower level” structure, like bytes aligned in certain ways, that are invisible to the average user, but can form what we call today tokens, that are basically a short sequence of float point numbers, being read in specific ways that are extremely consistent in their syntax.