The String Theory of Language: “Strings in, Strings out”

Dr. Zunaid Kazi
Artificial Intelligence in Plain English
11 min readApr 15, 2024

--

Can machines truly understand language?

Natural Language Processing (NLP), a field of artificial intelligence that focuses on enabling machines to understand, interpret, and respond to human language, has grappled with this question throughout its history, seeking to transform strings (text) into other meaningful strings (text).

Over the decades, I’ve witnessed¹ each phase of NLP’s evolution, from early rule-bound systems to today’s mind-bending LLMs.

With each step, we’ve gotten remarkably better at manipulating strings. At its core, NLP is all about “strings in, strings out.”

Have you ever wondered how machines understand language? Join me as I explore the major milestones in NLP history using the “string in, string out” lens.

Early Symbolic NLP (1950s-1990s)

The dawn

In the early days of Natural Language Processing (NLP), scientists focused on handcrafting language rules for logic and grammar² ³.

Think of it like a chef carefully following a detailed recipe. These early systems, breaking down sentences into logical components, showcased NLP’s essence: converting text (“Strings In”) to meaningful representations (“Strings Out”).

Example

  • Input (Strings In): “The quick brown fox jumps over the lazy dog.”
  • Processing: A sophisticated rule-based parser analyzes the sentence, recognizing grammatical structures and relationships.
  • Output (Strings Out): An enriched logical form that identifies not just the action and the actors but also the qualities and relations: (JUMP (AGENT: FOX (COLOR: BROWN, SPEED: QUICK)) (OVER: DOG (STATE: LAZY)))

Limitations

These systems struggled to handle the vast complexities and ambiguities of natural language.

Consider the phrase “fruit flies like a banana.” This innocently simple phrase poses a considerable challenge for a symbolic system:

  • Does “like” refer to preference (do fruit flies enjoy eating bananas)?
  • Or is it comparative (do fruits fly like bananas)?

This ambiguity demonstrates the limitations of purely rule-based approaches.

The brittleness of these rules and difficulty scaling knowledge bases for real-world applications led researchers to explore a fundamentally different paradigm: teaching machines to learn patterns directly from data.

Confronted with these challenges, researchers began to seek alternatives, leading to the advent of statistical NLP.

My story

My foray into AI and NLP began in the early 90s, smack dab in the middle of the second AI winter. Freshly arriving in the US, I took my first course in AI and NLP and was smitten with the idea of endowing machines with intelligence, particularly the ability to converse.

My Ph.D. dissertation involved developing a voice- and gesture-based language, applying symbolic NLP principles to enable individuals with disabilities to control a wheelchair-mounted robot. It was challenging work, pushing the boundaries of what the technology could do at the time, but that feeling when it worked… that’s when I knew I was on the right path and have not looked back since.

(AI winter or not, I persisted and came out the other side with a PhD in AI and NLP.)

The Rise of Statistical NLP (1980s–2010s)

Data takes center stage

If Symbolic NLP was about relying on hand-crafted rules, the Statistical NLP era began with machines learning these rules (patterns) directly from the data. These models analyzed vast collections of text (called corpora) to learn statistical relationships between words and phrases, enabling them to predict language patterns⁴. Computers started to learn that certain structures and words are more likely to occur together.

The “Strings In, Strings Out” concept still applied, with input text transformed into sequences of tokens (words, phrases). Models then learned to output tagged sequences (e.g., part-of-speech tags), translations, or probabilities (e.g., likelihood that an email is spam).

Example

  • Input (Strings In): “I found the plot confusing and the characters uninspiring.”
  • Task: Sentiment analysis to gauge review tone.
  • Processing: A statistical model evaluates the sentence, analyzing word patterns and sentiment indicators.
  • Output (Strings Out): “Negative” sentiment classification, with a confidence score of 85%.

Limitations

Despite its successes, statistical NLP had its drawbacks. It struggled to understand how different parts of a long sentence related to each other, and it still took a lot of effort to handcraft the linguistic data (features) needed to prime these models.

This need to understand better the deeper relationships within language ushered in the era of deep learning and neural networks, where machines could learn more complex linguistic relationships and not just patterns from data.

My story

With my Ph.D. in hand, I landed at the storied IBM Watson Research Center. It was the era of statistical NLP, and I was eager to dive into its data-driven world.

One of my projects involved identifying concepts and entities tangled in mountains of documents. Is the John Smith in this document the same John Smith in the other document? What about J. Smith in the third document? This groundbreaking work led to several papers⁵ ⁶, a patent⁷, and the thrill of seeing statistical models uncover hidden connections.

It cemented my belief that NLP was the future, and I was determined to be a part of it.

Neural Networks and the Deep Learning Revolution (2010s–Present)

NLP gets a brain

While statistical NLP enabled systems to learn language patterns from data, deep learning and neural networks gave computers a “brain,” allowing them to understand incredibly complex linguistic relationships and not just patterns from data. Instead of sifting data for patterns humans identify, neural networks build their own complex internal language models.

At its heart was this: neural networks, with their layers of interconnected “neurons” (much like how the human brain is structured), turn words and sentences into numerical data (mathematical vectors), allowing them to capture complex patterns and generate complex responses.

Neural networks⁸, along with the Deep Learning⁹ revolution, led to incredible breakthroughs in NLP, enabling advancements in machine translation, sentiment analysis, and natural language generation. The “Strings In, Strings Out” idea still holds, but what’s happening inside the machine is way more sophisticated. Input is still string (raw text), but the output is beginning to be diverse: translations, classifications, summaries, and question-answering.

Limitations

Like their statistical predecessors, early neural networks struggled with really long sentences. In simpler terms, the model had difficulty understanding how the meaning of words at the beginning of a long sentence related to those at the end. Innovations such as RNNs later resolved that problem to a certain extent, but challenges persist.

The demands for computational power and vast datasets for training continue to pose challenges, along with concerns about the “black box” nature of these models. This lack of transparency in how these models make decisions raises questions about trust and brings a pressing need for explainability in NLP.

Example

  • Input (Strings In): “What is the capital of France?”
  • Processing: A neural network processes the question, accessing a vast knowledge base.
  • Output (Strings Out): “Paris”

My story

Neural networks and deep learning were the next frontier, and I was in the thick of it. During this time, I worked on a project for the US National Cancer Institute that felt deeply important: Could we teach machines to infer cancer stages and grades from the complex language of radiology and pathology reports? We could.

The successful outcomes not only advanced NLP but also demonstrated the power of NLP to transform healthcare by extracting crucial insights from complex medical data.

Attention is All You Need (2017-Present)

Attention, please

While Neural Networks and Deep Learning were significant steps forward for NLP, they still suffered from a short attention span, having difficulty understanding interconnected relationships within long sentences and texts.

This key challenge in teaching machines the complexities of language is where the 2017 paper “Attention Is All You Need”¹⁰ brought forth a paradigm shift. The introduction of Transformers and their attention mechanisms enabled models to focus on (pay attention to) the most important words and connections within a sequence of text.

Attention transformed the “Strings In, Strings Out” dynamic of NLP, dramatically improving machine translation, summarization, question-answering, and a host of other language-based tasks.

Example

  • Input (Strings In): “The agreement on the European Economic Area was signed in 1992.”
  • Task: Translate the sentence into French.
  • Processing: Attention at Work:
  • The Transformer model breaks the sentence into word tokens.
  • While encoding this sentence, the attention mechanism focuses on the English word “agreement,” determining it relates strongly to the French word “accord” for accurate translation.
  • Attention also highlights the year “1992” for correct translation.
  • Output (Strings Out): “L’accord sur l’Espace économique européen a été signé en 1992.”

Limitations

However, these advancements are not without their challenges. One major concern is the immense computational power required for their operations.

Moreover, there is a potential for misdirection. Although attention generally improves a model’s focus, it can sometimes lead to focusing on the wrong parts of a text, particularly if the data contains errors or noise

My story

This time, my story is more personal. I’ve always had a literary streak (at least I think so)—I dabble in short stories and poetry — and I wanted to see how transformers could help with language expression, particularly poetry. Could I teach a system to generate poetry in the styles of Wordsworth, Byron, and Shelley? As I fed my system examples of their work, it was amazing to see how the model generated new verse — echoing the Romantic poets’ rhythm, nature themes, and even a dash of their melancholy.

LLMs and Generative NLP (2018-Present)

New Frontier

Building on the power of the Transformer architecture, Large Language Models (LLMs) mark the next major milestone in NLP¹¹. Imagine a model that has read not just books but whole libraries.

Trained on truly massive datasets containing billions (and sometimes trillions) of parameters, these models demonstrate remarkable text generation, understanding, and task completion abilities.

This massive scale, paired with advancements in training techniques, allows LLMs to have a contextual understanding of language unlike anything seen before. This enables them to generate coherent, consistent, and seemingly “sentient” responses¹².

It’s a shift that challenges the core of the “Strings In, Strings Out” paradigm — is something more at play?

Example

  • Input (Strings In): “Write a poem about a sunset on the beach.”
  • Processing: A large language model (like GPT-4, Gemini, or Claude) leverages its understanding of poetry, imagery, and sunsets.
  • Output (Strings Out):
    “The sun dips low, a fiery blaze,
    Painting clouds with golden haze.
    Waves whisper secrets to the shore,
    As twilight paints the ocean floor.”

Limitations

Despite their impressive capabilities, LLMs have flaws. They have the potential to “hallucinate” — generating text that’s factually wrong, misleading, or even reflects harmful biases from the data they were trained on.

Furthermore, the immense computational power required raises serious environmental concerns. Finally, LLMs remain somewhat of a black box, making it difficult to understand how they arrive at their results. This limitations highlight the need for ongoing research to ensure accuracy, reduce bias, boost efficiency, and improve explainability.

My story

In just a few short months, LLMs have become integral to both my professional and personal life.

They have significantly transformed how I work as the CTO of Infolytx, an AI solutions company. I use them for everything — streamlining client interactions, improving internal processes, and most importantly, delivering cutting-edge AI solutions to our clients.

But the impact goes beyond my professional life; LLMs have dramatically enhanced my productivity. This article itself is an example. I used LLMs as my editorial assistant to refine my language, identify inconsistencies, fix grammatical errors, and suggest more impactful phrasing. This process has definitely improved the article’s flow and readability.

Beyond “Strings In, Strings Out”

Throughout my career, I have watched NLP evolve from rules to representations. Yet, at its core, it has always been about making sense of text and meaningfully transforming strings to strings: the “Strings in, Strings out” paradigm.

The question is: Are we seeing a fundamental shift in this paradigm? The future may hold something different.

A world beyond text?

Close your eyes and imagine a future where machines learn and understand the world through experience, not just by reading words. They interact with simulated environments, collaborate with robots, and even interpret visual data. This allows them to connect language with action and perception in entirely new ways.

Imagine a world where machines transcend mere text manipulation. They reason like humans, adapting to uncertainty and nuance, intuitively understanding the context behind our words.

Imagine a world where machines are diagnosing patients alongside doctors, tailoring education to each individual student, and co-authoring novels, screenplays, and musical compositions with humans.

Such a fundamental shift hints at the possibility of creating a machine that feels profoundly more… human.

What would that mean for the future of creativity, empathy, and the very essence of what makes us human?

References

  1. Kazi, Z. (n.d.). Zunaid Kazi. Retrieved from https://zunaid.kazi.net/
  2. Chomsky, N. (1957). Syntactic Structures. Mouton.
  3. Winograd, T. (1972). Understanding natural language. Cognitive Psychology, 3(1), 1–191. https://doi.org/10.1016/0010-0285(72)90002-3
  4. Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., Mercer, R. L., & Lai, J. C. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19(2), 263–311.
  5. Kazi, Z., & Ravin, Y. (2000). Who’s Who? Identifying Concepts and Entities across Multiple Documents. In Proceedings of the 33rd Hawaii International Conference on System Sciences — Volume 3 (pp. 1–10). Hawaii, January 2000. https://doi.org/10.1109/HICSS.2000.926686
  6. Ravin, Y., & Kazi, Z. (1999). Is Hillary Rodham Clinton the President? Disambiguating Names across Documents. In Proceedings of the ACL 1999 workshop on Coreference and its Applications (pp. 1–8). Maryland, USA, June 1999.
  7. Kazi, Z., & Ravin, Y. (2002). Method for identifying concepts and entities across multiple documents. U.S. Patent №6,438,543. Retrieved from https://patents.google.com/patent/US6438543
  8. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., … & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97. https://doi.org/10.1109/MSP.2012.2205597
  9. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
  10. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998–6008).
  11. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics. Minneapolis, Minnesota. https://doi.org/10.18653/v1/N19-1423
  12. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165

Acknowledgment

All images included in this article were generated by DALL·E, an advanced AI image generation model developed by OpenAI.

In Plain English 🚀

Thank you for being a part of the In Plain English community! Before you go:

--

--

Technologist/Entrepreneur — Natural Language Processing, ML, and AI. Proud husband and father. Unapologetically arrogant and liberal. CTO at Infolytx.