Deterministic vs Probabilistic AI Systems

Since the dawn of the microprocessors and assembly language, 99% of all the applications we have been building and using fall in the category of Deterministic applications.

A Deterministic application is one where for a given input the output is always the same.

However with the dawn of Generative Ai, there will be a fundamental shift as a majority of the applications we will build and consume, will fall in the category of Probablistic Applications.

A Probablistic application is where for a given input, we could get broadly similar output but not necessarily the exact output for a given input.

The shift when moving from deterministic to predective applications

Deterministic Apps	Probablistic Apps
Precision Focus: Every part of the system works exactly as expected.	Embrace Variability: Accept and plan for a range of possible outcomes.
Error Handling: Consider all possible errors and handle them predictably.	Statistical Reasoning: Use probability and statistics to understand and guide the system’s behavior.
User Expectations: Users expect 100% accuracy and reliability.	User Expectations: Users should be prepared for variability and understand that the application learns and improves over time.

Embracing Uncertainty with Probablistic Applications

As Probabilistic applications incorporate randomness and probability theory to predict outcomes. They can produce different results under the same conditions, making them inherently uncertain. These applications have been prevalent for a while in complex systems like weather forecasting models, recommendation engines, data science, forecasting applications and now generative AI applications. Howerver the majority of us have not been building or using them unitl now i.e. the dawn of Generative AI applications.

Some of the immediate challenges we encounter when building Probablistic applications with Gen AI are:

Rethink our approach to UI & UX

UI & UX designers will need to re-think the interfaces and navigation flows for GenAI applications. Merely sticking a chat infterface to the right or left of an exiting App will not cut it. Users will to be subtly prompted for the right kind of input prompts, Input guardrails will need to be put in place to ensure users don’t try asking irrelevant or harmful questions.

Designers will need to come up with creative ways to incorporate Reinforcement Learning from Human Feedback (RLHF) into the user interactions, so that the LLMs outputs get optimised.

As we move into multi-model Apps, it will be interesting to see how do we ensure our GenAI apps meet accessibility guidelines.

Rethinking our approach to Testing

Until now, our entire approach to testing has been to provide the system with a given set of inputs and ensure we have a consistent output. We write our test cases and use automation tools like Selenium etc., to run automated tests for deterministic outputs.

This approach will not work for testing GenAI applications. We will need a new set of tools or new ways to validate outputs that involve some amount of variance.

A common approach would be to check for different patterns and metadata of the outputs to ascertain a higher probability or accuracy score for the outputs.

Another approach would be to build a collection of QE agents or bots that perform the testing steps and then use inference to validate if the output feels accurate. We may need to ensure that the agents use different methodologies and perspectives, akin to a multi-faceted review system. This could involve incorporating varied testing paradigms and introducing stochastic elements into the test procedures to mimic real-world unpredictability.

Further, integrating a feedback loop where outputs are continually reassessed and the testing criteria are dynamically adjusted based on previous results could be vital. This adaptive testing strategy can help in dealing with the evolving nature of GenAI outputs, ensuring that our testing mechanisms remain robust and relevant.

Moreover, collaboration with domain experts to understand the nuances of expected outputs, and using their insights to inform our testing metrics, can provide an additional layer of validation. This human-in-the-loop approach ensures that the AI’s outputs are not just technically sound but also contextually appropriate and meaningful.

In summary, testing GenAI applications requires a paradigm shift from traditional methods. We must embrace complexity, adaptability, and a blend of automated and human-driven processes to ensure the reliability and relevance of GenAI outputs in real-world scenarios.