GenAI Reality
The Good, The Bad and The Interesting
My phone suggests sprinkling AI (magic?) in place of my photos. A typical result is shown above. While the generated image on the right is interesting, I still prefer the real thing. Yet, GenAI has some utility, so what is it good for right now?
The Good: It Accelerates
Not only does using some sort of GenAI tool daily accelerate my work, creativity and general thinking, but also improved AI literally accelerates me (well, my car anyway).
I recently got a new car and am re-learning how to drive with assistance of the included multi-camera AI system. My dashboard is shown below. Of note, is that the system ‘sees’ whether or not I am staying in my lane, if I stray, the system warns, i.e. “put your hands on the wheel”, so it’s not a self-driving capability — by design. I set the number of car lengths and the cameras control the car speed to keep me that defined distance from any car that the system “sees” in front of my car by slowing my car down to maintain the observed distance between cars.
I can adjust the top speed with a up/down button on the steering wheel if/when conditions or requirements warrant speed changes.
Also, moving the steering wheel is made slightly stiffer when I set the car in ‘active’ mode, as a haptic cue to me that I am driving in this camera-assisted mode.
To stop the active mode, I tap the brake or gas pedal -or- click the button on the steering wheel. The system provides automation with feedback and multiple methods of human intervention.
Here’s a list of what I am doing daily with GenAI LLM tools:
- ChatGPT/Gemini: Validating technical approaches to cloud solutions by having contextual threaded ‘discussions’ with a virtual cloud architect colleague
- GitHub CoPilot: Quickly creating starter code samples for POCs in a variety of application and infrastructure scripting languages
- Gemini: Fixing code bugs faster than by Googling error messages and/or reading StackOverflow threads
- DiagramsGPT: Generating starter diagrams, often flowcharts to clarify my thinking and/or create documentation of requirements or processes
- DataAnalysisGPT: Quickly summarizing data from files such as CSV, JSON or VCF.
- Gemini for BigQuery: Generating starter SQL code dataset queries from English prompts.
- Subaru Active Mode: Driving safer.
The Bad: It Hallucinates
Actually I find the endless discussions (whining) about the ‘hallucinations’ of various GenAI tools boring. Of course, GenAI LLMs make stuff up, why is this surprising? They are PREDICTIVE non-deterministic tools.
In what world is a machine learning approach expected to be deterministic?
I amuse myself by asking GenAI tools ridiculous questions and then considering how and why the answers were generated. I consider this playful experimentation to be part of acquiring an intuition for the capabilities of a particular GenAI tool. I am looking more to understand what it does do rather than what it can or cannot do.
LLMs are based on human language and I am a linguist by training. Learning the culture and vernacular of a GenAI tool feels like visiting a foreign country — try to chat up the locals to learn a bit about the place.
Just like there are many dialects in a language, i.e. German has high German, low German, Swiss German, etc…. LLMs have chat tools, APIs, ensemble tools, etc… so the same model can and will answer differently depending on from which tool or interface you are working with it.
Shown below is the response to my prompt ‘Who’s the baddest MOFO LLM and why?’ from ChatGPT 4o, Claude and the OpenGPT 4o LLMs. Based on the responses, which LLM is the ‘stick in the mud’? Which one is the ‘know it all’? Which one is the ‘entertainment expert’?
The Interesting: Real Productivity
The most interesting (and useful) GenAI implementation I’ve been testing lately is the GitHub Copilot Workspace set of integrated GenAI capabilities.
To understand why it’s great, I’ll shown a simple example of Copilot Workspace in action. Navigate to a repo, click ‘< > Code’ and then fill in a task in the ‘Copilot’ pop up window. When you start with your task, GitHub Copilot captures the context — by analyzing the data in the associated Repo.
Next feedback is provided on what the LLM ‘sees’ as context. This is shown in the form of a human-readable generated Specification. The Spec include relevent (‘noticed’) current information and also proposed updates. Items can be manually added if important context was missed in the LLM-based analysis.
Then a proposed plan is generated, again with the ability to add more context, in case the LLM missed something important.
After that the changes are implemented by generating code. File diffs are automatically opened for review. You are provided with several options to integrate the proposed changes. The default is to ‘create a pull request’. However other options are available.
After you select your integration method, (I selected ‘Push to current branch’), then the LLM generates a commit message and extended description for your review.
The reason I showed this longer example is to illustrate what I have found to be a state-of-the-art implementation of tuned LLMs as part of a application. The customization of prompts, output and general excellence of the UI is key to usability AND human review and feedback at multiple points in the workflow.
At each step model output is provided in an expected format — human review is made easy to implement.
The Takeaway: Human in the Loop
As fully AI-managed self-driving cars are currently dangerous, relying on generated information from unreviewed LLMs for medical advice, doing math or anything else that is mission critical is silly at best and potentially dangerous in a number of ways.
However…
Applications that facilitate natural human-in-the-loop review at multiple phases of the generated information (whether camera-assisted active driving or coding CoPilots) are remarkably safe and useful.
Auto insurance companies don’t discount rate without very good data; Github CoPilot adoption velocity tells the real story too.