
RAG, LLM, GenAI: The meaning behind the acronyms
By now you are likely well acquainted with Generative AI (GenAI) and chatbots. GenAI includes things like Large Language Models (LLM) that are trained on large datasets. Paired with a conversation interface (aka chatbot), LLMs are designed to “speak” with its users. They have an impressive ability to generate human-like text based on the patterns they've learned from massive datasets of internet content, books, and articles. ChatGPT, of course, remains the most well-known example.
Because they're essentially sophisticated pattern-matching (or “probabilistic”) algorithms trained on historical data taking your data and pinging it back and forth in the cloud, the most pressing concerns are accuracy, bias, and security. Why?
Accuracy: They can quite breezily present incorrect information (often called "hallucinations").
Up-to-date knowledge: They don't inherently "know" about recent events or specialised information in a subject area.
Verifiability: They don't often cite sources for their claims by default.
Security: Standard models might leak sensitive information or generate harmful content.
All of which point to the main question around AI outputs; can you trust it?
The power of RAG
Portable uses Retrieval-Augmented Generation (RAG) to power AI chat tools, running iterative testing and pushing the boundaries of its capabilities. We use RAG because it overcomes most of the pain points around inaccurate and generic data sources. LLMs often lack niche or proprietary knowledge from their training data, leading to gaps and potential hallucinations as it tries to make up something that it wasn’t trained to generate. RAG addresses this by providing the LLM with up-to-date, trusted sources to ground its responses and improve accuracy.
When a user asks a question to a RAG chatbot, the system searches a curated set of documents it’s been configured to refer to for relevant information, then uses the retrieved information as context to guide the chatbot’s response, supplementing rather than replacing the model’s general knowledge. This approach anchors the chatbot’s responses in pre-verified information
To reduce answers that don't tie directly back to your trusted information sources, we make RAG systems work by:
- Searching your actual documents or knowledge bases — policies, legal texts, FAQs, or manuals.
- Retrieving the most relevant content.
- Citing exactly where the answer came from.
RAG is great, but it's not perfect.
Sometimes the RAG grabs the wrong information from your knowledge base. It might pull up something that seems relevant but actually isn't quite right for the question. Even with the right information in front of it, the AI can still get confused, especially with complex topics. It might misinterpret what it's reading, just like humans sometimes do.
So we can't just plug in RAG and call it a day. (Or call it a “pilot” and leave you to figure out how far it gets and what it gets wrong!) We need additional safeguards.
We don't believe in putting things into production we haven't tested to understand and mitigate risks for each use case.
We throw hundreds of test questions at our systems – everything from common requests to tricky edge cases. We even try to deliberately trip it up to see where it might go wrong.
We get subject matter experts (SME) to review the AI's answers.
On a recent project building a complaint pathway chatbot for the National Justice Project, Hear Me Out, we tried testing baseline or “vanilla” GenAI models without RAG, and had limited success. During testing an SME picked up that the baseline model was referring people to the wrong complaint bodies. Sometimes it was confusing complaint pathways based on keyword similarity - for example, “I saw an ad at the bus stop I found offensive like” led to a referral to the transportation complaint bodies instead of the advertising commission. Other times it used outdated information due to the cut-off date in its initial training data. The team used RAG to filter down to the relevant documents, which in this case were factsheets about the complaints bodies, and then implemented a reasoning step to make sure the Large Language Model analysed a necessary document thoroughly to pick up on outlier cases. There is no substitute for human judgment when it comes to nuance and accuracy, but with methodical testing that includes SME and design thinkers the AI process can more reliably provide relevant and accurate responses.

We design appropriate safeguards for sensitive use-cases — including:
- Human-in-the-loop workflows where needed.
- Controls and testing that scale to your organisation’s risk appetite.
- Collaboration with your risk, legal, and compliance teams to develop robust evaluation methods.
For Hear Me Out, we developed our system prompt to detect and refer people to legal advice if they were looking for more than complaint pathways, such as seeking financial compensation or legal remedies, as well as flagging more serious concerns around violence or coercive control. It was important to set up the website and chatbot so people understood the limitations of our self-service referral pathways, and understand that people with urgent situations should prioritise getting in-person help.
We also worked to limit the assistant to only consider information that was present in the source documents - fact sheets about each complaint body that explain what complaints they cover and how you can make a complaint. This provided context around specific enquiries while limiting hallucinations and overconfident responses, common LLM weaknesses. Hear Me Out went through months of rigorous testing and iterating still, though, because it’s only in accuracy and trust in the data systems that make us feel certain the user is experiencing the product in the safest possible environment.
Portable RAG projects
Sweep
Sweep empowers employers and employees to identify potential problems in the workplace, establish fair working conditions, and resolve wage disputes by simplifying pay monitoring processes and facilitating interactions with guidance on the latest employment law standards.
For more information or collaboration inquiries, visit www.sweepwages.com.au.
Hear Me Out
Hear Me Out is Australia's first AI-powered platform simplifying the complaints process. Developed by the National Justice Project, it guides users to appropriate complaint bodies, enhancing access to justice. By demystifying complex systems, it empowers individuals to address issues like discrimination and misconduct effectively.
Try Hear Me Out now: www.hearmeout.org.au
We know GenAI can never be 100% right. But we do know what’s at stake — and we build with that in mind.
For more information about what GenAI can do for you, check out: