The long and the short of IT

“Have no fear of perfection - you'll never reach it.” Retrieval-Augmented Generation

Mar 08, 2024

Who are the top tech companies of 2025 & 2026? Klarna has been in the news. It is gushing about saving US$40m as its ChatGPT means c700 fewer customer service agents. "Humans are not perfect" reveals CEO Sebastian Siemiatkowski. AI is also not perfect but its “mistakes are less severe and less common”. We enjoyed a scroll through Trustpilot, Comparably and Glassdoor to see what customers and staff think. But the news is of wider interest; (i) illustrating that GenAI is moving from training to inference which signposts an evolutionary next-step. (ii) Reminds us that mistakes are an inherent part of GenAI, after-all LLMs know how words relate statistically, not what they mean. LLMs deal in plausibility, not fact. To fix this corporates use human review, create knowledge bases which define their key concepts or RAG. LSE has RAG companies; Capita, Experian, London Stock Exchange, Moneysupermarket.com, Pearson, Relx, S4Capital and Trustpilot. RAGs to Riches upside is not reflected in current valuation (below). Investors do not appreciate the potential - neither do the companies. Who blinks first?

The problem that RAG Addresses

If you are using LLMs for fact-based answers, you must fact-check (apologies, Mr Siemiatkowski). Today’s LLM technology does not help developers (OpenAI, Google, Meta, Anthropic, Hugging Face etc) to extract rogue learnings from an already-trained model. In addition there is less transparency around the training data. To be sure, you can have keywords and phrases on blacklists, you can back-check model responses for instances of specific text bias. Beyond these workarounds companies are developing solutions based human review and/or on Retrieval-Augmented Generation, RAG (RAG defined).

RAG thinking dates to a December 2020 whitepaper by Meta. Meta came up with a framework called retrieval-augmented generation designed to give LLMs access to information beyond their training data. RAG would allow LLMs to build on a specialised body of knowledge to answer questions in a more accurate way. Thereby RAG would enhance the accuracy and reliability of GenAI LLMs with up-to-date facts and answers the need for trusted data in factual, data-centric decision-making processes. RAG has additional benefits. The concern that while LLMs generate response based on their training data, which is mostly from the Internet, to get answers from non-public data (such as in a corporate setting) the LLM must be upgraded. This is what RAG does and using them means that there is no need to re-train an LLM (expensive – and requiring more data), and hallucinations are reduced as answers are source from the proprietary data instead of the LLM directly. By grounding an LLM on a set of external, verifiable facts, a RAG-based model has fewer opportunities to leak sensitive data.

RAG has two phases: retrieval and content generation, as follows:

Retrieval: Algorithms search for and retrieve snippets of information relevant to the user’s prompt. The facts could come from indexed documents on the internet or a from a narrower knowledge pool inside a corporate. The answer is then passed to the language model.
Content generation: The LLM draws from the prompt and its training data to synthesise an answer which could be (say) passed to a chatbot with links to its sources.

RAG fill a large gap and makes LLMs more profitable. We do not hear much about RAG because we are at the test stage for GenAI adoption and the volume answers bit (inference) comes later. Inference is where getting the wrong answer equals problems. Inference is the process of drawing conclusions about the state of a data set based on the observed data. Without inference, a machine would not have the ability to learn. It is how you run live data through a trained AI model to make a prediction, or solve a task, achieved through an “inference engine” that applies logical rules to the knowledge base that then evaluates and analyses new information. There are two phases:

Training phase where intelligence is developed by recording, storing, and labeling information. If, for example, you are training a machine to identify boats, the machine-learning algorithm is fed with many images of different boats the machine can later refer to.
Inference where the machine uses the intelligence gathered and stored in phase one to understand new data. In this phase, the machine can use inference to identify and categorise new images as “cars" despite having never seen them before. Inference learning can be used to augment human decision making. For maximum reliability, energy efficiency, privacy and to minimize latency.

In a usage context

Think of this from the perspective of how software developers are building AI applications. The first design pattern is a special type of query called a prompt which is a natural language text describing the task that an AI should perform. This is sent to a router, which is a classifier that categorises the input. A recognised prompt routes to small language model, which tends to be more accurate, more responsive, and less expensive to operate. If it is not recognised then the LLM handles it. LLMs much more expensive to operate, but successfully returns answers to a larger variety of queries so there can be a balance of cost and performance.

Unintended consequences

Companies need to carefully determine which LLMs are best suited for their needs. Firms using LLMs to provide their clients or employees access to proprietary content should integrate RAG which will allow individuals access to the firm’s data, but in a vetted way with access based on specific permissions.
The systems are less transparent. Note GPT-4 uses a paradigm where pre-training uses both public data and "data licensed from third-party providers" to predict the next token. After this the model is then fine-tuned with reinforcement learning feedback from humans and AI for human alignment and policy compliance
LLM makers are being sued over their use of copyright materials in training their generative AI models. We acknowledge that the problem does not disappear with RAG community. While all are reminded of the 2023 actor and writer strikes in Hollywood included concerns that AI would potentially infringe on ownership of their images and written content – remembering that one of the advantages of GenAi is skills in Creative writing where it can be unbound by factual constraints.
This is a global election year. Cyber criminals are among the most ardent early adopters of GenAI. This year should see the bad actor/states in over-drive spreading deep fakes and political misinformation more rapidly with generative AI. In addition the content farms who create high volumes of fabricated news in order to generate programmatic ad revenue will also be among the early adopters and advertisers will need to more proactive approach to reduce the economic incentives in the adtech market.

How this market might evolve

While today human intervention is a guardrail against GenAI ‘wrongs’, our view is that the human interference in inference may be only temporal. The GenAI market model is likely to develop so that over time LLMs get bigger (see figure below) and smaller, with smaller reflecting specific use cases in particular vertical markets. LLMs have been getting bigger, because is better (accuracy) but so too this may not necessarily continue. Smaller models will focus on specific use cases and these winners may well be companies who today have little obvious connection with the current GenAI wave.

In this our thinking goes to the Internet wave where the later stage sees new companies (the natives, the ‘enabled ones’) do new-new things as industries experience dis-intermediation; some are dashed, others are re-born, more crumble under the weight of technical debt. For example, Asos and Ocado were both founded in 2000, long after the birth of the underlying technology (ditto, THG: 2004, Trustpilot: 2007 etc), and these were not seen as ‘tech’ companies. That internet wave gave rise to new e-comm and datacentre related industries, and GenAI will similarly create new industries that either do not exist, or are in their infancy, today. The best has yet to come. To get the ball rolling here we canter through our LSE RAG companies with a ‘cut-out-and-keep’ summary table in the data section.

LSE RAG cohort

Capita

Capita has been battered over the past years. Capita offers consulting, transformation and professional delivery services, drawing on its practical experience, and provide digitally enabled services and solutions. Capita deserves inclusion in our RAG list because of it deep domain expertise in the UK Public Sector where it is a Strategic Supplier. Its vast experience in outsourcing gives it domain knowledge of Public sector processes. Capita has been in the doldrums for years, but the appointment of Adolfo Hernandez puts a technologist and one very familiar with AI technologies as CEO. Capita has two divisions: Capita Public Service and Capita Experience in the UK, Europe, India and South Africa.

Capita Public Service division provides applied digital transformation and Business Process as a Service to improve the productivity and citizen experience of public services. The division is focused on education and learning; local public services; health and welfare; defense, security and fire; justice and policing; transport and central government.
Capita Experience Division is engaged in designing, transforming and delivering experiences for the life moments. Its serves various industry verticals: telecoms, media & technology; retail & consumer products; energy and utilities; government and transport and financial services.

BPO is a difficult market. At contract renewal customers demand more more for less, once that was addressed through scale, once through offshoring, one through process re-ingineering. At final (6/3) results Mr Hernandez announced that a June CMD will outline the growth strategy. Knowing Mr Hernandez we expect mention of GenAI for process improvement internally at Capita and externally with customers.

Website & Latest results webinar

Experian

A global information services company, Experian spans Business-to-Business and Consumer Services. The company is a RAG because it builds and manages large comprehensive databases. It collects, sorts, aggregates and transforms data from tens of thousands of sources, to provide a range of data-driven services. Experian has databases and third-party information, including client’s own data, which it combines to create and develop analytics, predictive tools, software and platforms. Its services help its clients improve the consistency and quality of their business decisions, in areas including credit risk, fraud prevention, identity management, customer service and engagement, account processing, and account management. It provides credit education, identity monitoring and fraud prevention services directly to consumers in the US, Brazil, United Kingdom, South Africa, Peru, Colombia and India. This includes free access to their Experian credit report and score, and useful online educational tools.

Website & Latest presentation

London Stock Exchange

LSEG is a diversified global financial markets infrastructure and data provider. The company has three divisions. The company deserves its RAG inclusion because of the wealth of data points on trading patterns. At final results (29/2), CEO David Schwimmer commented that the first new products from the Microsoft partnership would debut by Summer. With Microsoft as a partner and shareholder, LSE has a seat at the table at the heart of OpenAI development thinking and will thereby be aware of its RAG opportunity. The LSEG has three divisions, as follows:

Data & Analytics division provides information and data products, including indexes, benchmarks, real-time pricing data and trade reporting and reconciliation services. Data & Analytics division includes the core Refinitiv business and the FTSE Russell businesses. FTSE Russell is a global provider of index and analytics solutions.
Capital Markets division provides venues/platforms for access to capital through issuance and secondary market trading for equities, fixed income, and foreign exchange (FX). Capital Markets division includes the London Stock Exchange, Tradeweb, FXall, and Turquoise businesses.
Post Trade division provides clearing, risk management, capital optimization, and regulatory reporting solutions. Post Trade division consists of Over-the-counter (OTC) Derivatives, Securities & Reporting, and Non-Cash Collateral.

Website & Latest results

Moneysupermarket.com

Moneysupermarket.com has a wealth of information about consumer spending patterns. It is a trusted source and in 2023 it helped B2C users save an estimated £2.7bn on their household bills. The company operates price comparisons for money, insurance and home services through its websites. Its segments include Insurance, Money, Home Services, Travel and Cashback.

Insurance: Services include the customer completing transactions for an insurance policy on the provider’s Website, its website or a telephone call.
Money: Customers completing transactions for money products, such as credit cards, loans and mortgages on the provider's Website.
Home Services: Customers completing transactions for home services products, such as energy and broadband on the provider's Website.
Travel: Customers completing transactions for travel products on the provider's Website or its Website.
Cashback: Customers completing transactions for retail, telecommunications, services and travel products with a cashback incentive on the merchant's Website.

This is a technology-enabled business which it says uses centralised data, common tech, and is a scalable platform. The tech stack is based on ‘Decision Tech’ its UK price comparison platform which creates digital experiences that connect users with the right products and services when they need them.

Website & Latest results deck

Pearson

Pearson is a learning company with its principal operations in the education, assessment and certifications markets. Pearson provides digital content, learning experiences, assessments, qualifications and data in the learning market. Inclusion in our RAG cohort is due to the deep domain expertise in learning, and specifically ‘all of life’ learning and thereby the company’s ability (admittedly unproven) to have an actual LTV. The operating divisions are as follows:

Assessment & Qualifications: Pearson VUE, US Student Assessment, Clinical Assessment, UK GCSE and A Levels and International academic qualifications and associated courseware including the English-speaking Canadian and Australian K-12 businesses.
Virtual Learning: Virtual schools and online program management.
English Language Learning: Pearson Test of English, Institutional Courseware and English Online Solutions.
Workforce Skills: BTEC, GED, TalentLens, Faethm, Credly, Pearson College and Apprenticeships.

Website & Latest results deck

Relx

Relx, once Reed Elsevier, under CEO Erik Engstrom has pivoted to information-based analytics and decision tools, and hence its inclusion within our RAG listing. Final results (15/2) were pleasing, 2023A revenue £9.2bn, £8.6bn Y/Y with tech and data analysis being 83% of total and grew 7% Y/Y, with operating profit £2.7bn, £2.3bn Y/Y. The company employs c11,000 technologists (total FTEs 35k) and spends c£1.3bn/ year on IT. Across the portfolio, Relx has begun to debut AI products. Our impression is that it is taking a softly-softly approach, with few datapoints being disclosed at the results conference. However, the Relx structure suggests plenty of opportunities:

Risk: Information-based analytics and decision tools that combine public and industry-specific content with advanced technology and algorithms to assist them in evaluating and predicting risk and enhancing operational efficiency.
STM: Provides information and analytics that help institutions and professionals progress science, advance healthcare and improve performance.
Legal: Provides legal, regulatory and business information and analytics that help customers increase their productivity, improve decision making and outcomes.
Exhibitions: Connect digitally and face-to-face, learn about markets, source products and complete transactions.

Website & Latest results deck

S4Capital

S4Capital, founded by the incomparable Sir Martin Sorrell, is a global (geographical markets includes the Americas, EMEA and Asia Pacific) digital advertising, marketing, and technology services company. The Company’s inclusion on the RAG listing is due to the deep knowledge of the Adtech and Martech worlds along with the digitisation of Advertising, where we see much opportunity for process re-engineering. S4C is a digital native across its three operating segments: Content, Data&Digital Media, and Technology Services.

Content: Offers creative content, campaigns, and assets at a global scale for paid, social and earned media from digital platforms and applications to brand activations that are focused to convert consumers at every possible touchpoint.
Data & Digital Media; Provides full-service campaign management analytics, creative production and advertisement serving, platform and systems integration and transition and training and education.
Technology Services: Engaged in the digital transformation services in delivering advanced digital product design, engineering services and delivery services.

Website & Latest results

Trustpilot

A digital platform developing and hosting an online review platform that helps consumers make purchasing decisions and businesses showcase and improve their service. It hosts reviews to help consumers shop. We are big fans of the Company's platform which creates a place where businesses and consumers can gain actionable insights and collaborate. It is skill, and the resultant domain expertise, that warrants its inclusion on our RAG cohort.

Through its platform, consumers can share feedback, at any time, about any business with a Website and review feedback left by other consumers. The platform also gives consumers the opportunity to recommend businesses, products, services, and locations based on their experiences. Its consumers can search for reviews on any business by category and any business with a Website, whether operating online or offline, can receive a review.

Website & Latest results deck

The data

Klarna NPS score

Source: Comparably

Datapoints used to train artificial intelligence systems

Source: Epoch (2024)

How a RAG works

Source: : https://www.ml6.eu/blogpost/leveraging-llms-on-your-domain-specific-knowledge-base

Valuation heatmap - No Rags to riches as AI RAG is a fraction of AI valuation

Source Company data, Yahoo Finance, Analyst

LSE RAG who’s who & (fortune favours the bold) your fellow investors

Source Company data, Yahoo Finance, Analyst (Priced LSE close 7/3)

End notes & Disclaimer: Please read

All information used in the publication of this report has been compiled from publicly available sources that are believed to be reliable, however we do not guarantee the accuracy or completeness of this report and have not sought for this information to be independently verified. This is not investment advice. Opinions contained in this report represent those of the author at the time of publication. Forward-looking information or statements in this report contain information that is based on assumptions, forecasts of future results, estimates of amounts not yet determinable, and therefore involve known and unknown risks, uncertainties and other factors which may cause the actual results, performance or achievements of their subject matter to be materially different from current expectations. The author is not liable for any direct, indirect or consequential losses, loss of profits, damages, costs or expenses incurred or suffered by you arising out or in connection with the access to, use of or reliance on any information contained herein. The information should not be construed in any manner whatsoever as, personalized advice nor construed by any subscriber or prospective subscriber as a solicitation to effect, or attempt to effect, any transaction in a security. Any logo used in this report is the property of the company to which it relates, is used here strictly for informational and identification purposes only and is not used to imply any ownership or license rights between any such company and Technology Investment Services Ltd. Email addresses and any other personally identifiable information collected in the provision of the newsletter are only used to provide and improve the newsletter.

Need more

Let’s chat at Progressive Equity Research here where I am delighted to be a contributing analyst and my website here.

The ask

My name is George O’Connor. I am a tech investment and IT industry analyst. I explore shareholder value, its drivers, the best exponents, the duffers. The target readers are investors, companies, advisors, stakeholders and YOU. If you like this please subscribe and pass it on to colleagues and friends. That said, if you hate it - do the same. Thanks for dropping by dear investor.

George O'Connor: The long and the short of IT

Discussion about this post