Backed by CDTI Innovación and the European MRR funds, the UPV spin-off tranSkriptorium uses artificial intelligence to open handwritten, printed or typewritten documents to the world—materials that were previously almost impossible to consult. Its technology transforms them, regardless of language, age or deterioration, into accessible databases for researchers, public administrations and companies.

At a time when digitalisation is advancing rapidly, a large part of human knowledge remains trapped in documents that are not electronically accessible. As a result, millions of administrative files, judicial records and handwritten documents stored in public and private archives remain outside the reach of search systems and data-analysis tools, limiting their value for administrations, businesses, researchers and citizens.

Faced with this challenge, tranSkriptorium has emerged—a spin-off from the Universitat Politècnica de València (UPV) that has turned years of research into solutions capable of interpreting, classifying and extracting structured information from historical and administrative documents. According to Luis Antonio Morró, the company’s CEO: “The origin of tranSkriptorium goes back to the moment when researchers from the Pattern Recognition and Human Language Technology Research Center (PRHLT) at the UPV and the University itself became interested in whether Probabilistic Indexing (PrIx) technology had business potential.” “That reflection,” he adds, “marked the starting point for transforming a scientific development into a tool with real-world impact.”

Founded during the 2020 pandemic, the company bases its value proposition on solutions such as PrIx and advanced handwritten text-recognition models capable of analysing untranscribed images and understanding documents that previously could only be studied manually.

Since then, tranSkriptorium has specialised in processing complex documents: historical manuscripts, typewritten pages or printed texts with difficult scripts, irregular layouts or marginal notes. Although its first clients have been public administrations, Morró stresses that the technology they develop has a much broader scope: “In this era, any holder of data that has so far not been electronically accessible clearly recognises the business and economic value of being able to access all the documentation they possess.”

The company is also working to accelerate the digitalisation of thousands of documentary collections that remain invisible to electronic systems. Its goal is clear: “We aim to democratise access to information and allow any citizen, researcher, company or administration to consult these documents as easily as if they were browsing a digital archive,” says the CEO.


A Challenge to Tackle: Billions of Undescribed Documents

Despite advances in digitalisation, most public and private archives contain documents that have not been described or catalogued—or that lack even minimally structured information. In many cases, only a digital image of a handwritten or typewritten page exists, which cannot be automatically processed. As Morró notes: “Billions of documents stored in archives contained barely any information, and manual processes only allowed around 3% to be described.”

This lack of description means that consulting collections depends on the expert knowledge of archivists and curators who must interpret each document manually. It also limits opportunities for reuse, research or large-scale analysis, and hinders compliance with regulations related to transparency, citizen access or preservation of institutional memory.

To address this issue, the company adopts a dual approach: on the one hand, automatic recognition systems accelerate the work; on the other, “human-in-the-loop” strategies allow experts to validate ambiguous cases. As the CEO explains: “This approach helps maintain quality. Our technology combines automation and human oversight to obtain real data and manage it at scale while avoiding errors associated with fully generative models.”


Digitalising, Describing and Extracting Information at Scale

The support of Neotec—an initiative of CDTI Innovación co-funded with the European Recovery and Resilience Facility (MRR)—has been crucial to tranSkriptorium’s development and growth. In Morró’s words: “Without this backing, it would have been difficult to pursue such an ambitious project, especially due to the costs of research, development and model training.” He adds: “Neotec has made it possible to speed up testing, demonstrate commercial viability and strengthen our position in an expanding market.”

Thanks to this support, the company has advanced a strategic project: developing models capable of classifying documents, segmenting their components and identifying names of people, positions, dates and other structured elements, transforming large archives into searchable and exploitable databases. The technology combines several complementary capabilities: it analyses thousands of images to determine document type, internal structure and content, automating the initial archival-processing phase; it identifies entities and key data—essential for building indices and enabling advanced searches; and, at the core of the solution, it incorporates PrIx, the probabilistic indexing technology that enables work with images without the need to transcribe their entire content.

“This tool allows us to work with untranscribed documents and locate information as if using a modern search engine, offering fast and precise access to collections that were previously almost inaccessible,” he explains.


The Value of the Human-in-the-Loop Approach

Massive archival digitalisation presents ambiguities: complex handwriting, abbreviations, strike-throughs or physical deterioration. In this context, Morró stresses that, unlike other systems, “tranSkriptorium chooses to integrate experts into the validation of results.” He also notes: “It’s not about replacing professionals, but about multiplying their capacity.”


Impact and International Validation

tranSkriptorium’s AI technology has already been validated by institutions and universities from various countries, as well as by public administrations managing large-scale collections.

“We have observed great international demand, especially because our technology does not depend on a specific language or historical period,” says Morró. He adds: “We can work with documents in Spanish, Valencian, French, English, Latin or any other language, and with scripts as varied as 17th-century notarial handwriting or mid-20th-century administrative calligraphy.”

The CEO also highlights that their solution is not limited to historical archives. Handwritten documents are still being produced today in areas such as healthcare, social services, education and justice. “It’s part of everyday life,” he emphasises. For this reason, their technology not only recovers the past but also impacts present-day document management.


Future Outlook: Alliances, Expansion and New Research Lines

Over the coming years, tranSkriptorium aims to take part in European projects and establish global partnerships to drive adoption of its technology by public administrations and major institutions, with the goal of becoming an international benchmark in intelligent archival processing.

At the same time, the company will continue investing in research to improve accuracy, extraction capabilities and model robustness when dealing with deteriorated or highly complex documents. As Morró notes, their intention is to “Obtain real data and manage it at scale without relying on technologies that may generate unverifiable information.”

Morró summarises the company’s philosophy with a clear idea: democratising access to knowledge. Their goal is to ensure that any citizen can consult a historical or administrative archive and locate information with the same ease as using a digital search engine. Ultimately, as the CEO concludes, “Recovering the hidden information in millions of documents is an essential step toward building more transparent, efficient societies that remain connected to their collective memory.”


CDTI Innovación

The Centre for the Development of Technology and Innovation, CDTI E.P.E., is the innovation agency of the Ministry of Science, Innovation and Universities. Its mission is to promote technological innovation within the business sector. CDTI’s strategic objective is to ensure that Spanish companies generate and transform scientific-technical knowledge into globally competitive, sustainable and inclusive growth. In 2024, under a new strategic plan, CDTI provided more than €2.3 billion in support to Spanish companies and startups.

More information

Website: www.cdti.es
LinkedIn: https://www.linkedin.com/company/29815
X: https://twitter.com/CDTI_innovacion
YouTube: https://www.youtube.com/user/CDTIoficial

Image: AI models capable of transforming historical documentary collections into structured, searchable information.

Subscribe to Directory
Write an Article

Highlight

Axon moves into Cloud Technology

by Axon Partners Group

cloud technology axon

Marsh Acquires the Remaining Stake in As...

by Marsh

Founded in 2020, Asterra has established a strong presence in the Span...

Photos Stream