Interview with Luis Antonio Morró CEO of tranSkriptorium: “The support of CDTI Innovation and European MRR funds has been key to developing our AI technology capable of extracting information from non-native digital documents”

Backed by CDTI Innovation and European MRR funds, the spin-off of the UPV tranSkriptorium uses artificial intelligence to open up to the world handwritten, printed or typed documents that were previously almost impossible to consult. Their technology transforms them, regardless of language, seniority or deterioration, into databases accessible to researchers, administrations and companies

CEO_Transkriptorium
The company combines automation with human monitoring to obtain real and reliable data on a large scale

At a time when digitalization is advancing rapidly, much of the knowledge remains trapped in documents that are not electronically accessible. Therefore, millions of administrative files, judicial records and handwritten documents kept in public and private archives remain outside the scope of data search and analysis systems, limiting their value for administrations, companies, researchers and citizens.

Faced with this challenge, tranSkriptorium emerges, a spin-off of the Universitat Politècnica de València (UPV) that has turned years of research into solutions capable of interpreting, classifying and extracting structured information from historical and administrative documents. According to Luis Antonio Morró, CEO of the company: “The origin of tranSkriptorium dates back to the time when researchers at the Pattern Recognition and Human Language Technology Research Center (PRHLT) of the UPV and the University were interested in knowing if Probabilistic Indexing (PrIx) technology made business sense.” “That reflection,” he adds, “marked the starting point for transforming a scientific development into a tool of real impact.”

The company, founded during the 2020 pandemic, bases its value proposition on solutions such as PrIx and advanced handwriting recognition models, capable of analyzing untranscribed images and understanding documents that until now could only be studied manually.

Since then, tranSkriptorium has specialized in the processing of complex documents: old manuscripts, typewritten or printed with difficult calligraphies, irregular layouts or marginal annotations. Although its first customers have been public administrations, Morró emphasizes that the technology with which they work from the company has a much broader scope: “In this era, any data holder not electronically accessible to date identifies the business and economic importance of being able to access all available documentation.”

The company also works to accelerate the digitization of thousands of documentary collections that remain invisible to electronic systems. Thus, its objective is clear: “We seek to democratize access to information and allow any citizen, researcher, company or administration to consult these documents so that they can be explored with the same ease as a digital archive,” says the CEO.


Team of tranSkriptoriumAI, researchers and developers who combine AI and archival knowledge to retrieve information hidden in thousands of documents

 

A challenge to face: billions of undescribed documents

Despite advances in digitization, most public and private archives contain documents that have not been described or catalogued, or that lack minimally structured information. In many cases, there is only a digital image of the handwritten or typed page, impossible to process automatically. As Morró assures: “Billions of documents in the archives barely had any information, and manual processes only allowed around 3% to be described.”

This lack of description makes the consultation of the funds dependent on the expert knowledge of archivists and conservatives who must interpret each document manually. In addition, it limits the possibilities of reuse, research or mass analysis, and makes it difficult to comply with regulations related to transparency, citizen access or preservation of institutional memory.

For this reason, the company seeks to solve this problem with a double approach: on the one hand, through automatic recognition systems that speed up work; on the other, through “human-in-the-loop” strategies, where an expert validates ambiguous cases. As the CEO explains: “This approach allows us to maintain quality. Our technology combines automation with human monitoring to get real data and manage it on a large scale, avoiding errors associated with fully generative models.”
 


AI models capable of transforming historical documentary collections into structured and searchable information

 

Digitize, describe and extract information on a large scale

The support of Neotec, an initiative of the CDTI Innovation co-financed with the European funds of the Recovery and Resilience Mechanism (MRR), has been key to the development and growth of tranSkriptorium. In the words of Morró: “Without this support it would have been difficult to tackle such an ambitious project, especially because of the costs of research, development and model training,” he adds: “Neotec has made it possible to speed up testing, demonstrate commercial viability and strengthen its positioning in an expanding market.”

Thanks to this support, the company has been able to advance in a strategic project: develop models capable of classifying documents, segmenting their components and identifying names of people, positions, dates and other structured elements, transforming large files into searchable and exploitable databases. The technology combines several complementary capabilities: analyzes thousands of images to determine the type of document, its internal structure and the information it contains, automating the initial phase of archival work; identifies key entities and data essential to build indexes and enable advanced searches; and, at the heart of the solution, incorporates PrIx, the probabilistic indexing technology that allows working with images without the need to transcribe all their content.

“This tool allows you to work with untranscribed documents and locate information as if it were a modern search engine, offering fast and accurate access to collections that were previously practically inaccessible,” he explains. 
 


Automated sorting and sorting processes that convert handwritten or typed pages into ready data for analysis and consultation


The value of the human-in-the-loop approach

The massive digitization of files presents ambiguities: complex calligraphies, abbreviations, scratches or physical impairments. In this context, Morró emphasizes that, compared to other systems, “tranSkriptorium is committed to integrating experts in the validation of results”. In addition, he assures that “It is not a question of replacing the professional, but of multiplying his capacity for work.”
 

Impact and international validation

The technology of tranSkriptorium AI has already been validated by institutions and universities in different countries, as well as by public administrations that manage large volume collections.

“We have observed a great international demand, due, especially, to the fact that our technology does not depend on a specific language or a specific time,” says Morró, who adds: “They can work with documents in Spanish, Valencian, French, English, Latin or any other language, and with calligraphies as diverse as the notarial ones of the seventeenth century or the administrative ones of the mid-twentieth.”

In addition, the CEO points out that his solution is not limited to historical archives. There are handwritten documents of recent creation in health, social services, education or justice. “It’s something that’s part of our daily lives,” he says. Therefore, its technology not only recovers the past, but also impacts on the document management of the present.
 

Future prospects: alliances, expansion and new lines of research

In the coming years, tranSkriptorium seeks to integrate into European projects and establish global alliances that promote the use of its technology in public administrations and large institutions, with the aim of consolidating itself as an international reference in intelligent file processing.

At the same time, the company will continue to invest in research to improve the accuracy, extraction capacity and robustness of its models against particularly deteriorated or complex documents. As Morró points out, its intention is “to obtain real data and manage it on a large scale without relying on technologies that can generate non-verifiable information”.

Morró synthesizes the philosophy of the company with a clear idea: democratizing access to knowledge. Its objective is that any citizen can consult a historical or administrative archive allowing to locate the information with the same agility as a digital search engine. In short, and as the CEO concludes, “Recovering the information hidden in millions of documents is an essential step to build societies more transparent, efficient and connected with their collective memory.”

 

CDTI Innovation

The Center for Technological Development and Innovation, CDTI E.P.E. It is the innovation agency of the Ministry of Science, Innovation and Universities, whose objective is the promotion of technological innovation in the business environment. The mission of the CDTI is to ensure that the Spanish business fabric generates and transforms scientific and technical knowledge into globally competitive, sustainable and inclusive growth. In 2024, within the framework of a new strategic plan, the CDTI provided more than 2.3 billion euros of support to Spanish companies and startups.


More information:

Press Office
press@cdti.es
91-581.55.00

On the Internet
Website: www.cdti.es
On Linkedin: https://www.linkedin.com/company/29815
On X: https://twitter.com/CDTI_innovacion
On Youtube: https://www.youtube.com/user/CDTIoficial

This content is copyright © 2025 CDTI,EPE. The use and reproduction is allowed by citing the source and digital identity of CDTI (@CDTI_innovacion).