Annonce

26 septembre 2024

PhD subject "Self-adaptive handwriting recognition using self-supervised Learning"

Catégorie : Doctorant

PhD thesis offer
“Self-adaptive handwriting recognition using self-supervised Learning”

Location:
IRISA lab, SHADoc team
263 avenue du General Leclerc, 35000 Rennes (France)

Duration: 3 years

Desired start date: as soon as possible

Funding: already available (PEPR IA Adapting)

Contacts:
- Bertrand Couasnon (bertrand.couasnon@irisa.fr)
- Yann Soullard (yann.soullard@irisa.fr)
- Denis Coquenet (denis.coquenet@irisa.fr)

Keywords:
Self-supervised learning, Deep learning, Handwriting recognition, Machine Learning, Adaptation to specific documents

This thesis is in line with the domain of handwriting recognition. In particular, we are interested in recognizing handwritten texts present in images of document pages. The aim is to facilitate access to the automatic transcription of various, potentially ancient, documents. Older documents, in fact, present increased difficulties due to the old language, the style of writing, the deterioration of the document or low-quality digitizations, which often make the task of transcribing a specific document difficult and troublesome, even for an expert.
Traditional systems, successively combining a text line extraction task, an optical model recognizing characters in images, and a language model providing linguistic corrections [Soullard et al., 2019], are gradually being replaced by so-called end-to-end systems performing these different tasks simultaneously [Yousef and Bishop, 2020, Coquenet et al., 2022, 2023]. These systems are based on fully convolutional architectures [Yousef and Bishop, 2020] or on Transformers [Coquenet et al., 2023], which become more and more popular in many fields, such as handwriting recognition, thanks to the efficiency of the attention mechanisms they include [Barrere et al., 2024, Kang et al., 2022]. These systems are thus highly capable of analyzing images and recognizing text. Although they perform well on the document type(s) on which they have been trained, these systems can nevertheless suffer from a lack of adaptability to generalize to new documents. This is notably due to the lack of supervised data with diversity that can be exploited to train these models. In addition, having supervised examples from the specific document that have to be transcribed is hard and time-consuming even for an expert which make the model adaptation less reliable.

The aim of this thesis will be to look at self-supervised learning in order to take advantage of unsupervised examples for adapting these models. In recent years, a great deal of work has focused on self-supervised approaches for network pre-training. Starting with a pretext task that does not require manual annotation, the model nonetheless acquires strong modeling capabilities. These approaches form the basis of all literature foundation models, whether text-based (GPT-3 [Brown et al., 2020]), image-based (DINO [Caron et al., 2021]) or both (CLIP [Radford et al., 2021]). The pretext tasks can be diverse: prediction of the next token for GPT 3 or contrastive learning for CLIP and DINO. More recently, work [He et al., 2022] has shown remarkable results for the pretext task of partially masked image reconstruction with Transformer networks.

Thus, we are interested in performing self-supervised learning for a handwriting recognition task in images. The aim here is twofold: a) to adapt a current system to deal with a wide range of examples in order to be generic and improve the generalization capabilities; b) to enable a system to specialize on an unlabeled corpus in order to enhance the transcription capabilities of this corpus. Thus, self-supervised learning should make it possible to adapt current reference systems to achieve these generalization and specialization objectives while minimizing the amount of annotated data required.

References
Killian Barrere, Yann Soullard, Aurélie Lemaitre, and Bertrand Coüasnon. Training transformer architectures on few annotated data: an application to historical handwritten text recognition. In International Journal on Document Analysis and Recognition, pages –. Springer, 2024.

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901, 2020.

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. In International Conference on Computer Vision, pages 9630–9640, 2021.
Denis Coquenet, Clément Chatelain, and Thierry Paquet. End-to-end handwritten paragraph text recognition using a vertical attention network.
Denis Coquenet, Clément Chatelain, and Thierry Paquet. Dan : a segmentation-free document attention network for handwritten document recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross B. Girshick. Masked autoencoders are scalable vision learners. In Conference on Computer Vision and Pattern Recognition, pages 15979–15988, 2022.
Lei Kang, Pau Riba, Marçal Rusiñol, Alicia Fornés, and Mauricio Villegas. Pay attention to what you read : non-recurrent handwritten text-line recognition. PR, 129 :108766, 2022.
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763, 2021.
Yann Soullard, Wassim Swaileh, Pierrick Tranouez, Thierry Paquet, and Clément Chatelain. Improving text recognition using optical and language model writer adaptation. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pages 1175–1180. IEEE, 2019.
Mohamed Yousef and Tom E Bishop. Origaminet : weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14710–14719, 2020.

Expected skills:
Expected courses: Machine Learning, Deep Learning
Skills:
- Python programming
- At least one experience with a deep learning environment (Keras, TensorFlow, PyTorch)

How to apply:
- CV
- Cover letter
- Master’s or engineering school grades
- Recommandation letters (optional)

SHADoc team:

The Shadoc team (Systems for Hybrid Analysis of DOCuments) focuses on modelling man-made data for written communication: handwriting, gesture (2D and 3D), and documents, under various aspects: analysis, recognition, composition, interpretation.
The objective is to achieve a continuum between paper and digital documents with a certain readability.
We mainly focus on the following topics:
- Intelligent recognition of handwritten content: documents, writings, gestures;
- Analysis of the semantic/structural content: document structure, stages of production of diagrams, drawings, musical scores, sketches, architectural plans;
- Design of new AI, combining recognition and analysis: offer enriched experiences for digital humanities or e-education.
The roadmap of the Shadoc team is on the frontier of several research axes: Pattern Recognition,
Machine Learning, Artificial Intelligence, Human-Machine Interaction, Uses and Digital Learning. Our research is characterized by the hybridization of several AI approaches: two-dimensional grammars, deep learning, fuzzy inference systems… This hybridization aims at guaranteeing, beyond performance, important aspects such as: explicability, genericity, adaptability, data frugality.
Beyond hybridization, the originality of this research is to focus on user interaction. This strategy aims at answering the limits of the current approaches which are based on non-interactive treatments. The concept is to reinforce the decision processes by relying on the implicit validations or explicit corrections of a user to avoid the propagation of errors throughout the analysis. The notions of interpretation, adaptation and incremental learning are at the heart of this research, the objective being to design efficient, robust and self-evolving system.

IRISA lab:

IRISA is today one of the largest French research laboratories (more than 850 people) in the field of computer science and information technologies. Structured into seven scientific departments, the laboratory is a research center of excellence with scientific priorities such as bioinformatics, systems security, new software architectures, virtual reality, big data analysis and artificial intelligence.
Located in Rennes, Lannion and Vannes, IRISA is at the heart of a rich regional ecosystem for research and innovation and is positioned as the reference in France with an internationally recognized expertise through numerous European contracts and international scientific collaboration.
Focused on the future of computer science and necessarily internationally oriented, IRISA is at the very heart of the digital transition of society and of innovation at the service of cybersecurity, health, environment and ecology, transport, robotics, energy, culture and artificial intelligence.
IRISA is a joint-venture resulting from the collaboration between nine institutions, in alphabetical order: CentraleSupélec, CNRS, ENS Rennes, IMT Atlantique, Inria, INSA Rennes, Inserm, Université Bretagne Sud, Université de Rennes. From this collaboration is born a force that comes from women and men who give the best of themselves for fundamental and applied research, education, exchanges with other disciplines, transfer of know-how and technology, and scientific mediation.

Reminder:

Location:
IRISA lab, SHADoc team
263 avenue du General Leclerc, 35000 Rennes (France)

Duration: 3 years

Desired start date: as soon as possible

Funding: already available (PEPR IA Adapting)

Contacts:
- Bertrand Couasnon (bertrand.couasnon@irisa.fr)
- Yann Soullard (yann.soullard@irisa.fr)
- Denis Coquenet (denis.coquenet@irisa.fr)

Retour

Identification

Annonce

PhD subject "Self-adaptive handwriting recognition using self-supervised Learning"

Dans cette rubrique