PhD position starting in fall 2022, at IMT Atlantique (Brest), a top-level ingineering school in France. The topic of the PhD is on channel coding for the emerging application of data storage on DNA molecules (see https://www.youtube.com/watch?v=r8qWc9X4f6k for a brief introduction).
The PhD will be in the framework of a French PEPR project called MolecularXiv which aims at developing innovative and competitive solutions for massive DNA data storage.
Data storage on DNA molecules is perceived as an emerging and promising technology which should allow for highly increased density and durability compared to conventional storage techniques (HDD, SDD, etc.). The disruptive idea of this technology, which appeared for a long time to be a distant hope, is to build synthetic DNA sequences encoding some relevant information, see https://www.youtube.com/watch?v=r8qWc9X4f6k for a brief introduction.
Recent major improvements in both DNA synthesis and sequencing techniques made DNA data storage affordable, although these techniques still introduce a large amount of errors in the read data sequences. While conventional storage systems are mostly concerned with substitution errors, DNA storage also introduces deletions and insertions which standard Error-Correction (EC) solutions (LDPC, Turbo, Polar, etc.) cannot handle
DNA sequencers output a large number of copies of the same input sequence, with different error realisations. Recently developed EC solutions for DNA storage efficiently exploit those multiple reads so as to correct both substitution, insertion, and deletion errors. However, these solutions assume unrealistic independent and identically distributed (i.i.d.) error models, and may therefore show poor performance in practice.
In previous works, we developed a channel model for DNA storage, which accurately captures error dependency to the successive bases of the read sequences, as well as memory within errors introduced by the sequencer. This model could be used in a first design step before testing the developed EC solutions under costly in-vivo experiments.
Therefore, the three main challenges of the PhD will be as follows:
The PhD will be realized in the context of the PEPR (Projet Exploratoire de Recherche) MolecularXiv (see https://www.cnrs.fr/index.php/fr/cnrsinfo/stockage-de-donnees-du-data-center-la-capsule-adn). This French project will involve many researchers working in various area: biology, bioinformatics, signal processing, etc. Although the focus of the PhD will be on coding and information theory, the candidate should expect some interactions with researchers working on the other fields.
The candidate should have earned an MSc degree, or equivalent, in one of the following fields: telecommunications, information theory, applied mathematics, signal processing.
To apply, please contact Elsa Dupraz (elsa.dupraz@imt-atlantique.fr) and Emmanuel Boutillon (emmanuel.boutillon@univ-ubs.fr), and attach the following:
(c) GdR 720 ISIS - CNRS - 2011-2022.