🏠 Xavier Pic - Homepage

New Position at EURECOM: Postdoctoral researcher

AI-based image compression, DNA storage

November 4th, 2024 EURECOM

I am very happy to join the team of Pr. Raja Appuswamy, in the Data Science department, to work on image compression based on AI models. An application to DNA storage is also considered. This postdoc is funded by the French Government, under the PEPR Molexcularxiv Project fund.

PhD Defense

Coding algorithms for long-term storage of digital images on synthetic DNA molecules

September 12th, 2024 I3S SIS/Mediacoding

Coding algorithms for long-term storage of digital images on synthetic DNA molecules

Abtract

The current digital world is facing a number of issues, some of them linked to the amount of data that is being stored. The current technologies available as an offer to store data are not enough to store the totality of the storage demand. For this reason, new data storage technologies have to be developed. DNA molecules are one of the candidates available for novel data storage methods. The long lifespan of these molecules make it a good fit for the archival of data that is rarely accessed but needs to be stored for long periods of time. This data, often called “cold”, represents approximately 80% of the data in our digital universe. But DNA uses 4 symbols (A,C,G and T) to encode data against the usual binary code (0,1). For this reason, storing data into DNA requires a specific encoding system capable of translating a binary data stream into a quaternary data stream. In this thesis we will focus on new encoding methods from the Deep Learning state of the art, and we will adapt those methods for the encoding, decoding, compression and decompression of images on synthetic DNA.

Jury

Aline Roumy, Research Director, INRIA, Rennes
Eitan Yaakobi, Research Director, Technion, Haifa
Thomas Heinis, Associate Professor, Imperial College London
Athanassios Skodras, Professor, University of Patras
Raja Appuswamy, Assistant Professor, EURECOM, Sophia Antipolis
Dominique Lavenier, Research Director, CNRS, IRISA, Rennes

IEEE 30th International Conference on Image Processing

Poster: MQ-Coder inspired Arithmetic coder for Synthetic DNA Data Storage

October 11th, 2023 I3S SIS/Mediacoding

Over the past years, the ever-growing trend on data storage demand, more specifically for "cold" data (i.e. rarely accessed), has motivated research for alternative systems of data storage. Because of its biochemical characteristics, synthetic DNA molecules are now considered as serious candidates for this new kind of storage. This paper introduces a novel arithmetic coder for DNA data storage, and presents some results on a lossy JPEG 2000 based image compression method adapted for DNA data storage that uses this novel coder. The DNA coding algorithms presented here have been designed to efficiently compress images, encode them into a quaternary code, and finally store them into synthetic DNA molecules. This work also aims at making the compression models better fit the problematic that we encounter when storing data into DNA, namely the fact that the DNA writing, storing and reading methods are error prone processes. The main take away of this work is our arithmetic coder and it's integration into a performant image codec.

Best Student Paper Award
Trung Hieu Le, Xavier Pic and Marc Antonini (I3S CNRS UCA)

IEEE 25th International Workshop on Multimedia Signal Processing

Poster: INR-MDSQC: Implicit Neural Representation Multiple Description Scalar Quantization for robust image Coding

October 1st, 2023 I3S SIS/Mediacoding

Multiple Description Coding (MDC) is an error-resilient source coding method designed for transmission over noisy channels. We present a novel MDC scheme employing a neural network based on implicit neural representation. This involves overfitting the neural representation for images. Each description is transmitted along with model parameters and its respective latent spaces. Our method has advantages over traditional MDC that utilizes auto-encoders, such as eliminating the need for model training and offering high flexibility in redundancy adjustment. Experiments demonstrate that our solution is competitive with autoencoder-based MDC and classic MDC based on HEVC, delivering superior visual quality.

IEEE 25th International Workshop on Multimedia Signal Processing

Presentation: Image storage on synthetic DNA using compressive autoencoders and DNA-adapted entropy coders

October 1st, 2023 I3S SIS/Mediacoding

Over the past years, the ever-growing trend on data storage demand, more specifically for "cold" data (rarely accessed data), has motivated research for alternative systems of data storage. Because of its biochemical characteristics, synthetic DNA molecules are now considered as serious candidates for this new kind of storage. This paper presents some results on lossy image compression methods based on convolutional autoencoders adapted to DNA data storage, with synthetic DNA-adapted entropic and fixed-length codes. The model architectures presented here have been designed to efficiently compress images, encode them into a quaternary code, and finally store them into synthetic DNA molecules. This work also aims at making the compression models better fit the problematics that we encounter when storing data into DNA, namely the fact that the DNA writing, storing and reading methods are error prone processes. The main take aways of this kind of compressive autoencoder are our latent space quantization and the different DNA adapted entropy coders used to encode the quantized latent space, which are an improvement over the fixed length DNA adapted coders that were previously used.

JPEG Group

JPEG 100th Meeting

JPEG DNA Ad-Hoc Group's Call for Proposal publication

July 21, 2023 JPEG DNA AHG

The JPEG Committee has been exploring coding of images in quaternary representations particularly suitable for image archival on DNA storage. The scope of JPEG DNA is to create a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers.

At the 100th JPEG meeting, “Additions to the JPEG DNA Common Test Conditions version 2.0”, was produced which supplements the “JPEG DNA Common Test Conditions” by specifying a new constraint to be taken into account when coding images in quaternary representation. In addition, the detailed procedures for evaluation of the pre-registered responses to the JPEG DNA Call for Proposals were defined. Furthermore, the next steps towards a deployed high-performance standard were discussed and defined. In particular, it was decided to request for the new work item approval once a Committee Draft stage has been reached. The JPEG-DNA AHG has been re-established to work on the preparation of assessment and crosschecking of responses to the JPEG DNA Call for Proposals until the 101st JPEG meeting in October 2023.

Marc Antonini (I3S CNRS UCA), Elsa Dupraz (IMT-Atlantique), Dominique Lavenier (IRISA)

CNRS - GDR ISIS

Stockage de données numériques dans de l'ADN synthétique - Grandes avancées et défis à relever

Présentation: Constrained and robust image coding for molecular storage

July 3, 2023 CNRS - GDR ISIS

L'explosion des données est l'un des plus grands défis de l'évolution numérique. La demande de stockage augmente à un rythme tel qu'elle ne peut rivaliser avec les capacités réelles des appareils. Selon les prévisions, l'univers numérique devrait atteindre plus de 180 zettaoctets d'ici 2025, tandis que 80 % des données sont rarement consultées (données "froides"), mais méritent d'être archivées à long terme pour mémoire de l'humanité (photographies, films, code informatique, connaissances scientifiques, etc.). Dans le même temps, les dispositifs de stockage classiques ont une durée de vie limitée à 10 ou 20 ans et doivent être fréquemment remplacés pour garantir la fiabilité des données, un processus coûteux en termes d'argent et d'énergie. Des études récentes ont montré qu'en raison de ses propriétés biologiques, l'ADN est un candidat très prometteur pour l'archivage à long terme de données numériques "froides" pendant des siècles. Le stockage de données sous la forme de molécules d'ADN nécessite de coder les informations dans un flux quaternaire composé des symboles A, C, T et G (les fameux nucléotides), tout en respectant des contraintes strictes liées aux processus biochimiques associés. De plus, ce support de stockage introduit des erreurs non conventionnelles de types insertions et deletions que les méthodes classiques de correction d'erreurs ne savent pas traiter. Des travaux pionniers ont d'ores et déjà proposé différents algorithmes pour le codage et la protection des données stockées dans de l'ADN, laissant cependant encore la place à de nombreux défis à relever.

L'objectif de cette journée est de faire le point sur les avancées technologiques et les grands défis à relever dans ce domaine du stockage moléculaire, en mettant en avant les problématiques liées au traitement du signal et des images ainsi que de la théorie des codes correcteurs et du codage source/canal conjoint. La journée débutera par deux tutoriels d'introduction au sujet, avant de se poursuivre par des exposés plus techniques sur les sujets précédents.

24th International Conference on Digital Signal Processing

Presentation: Rotating labeling of entropy coders for synthetic DNA data storage

June 12, 2023 I3S SIS/Mediacoding

Over the past years, the ever-growing trend on data storage demand, more specifically for "cold" data (i.e. rarely accessed), has motivated research for alternative systems of data storage. Because of its biochemical characteristics, synthetic DNA molecules are considered as potential candidates for a new storage paradigm. Because of this trend, several coding solutions have been proposed over the past years for the storage of digital information into DNA. Despite being a promising solution, DNA storage faces two major obstacles: the large cost of synthesis and the noise introduced during sequencing. Additionally, this noise increases when biochemically defined coding constraints are not respected: avoiding homopolymers and patterns, as well as balancing the GC content. This paper describes a novel entropy coder which can be embedded to any block-based image-coding schema and aims to robustify the decoded results. Our proposed solution introduces variability in the generated quaternary streams, reduces the amount of homopolymers and repeated patterns to reduce the probability of errors occurring. In this paper, we integrate the proposed entropy coder into four existing JPEG-inspired DNA coders. We then evaluate the quality-in terms of biochemical constraints-of the encoded data for all the different methods.

30th European Signal Processing Conference, EUSIPCO 2022

Poster: A constrained Shannon-Fano entropy coder for image storage in synthetic DNA

September 30, 2022 I3S SIS/Mediacoding

The exponentially increasing demand for data storage has been facing more and more challenges during the past years. The energy costs that it represents are also increasing, and the availability of the storage hardware is not able to follow the storage demand's trend. The short lifespan of conventional storage media-10 to 20 years-forces the duplication of the hardware and worsens the situation. The majority of this storage demand concerns "cold" data, data very rarely accessed but that has to be kept for long periods of time. The coding abilities of synthetic DNA, and its long durability (several hundred years), make it a serious candidate as an alternative storage media for "cold" data. In this paper, we propose a variable-length coding algorithm adapted to DNA data storage with improved performance. The proposed algorithm is based on a modified Shannon-Fano code that respects some biochemichal constraints imposed by the synthesis chemistry. We have inserted this code in a JPEG compression algorithm adapted to DNA image storage and we highlighted an improvement of the compression ratio ranging from 0.5 up to 2 bits per nucleotide compared to the state-of-the-art solution, without affecting the reconstruction quality.

Xavier Pic

EURECOM Research

Raja Appuswamy

Data Science Department

Departments's GitLab

I3S Laboratory

Marc Antonini

Mediacoding Team

Mediacoding's GitHub

JPEG DNA Ad-Hoc Group

JPEG DNA's GitLab

New Position at EURECOM: Postdoctoral researcher

PhD Defense

IEEE 30th International Conference on Image Processing

Best Student Paper Award
Trung Hieu Le, Xavier Pic and Marc Antonini (I3S CNRS UCA)

IEEE 25th International Workshop on Multimedia Signal Processing

IEEE 25th International Workshop on Multimedia Signal Processing

JPEG Group

JPEG 100th Meeting

Marc Antonini (I3S CNRS UCA), Elsa Dupraz (IMT-Atlantique), Dominique Lavenier (IRISA)

CNRS - GDR ISIS

24th International Conference on Digital Signal Processing

30th European Signal Processing Conference, EUSIPCO 2022

Xavier Pic

About

Lab