MPRG : 機械知覚&ロボティクスグループ／中部大学

01 Jun 2022 学術論文（E）

Detecting Double Layer Sign (DLS) with OCT using Multi-Region Segmentation Visual Transformers (ViT)

Author: Yuka Kihara, Yingying Shi, Mengxi Shen, Liang Wang, Rita Laiginhas, Xiaoshuang Jiang, Jeremy Liu, Rosalyn Morin, Giovanni Gregori, Philip J Rosenfeld, Hironobu Fujiyoshi, Aaron Y Lee
Publication: Investigative Ophthalmology & Visual Science, Vol.63, 470 – A0007., 2022

Download: PDF (English)

Purpose : For the task of segmenting the double layer sign (DLS), an important feature of type 1 macular neovascularization (MNV) in age-related macular degeneration, we applied a Vision Transformer (ViT)-based model, which is now state of the art in many computer vision tasks. The ViT is convolution-free transformer architecture that can capture global interactions between elements of a scene and make better use of long-range dependencies.
Methods : Eyes were imaged using swept-source OCT angiography (SS-OCT, PLEX Elite 9000, Carl Zeiss Meditec, Dublin, CA) 6x6mm scans. The scans consisted of 500 A-scans per B-scan; each B-scan repeated twice at each of 500 B-scan positions along the y-axis. The SS-OCTA structural B-scans were manually annotated for the presence of a DLS and drusen (Dr) and used for training. We built a multi-region segmentation ViT that labelled both DLSs and Dr on a single B-scan image. In order to extend ViT from image classification to semantic segmentation, we depended on the output embeddings corresponding to image patches and obtained class labels from these embeddings with a pointwise linear decoder. For comparison, a convolutional (CNN) model was trained on the same dataset.
Results : A total of 251 eyes (211 patients) were included; 188 eyes with DLS and 63 eyes with drusen only (Dr) as controls. Our ViT model had 12 layers, 768 token sizes, and 12 heads. Mean Intersections over Union (IoU) between predicted and annotated masks for DLSs and Dr were 59.7%, 62.4% for the ViT model, and 44.9%, 52.8% for the CNN model, respectively. The transformer-based model significantly outperformed the CNN-based model.
Conclusions : We present a network that can detect DLS from structural B-scans alone using a purely transformer-based model and have applied it to a dataset with coarse annotations. To our knowledge, this is the first application of ViT segmentations in ophthalmic imaging. This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.