Hi, I’m an undergraduate student at Case Western Reserve University with a concentration in Natural Language
Processing (NLP) research.
I previously interned at Twitter AI, Samsung Research America, and HuggingFace’s BigScience in AI Research.
I’m also an active member of VietAI, supervised by Minh-Thang Luong, where we focus on
bringing state-of-the-art research works to the community.
Manuscripts & Publications [Google Scholar]
- 🌸 Enriching Biomedical Knowledge for Low-resource
Language Through Large-Scale Translation EACL 2023
{
Long Phan,
Tai Dang,
Hieu Tran,
TH Trieu },
Vy Phan,
Lam Chau,
Minh-Thang Luong
[ Code ]
- 🌸 MTet: Multi-domain Translation for English and Vietnamese
{
Chinh Ngo,
TH Trieu,
Long Phan,
Hieu Tran },
Hieu Nguyen
Minh-Thang Luong
[ Code
| Blog
]
- 🤗 BLOOM: A 176B-Parameter Open-Access Multilingual Language
Model In review @ Journal of Machine Learning Research
Teven Le Scao,
Angela Fan,
Christopher Akiki,
... ,
Long Phan, ... ,
Thomas Wolf
[ Code ]
- 🤗 The BigScience ROOTS Corpus: A 1.6TB Composite
Multilingual Dataset
NeurIPS 2022
Hugo Laurençon,
Lucile Saulnier,
Thomas Wang,
... ,
Long Phan, ... ,
Yacine Jernite
[ Code ]
- ViT5: Pretrained Text-to-Text Transformer for
Vietnamese Language Generation NAACL SRW 2022
Long Phan,
Tai Dang,
Hieu Tran,
Hieu Nguyen,
TH Trieu
[ Code
| Blog
]
- CoTexT: Multi-task Learning with Code-Text
Transformer ACL NLP4Prog 2021
Long Phan,
Hieu Tran,
Daniel Le,
Hieu Nguyen,
James Anibal,
Alec Peltekian,
Yanfang Ye
[ Code
]
- HAL-X: Scalable
hierarchical clustering for rapid and tunable single-cell analysis PLOS
Computational Biology
James Anibal,
Alexandre G. Day,
Erol Bahadroglu,
Liam O’Neil,
Long Phan,
Alec Peltekian,
Amir Erez,
Mariana Kaplan,
Grégoire
Altan-Bonnet,
Pankaj Mehta
- Scifive: a text-to-text transformer model for biomedical
literature arXiv preprint
Long Phan,
James Anibal,
Hieu Tran,
Shaurya Chanana,
Erol Bahadroglu,
Alec Peltekian,
Grégoire
Altan-Bonnet
[ Code
]
- SPBERT: An Efficient Pre-training BERT on SPARQL Queries for
Question Answering over Knowledge Graphs ICONIP 2021
Hieu Tran,
Long Phan,
James Anibal,
Binh T. Nguyen,
Truong-Son Nguyen
[ Code
]
- Hierarchical Transformer Encoders for Vietnamese Spelling
Correction EA/AIE 2021
Hieu Tran,
Cuong Dinh,
Long Phan,
Truong-Son Nguyen
Service
- Reviewer:
Scientific Data - Nature;
TU @ COLING 2022