UW · Lab for Computing Cultural Heritage
Graduate Student Researcher
- Examined clusters from over 1.5 million Library of Congress archival newspaper images to identify large-scale patterns of visual reuse and circulation in early 20th-century print media.
- Applied CLIP embeddings with DBSCAN clustering to generate groupings of visually similar images, then computed cluster statistics to support systematic interpretation.
- Built robust embedding-image lookup pipelines linking CLIP vectors, image IDs, filenames, and metadata, enabling reliable retrieval and scalable downstream analysis.
- Performed zero-shot concept labeling on clusters by embedding candidate semantic categories and assigning top concepts using cosine similarity.
- Quantified image circulation and reuse dynamics by measuring cluster frequency distributions, cross-newspaper diversity, and temporal spread.
UW · Language Accessibility Research Lab
Research Collaborator
- Contributed to NeuroAdapt, a content personalization framework across neurodivergent profiles, by developing core segmentation, alignment, and post-processing pipelines comparing original text with plain-language counterparts.
- Built a semantic alignment pipeline using LASER-3 embeddings and VecAlign to support complex alignment types and capture sentence splitting, merging, expansions, and omissions.
- Conducted quantitative evaluation using linguistic and readability metrics, complemented by qualitative analysis, to assess lexical substitutions, structural changes, semantic restorations, and tone and style adjustments.
- Reviewed related work on adaptive reading systems, multi-mode reading interfaces, and accessibility-focused evaluation to identify transferable design and methodological insights for an adaptive text system in development.
MobiDrop (Zhejiang) Co., Ltd
Bioinformatics Research Assistant
- Pretrained scGPT, a Transformer-based model for gene expression prediction using single-cell RNA sequencing data.
- Curated and preprocessed a dataset of 300,000 human blood cells from the CELLxGENE repository, preserving organism-level structure while performing normalization, binning, and tokenization of highly variable genes.
- Modified the Transformer encoder architecture to jointly embed gene identities and binned expression values.
- Leveraged GPU-accelerated training to scale pretraining experiments, writing Bash scripts to automate job submission, checkpointing, and experiment tracking.