Research
Yueran (Hannah) Sun

UW · Lab for Computing Cultural Heritage

Graduate Student Researcher
Nov 2025 – Present
  • Examined clusters from over 1.5 million Library of Congress archival newspaper images to identify large-scale patterns of visual reuse and circulation in early 20th-century print media.
  • Applied CLIP embeddings with DBSCAN clustering to generate groupings of visually similar images, then computed cluster statistics to support systematic interpretation.
  • Built robust embedding-image lookup pipelines linking CLIP vectors, image IDs, filenames, and metadata, enabling reliable retrieval and scalable downstream analysis.
  • Performed zero-shot concept labeling on clusters by embedding candidate semantic categories and assigning top concepts using cosine similarity.
  • Quantified image circulation and reuse dynamics by measuring cluster frequency distributions, cross-newspaper diversity, and temporal spread.

UW · Language Accessibility Research Lab

Research Collaborator
Sep 2025 – Present
  • Contributed to NeuroAdapt, a content personalization framework across neurodivergent profiles, by developing core segmentation, alignment, and post-processing pipelines comparing original text with plain-language counterparts.
  • Built a semantic alignment pipeline using LASER-3 embeddings and VecAlign to support complex alignment types and capture sentence splitting, merging, expansions, and omissions.
  • Conducted quantitative evaluation using linguistic and readability metrics, complemented by qualitative analysis, to assess lexical substitutions, structural changes, semantic restorations, and tone and style adjustments.
  • Reviewed related work on adaptive reading systems, multi-mode reading interfaces, and accessibility-focused evaluation to identify transferable design and methodological insights for an adaptive text system in development.

MobiDrop (Zhejiang) Co., Ltd

Bioinformatics Research Assistant
Nov 2023 – May 2024 · Remote (Shanghai, China)
  • Pretrained scGPT, a Transformer-based model for gene expression prediction using single-cell RNA sequencing data.
  • Curated and preprocessed a dataset of 300,000 human blood cells from the CELLxGENE repository, preserving organism-level structure while performing normalization, binning, and tokenization of highly variable genes.
  • Modified the Transformer encoder architecture to jointly embed gene identities and binned expression values.
  • Leveraged GPU-accelerated training to scale pretraining experiments, writing Bash scripts to automate job submission, checkpointing, and experiment tracking.