Social Links
Contact
My research primarily involves leveraging crowdsourced data from social media, mobile phones, and online review platforms, combined with artificial intelligence (AI) techniques to address socio-technical challenges in (1) Urban Informatics and (2) Health Informatics. My research portfolio has expanded to explore (3) Large Language Models (LLMs) and their responsible usage.
Across these areas, I have accumulated 15 (co)first-authored journal and conference papers, 18 co-authored published works, 20 works under review or revision, and have more than 20 additional working projects. My research on using social media data for disaster and public health crises has been featured as a Pending Disaster story in the "Engineering at Maryland" magazine published by the University of Maryland. In recognition of my work on geoinformatics, my study was selected as the plenary presentation at the 2023 Geo-Risk conference.
My publications have appeared in venues, including:
Computer and Information Science: ACL, EMNLP, AAAI ICWSM, ACM Transactions on the Web, ACM Transactions on the Intelligent System and Technology, International Journal of Information Management, Computers & Education
Civil and Environmental Engineering: ASCE Journal of Managment in Engineering, ASCE Natural Hazards Review, Sustainable Cities and Society, International Journal of Disaster Risk Reduction
Health and Medical Informatics: Journal of Biomedical Informatics (JBI), Journal of Medical Internet Research (JMIR)
My first research area focuses on addressing socio-technical challenges in smart cities to foster communities that are Resilient, Equitable, and Smart. To achieve this, my studies have examined residential attitudes, behaviors, and interactions with urban systems. Understanding these dynamics from residents offers valuable insights into promoting sustainable land use and builtenvironments. In particular, my research has leveraged "crowdsourced data" from social media, mobile phones, and online review platforms to explore the potential of crowdsourcing and enhance urban planning and sustainable development. Compared to conventional surveys, crowdsourcing through these platforms offers several advantages, including its geospatial coverage, large-scale data availability, cost-effectiveness, and—most importantly—facilitating a "human-centered" data science approach that addresses the human elements in urban environments.
Figure. Distribution of parking sentiment across CBSAs in the USA.
Crowdsourced reviews reveal substantial disparities in public perceptions of parking [2024]
L Li, S Hu, L Dinh, L Hemphill
arXiv preprint arXiv:2407.05104
The study analyzes parking perceptions using 5 million Google Maps reviews across 911 U.S. metropolitan areas. Using BERT sentiment analysis and regression modeling, we found significant variations in parking sentiment across different locations and business types, with restaurants receiving the most negative feedback. Dense urban areas with higher minority populations and lower socioeconomic status showed more negative parking sentiment. Counter-intuitively, increased parking supply didn't improve satisfaction. Text analysis revealed distinct urban-rural differences in parking concerns.
Figure. Time series of road traffic volume change and speed change.
Multi-crowdsourced data fusion for modeling link-level traffic resilience to adverse weather events [2024]
S Hu, K Wang, L Li, Y Zhao, Z He, Y Zhang
International Journal of Disaster Risk Reduction, 104754
This study analyzes road traffic resilience during adverse weather using crowdsourced data from mobile devices and WAZE reports. We examined floods, winter storms, and fog impacts in Dallas-Fort Worth (2022) using metrics like speed change, event duration, and area under the curve. Winter storms showed the strongest impact, followed by floods and fog. Higher-class roads with greater volume experienced more significant changes. Road geometry influenced flood resilience but not winter storm impacts. The findings help guide disaster preparedness in transportation systems.
Figure. SIR modeling for topic disussion on social media platform.
Z Ma, L Li, L Hemphill, GB Baecher, Y Yuan
Sustainable Cities and Society 106, 105362
The study analyzed Twitter data during the 2020 western U.S. wildfire season using BERT topic modeling and Susceptible-Infected-Recovered (SIR) theory to understand disaster response patterns. Three main topics emerged: health impact, damage, and evacuation. Temporal-spatial analysis revealed correlations between topic diffusion and wildfire spread patterns. The SIR model parameters showed high levels of concern among residents in affected cities. This research demonstrates how social media analysis can provide quantitative insights for disaster response.
Figure. The epicenter (star) and ShakeMap for the investigated events.
Exploring the potential of social media crowdsourcing for post-earthquake damage assessment [2023]
L Li, M Bensi, G Baecher
International Journal of Disaster Risk Reduction 98, 104062
The study investigates using social media, particularly Twitter posts, for sudden-onset disaster damage assessment. This study applies natural language processing and machine learning to classify damage levels from tweets across six earthquake sequences. The analysis examines temporal patterns and compares findings with USGS and media reports to evaluate crowdsourcing's effectiveness. The study reveals insights about timing, spatial coverage, and how social data could complement traditional damage assessment methods for decision-making support.
Figure. Evacuation map on different dates derived from social media.
L Li, Z Ma, T Cao
Fire Safety Journal 126, 103480
This paper presents a data-driven study of social media-aided evacuations for the 2020 wildfires in the western United States. The research validates social media data against official sources through temporal and spatial analysis. Network analysis reveals government channels, news agencies, and public figures as key information disseminators, with on-evacuation communications being more locally focused than pre-evacuation messages. The findings demonstrate social media's value in supporting evacuation efforts and provide guidance for extracting critical disaster relief information.
My second research area focuses on two key aspects of health informatics: (1) Engaging with public opinions and experiences, both in response to therapeutic interventions and on public health policies, thus enhancing health communications and supporting decision-making, and (2) Empowering individuals with AI-driven techniques to facilitate health information-seeking. For both areas, my research adopts advanced NLP techniques, including LLMs which has promising generative capabilities for interpreting textual data in health informatics. My research has leveraged such tools to harness health-related opinion and outcome data from both social media posts and electronic health records, and to develop conversational agents that facilitate more efficient health information-seeking.
Figure. An illustration of the research roadmap of LLMs in EHR.
L Li, J Zhou, Z Gao, W Hua, L Fan, H Yu, L Hagen, Y Zhang, TL Assimes, L Hemphill, S Ma; arXiv preprint arXiv:2405.03066
This scoping review analyzed 329 papers from OpenAlex examining LLM applications in EHR research. The bibliometric analysis revealed trends in paper publication, model applications, and collaboration networks. Seven key topics were identified: named entity recognition, information extraction, text similarity, summarization, classification, dialogue systems, and diagnosis/prediction. The study highlighted LLMs' capabilities and addressing implications for data resources, prompt engineering, fine-tuning, performance measurement, and ethics in healthcare applications.
Figure. Social network of Twitter users based on mentions during Mpox.
L Fan, L Li, L Hemphill
Journal of Medical Internet Research
This study analyzed 1.6 million tweets about the 2022 Mpox outbreak to examine toxic online discourse. Using BERT topic modeling and network analysis, we identified five main topics: disease (46.6%), homophobia (23.9%), health policy (19.3%), politics (6%), and racism (4.1%). The analysis revealed widespread retweeting of toxic content, with influential users rarely countering such toxicity. The findings highlight how tracking topic dynamics and network patterns can inform crisis communication strategies and policy decisions for managing online toxicity during public health emergencies.
Figure. An illustration of Reasoning RAG agentic workflow in the AIPatient system.
AIPatient: Simulating patients with EHRs and LLM powered agentic workflow [2024]
H Yu, J Zhou, L Li, S Chen, J Gallifant, A Shi, X Li, W Hua, M Jin, G Chen, Y Zhou, Z Li, T Gupte. M Chen, Z Azizi, Y Zhang, TL Assimes, X Ma, D Bitterman, L Lu, L Fan
arXiv preprint arXiv:2409.18924
This study proposed a novel patient system called AIPatient using a knowledge graph built from MIMIC-III health records, covering 1,495 patients with high validity (F1 0.89). The system employs Reasoning RAG with six LLM-powered agents for tasks like retrieval, querying, and summarization. It achieves 94.15% accuracy in medical QA, outperforming other benchmarks, while maintaining high readability, robustness, and stability. The system shows promise for medical education, model evaluation, and integration applications.
Figure. Accuracy changes between prompts for drug-disease associations.
Z Gao, L Li, S Ma, Q Wang, L Hemphill, R Xu
Annals of Biomedical Engineering, 1-9
This study aims to explore the potential of ChatGPT in discerning drug-disease associations. We collected 2694 true drug-disease associations and 5662 false drug-disease pairs. Our approach involved creating various prompts to instruct ChatGPT in identifying these associations. ChatGPT’s capability to identify drug-disease associations with an accuracy of 74.6–83.5% and 96.2–97.6% for the true and false pairs, respectively. This study shows that ChatGPT has the potential in identifying drug-disease associations and may serve as a helpful tool in searching pharmacy-related information.
Figure. Detrends of the state-level VAI derived from social media.
Dynamic assessment of the COVID-19 vaccine acceptance leveraging social media data [2022]
L Li, J Zhou, Z Ma, MT Bensi, MA Hall, GB Baecher
Journal of Biomedical Informatics 129, 104054
This study analyzed 29 million vaccine-related tweets between August 2020 and April 2021 to develop a vaccine acceptance index (VAI) for COVID-19 vaccination opinions. Using NLP, the VAI quantifies sentiment across geographic regions by measuring positive versus negative tweets. The national VAI shifted from negative to positive in 2020 and stabilized after January 2021. County-level predictions showed consistency for those areas with 30+ users. The study demonstrates social media's potential for rapid, cost-effective assessment of vaccine acceptance compared to traditional surveys.
My research on LLMs primarily focuses on investigating LLMs' reasoning abilities and trustworthiness for interactive engagement. This includ exploring LLMs' reasoning capabilities to perform computational social science tasks and computational scientific tasks, particularly evaluating their mathematical and logical reasoning across diverse contexts, and exploring the potential of the multi-agent framework, whereby workflows are designed through interactions across several LLM-assisted agentsto accomplish complex tasks. My research also studies LLMs' impacts on multiple sectors, including education, healthcare, and scientific communities. One particular work has investigated the impact of LLMs on scientific communities through the science-of-science perspective.
Figure. Collaboration diversity over time based on authors' institution information.
L Li, L Dinh, S Hu, L Hemphill
arXiv preprint arXiv:2408.04163
This study analyzed 50,391 OpenAlex papers to study scientific collaboration in LLM studies. We found increased interdisciplinary collaboration in LLM research after ChatGPT's release. Computer Science shows consistent growth in collaboration diversity, while fields like Social Science and Psychology demonstrate varied increases by 2024. Health-related fields have significantly higher collaboration entropy. Network analysis identifies Stanford, Harvard, UCL, and Google as key institutional players, with Computer Science and Medicine departments leading research connections.
Figure. An illustration of the agent workflow for deal/no-deal game design.
Game-theoretic LLM: Agent workflow for negotiation games [2024]
W Hua, O Liu, L Li, A Amayuelas, J Chen, L Jiang, M Jin, L Fan, F Sun, W Wang, X Wang, Y Zhang; arXiv preprint arXiv:2411.05990
This study examines LLMs' rationality in game theory contexts, finding they often deviate from optimal strategies as game complexity increases. We developed specialized workflows to improve LLMs' ability to compute Nash Equilibria and make rational decisions under uncertainty. These workflows significantly enhanced LLMs' performance in identifying optimal strategies and reduced exploitation vulnerability in negotiations. The study also explores whether using such workflows is itself a rational choice, contributing insights for developing more strategically capable AI agents.
Figure. Model's performance on different complexity levels.
NPHardEval: Dynamic benchmark on reasoning ability of large language models via complexity classes [2024]
L Fan, W Hua, L Li, H Ling, Y Zhang, L Hemphill
ACL 2024
This study introduced a new benchmark called NPHardEval for evaluating LLM reasoning capabilities through 900 algorithmic questions up to NP-Hard complexity. Unlike existing benchmarks that risk overfitting due to static, public datasets, NPHardEval implements monthly data updates to ensure more accurate assessment. The questions span multiple complexity classes below NP-hard, providing comprehensive measurement of LLM reasoning abilities. This benchmark enables objective comparison of LLM performance across complexity classes while maintaining evaluation integrity.
Figure. Semantic network analysis of global English news.
L Xian, L Li, Y Xu, BZ Zhang, L Hemphill
AAAI ICWSM 2024
This study analyzed 24,827 English news articles (2018-2023) examining global media coverage of generative AI using BERTopic modeling and RoBERTa sentiment analysis. Our coverage analysis identified four key topics: business, corporate tech development, regulation/security, and education, with spikes during major AI developments. Business articles showed positive sentiment, while regulation/security coverage was more neutral-negative. This study provides insights into news discourse patterns around emerging technologies.
Figure. Classification performance as compared to MTurker annotations.
L Li, L Fan, S Atreja, L Hemphill
ACM Transactions on the Web
This study evaluated ChatGPT's ability to detect Hateful, Offensive, and Toxic (HOT) content on social media compared to MTurk annotations. Using five prompts across four experiments, we found that ChatGPT achieved 80% accuracy, showing better consistency in identifying non-HOT content. The model interpreted "hateful" and "offensive" as subsets of "toxic" content. Prompt design influenced performance. Our findings suggest ChatGPT could help moderate social media content while reducing human exposure to harmful material, though performance varies based on prompt configuration.