As real-world data (RWD) evolves, the rise of unstructured data—from clinical notes to lab reports—is transforming how life sciences teams identify, understand, and engage with patients and providers. These data sources offer rich, contextual insights into patient experiences, disease progression, and physician decision-making that traditional structured datasets often miss.
At a recent industry event, I sat down with Joanne Tsai, Director and Team Lead for Oncology at Pfizer, Lance Wolkenbrod, Senior Principal, RWD Solutions and Madeline Naylor, Chief Clinician, RWD at Norstella spoke about how unstructured data is helping uncover hidden patient populations, enhance physician targeting, and bridge the gap between commercial strategy and clinical reality.
Q: How has the use of unstructured data evolved in the past few years, and why is it so important for understanding patient populations?
Lance Wolkenbrod: For a long time, we relied solely on structured data—ICD-10 codes, CPT codes, drug treatment codes—to understand patients. That information was useful for tracking reimbursement, but it didn’t fully reflect what was happening in the clinic. As electronic medical records became more robust, we gained access to unstructured clinical notes—what physicians actually write about their patients. These notes reveal the physician’s thought process, patient symptoms, and treatment rationale. Mining that kind of data used to be incredibly difficult, but with advances in AI and natural language processing (NLP), we can now extract meaningful insights that simply weren’t accessible before.
Q: Joanne, how is Pfizer using unstructured data today to uncover patient populations and support commercial efforts?
Joanne Tsai: In my role, I sit within the commercial group, working closely with brand leads and sales leadership. We don’t access the raw unstructured notes directly, but we rely on Norstella’s analytics teams to derive key insights from them, such as biomarker status. For example, identifying whether a patient is BRAF-positive or ALK-positive helps us refine our market share estimates and performance tracking, since this information doesn’t exist in claims data.
We also use unstructured data to generate lab alerts for our field teams. These alerts help identify physicians who are treating patients who’ve tested positive for specific biomarkers—so our reps can engage at the right moment. For rare or low-incidence conditions, this kind of signal is incredibly valuable. In some of our oncology brands, only 3–5% of patients meet certain criteria, so finding those patients through traditional data alone is nearly impossible. Field teams see these alerts as actionable leads that improve engagement with physicians and ultimately help patients access the right therapies sooner.
Q: From a clinical perspective, what makes unstructured data so powerful for identifying hidden patients?
Madeline Naylor: Unstructured data provides the clinical richness that structured data lacks. When a physician dictates a clinical note, they include details you’d never see in claims—symptoms, imaging findings, lab values, even over-the-counter medications. In one paragraph of a note, you might capture the entire clinical picture: what the patient presented with, how the doctor assessed them, what treatments were initiated, and when follow-ups are scheduled.
For conditions like oncology or neurology, that context is invaluable. If we can see how and when patients are presenting—before a formal diagnosis—we can identify patterns earlier and help teams reach those patients sooner. That’s the power of unstructured data: it fills in the gaps of the patient journey, helping us move from just understanding what is happening to understanding why it’s happening.
Q: What are some of the main challenges of working with unstructured data?
MN: The biggest challenge is scale and transformation. We have more than 600 different note types within our EMR data—everything from prior authorization notes to genetic counseling, nurse consults, imaging, and biopsy reports. That’s an incredible asset, but it only becomes useful when we can structure and interpret it effectively.
We’ve made great strides with AI and NLP to translate that unstructured data into actionable insights, but it’s still an evolving process. It requires constant iteration, close collaboration with clients, and continuous learning about what works best for different use cases.
JT: From a commercial perspective, data literacy is another big challenge. Many teams are comfortable with claims data but unfamiliar with EMR data, especially unstructured sources. It takes time and investment to educate teams on what this data can do and how to interpret it responsibly. It also takes time to build the right NLP models—our team spent six months developing a reliable market share model based on unstructured data. The results were worth it, but it’s important to set expectations around the learning curve.
Q: What new opportunities does unstructured data open up beyond patient identification?
MN: We’re now using unstructured data not just to find patients, but to understand the why behind physician behaviors. Why are doctors prescribing one drug over another? Why aren’t certain physicians referring patients to clinical trials? When we analyze notes, we uncover reasons—like perceived side effects or access barriers—that weren’t visible before.
This insight also allows us to develop physician profiles and referral networks. We can map who’s referring patients into trials, who’s prescribing early, and how care teams are interconnected. That understanding supports both clinical trial recruitment and commercial targeting strategies.
JT: Exactly. On our side, we’re using this kind of insight to refine dynamic targeting. For example, by sequencing patient encounters, we can identify not just the prescribing physician, but everyone involved in that patient’s care—NPs, PAs, or other specialists who influence treatment decisions. That helps our field teams reach the full network of decision-makers, not just the primary prescriber.
Q: Looking ahead, how do you see unstructured data shaping the future of RWD and commercial strategy?
LW: We’re at a turning point where unstructured data is helping us move from retrospective analysis to real-time insights. As models become faster and more precise, we can use these signals to identify emerging patient populations, optimize engagement strategies, and support better access to therapies. The goal isn’t just more data—it’s more meaningful data that drives better outcomes for patients and smarter decisions for the industry.