Natural Language Datasets

We are not at a loss for data, but for manpower to pursue exploring it! While this list is not comprehensive, here is an overview of some of our Natural Language Datasets:

  • 4.4 million narrative radiology reports from Stanford
  • 1 million narrative radiology reports from 3 other institutions
  • 430,000 radiology reports identified as normal by the interpreting radiologist
  • 110,000 chest CT reports annotated for presence/absence of pulmonary embolism
  • 150 chest CT reports with all concepts annotated

Please reach out to the lab if you would like to learn more or collaborate.