A comprehensive collection of APIs, datasets, tools, repositories, and research papers for patent data mining and information retrieval.
Programmatic access to patent databases and search engines
Downloadable patent corpora for training, benchmarking, and large-scale analysis
| Dataset | Access | Description | Source |
|---|---|---|---|
| PatentsView Granted Patent Data | Free | 35 files covering all US patents from 1976 to present including inventors, assignees, citations, and technology categories, updated quarterly | patentsview.org |
| USPTO Patent Grant Bulk XML | Free | Weekly XML files of all granted patents with full bibliographic data including title, abstract, claims, and assignee | data.uspto.gov |
| USPTO Patent Application Bulk XML | Free | Weekly XML files of all published patent applications containing full text, bibliographic data, and drawings, published every Thursday | data.uspto.gov |
| USPTO Office Actions Dataset | Free | Full text of all public office actions from 2008 to present as weekly JSON files, useful for examining rejection patterns and prior art citations | data.uspto.gov |
| USPTO AI Patent Dataset (AIPD) | Free | 13M+ patents classified across 8 AI technology components using ML models, covering 1976 to 2020 | uspto.gov |
| Lens.org Bulk Data | Free for researchers | 140M+ patents across 100+ jurisdictions available for bulk download for academic and research use | lens.org |
| BigPatent Dataset | Free | 1.3M US patent documents with human written abstracts across 9 CPC sections, widely used for NLP training and summarization benchmarking | huggingface.co |
| PatEx Patent Examination Dataset | Free | 13M+ patent applications with full prosecution history, continuation data, claims of foreign priority, and examination details | data.uspto.gov |
| CLEF-IP Prior Art Dataset | Free | Most widely used prior art retrieval benchmark with 2000 patent queries and human relevance judgments for evaluation | ir.nist.gov |
| PatentsView Pre-Grant Publication Data | Free | 25 files covering all US pre-grant publications from 2001 to present including applicants, assignees, inventors, and technology categories | patentsview.org |
| USPTO Patent Assignment Data | Free | 10M+ patent assignments and ownership transactions recorded at USPTO since 1970 involving 17M+ patents and applications | data.uspto.gov |
| USPTO Patent Maintenance Fee Data | Free | Recorded maintenance fee events for all patents granted from 1981 to present, updated weekly every Tuesday | data.uspto.gov |
| PatentsView Long Text Data | Free | Annual files of full patent long text including summary, claims, detailed description, and drawing description from 1976 to present | patentsview.org |
| Google Patents Public Data (BigQuery) | Free tier | Full text of patents from 17 patent offices as structured BigQuery tables, useful for large scale analysis | cloud.google.com |
| WIPO PATENTSCOPE Bulk Data | Free | Bulk access to PCT international patent applications and national collections across 100+ countries in structured XML format | wipo.int |
Open-source tools and libraries for patent analysis and retrieval
Patent Quality and AI — semantic prior art search engine using dense neural retrieval over a global patent corpus. Core engine behind the PQAI platform.
Free and commercial platforms for searching global patent literature
Official websites of national and international patent offices worldwide
Curated academic papers on patent NLP, classification, retrieval, and prior art search
Kumaravel and Sankaranarayanan
Setchi et al.
Giczy et al.
Freunek and Bodmer
Siddharth and Luo
Nakamitsu et al.
Krestel et al.
Chikkamath et al.
Alderucci and Ashley
Dear PQAI Team,
We are pleased to express our support for PQAI and its mission to revolutionize patent searching through open-source, AI-driven solutions.
At [COMPANY NAME], we recognize the importance of accessible and efficient patent tools in fostering innovation and empowering inventors from diverse backgrounds. By supporting PQAI, we aim to contribute to the development of transparent, collaborative, and impactful solutions for the intellectual property community.
We kindly request the addition of [COMPANY NAME] to the official List of Supporters of PQAI.
Sincerely,
[CEO or Equivalent Name]
[Title]
[Company Name]
[Signature]