The Evolution of Data Access Tools for Patents
The Patent Office has many restrictions on the information it can disclose. In fact, through the 1990s, prior art searches could only be conducted in the Patent Office’s library. And the early version of the Patent Office’s database only used patent classification codes and did not allow full-text searching.
Things started to change in the early 2000s. The U.S. began publishing pending applications in 2001, opening the door to a wealth of information for inventors, litigators, and patent prosecutors. Gone were the days of hiring agents near the Patent Office to conduct your patent searches and pull your file wrappers. Instead, anyone with some training and patience could use the Patent Office’s database to obtain patent and patent application data.
We have now reached the next stage in the evolution of patent data tools. Developers have identified the strengths and weaknesses in the Patent Office’s interface. They have created proprietary and open-source tools to obtain, clean, and visualize patent data.
Using the Power of Open-Source Tools to Transform the IP Landscape
Open-source tools have several advantages over proprietary approaches, including:
- Free to use.
- Open access to source code.
- Available for developers to change or incorporate the source code into new tools.
- Royalty-free distribution or redistribution.
- New business models for corporates.
These attributes encourage developers to adopt a standard instead of creating separate approaches. It also speeds development by allowing developers to “stand on the shoulders” of their predecessors.
Obtaining Patent Data
Formerly, patent data was located in silos in the Patent Office. However, for the past 20 years, the Patent Office has made this information available. Some of the open-source tools designed to obtain patent data include:
PQAI stands for Patent Quality through Artificial Intelligence. This library of patent-related tools provides a next-generation prior art search engine. This search engine evaluates the search results and returns the top ten prior art references. In addition, the search engine trains itself to determine which results to return based on historical patent examination records.
PQAI promises to transform the IP landscape for inventors/enterprises, patent attorneys, and even patent examiners by delivering higher-quality search results. Conducting a search and reviewing mountains of search results takes time. Since PQAI only provides the most relevant prior art references, it provides more accurate, faster, and cheaper patent searches.
PQAI was initiated by the Georgia Intellectual Property Alliance (GIPA) and AT&T. The algorithm was contributed by GreyB, and InspireIP manages the application. As an open-source application, developers continue to improve PQAI. To review PQAI or contribute, you can access the files on PQAI’s GitHub.
The initiators of PatZilla call it “a modular patent information research platform and data integration toolkit.” Its primary feature is a search engine that pulls prior art references from the European Patent Office’s database. It also pulls from DEPATISnet, CLAIMS Direct, and depa.tech. In addition, PatZilla provides pdf, image, bibliographic data, and full test acquisition from these services.
PatZilla’s contributions to the evolving IP landscape include:
- A user interface that allows efficient screening of multiple references.
- Web-based collaboration for information sharing.
- Adaptable API for integration into third-party systems.
phpIP manages and dockets patents and other IP rights. The software was designed for inventors, enterprises, and IP law firms.
The system’s initiators sought to develop a software package that was flexible and easy to use. Unfortunately, most alternative packages were complicated and provided more features than necessary. As a result, most users paid for features they did not need and could not use the features they wanted.
phpIP was built on open-source software. It is changing the IP landscape by providing intuitive docketing and patent management tool. Notably, users can adapt the system to their specific needs. As they do, they can contribute to the overall improvement of the system.
You can view the documentation and source code files at phpIP’s Github.
Cleaning Patent Data | Open Refine
Not every user who works with patent data will need to clean it. But occasionally, you will have a large file of patent or patent application data that does not have the correct format for your use.
In the past, users have relied on Excel or Open Office to clean data. But this often requires the user to manually fix each cell or have the programming knowledge to write a macro to fix the cells automatically.
Open Refine is a tool that automates patent data cleaning. It is an open-source tool that Metaweb Technologies, Inc, developed. It was acquired by Google and released for open use in October 2012.
Open Refine provides automated data cleaning functions that can be applied to large patent data files. Some of the features that apply to patent data cleaning include:
- Reformatting dates.
- Separating inventors into different cells.
- Repairing corrupted or missing characters.
This tool can improve the speed and accuracy of the review, analysis, and storing of patent data. To contribute to Open Refine, visit the GitHub page.
Visualizing Patent Data
Visualizations can help identify trends or patterns in the massive amount of patent data that may relate to your project. For example, you might benefit from a visualization of when patent applications were filed or which countries they were filed in.
Until recently, you would need to comb through a spreadsheet to spot patterns in the data. Now, there are tools to turn patent data into visualizations, including:
Gephi is a network visualization platform that can create graphs showing relationships between patents or patent applications. It is an open-source application that is free to use. Association Gephi authored the software, but many developers have contributed to it.
Gephi can convert CSV or Excel files into data visualizations. This means you can import a file from The Lens or a cleaned file from Open Refine (both discussed above). Gephi will then create a visualization of the data.
This will change the IP landscape by revealing obscure or hidden patterns in the data. For example, you can visualize the number of pending applications in the data file that belong to each assignee.
To view the source code for Gephi or participate in its development, visit the Gephi GitHub page.
Plotly Chart Studio is an open-source platform that can be used to create interactive graphics. The open-source version of Plotly is cloud-based. This version is free to use. Plotly also offers enterprise versions for a fee.
Plotly creates graphs from data files generated through The Lens or Open Refine. Like Gephi, Plotly can help spot trends or patterns in the data. But unlike Gephi, the graphs in Plotly were designed to be interactive and shareable. This makes Plotly a valuable collaborative tool that will alter the IP landscape.
Plotly was developed by Plotly Technologies, Inc. You can help develop Plotly by reviewing the source code and documentation on Plotly’s GitHub.
The open-source nature of these tools almost guarantees that they will continue to develop and improve. To be a part of these opportunities, you can either use the software and provide feedback or you can collaborate with the developers to identify and create new features for these applications.
Get in touch with Sam Zellner, project lead for PQAI, to explore collaboration opportunities with PQAI.