Top AI-Based Research Papers on Prior Art Search

Top AI Based Research Papers on Prior Art Search

Since searchable digital patent databases gained widespread acceptance, the IP industry has been greatly interested in moving beyond basic, keyword-based searching. Numerous startups and established companies have tried their hand at the problem of AI-powered patent searches. With each attempt, they have developed multiple techniques.

True AI search that goes beyond keyword matching to understand what you are looking for  has remained elusive for decades. Fortunately, a new wave of progress has risen in the last few years as a result of the foundational developments in natural language processing using deep neural networks.

Early developments were triggered by the popularization of word embedding techniques around 2014. This technique of word representation allows words with similar meanings to have a similar representation. Word embeddings are a precursor to a new era in patent searching. This technique on its own, however, is insufficient for true AI search.

From 2015 to 2017, contextualized word embeddings and various techniques for creating sentence embeddings were invented. Then, in 2018, transformer-based language models such as BERT significantly advanced things. BERT (Bidirectional Encoder Representations from Transformers) better understands the nuances and context of words in search queries and can match those queries with more relevant results.

Google also released a custom version of its BERT model trained on patent data. Parallel advancements in AI training, such as contrastive learning and learning-to-rank techniques, have enabled researchers to develop more robust and entirely new ways of searching patents.

Today is an exciting time in the history of patent search technology, with an improved AI appearing on the research landscape every couple of weeks. These advances will undoubtedly make it easier for people, both professional searchers and inventors, to explore and make sense of the patent data.

But what about you? What will make your research less time-consuming?

PQAI has taken the onus to constantly update our project, in which we are creating a one-stop online repository of all the research papers in the field of AI-based patent search.

Moreover, we’ll notify you when a new paper is published to keep you updated with the latest advancements in the industry. This way, you will have access to all the previous and upcoming information available for your research/development without having a flood of emails stuffing your inbox.

For your convenience, all the papers published since 2004 have been listed below (latest first) with  author names, the year published, and download links.

Title

A Two-Stage Deep Learning-Based System For Patent Citation Recommendation

Author(s)

Choi Jaewoong, Lee Jiho, Yoon Janghyeok, Jang Sion, Jaeyoung Kim, Sungchul Choi

Year of Publishing

2022

Published On

Springer

Affiliation

Pukyong National University, Netmarble AI Center, VUNO INC

PDF

Link 

Abstract

The increasing number of patents leads patent applicants and examiners to spend more time and cost on searching and citing prior patents. Deep learning has exhibited outstanding performance in the recommendation of movies, music, products, and paper citation. However, the application of deep learning in patent citation recommendation has not been addressed well. Despite many attempts to apply deep learning models to the patent domain, there is little attention to the patent citation recommendation. Since patent citation is determined according to a complex technological context beyond simply finding semantically similar preceding documents, it is necessary to understand the context in which the citation occurs. Therefore, we propose a dataset named as a PatentNet to capture technological citation context based on textual information, meta data and examiner citation information for about 110,000 patents. Also, this paper proposes a strong benchmark model considering the similarity of patent text as well as technological citation context using cooperative patent classification (CPC) code. The proposed model exploits a two-stage structure of selecting based on textual information and pre-trained CPC embedding values and re-ranking candidates using a trained deep learning model with examiner citation information. The proposed model achieved improved performance with an MRR of 0.2506 on the benchmarking dataset, outperforming the existing methods. The results obtained show that learning about the descriptive citation context, rather than simple text similarity, has an important influence on citation recommendation. The proposed model and dataset can help researchers to understand technological citation context and assist patent examiners or applicants to find prior patents to cite effectively.

Title

Multi-Document Summarization For Patent Documents Based On Generative Adversarial Network

Author(s)

Sunhye Kim, Byungun Yoon

Year of Publishing

2022

Published On

Science Direct

Affiliation

Dongguk University

PDF

Link 

Abstract

Given the exponential growth of patent documents, automatic patent summarization methods to facilitate the patent analysis process are in strong demand. Recently, the development of natural language processing (NLP), text-mining, and deep learning has greatly improved the performance of text summarization models for general documents. However, existing models cannot be successfully applied to patent documents, because patent documents describing an inventive technology and using domain-specific words have many differences from general documents. To address this challenge, we propose in this study a multi-patent summarization approach based on deep learning to generate an abstractive summarization considering the characteristics of a patent. Single patent summarization and multi-patent summarization were performed through a patent-specific feature extraction process, a summarization model based on generative adversarial network (GAN), and an inference process using topic modeling. The proposed model was verified by applying it to a patent in the drone technology field. In consequence, the proposed model performed better than existing deep learning summarization models. The proposed approach enables high-quality information summary for a large number of patent documents, which can be used by R&D researchers and decision-makers. In addition, it can provide a guideline for deep learning research using patent data.

Title

A Survey On Sentence Embedding Models Performance For Patent Analysis

Author(s)

Hamid Bekamiri, Daniel S. Hain, Roman Jurowetzki

Year of Publishing

2022

Published On

ArXiv

Affiliation

Cornell University

PDF

Link  

Abstract

Patent data is an important source of knowledge for innovation research, while the technological similarity between pairs of patents is a key enabling indicator for patent analysis. Recently researchers have been using patent vector space models based on different NLP embeddings models to calculate the technological similarity between pairs of patents to help better understand innovations, patent landscaping, technology mapping, and patent quality evaluation. More often than not, Text Embedding is a vital precursor to patent analysis tasks. A pertinent question then arises: How should we measure and evaluate the accuracy of these embeddings? To the best of our knowledge, there is no comprehensive survey that builds a clear delineation of embedding models’ performance for calculating patent similarity indicators. Therefore, in this study, we provide an overview of the accuracy of these algorithms based on patent classification performance and propose a standard library and dataset for assessing the accuracy of embeddings models based on PatentSBERTa approach. In a detailed discussion, we report the performance of the top 3 algorithms at section, class, and subclass levels. The results based on the first claim of patents show that PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings have the best accuracy for computing sentence embeddings at the subclass level. According to the first results, the performance of the models in different classes varies, which shows researchers in patent analysis can utilize the results of this study to choose the best proper model based on the specific section of patent data they used.

Title

End To End Neural Retrieval For Patent Prior Art Search

Author(s)

Vasileios Stamatis

Year of Publishing

2022

Published On

Springer

Affiliation

International Hellenic University

PDF

Link 

Abstract

This research will examine neural retrieval methods for patent prior art search. One research direction is the federated search approach, where we proposed two new methods that solve the results merging problem in federated patent search using machine learning models. The methods are based on a centralized index containing samples of documents from all potential resources, and they implement machine learning models to predict comparable scores for the documents retrieved by different resources. The other research direction is the adaptation of end-to-end neural retrieval approaches to the patent characteristics such that the retrieval effectiveness will be increased. Off-the-self neural methods like BERT have lower effectiveness for patent prior art search. So, we adapt the BERT model to patent characteristics in order to increase retrieval performance. We propose a new gate-based document retrieval method and examine it in patent prior art search. The method combines a first-stage retrieval method using BM25 and a re-ranking approach where the BERT model is used as a gating function that operates on the BM25 score and modifies it according to the BERT relevance score. These experiments are based on two-stage retrieval approaches as neural models like BERT requires lots of computing power to be used. Eventually, the final part of the research will examine first-stage neural retrieval methods such as dense retrieval methods adapted to patent characteristics for prior art search.

Title

A Doc2vec And Local Outlier Factor Approach To Measuring The Novelty Of Patents

Author(s)

Daeseong Jeon, Joon MoAhn, Juram Kim, Changyong Lee

Year of Publishing

2022

Published On

Science Direct

Affiliation

Ulsan National Institute of Science and Technology, Korea University, Korea Institute of Science and Technology Information, Sogang University

PDF

Link 

Abstract

Patent analysis using text mining techniques is an effective way to identify novel technologies. However, the results of previous studies have been of limited use in practice because they require domain-specific knowledge and reflect the limited technological features of patents. As a remedy, this study proposes a machine learning approach to measuring the novelty of patents. At the heart of this approach are doc2vec to represent patents as vectors using textual information of patents and the local outlier factor to measure the novelty of patents on a numerical scale. A case study of 1,877 medical imaging technology patents confirms that our novelty scores are significantly correlated with the relevant patent indicators in the literature and that the novel patents identified have a higher technological impact on average. It is expected that the proposed approach could be useful as a complementary tool to support expert decision-making in identifying new technology opportunities, especially for small and medium-sized companies with limited technological knowledge and resources.

Title

Deep Learning For Patent Landscaping Using Transformer And Graph Embedding

Author(s)

Seokkyu Choi, Hyeonju Lee, Eunjeong Park, Sungchul Choi

Year of Publishing

2022

Published On

Science Direct

Affiliation

Gachon University, Industrial Application R&D Institute, Seoul National University, Seoul National University

PDF

Link 

Abstract

Patent landscaping is used to search for related patents during research and development projects. Patent landscaping is a crucial task required during the early stages of an R & D project to avoid the risk of patent infringement and to follow current trends in technology. The first task of patent landscaping is to extract the target patent for analysis from a patent database. Because patent classification for patent landscaping requires advanced human resources and can be tedious, the demand for automated patent classification has gradually increased. However, a shortage of well-defined benchmark datasets and comparable models makes it difficult to find related research studies. This paper proposes an automated patent classification model for patent landscaping based on transformer and graph embedding, both of which are drawn from deep learning. The proposed model uses a transformer architecture to derive text embedding from patent abstracts and uses a graph neural network to derive graph embedding from classification code co-occurrence information and concatenates them. Furthermore, we introduce four benchmark datasets to compare related research studies on patent landscaping. The obtained results showed prominent performance that was actually applicable to our dataset and comparable to the model using BERT, which has recently shown the best performance.

Title

Establish A Patent Risk Prediction Model For Emerging Technologies Using Deep Learning And Data Augmentation

Author(s)

Yung-Chang Chi, Hei-Chia Wang

Year of Publishing

2022

Published On

Science Direct

Affiliation

National Cheng Kung University, National Cheng Kung University

PDF

Link 

Abstract

Technology patents are considered the source and bedrock of emerging technologies. Patents create value in any enterprise. However, obtaining patents is time consuming, expensive, and risky; especially if the patent application is rejected. The development of new patents requires extensive costs and resources, but sometimes they may be similar to other patents once the technology is fully developed. They might lack relevant patentable features and as a result, fail to pass the patent examination, resulting in investment losses. Patent infringement is also an especially important topic for reducing the risk of legal damages of patent holders, applicants, and manufacturers. Patent examinations have so far been performed manually. Due to manpower and time limitations, the examination time is exceedingly long and inefficient. Current patent similarity comparison research, and the classification algorithms of text mining are most commonly employed to provide analyses of the possibility of examination approval, but there is insufficient discussion about the possibility of infringement. However, if a new technology or innovation can be accurately determined in advance whether it likely to pass or fail (and why), or is at risk of patent infringement, losses can be mitigated.


This research attempts to identify the issues involved in evaluating patent applications and infringement risks from existing patent databases. For each patent application, this research uses Convolutional Neural Networks, CNN + Long Short Term Memory Network, LSTM, prediction model, and the United States Patent and Trademark Office (USPTO) public utility patent application and reviews results based on keyword search. Then, data augmentation is utilized before performing model training; 10% of the approved and rejected applications are randomly selected as test cases, with the remaining 90% of the cases used to train the prediction model of this research in order to determine a model that can predict patent infringement and examination outcomes. Experimental results of the model in this study predicts that the accuracy of each classification is at least 87.7%, and can be used to find the classification of the reason for a rejection of a patent application failure.

Title

Pre-Trained Transformer-Based Classification For Automated Patentability Examination

Author(s)

Hao-Cheng Lo, Jung-Mei Chu

Year of Publishing

2021

Published On

IEEE

Affiliation

National Taiwan University

PDF

Link 

Abstract


Patentability examination, which means checking whether claims of a patent application meet the requirements for being patentable, is highly reliant on experts’ arduous endeavors entailing domain knowledge. Therefore, automated patentability examination would be the immediate priority, though under-appreciated. In this work, being the first to cast deep-learning light on automated patentability examination, we formulate this task as a multi-label text classification problem, which is challenging due to learning cross-sectional characteristics of abstract requirements (labels) from text content replete with inventive terms. To address this problem, we fine-tune downstream multi-label classification models over pre-trained transformer variants (BERT-BaseLarge, RoBERTa-BaseLarge, and XLNet) in light of their state-of-the-art achievements on many tasks. On a large USPTO patent database, we assess the performance of our models and find the model outperforming others based on the metrics, namely micro-precision, micro-recall, and micro-F1.

Title

PatentNet: Multi-Label Classification Of Patent Documents Using Deep Learning Based Language Understanding

Author(s)

Arousha Haghighian Roudsari, Jafar Afshar, Wookey Lee, Suan Lee 

Year of Publishing

2021

Published On

Springer

Affiliation

Inha University, Semyung University

PDF

Link 

Abstract

Patent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.

Title

PatentSBERTa: A Deep NLP Based Hybrid Model For Patent Distance And Classification Using Augmented SBERT

Author(s)

Hamid Bekamiri, Daniel S. Hain, Roman Jurowetzki

Year of Publishing

2021

Published On

ArXiv

Affiliation

Aalborg University Business School

PDF

Link 

Abstract

This study provides an efficient approach for using text data to calculate patent-to-patent (p2p) technological similarity, and presents a hybrid framework for leveraging the resulting p2p similarity for applications such as semantic search and automated patent classification. We create embeddings using Sentence-BERT (SBERT) based on patent claims. To further increase the patent embedding quality, we use transformer models based on SBERT and RoBERT, and apply the augmented approach for fine-tuning SBERT by in-domain supervised patent claims data. We leverage SBERTs efficiency in creating embedding distance measures to map p2p similarity in large sets of patent data. We deploy our framework for classification with a simple Nearest Neighbors (KNN) model that predicts Cooperative Patent Classification (CPC) of a patent based on the class assignment of the K patents with the highest p2p similarity. We thereby validate that the p2p similarity captures their technological features in terms of CPC overlap, and at the same demonstrate the usefulness of this approach for automatic patent classification based on text data. Furthermore, the presented classification framework is simple and the results easy to interpret and evaluate by end-users. In the out-of-sample model validation, we are able to perform a multi-label prediction of all assigned CPC classes on the subclass (663) level on 1,492,294 patents with an accuracy of 54% and F1 score > 66%, which suggests that our model outperforms the current state-of-the-art in text-based multi-label and multi-class patent classification. We furthermore discuss the applicability of the presented framework for semantic IP search, patent landscaping, and technology intelligence. We finally point towards a future research agenda for leveraging multi-source patent embeddings, their appropriateness across applications, as well as to improve and validate patent embeddings by creating domain-expert curated Semantic Textual Similarity (STS) benchmark datasets.

Title

A Multi-Task Approach To Neural Multi-Label Hierarchical Patent Classification Using Transformers

Author(s)

Subhash Chandra Pujari, Annemarie Friedrich, Jannik Strotgen ¨

Year of Publishing

2021

Published On

Github

Affiliation

Bosch Center for Artificial Intelligence, Heidelberg University

PDF

Link 

Abstract

With the aim of facilitating internal processes as well as search applications, patent offices categorize documents into taxonomies such as the Cooperative Patent Categorization. This task corresponds to a multi-label hierarchical

text classification problem. Recent approaches based on pre-trained neural language models have shown promising performance by focusing on leaf-level label

prediction. Prior works using intrinsically hierarchical algorithms, which learn a

separate classifier for each node in the hierarchy, have also demonstrated their effectiveness despite being based on symbolic feature inventories. However, training one transformer-based classifier per node is computationally infeasible due

to memory constraints. In this work, we propose a Transformer-based Multi-task

Model (TMM) overcoming this limitation. Using a multi-task setup and sharing

a single underlying language model, we train one classifier per node. To the best

of our knowledge, our work constitutes the first approach to patent classification combining transformers and hierarchical algorithms. We outperform several

non-neural and neural baselines on the WIPO-alpha dataset as well as on a new

dataset of 70k patents, which we publish along with this work. Our analysis reveals that our approach achieves much higher recall while keeping precision high.

Strong increases on macro-average scores demonstrate that our model also performs much better for infrequent labels. An extended version of the model with

additional connections reflecting the label taxonomy results in a further increase

of recall especially at the lower levels of the hierarchy.

Title

PQPS: Prior-Art Query-Based Patent Summarizer Using RBM And Bi-LSTM

Author(s)

Girthana Kumaravel, Swamynathan Sankaranarayanan

Year of Publishing

2021

Published On

Hindawi

Affiliation

Anna University

PDF

Link 

Abstract

A prior-art search on patents ascertains the patentability constraints of the invention through an organized review of prior-art document sources. This search technique poses challenges because of the inherent vocabulary mismatch problem. Manual processing of every retrieved relevant patent in its entirety is a tedious and time-consuming job that demands automated patent summarization for ease of access. This paper employs deep learning models for summarization as they take advantage of the massive dataset present in the patents to improve the summary coherence. This work presents a novel approach of patent summarization named PQPS: prior-art query-based patent summarizer using restricted Boltzmann machine (RBM) and bidirectional long short-term memory (Bi-LSTM) models. The PQPS also addresses the vocabulary mismatch problem through query expansion with knowledge bases such as domain ontology and WordNet. It further enhances the retrieval rate through topic modeling and bibliographic coupling of citations. The experiments analyze various interlinked smart device patent sample sets. The proposed PQPS demonstrates that retrievability increases both in extractive and abstractive summaries.

Title

Artificial Intelligence For Patent Prior Art Searching

Author(s)

Rossitza Setchi, Irena Spasić, Jeffrey Morgan, Christopher Harrison, Richard Corken

Year of Publishing

2021

Published On

Science Direct

Affiliation

Cardiff University, Intellectual Property Office UK

PDF

Link 

Abstract

This study explored how artificial intelligence (AI) could assist patent examiners as part of the prior art search process. The proof-of-concept allowed experimentation with different AI techniques to suggest search terms, retrieve most relevant documents, rank them and visualise their content. The study suggested that AI is less effective in formulating search queries but can reduce the time and cost of the process of sifting through a large number of patents. The study highlighted the importance of the humanin-the-loop approach and the need for better tools for human-centred decision and performance support in prior art searching.

Title

A Survey On Deep Learning For Patent Analysis

Author(s)

Ralf Krestel, Renukswamy Chikkamath, Christoph Hewel, Julian  Risch

Year of Publishing

2021

Published On

Research Gate

Affiliation

University of Potsdam, University of Passau, BETTEN & RESCH

PDF

Link 

Abstract

Patent document collections are an immense source of knowledge for research and innovation communities worldwide. The rapid growth of the number of patent documents poses an enormous challenge for retrieving and analyzing information from this source in an effective manner. Based on deep learning methods for natural language processing, novel approaches have been developed in the field of patent analysis. The goal of these approaches is to reduce costs by automating tasks that previously only domain experts could solve. In this article, we provide a comprehensive survey of the application of deep learning for patent analysis. We summarize the state-of-the-art techniques and describe how they are applied to various tasks in the patent domain. In a detailed discussion, we categorize 40 papers based on the dataset, the representation, and the deep learning architecture that were used, as well as the patent analysis task that was targeted. With our survey, we aim to foster future research at the intersection of patent analysis and deep learning and we conclude by listing promising paths for future work.

Title

Patent Sentiment Analysis To Highlight Patent Paragraphs

Author(s)

Renukswamy Chikkamath, Vishvapalsinhji Ramsinh Parmar, Christoph Hewel, Markus Endres

Year of Publishing

2021

Published On

ArXiv

Affiliation

University of Passau, BETTEN & RESCH

PDF

Link 

Abstract

Given a patent document, identifying distinct semantic annotations is an interesting research aspect. Text annotation helps the patent practitioners such as examiners and patent attorneys to quickly identify the key arguments of any invention, successively providing a timely marking of a patent text. In the process of manual patent analysis, to attain better readability, recognising the semantic information by marking paragraphs is in practice. This semantic annotation process is laborious and time-consuming. To alleviate such a problem, we proposed a novel dataset to train Machine Learning algorithms to automate the highlighting process. The contributions of this work are: i) we developed a multi-class, novel dataset of size 150k samples by traversing USPTO patents over a decade, ii) articulated statistics and distributions of data using imperative exploratory data analysis, iii) baseline Machine Learning models are developed to utilize the dataset to address patent paragraph highlighting task, iv) dataset and codes relating to this task are open-sourced through a dedicated GIT web page: https://github.com/Renuk9390/Patent_Sentiment_Analysis, and v) future path to extend this work using Deep Learning and domain specific pre-trained language models to develop a tool to highlight is provided. This work assist patent practitioners in highlighting semantic information automatically and aid to create a sustainable and efficient patent analysis using the aptitude of Machine Learning.

Title

Artificial Intelligence Technology Analysis Using Artificial Intelligence Patent Through Deep Learning Model And Vector Space Model

Author(s)

Yongmin Yoo, Dongjin Lim, Kyungsun Kim

Year of Publishing

2021

Published On

ArXiv

Affiliation

NHN Diquest

PDF

Link 

Abstract

Thanks to rapid development of artificial intelligence technology in recent years, the current artificial intelligence technology is contributing to many part of society. Education, environment, medical care, military, tourism, economy, politics, etc. are having a very large impact on society as a whole. For example, in the field of education, there is an artificial intelligence tutoring system that automatically assigns tutors based on student’s level. In the field of economics, there are quantitative investment methods that automatically analyze large amounts of data to find investment laws to create investment models or predict changes in financial markets. As such, artificial intelligence technology is being used in various fields. So, it is very important to know exactly what factors have an important influence on each field of artificial intelligence technology and how the relationship between each field is connected. Therefore, it is necessary to analyze artificial intelligence technology in each field. In this paper, we analyze patent documents related to artificial intelligence technology. We propose a method for keyword analysis within factors using artificial intelligence patent data sets for artificial intelligence technology analysis. This is a model that relies on feature engineering based on deep learning model named KeyBERT, and using vector space model. A case study of collecting and analyzing artificial intelligence patent data was conducted to show how the proposed model can be applied to real world problem

Title

Identifying Artificial Intelligence (AI) Invention: A Novel AI Patent Dataset

Author(s)

Alexander V. Giczy, Nicholas A. Pairolero, Andrew Toole

Year of Publishing

2021

Published On

SSRN

Affiliation

United States Patent and Trademark Office

PDF

Link 

Abstract

Artificial Intelligence (AI) is an area of increasing scholarly and policy interest. To help researchers, policymakers, and the public, this paper describes a novel dataset identifying AI in over 13.2 million patents and pre-grant publications (PGPubs). The dataset, called the Artificial Intelligence Patent Dataset (AIPD), was constructed using machine learning models for each of eight AI component technologies covering areas such as natural language processing, AI hardware, and machine learning. The AIPD contains two data files, one identifying the patents and PGPubs predicted to contain AI and a second file containing the patent documents used to train the machine learning classification models. We also present several evaluation metrics based on manual review by patent examiners with focused expertise in AI, and show that our machine learning approach achieves state-of-the-art performance across existing alternatives in the literature. We believe releasing this dataset will strengthen policy formulation, encourage additional empirical work, and provide researchers with a common base for building empirical knowledge on the determinants and impacts of AI invention.

Title

BERT Based Freedom To Operate Patent Analysis

Author(s)

Michael Freunek, André Bodmer

Year of Publishing

2021

Published On

ArXiv

Affiliation

University of Berne

PDF

Link 

Abstract

In this paper we present a method to apply BERT to freedom to operate patent analysis and patent searches. According to the method, BERT is fine-tuned by training patent descriptions to the independent claims. Each description represents an invention which is protected by the corresponding claims. Such a trained BERT could be able to identify or order freedom to operate relevant patents based on a short description of an invention or product. We tested the method by training BERT on the patent class G06T1/00 and applied the trained BERT on five inventions classified in G06T1/60, described via DOCDB abstracts. The DOCDB abstract are available on ESPACENET of the European Patent Office. 

Title

BERT Based Patent Novelty Search By Training Claims To Their Own Description

Author(s)

Michael Freunek, André Bodmer

Year of Publishing

2021

Published On

ArXiv

Affiliation

University of Berne

PDF

Link 

Abstract

In this paper we present a method to concatenate patent claims to their own description. By applying this method, BERT trains suitable descriptions for claims. Such a trained BERT (claim-to-descriptionBERT) could be able to identify novelty relevant descriptions for patents. In addition, we introduce a new scoring scheme, relevance scoring or novelty scoring, to process the output of BERT in a meaningful way. We tested the method on patent applications by training BERT on the first claims of patents and corresponding descriptions. BERT’s output has been processed according to the relevance score and the results compared with the cited X documents in the search reports. The test showed that BERT has scored some of the cited X documents as highly relevant.

Title

Prior Art Search And Reranking For Generated Patent Text 

Author(s)

Jieh-Sheng Lee, Jieh Hsiang

Year of Publishing

2021

Published On

ArXiv

Affiliation

National Taiwan University

PDF

Link  

Abstract

Generative models, such as GPT-2, have demonstrated impressive results recently. A fundamental question we would like to address is: where did the generated text come from? This work is our initial effort toward answering the question by using prior art search. The purpose of the prior art search is to find the most similar prior text in the training data of GPT-2. We take a reranking approach and apply it to the patent domain. Specifically, we pre-train GPT-2 models from scratch by using the patent data from the USPTO. The input for the prior art search is the patent text generated by the GPT-2 model. We also pre-trained BERT models from scratch for converting patent text to embeddings. The steps of reranking are: (1) search the most similar text in the training data of GPT-2 by taking a bag-of-words ranking approach (BM25), (2) convert the search results in text format to BERT embeddings, and (3) provide the final result by ranking the BERT embeddings based on their similarities with the patent text generated by GPT-2. The experiments in this work show that such reranking is better than ranking with embeddings alone. However, our mixed results also indicate that calculating the semantic similarities among long text spans is still challenging. To our knowledge, this work is the first to implement a reranking system to identify retrospectively the most similar inputs to a GPT model based on its output.

Title

An Empirical Study On Patent Novelty Detection: A Novel Approach Using Machine Learning And Natural Language Processing

Author(s)

Renukswamy Chikkamath; Markus Endres; Lavanya Bayyapu; Christoph Hewel

Year of Publishing

2021

Published On

IEEE

Affiliation

University of Passau, BETTEN und RESCH

PDF

Link 

Abstract

Patent, a form of intellectual property often be in the first place when it comes to securing an invention. The legal boundaries created then will become key stages of turning an invention into a commercial product. In recent years, the unprecedented growth of patent applications has induced a great challenge to patent examiners. Novelty detection is one major step considered before and after filing a patent application to assure claimed inventions are new and non-obvious. This itself is considered as a salient stage of prior art search by patent applicants, patent examiners, patent attorneys, patent agent professionals. Management in terms of critical analysis of such a large scale of documents has become a challenge since missing an optimal, effective, and efficient system. To this end, we come up with a novel experimental case study to foster highly recursive and interactive tasks. We developed and investigated more than 50 machine learning models on the considered dataset. The contributions of this work include: (1) outlined and anticipated the importance of novelty detection in the patent domain, (2) develop various baseline models for novelty detection, (3) utilize immense contributions of deep learning towards NLP to improve baseline models, (4) assess the performance of every model by using different word embeddings like word2vec, glove, fasttext, and domain-specific embeddings, (5) a novel application of NBSVM algorithm on our dataset, and considered as exceptionally good of our models. We articulated the fulfillment of models using training and validation curves to prove seemingly negligible overfit or no overfit, in the hope that effective automation in novelty detection helps in driving down the routine prior art search efforts.

Title

Identifying Valuable Patents: A Deep Learning Approach

Author(s)

Leonidas Aristodemou

Year of Publishing

2020

Published On

University of Cambridge Repository

Affiliation

University of Cambridge

PDF

Link 

Abstract

Big data is increasingly available in all areas of manufacturing, which presents value for enabling a competitive data-driven economy. Increased data availability presents an opportunity to introduce the next generation of innovative technologies. Firms invest in innovation and patents to increase, maintain and sustain competitive advantage. Consequently, the valuation of patents is a key determinant in economic growth since patents are an important innovation indicator. Given the surge in patenting throughout the world, the interest in the value of patents has grown significantly. Traditionally, studies on patent value have focused on limited data availability restricted to a specific technology area using methods such as regression, and mostly using numeric and binary categoric data types. We propose the definition for intellectual property intelligence (IPI) as the data science of analysing large amount of IP information, specifically patent data, with artificial intelligence (AI) methodologies to discover relationships and trends in the data for decision making. With the rise of AI and the ability to analyse larger datasets of patents, we develop an AI deep learning methodology for the valuation of patents. To do that, we build a large USPTO dataset consisting of all granted patents from 1976-2019: (i) we collect, clean, collate and pre-process all the data from the USPTO (and the OECD patent quality indicators database); (ii) we transform the data into numeric, categoric, and text features so that we are able to input them to the deep learning model. More specifically, we transform the text (abstract, claims, summary, title) into feature vectors using our developed Doc2Vec vector space model (VSM), that we assess using the t-distributed stochastic neighbour embedding (t-SNE) visualisation. The dataset is made publicly available for researchers to efficiently and effectively run fairly complex data analysis. We propose an AI deep learning methodology for the valuation of patents to identify valuable patents. Using our developed dataset, we build AI deep learning models, which are based on deep and wide feed-forward artificial neural networks (ANN), with dropout, L2 penalty and batch normalisation regularisation layers, to forecast the value of patents with 12 ex-post patent value output proxies. These include the grant_lag, generality, quality_index_4, and forward citations, generality_index and renewals in three time horizons (t4, t8, t12). We associate these patent value proxies to their respective patent value dimension (economic, strategic and technological). We forecast patent value using ex-ante patent value input determinants, for a wide range of technological areas (using the IPC classes), and time horizon domains (short term in t4, medium term in t8, and long term in t12). We evaluate all our models using a variety of strategies (out-of-time test, out-of-sample test, k-Fold and random split cross validation), and transparently report all metrics (accuracy, confusion matrix, F1-score, false negative rate, log loss, mean absolute error, precision, recall). Our models have higher accuracy and macro average F1-scores, with low values for the training and validation losses compared to prior art. With increasing prediction horizons, we observe an increase in the macro average F1-scores for several of the proxies. In addition, we find that the composite index that takes into consideration more than one value dimension, has the combined highest accuracy and macro average F1-score, relative to single value dimension patent proxies. Moreover, we find that firms seem to file widely at the short term time horizon and then focus their technological competencies to established opportunities. Patent owners seem to renew their patents in the fear of losing out. Our study has moved away from relatively small datasets, limited to specific technology field, and allowed for reproducibility in other fields. We can tailor models to different technology area, with different patent value proxies, with different time horizons. This study proposes an AI methodology, which is based on deep learning, using deep and wide feed forward artificial neural networks, to predict the value of patents, which has academic and industrial implications. We predict the value of patents with a variety of output proxies, including composite index proxies, for different technology areas (IPC classifications) and time horizons. Since we use all USPTO granted patents from 1976-2019 to train our models, we can apply this approach to patents in any technology field. Our approach enables researchers and industry professionals to value patents using a variety of patent value proxies, based on different value dimensions, tailored to specific technology areas. The proposed AI deep learning approach could effectively support expert decision making (technology, innovation and IP managers etc.) in their decision making by providing fast, low cost, data-driven intellectual property intelligence (IPI) from big patent data. Firms with limited resources, i.e. small-medium enterprises (SMEs) can choose representative proxies to forecast patent value estimates, saving resources. Consequently, the proposed approach could efficiently support experts in their patent value judgement, policy making in the government’s investments in technological sectors of the future to support the economy, and patent offices with the AI approaches to analyse efficiently and effectively big patent data. We anticipate this research would be interesting for future researchers to expand the emerging field of IPI research and the skills they will need to perform IPI data-driven research with a variety of data sources and AI deep learning ANN approaches.

Title

PatentMatch: A Dataset For Matching Patent Claims & Prior Art

Author(s)

Julian Risch, Nicolas Alder, Christoph Hewel, Ralf Krestel

Year of Publishing

2020

Published On

ArXiv

Affiliation

University of Potsdam, BETTEN & RESCH

PDF

Link 

Abstract

Patent examiners need to solve a complex information retrieval task when they assess the novelty and inventive step of claims made in a patent application. Given a claim, they search for prior art, which comprises all relevant publicly available information. This time-consuming task requires a deep understanding of the respective technical domain and the patent-domain-specific language. For these reasons, we address the computer-assisted search for prior art by creating a training dataset for supervised machine learning called PatentMatch. It contains pairs of claims from patent applications and semantically corresponding text passages of different degrees from cited patent documents. Each pair has been labeled by technically-skilled patent examiners from the European Patent Office. Accordingly, the label indicates the degree of semantic correspondence (matching), i.e., whether the text passage is prejudicial to the novelty of the claimed invention or not. Preliminary experiments using a baseline system show that PatentMatch can indeed be used for training a binary text pair classifier on this challenging information retrieval task. The dataset is available online: https://hpi.de/naumann/s/patentmatch

Title

Prior Art Search Using Multi-modal Embedding Of Patent Documents

Author(s)

Myungchul Kang, Suan Lee, Wookey Lee

Year of Publishing

2020

Published On

IEEE

Affiliation

Inha University

PDF

Link 

Abstract

Due to the limitations of the existing prior art search methods, a new patent search paradigm can be innovated by the concepts based on a precise patent document embedding, and a real-time feedback. These concepts can be achieved by the following ideas. The latest language model BERT can be incorporated with the description drawing embedding so that the explorable user interactive model can be adopted to the patent domain for “Building an artificial intelligent patent search system.” Therefore, these methodologies mainly with the help of deep learning can solve the traditional labor-intensive and time-consuming prior art search.

Title

Patent Prior Art Search Using Deep Learning Language Model

Author(s)

Dylan Myungchul Kang, Charles Cheolgi Lee, Suan Lee, Wookey Lee

Year of Publishing

2020

Published On

ACM

Affiliation

Inha University, VOICE AI Institute, 

PDF

Link 

Abstract

A patent is one of the essential indicators of new technologies and business processes, which becomes the main driving force of the companies and even the national competitiveness as well, that has recently been submitted and exploited in a large scale of quantities of information sources. Since the number of patent processing personnel, however, can hardly keep up with the increasing number of patents, and thus may have been worried about from deteriorating the quality of examinations. In this regard, the advancement of deep learning for the language processing capabilities has been developed significantly so that the prior art search by the deep learning models also can be accomplished for the labor-intensive and expensive patent document search tasks. The prior art search requires differentiation tasks, usually with the sheer volume of relevant documents; thus, the recall is much more important than the precision, which is the primary difference from the conventional search engines. This paper addressed a method to effectively handle the patent documents using BERT, one of the major deep learning-based language models. We proved through experiments that our model had outperformed the conventional approaches and the combinations of the key components with the recall value of up to ‘94.29%’ from the real patent dataset.

Title

Construction And Evaluation Of Gold Standards For Patent Classification—A Case Study On Quantum Computing

Author(s)

Steve Harris, Anthony Trippe, David Challis, Nigel Swycher

Year of Publishing

2020

Published On

Science Direct

Affiliation

Aistemos Ltd, Patinformatics LLC

PDF

Link 

Abstract

This article discusses options for evaluation of patent and/or patent family classification algorithms by means of “gold standards”. It covers the creation criteria, and desirable attributes of evaluation mechanisms, then proposes an example gold standard, and discusses the results of applying the evaluation mechanism against the proposed gold standard and an existing commercial implementation.

Title

Research On Classification And Similarity Of Patent Citation Based On Deep Learning 

Author(s)

Yonghe Lu, Xin Xiong, Weiting Zhang, Jiaxin Liu, Ruijie Zhao 

Year of Publishing

2020

Published On

Springer

Affiliation

Sun Yat-sen University

PDF

Link 

Abstract

This paper proposes a patent citation classification model based on deep learning, and collects the patent datasets in text analysis and communication area from Google patent database to evaluate the classification effect of the model. At the same time, considering the technical relevance between the examiners’ citations and the pending patent, this paper proposes a hypothesis to take the output value of the model as the technology similarity of two patents. The rationality of the hypothesis is verified from the perspective of machine statistics and manual spot check. The experimental results show that the model effect based on deep learning proposed in this paper is significantly better than the traditional text representation and classification method, while having higher robustness than the method combining Doc2vec and traditional classification technology. In addition, we compare between the proposed method based on deep learning and the traditional similarity method by a triple verification. It shows that the proposed method is more accurate in calculating technology similarity of patents. And the results of manual sampling show that it is reasonable to use the output value of the proposed model to represent the technology similarity of patents.

Title

Using AI To Analyze Patent Claim Indefiniteness 

Author(s)

Dean Alderucci, Kevin Ashley

Year of Publishing

2020

Published On

Indiana University Repository

Affiliation

Carnegie Mellon University, University of Pittsburgh

PDF

Link 

Abstract

We describe how to use artificial intelligence (AI) techniques to partially automate a type of legal analysis, determining whether a patent claim satisfies the definiteness requirement. Although fully automating such a high-level cognitive task is well beyond state-of-the-art AI, we show that AI can nevertheless assist the decision maker in making this determination. Specifically, the use of custom AI technology can aid the decision maker by (1) mining patent text to rapidly bring relevant information to the decision maker’s attention, and (2) suggesting simple inferences that can be drawn from that information.

We begin by summarizing the law related to patent claim indefiniteness. A summary of existing case law allows us to identify the types of information that can be relevant to the legal determination of indefiniteness. This in turn guides us in designing AI software that processes a patent’s text to extract information that can be relevant to the legal analysis of indefiniteness. Some types of relevant information include whether terms in a claim are defined in the patent, whether terms in a claim are not mentioned in the patent’s specification, whether the claim includes nonstandard terms coined by the drafter of the patent, whether the claim relies on vaguely-specified measurements, and whether the patent’s specification discloses structure corresponding to a means-plus-function limitation


The AI software rapidly processes a patent’s text and identifies information that is relevant to the legal analysis. The software then provides the human decision maker with this information as well as simple metrics and inferences, such as the percentage of claim terms that are defined explicitly or by example, and whether terms that are coined by the drafter should be defined or renamed. This can provide the user with insights about a patent much faster than if the user read the entirety of the patent to locate the same information unaided.


Moreover, the software can aggregate the various types of information to “score” a claim (e.g., from 0 to 100) based on its risk of being deemed indefinite. For example, a claim containing only defined terms and lacking any vague measurements would score much lower in terms of risk than a claim with terms that are not only undefined but do not even appear in the patent’s specification. Once each claim in a patent is assigned such an indefiniteness score, the patent itself can be given an overall indefiniteness score.


Scoring groups of patents in this manner has further advantages even if the scores are blunt measurements. AI software ranks a group of patents (e.g., all patents owned by a company) by indefiniteness scores. This allows a very large set of patents to be quickly searched for patents that have the highest, or lowest, indefiniteness score. The results of such a search could be, e.g., the patents to target for detailed review in litigation, post-grant proceedings, or licensing negotiations. Finally, we present some considerations for refining and augmenting the proposed methods for partially automating the indefiniteness analysis, and more broadly other types of legal analysis.

Title

Patent Document Clustering With Deep Embeddings

Author(s)

Jaeyoung Kim, Janghyeok Yoon, Eunjeong Park, Sungchul Choi  

Year of Publishing

2020

Published On

Springer

Affiliation

Gachon University, Konkuk University

PDF

Link 

Abstract

The analysis of scientific and technical documents is crucial in the process of establishing science and technology strategies. One popular method for such analysis is for field experts to manually classify each scientific or technical document into one of several predefined technical categories. However, not only is manual classification error-prone and expensive, but it also requires extended efforts to handle frequent data updates. In contrast, machine learning and text mining techniques enable cheaper and faster operations, and can alleviate the burden on human resources. In this paper, we propose a method for extracting embedded feature vectors by applying a neural embedding approach for text features in patent documents and automatically clustering the embedding features by utilizing a deep embedding clustering method.

Title

Optimizing Neural Networks For Patent Classification

Author(s)

Louay Abdelgawad, Peter Kluegl, Erdan Genc, Stefan Falkner, Frank Hutter

Year of Publishing

2020

Published On

Springer

Affiliation

Konkuk University, Albert-Ludwigs University of Freiburg, 

PDF

Link 

Abstract

A great number of patents is filed everyday to the patent offices worldwide. Each of these patents has to be labeled by domain experts with one or many of thousands of categories. This process is not only extremely expensive but also overwhelming for the experts, due to the considerable increase of filed patents over the years and the increasing complexity of the hierarchical categorization structure. Therefore, it is critical to automate the manual classification process using a classification model. In this paper, the automation of the task is carried out based on recent advances in deep learning for NLP and compared to customized approaches. Moreover, an extensive optimization analysis grants insights about hyperparameter importance. Our optimized convolutional neural network achieves a new state-of-the-art performance of 55.02% accuracy on the public Wipo-Alpha dataset.

Title

Engineering Knowledge Graph For Keyword Discovery In Patent Search 

Author(s)

Serhad Sarica, Binyang Song, En Low, Jianxi Luo

Year of Publishing

2019

Published On

Cambridge Univeristy Press

Affiliation

Singapore University of Technology and Design

PDF

Link 

Abstract

Patent retrieval and analytics have become common tasks in engineering design and innovation. Keyword-based search is the most common method and the core of integrative methods for patent retrieval. Often searchers intuitively choose keywords according to their knowledge on the search interest which may limit the coverage of the retrieval. Although one can identify additional keywords via reading patent texts from prior searches to refine the query terms heuristically, the process is tedious, time-consuming, and prone to human errors. In this paper, we propose a method to automate and augment the heuristic and iterative keyword discovery process. Specifically, we train a semantic engineering knowledge graph on the full patent database using natural language processing and semantic analysis, and use it as the basis to retrieve and rank the keywords contained in the retrieved patents. On this basis, searchers do not need to read patent texts but just select among the recommended keywords to expand their queries. The proposed method improves the completeness of the search keyword set and reduces the human effort for the same task.

Title

A Novelty Detection Patent Mining Approach For Analyzing Technological Opportunities

Author(s)

Juite Wang, Yi-Jing Chen

Year of Publishing

2019

Published On

Science Direct

Affiliation

National Chung Hsing University, 

PDF

Link 

Abstract

Early opportunity identification is critical for technology-based firms seeking to develop technology or product strategies for competitive advantage in the future. This research develops a patent mining approach based on the novelty detection statistical technique to identify unusual patents that may provide a fresh idea for potential opportunities. A natural language processing technique, latent semantic analysis, is applied to extract hidden relations between words in patent documents for alleviating the vocabulary mismatch problem and reducing the cumbersome efforts of keyword selection by experts. The angle-based outlier detection method, a novelty detection statistical technique, is used to determine outlier patents that are distinct from the majority of collected patent documents in a high-dimensional data space. Finally, visualization tools are developed to analyze the identified outlier patents for exploring potential technological opportunities. The developed methodology is applied in the telehealth industry and research findings can help telehealth firms formulate their technology strategies.

Title

TechNet: Technology Semantic Network Based On Pate

Author(s)

Serhad Sarica, Jianxi Luo, Kristin L. Wood

Year of Publishing

2019

Published On

ArXiv

Affiliation

Singapore University of Technology and Design

PDF

Link 

Abstract

The growing developments in general semantic networks, knowledge graphs and ontology databases have motivated us to build a large-scale comprehensive semantic network of technology-related data for engineering knowledge discovery, technology search and retrieval, and artificial intelligence for engineering design and innovation. Specially, we constructed a technology semantic network (TechNet) that covers the elemental concepts in all domains of technology and their semantic associations by mining the complete U.S. patent database from 1976. To derive the TechNet, natural language processing techniques were utilized to extract terms from massive patent texts and recent word embedding algorithms were employed to vectorize such terms and establish their semantic relationships. We report and evaluate the TechNet for retrieving terms and their pairwise relevance that is meaningful from a technology and engineering design perspective. The TechNet may serve as an infrastructure to support a wide range of applications, e.g., technical text summaries, search query predictions, relational knowledge discovery, and design ideation support, in the context of engineering and technology, and complement or enrich existing semantic databases. To enable such applications, the TechNet is made public via an online interface and APIs for public users to retrieve technologyrelated terms and their relevancies

Title

Improving Chemical Named Entity Recognition In Patents With Contextualized Word Embeddings

Author(s)

Zenan Zhai, Dat Quoc Nguyen, Saber Akhondi, Camilo Thorne, Christian Druckenbrodt, Trevor Cohn, Michelle Gregory, Karin Verspoor

Year of Publishing

2019

Published On

ACL Anthology

Affiliation

The University of Melbourne

PDF

Link 

Abstract

Chemical patents are an important resource for chemical information. However, few chemical Named Entity Recognition (NER) systems have been evaluated on patent documents, due in part to their structural and linguistic complexity. In this paper, we explore the NER performance of a BiLSTM-CRF model utilising pre-trained word embeddings, character-level word representations and contextualized ELMo word representations for chemical patents. We compare word embeddings pre-trained on biomedical and chemical patent corpora. The effect of tokenizers optimized for the chemical domain on NER performance in chemical patents is also explored. The results on two patent corpora show that contextualized word representations generated from ELMo substantially improve chemical NER performance w.r.t. the current state-of-the-art. We also show that domain-specific resources such as word embeddings trained on chemical patents and chemical-specific tokenizers, have a positive impact on NER performance.

Title

Automating The Search For A Patent’s Prior Art With A Full Text Similarity Search

Author(s)

Lea Helmers, Franziska Horn, Franziska Biegler, Tim Oppermann, Klaus-Robert Müller

Year of Publishing

2019

Published On

ArXiv

Affiliation

Technische Universität Berlin, Meinig & Partner, Korea University, Max-Planck-Institut für Informatik

PDF

Link 

Abstract

More than ever, technical inventions are the symbol of our society’s advance. Patents guarantee their creators protection against infringement. For an invention being patentable, its novelty and inventiveness have to be assessed. Therefore, a search for published work that describes similar inventions to a given patent application needs to be performed. Currently, this so-called search for prior art is executed with semi-automatically composed keyword queries, which is not only time consuming, but also prone to errors. In particular, errors may systematically arise by the fact that different keywords for the same technical concepts may exist across disciplines. In this paper, a novel approach is proposed, where the full text of a given patent application is compared to existing patents using machine learning and natural language processing techniques to automatically detect inventions that are similar to the one described in the submitted document. Various state-of-the-art approaches for feature extraction and document comparison are evaluated. In addition to that, the quality of the current search process is assessed based on ratings of a domain expert. The evaluation results show that our automated approach, besides accelerating the search process, also improves the search results for prior art with respect to their quality.

Title

Patent Classification By Fine-Tuning BERT Language Model

Author(s)

Jieh-Sheng Lee, Jieh Hsiang

Year of Publishing

2019

Published On

ArXiv

Affiliation

National Taiwan University

PDF

Link 

Abstract

In this work we focus on fine-tuning a pre-trained BERT model and applying it to patent classification. When applied to large datasets of over two millions patents, our approach outperforms the state of the art by an approach using CNN with word embeddings. In addition, we focus on patent claims without other parts in patent documents. Our contributions include: (1) a new state-of-the-art result based on pretrained BERT model and fine-tuning for patent classification, (2) a large dataset USPTO-3M at the CPC subclass level with SQL statements that can be used by future researchers, (3) showing that patent claims alone are sufficient for classification task, in contrast to conventional wisdom.

Title

Automatic Pre-Search: An overview

Author(s)

Dominique Andlauer

Year of Publishing

2018

Published On

Science Direct

Affiliation

European Patent Office

PDF

Link 

Abstract

This paper describes the evolution of the EPO’s search tools over time and their envisaged revolution towards supporting a more or even fully automated search process. Regardless of whether the goal of fully automated search is achieved completely or only partially, the chosen approach will in any case bring about major improvements for both the EPO’s examiners and the prior art search community at large.

Title

De-Noising Documents With A Novelty Detection Method Utilizing Class Vectors

Author(s)

Lee Younghoon, Cho Sungzoona, Choi Jinhaeb 

Year of Publishing

2018

Published On

IOS Press

Affiliation

Seoul National University

PDF

Link 

Abstract

The classification of customer-voice data is an important matter in real business since it is necessary for customer-voice data to be delivered to relevant departments and responsible individuals. Additionally, customer-voice data typically includes several novel words, such as typo’s, informal terms, or exceedingly general words to discriminate between categories of customer-voice data. Furthermore, noisy data often has a negative effect on the classification task. In this study, advanced novelty detection method is proposed to utilize class vector that possessed high cosine similarity with words to effectively discriminate between classes. The class vector is considered as the centroid or the mean of each word vector distribution as derived from the neural embedding model, and the novelty score of each word is calculated and novel words are effectively detected. Each novelty score is calculated by improvements of GMM and KMC methods utilizing a class vector. The experiments verify the propriety of the proposed method with qualitative observations, and the application of the proposed method with quantitative experiments verifies the representational effectiveness and classification performance of customer-voice data. The experiment results indicate that the performance of a classification of customer-voice data improved with the application of the newly proposed novelty detection method in this study.

Title

DeepPatent: Patent Classification With Convolutional Neural Networks And Word Embedding

Author(s)

Shaobo Li, Jie Hu, Yuxin Cui, Jianjun Hu 

Year of Publishing

2018

Published On

Springer

Affiliation

Guizhou University, University of South Carolina

PDF

Link 

Abstract

Patent classification is an essential task in patent information management and patent knowledge mining. However, this task is still largely done manually due to the unsatisfactory performance of current algorithms. Recently, deep learning methods such as convolutional neural networks (CNN) have led to great progress in image processing, voice recognition, and speech recognition, which has yet to be applied to patent classification. We proposed DeepPatent, a deep learning algorithm for patent classification based on CNN and word vector embedding. We evaluated the algorithm on the standard patent classification benchmark dataset CLEF-IP and compared it with other algorithms in the CLEF-IP competition. Experiments showed that DeepPatent with automatic feature extraction achieved a classification precision of 83.98%, which outperformed all the existing algorithms that used the same information for training. Its performance is better than the state-of-art patent classifier with a precision of 83.50%, whose performance is, however, based on 4000 characters from the description section and a lot of feature engineering while DeepPatent only used the title and abstract information. DeepPatent is further tested on USPTO-2M, a patent classification benchmark data set that we contributed with 2,000,147 records after data cleaning of 2,679,443 USA raw utility patent documents in 637 categories at the subclass level. Our algorithms achieved a precision of 73.88%.

Title

A Hierarchical Feature Extraction Model For Multi-Label Mechanical Patent Classification

Author(s)

Jie Hu, Shaobo Li, Jianjun Hu, Guanci Yang

Year of Publishing

2018

Published On

MDPI

Affiliation

Guizhou University, University of South Carolina

PDF

Link 

Abstract

Various studies have focused on feature extraction methods for automatic patent classification in recent years. However, most of these approaches are based on the knowledge from experts in related domains. Here we propose a hierarchical feature extraction model (HFEM) for multi-label mechanical patent classification, which is able to capture both local features of phrases as well as global and temporal semantics. First, a n-gram feature extractor based on convolutional neural networks (CNNs) is designed to extract salient local lexical-level features. Next, a long dependency feature extraction model based on the bidirectional long–short-term memory (BiLSTM) neural network model is proposed to capture sequential correlations from higher-level sequence representations. Then the HFEM algorithm and its hierarchical feature extraction architecture are detailed. We establish the training, validation and test datasets, containing 72,532, 18,133, and 2679 mechanical patent documents, respectively, and then check the performance of HFEMs. Finally, we compared the results of the proposed HFEM and three other single neural network models, namely CNN, long–short-term memory (LSTM), and BiLSTM. The experimental results indicate that our proposed HFEM outperforms the other compared models in both precision and recall.

Title

Automated Patent Landscaping

Author(s)

Aaron Abood, Dave Feltenberger   

Year of Publishing

2018

Published On

Springer

Affiliation

Google Inc

PDF

Link 

Abstract

Patent landscaping is the process of finding patents related to a particular topic. It is important for companies, investors, governments, and academics seeking to gauge innovation and assess risk. However, there is no broadly recognized best approach to landscaping. Frequently, patent landscaping is a bespoke human-driven process that relies heavily on complex queries over bibliographic patent databases. In this paper, we present Automated Patent Landscaping, an approach that jointly leverages human domain expertise, heuristics based on patent metadata, and machine learning to generate high-quality patent landscapes with minimal effort. In particular, this paper describes a flexible automated methodology to construct a patent landscape for a topic based on an initial seed set of patents. This approach takes human-selected seed patents that are representative of a topic, such as operating systems, and uses structure inherent in patent data such as references and class codes to “expand” the seed set to a set of “probably-related” patents and anti-seed “probably-unrelated” patents. The expanded set of patents is then pruned with a semi-supervised machine learning model trained on seed and anti-seed patents. This removes patents from the expanded set that are unrelated to the topic and ensures a comprehensive and accurate landscape.

Title

User Interface For Managing And Refining Related Patent Terms

Author(s)

Girish Showkatramani, Arthi Krishna, Ye Jin, Aaron Pepe, Naresh Nula, Greg Gabel

Year of Publishing

2018

Published On

Springer 

Affiliation

United States Patent and Trademark Office

PDF

Link 

Abstract

One of the crucial aspects of the patent examination process is assessing the patentability of an invention by performing extensive keyword-based searches to identify related existing inventions (or lack thereof). The expertise of identifying the most effective keywords is a critical skill and time-intensive step in the examination process. Recently, word embedding techniques have demonstrated value in identifying related words. In word embedding, the vector representation of an individual word is computed based on its context, and so words with similar meaning exhibit similar vector representation. Using a number of alternate data sources and word embedding techniques we are able to generate a variety of word embedding models. For example, we initially clustered patent data based on the different areas of interests such as Computer Architecture or Biology, and used this data to train Word2Vec and fastText models. Even though the generated word embedding models were reliable and scalable, none of the models by itself was sophisticated enough to match an experts choice of keywords.

In this study, we have developed a user interface that allows domain experts to quickly evaluate several word embedding models and curate a more sophisticated set of related patent terms by combining results from several models or in some cases even augmenting to them by hand. Our application thereby seeks to provide a functional and usable centralized interface towards searching and identifying related terms in the patent domain.

Title

Supervised Approaches To Assign Cooperative Patent Classification (CPC) Codes To Patents

Author(s)

Tung Tran, Ramakanth Kavuluru

Year of Publishing

2017

Published On

NETLAB

Affiliation

University of Kentucky

PDF

Link 

Abstract

This paper re-introduces the problem of patent classification with respect to the new Cooperative Patent Classification (CPC) system. CPC has replaced the U.S. Patent Classification (USPC) coding system as the official patent classification system in 2013. We frame patent classification as a multi-label text classification problem in which the prediction for a test document is a set of labels and success is measured based on the micro-F1 measure. We propose a supervised classification system that exploits the hierarchical taxonomy of CPC as well as the citation records of a test patent; we also propose various label ranking and cut-off (calibration) methods as part of the system pipeline. To evaluate the system, we conducted experiments on U.S. patents released in 2010 and 2011 for over 600 labels that correspond to the “subclasses” at the third level in the CPC hierarchy. The best variant of our model achieves ≈ 70% in micro-F1 score and the results are statistically significant. To the best of our knowledge, this is the first effort to reinitiate the automated patent classification task under the new CPC coding scheme. 

Title

Patents Images Retrieval And Convolutional Neural Network Training Dataset Quality Improvement

Author(s)

Alla Kravets, Nikita Lebedev, Maxim Legenchenko 

Year of Publishing

2017

Published On

Atlantis Press

Affiliation

Volgograd State Technical University

PDF

Link 

Abstract

The paper considers the problem of the analysis of patents’ figures for formalization of subjective opinions of the patent office experts that reviews applications for inventions. Drawings omission may indicate an incomplete description of the invention and entail the rejection of patent applications and other problems. Since patent images, even if one considers images of the same type, class, etc., are unique, different from each other. Nowadays for image processing are applied neural networks with different architectures. Neural network, Convolutional neural network, Siamese neural network were considered in the research. 4 libraries (Theano, TensorFlow, Caffe, and Keras) were studied. The main contributions of the paper are the new classification of patents’ imaged, training dataset formation and quality improvement approach, and the software implementation for CNN training.

Title

An Initial Study Of Anchor Selection In Patent Link Discovery

Author(s)

Dilesha Seneviratne, Shlomo Geva, Guido Zuccon, Andrew Trotman

Year of Publishing

2017

Published On

University of Otago

Affiliation

Queensland University of Technology, University of Otago

PDF

Link  

Abstract

Patents are a source of technical knowledge, but often difficult to understand. Technological solutions that would help understand the knowledge expressed in patents can assist the creation of new knowledge, and inventions. This paper explores anchor text selection for linking patents to external knowledge sources such as web pages and prior patents. While link discovery has been investigated in other domains, e.g., Wikipedia and the medical domain, the application of linking patents has received little attention and it presents some unique challenges as this paper shows. The paper contributes: (1) a test collection investigating the identification of anchor text (entities) in patent link discovery, (2) a user experiment studying the selection of anchors by users, and (3) an evaluation of four popular unsupervised keyword ranking methods (TFIDF, BM25, Keyphraseness, Termex) to identify potential anchors to link

Title

A Literature Review On Patent Information Retrieval Techniques

Author(s)

Alok Khode, Sagar Jambhorkar

Year of Publishing

2017

Published On

Indian Journal of Science and Technology

Affiliation

Symbiosis International University, National Defense Academy

PDF

Link 

Abstract

Patents are critical intellectual assets for any competitive business. They can prove to be a gold mine if retrieved, analyzed and utilized appropriately. Patentability search is an important step in the patent process and missing out any relevant patent may cause expensive legal consequences. As worldwide patent collection is growing rapidly, retrieval of this enormous knowledge source has become complex and exhaustive. This paper attempts to review the studies carried out in enhancing the relevance effectiveness of patent information retrieval. Method/Analysis: Literature review presents various research works that have been carried out to yield better results in patent retrieval task by refining existing information retrieval techniques or by using standard approaches at the various stages of the patent retrieval task. This work exclusively looks at literatures dealing with retrieval of patent text. Findings: Patent retrieval is not a completely solved research domain and general information retrieval approaches do not prove effective in this domain as patents are special documents posing various retrieval challenges. The review also highlights future research directions and will help researchers working in the domain of patent retrieval. Application/Improvement: Considering the various techniques and frameworks available and their limitations, there is a lot of scope in the field of patent retrieval techniques which makes room for further research to be taken up in this domain.

Title

Examiner Assisted Automated Patents Search

Author(s)

Arthi Krishna, Brian Feldman, Joseph Wolf, Greg Gabel, Scott Beliveau, Thomas Beach

Year of Publishing

2016

Published On

AAAI

Affiliation

United States Patent And Trademark Office

PDF

Link 

Abstract

One of the most crucial and knowledge-intenstive steps of patent examination is the determination of prior art- evidence that the idea claimed by a patent is already known. Automated prior art retrieval algorithms, if effective, can assist expert examiners by identifying literature that would otherwise take substantial research to uncover. Our approach is to build a patent search algorithm which functions as a cognitive assistant to the patent searcher. Contrary to the approach of treating the search algorithm as a black box, all componetnts of the seach algorithm are explained, and these components expose controls that can be adjusted by the user. This level of transparency and interactivity of the algorithm not only enables the experts to get the best use of the tool, but also is crucial in gaining the trust of the users. In this paper we discuss the engineering of the cognitive assistant search tool, referred to as Sigma, and the various interactions it affords the users. The tool is currently being piloted to patent examiners in the unit 2427.

Title

User Interface For Customizing Patents Search: An Exploratory Study

Author(s)

Arthi Krishna, Brian Feldman, Joseph Wolf, Greg Gabel, Scott Beliveau, Thomas Beach

Year of Publishing

2016

Published On

Springer

Affiliation

United States Patent and Trademark Office

PDF

Link 

Abstract

Prior art searching is a critical and knowledge-intensive step in the examination process of a patent application. Historically, the approach to automated prior art searching is to determine a few keywords from the patent application and, based on simple text frequency matching of these keywords, retrieve published applications and patents. Several emerging techniques show promise to increase the accuracy of automated searching, including analysis of: named entity extraction, explanations of how patents are classified, relationships between references cited by the examiner, weighing words found in some sections of the patent application differently than others, and lastly using the examiners’ domain knowledge such as synonyms. These techniques are explored in this study. Our approach is firstly, to design a user interface that leverages the above-mentioned processing techniques for the user and secondly, to provide visual cues that can guide examiner to fine tune search algorithms. The user interface displays a number of controls that affect the behavior of the underlying search algorithm—a tag cloud of the top keywords used to retrieve patents, sliders for weights on the different sections of a patent application (e.g., abstract, claims, title or specification), and a list of synonyms and stop-words. Users are provided with visual icons that give quick indication of the quality of the results, such as whether the results share a feature with the patent-at-issue, such as both citing to the same reference or having a common classification. This exploratory study shows results of seven variations of the search algorithm on a test corpus of 100500 patent documents.

Title

On Term Selection Techniques For Patent Prior Art Search

Author(s)

Mona Golestan Far

Year of Publishing

2016

Published On

Australian National University

Affiliation

The Australian National University

PDF

Link 

Abstract

A patent is a set of exclusive rights granted to an inventor to protect his invention for a limited period of time. Patent prior art search involves finding previously granted patents, scientific articles, product descriptions, or any other published work that may be relevant to a new patent application. Many well-known information retrieval (IR) techniques (e.g., typical query expansion methods), which are proven effective for ad hoc search, are unsuccessful for patent prior art search. In this thesis, we mainly investigate the reasons that generic IR techniques are not effective for prior art search on the CLEF-IP test collection. First, we analyse the errors caused due to data curation and experimental settings like applying International Patent Classification codes assigned to the patent topics to filter the search results. Then, we investigate the influence of term selection on retrieval performance on the CLEF-IP prior art test collection, starting with the description section of the reference patent and using language models (LM) and BM25 scoring functions. We find that an oracular relevance feedback system, which extracts terms from the judged relevant documents far outperforms the baseline (i.e., 0.11 vs. 0.48) and performs twice as well on mean average precision (MAP) as the best participant in CLEF-IP 2010 (i.e., 0.22 vs. 0.48). We find a very clear term selection value threshold for use when choosing terms. We also notice that most of the useful feedback terms are actually present in the original query and hypothesise that the baseline system can be substantially improved by removing negative query terms. We try four simple automated approaches to identify negative terms for query reduction but we are unable to improve on the baseline performance with any of them. However, we show that a simple, minimal feedback interactive approach, where terms are selected from only the first retrieved relevant document outperforms the best result from CLEF-IP 2010, suggesting the promise of interactive methods for term selection in patent prior art search.

Title

A Study Of Query Reformulation For Patent Prior Art Search With Partial Patent Applications

Author(s)

Mohamed Reda Bouadjenek, Scott Sanner, Gabriela Ferraro

Year of Publishing

2015

Published On

HAL 

Affiliation

University of Montpellier, ‡Oregon State University

PDF

Link 

Abstract

Patents are used by legal entities to legally protect their inventions and represent a multi-billion dollar industry of licensing and litigation. In 2014, 326,033 patent applications were approved in the US alone – a number that has doubled in the past 15 years and which makes prior art search a daunting, but necessary task in the patent application process. In this work, we seek to investigate the efficacy of prior art search strategies from the perspective of the inventor who wishes to assess the patentability of their ideas prior to writing a full application. While much of the literature inspired by the evaluation framework of the CLEF-IP competition has aimed to assist patent examiners in assessing prior art for complete patent applications, less of this work has focused on patent search with queries representing partial applications. In the (partial) patent search setting, a query is often much longer than in other standard IR tasks, e.g., the description section may contain hundreds or even thousands of words. While the length of such queries may suggest query reduction strategies to remove irrelevant terms, intentional obfuscation and general language used in patents suggests that it may help to expand queries with additionally relevant terms. To assess the trade-offs among all of these pre-application prior art search strategies, we comparatively evaluate a variety of partial application search and query reformulation methods. Among numerous findings, querying with a full description, perhaps in conjunction with generic (non-patent specific) query reduction methods, is recommended for best performance. However, we also find that querying with an abstract represents the best trade-off in terms of writing effort vs. retrieval efficacy (i.e., querying with the description sections only lead to marginal improvements) and that for such relatively short queries, generic query expansion methods help

Title

Novelty-Focused Patent Mapping For Technology Opportunity Analysis

Author(s)

Changyong Lee, Bokyoung Kang, Juneseuk Shin

Year of Publishing

2014

Published On

Science Direct

Affiliation

Ulsan National Institute of Science and Technology, Seoul National University, Sungkyunkwan University

PDF

Link 

Abstract

Patent maps are an effective means of discovering potential technology opportunities. However, this method has been of limited use in practice since defining and interpreting patent vacancies, as surrogates for potential technology opportunities, tend to be intuitive and ambiguous. As a remedy, we propose an approach to detecting novel patents based on systematic processes and quantitative outcomes. At the heart of the proposed approach is the text mining to extract the patterns of word usage and the local outlier factor to measure the degree of novelty in a numerical scale. The meanings of potential technology opportunities become more explicit by identifying novel patents rather than patent vacancies that are usually represented as a simple set of keywords. Finally, a novelty-focused patent identification map is developed to explore the implications on novel patents. A case study of the patents about thermal management technology of light emitting diode (LED) is exemplified. We believe the proposed approach could be employed in various research areas, serving as a starting point for developing more general models.

Title

A Survey Of Automated Hierarchical Classification Of Patents

Author(s)

Juan Carlos Gomez, Marie-Francine Moens

Year of Publishing

2014

Published On

KU Leuven

Affiliation

KU Leuven

PDF

Link 

Abstract

In this era of “big data”, hundreds or even thousands of patent applications arrive every day to patent offices around the world. One of the first tasks of the professional analysts in patent offices is to assign classification codes to those patents based on their content. Such classification codes are usually organized in hierarchical structures of concepts. Traditionally the classification task has been done manually by professional experts. However, given the large amount of documents, the patent professionals are becoming overwhelmed. If we add that the hierarchical structures of classification are very complex (containing thousands of categories), reliable, fast and scalable methods and algorithms are needed to help the experts in patent classification tasks. This chapter describes, analyzes and reviews systems that, based on the textual content of patents, automatically classify such patents into a hierarchy of categories. This chapter focuses specially in the patent classification task applied for the International Patent Classification (IPC) hierarchy. The IPC is the most used classification structure to organize patents, it is world-wide recognized, and several other structures use or are based on it to ensure office inter-operability.

Title

A Hybrid Patent Prior Art Retrieval Approach Using Claim Structure And Description

Author(s)

Fu-Ren Lin, Ke-Ren Chen, Szu-Yin Lin

Year of Publishing

2013

Published On

Springer

Affiliation

National Tsing Hua University, Chung Yuan Christian University

PDF

Link 

Abstract

In the highly competitive business environment, companies use patents as the intellectual asset to gain strategic competiveness. Patent prior art retrieval is a nontrivial task for invalidity and patentability search, which could help enterprises to plan their R&D strategies, patent portfolio, and avoid patent infringement issues in the future. This study adopts an efficient and effective hybrid patent prior art retrieval approach using claim structure and patent description to enhance prior art retrieval performance in terms of recall rate and exam the robustness through performing experiments in a large dataset. We obtained the best result by combining the information of claim structure and top 70 % sentences in description. We have achieved the competitive result in terms of raising the recall rate with the proposed hybrid approach, which also demonstrated the usefulness of including claim structure into patent prior art retrieval system.

Title

Identifying Technological Opportunities Using The Novelty Detection Technique: A Case Of Laser Technology In Semiconductor Manufacturing

Author(s)

Youngjung Geum, Jeonghwan Jeon, Hyeonju Seol

Year of Publishing

2013

Published On

Taylor & Francis Online

Affiliation

Seoul National University, Korea Air Force Academy

PDF

Link 

Abstract

While identification of technological opportunities has received considerable attention, previous studies have some weaknesses in terms of subjectivity when finding the opportunities in practical terms. This paper proposes a systematic framework to identify technological opportunities, focusing on objective evidences which are specific and practical to be used in a business environment. To do this, we used patents as a source and employed a novelty detection technique whose primary object is detecting the novel pattern. To begin with, the patents are collected from the United States Patent and Trademark Office (USPTO) database. These patents are then pre-processed into a structured keyword vector that can represent the characteristics of each patent. These keyword vectors are then used to analyse the new and emerging pattern, using the novelty detection technique. As the final step, the results are analysed to identify the technological opportunities. A case study on laser technology in lithography is presented to show the proposed framework.

Title

An Ontology-Based Automatic Semantic Annotation Approach For Patent Document Retrieval In Product Innovation Design

Author(s)

Feng Wang, Lan Fen Lin, Zhou Yang

Year of Publishing

2013

Published On

Scientific

Affiliation

Zhejiang University

PDF

Link 

Abstract

Patent retrieval plays a very important role in product innovation design. However, current patent retrieval approaches lack semantic comprehension and association, and usually cannot capture the implicit useful knowledge at a semantic level. In order to improve the traditional patent search, this paper proposes a novel ontology-based automatic semantic annotation approach based on the thorough analysis of patent documents, which combines both structure and content characteristics, and integrates multiple techniques from various aspects. Multilayer semantic model is established to realize unified semantic representation. The approach first utilizes template schemes to extract the structure information from patent documents, and then identifies semantics of entities and relations between entities from the content based on natural language processing techniques and domain knowledge, and at last employs a heuristic pattern learning method to abstract patent technical features. Case study is provided to show that our approach can acquire multi-level patent semantic knowledge from multiple perspectives, and discover semantic correlations between patent documents, which can further promote the accurate patent semantic retrieval effectively.

Title

Recommending Patents Based On Latent Topics

Author(s)

Ralf Krestel, Padhraic Smyth

Year of Publishing

2013

Published On

Research Gate

Affiliation

Leibniz Information Centre for Economics

PDF

Link 

Abstract

The availability of large volumes of granted patents and applications, all publicly available on the Web, enables the use of sophisticated text mining and information retrieval methods to facilitate access and analysis of patents. In this paper we investigate techniques to automatically recommend patents given a query patent. This task is critical for a variety of patent-related analysis problems such as finding relevant citations, research of relevant prior art, and infringement analysis. We investigate the use of latent Dirichlet allocation and Dirichlet multinomial regression to represent patent documents and to compute similarity scores. We compare our methods with state-of-the-art document representations and retrieval techniques and demonstrate the effectiveness of our approach on a collection of US patent publications.

Title

A New Instrument For Technology Monitoring: Novelty In Patents Measured By Semantic Patent Analysis

Author(s)

Jan M. Gerken, Martin G. Moehrle

Year of Publishing

2012

Published On

Springer

Affiliation

University of Bremen

PDF

Link 

Abstract

Given that in terms of technology novel inventions are crucial factors for companies; this article contributes to the identification of inventions of high novelty in patent data. As companies are confronted with an information overflow, and having patents reviewed by experts is a time-consuming task, we introduce a new approach to the identification of inventions of high novelty: a specific form of semantic patent analysis. Subsequent to the introduction of the concept of novelty in patents, the classical method of semantic patent analysis will be adapted to support novelty measurement. By means of a case study from the automotive industry, we corroborate that semantic patent analysis is able to outperform available methods for the identification of inventions of high novelty. Accordingly, semantic patent information possesses the potential to enhance technology monitoring while reducing both costs and uncertainty in the identification of inventions of high novelty.

Title

Automatic IPC Encoding And Novelty Tracking For Effective

patent mining

Author(s)

Douglas Teodoro, Emilie Pasche, Dina Vishnyakova, Christian Lovis, Julien Gobeill, Patrick Ruch

Year of Publishing

2011

Published On

BiTeM

Affiliation

University of Geneva, University of Applied Sciences

PDF

Link 

Abstract

Accurate classification of patent documents according to the IPC system is vital for the interoperability between different patent offices and for the prior art search task involved in a patent application procedure. It is essential for companies and governments to track changes in technology in order to asses their investments and create new branches of novel solutions. In this paper, we present our experiments from the NTCIR-8 challenge to automate paper abstract classification into the IPC taxonomy and to create a technical trend map from it. We apply the k-NN algorithm in the classification process and manipulate the rank of the nearest neighbours to enhance our results. The technical trend map is created by detecting technologies and their effects passages in paper and patent abstracts. A CRF-based system enriched with handcrafted rules is used to detect technology, effect, attribute and value phrases in the abstracts. Our experiments use multi patent databases for training the system and paper abstracts as well as patent applications for testing purposes, thus characterising a cross database and cross genre task. In the subtask of Research Papers Classification, we achieve a MAP of 0.68, 0.50 and 0.30 for the English and 0.71, 0.50 and 0.30 for the J2E subclass, main group and subgroup classifiers respectively. In the Technical Trend Map Creation subtask, we achieve an F-score of 0.138 when detecting technology/effect elements in patent abstracts and 0.141 in paper abstracts. Our methodology provides competitive results for the state of the art, with the majority of our official runs being ranked within the top two for both trend map (papers) and IPC coding. That said we see room for improvements especially in the detection of technologies and attributes elements in abstracts. Finally, we believe that the subtask of Technical Trend Map Creation needs to be adjusted in order to better produce a patent map. The classification system is available online at http://pingu.unige.ch:8080/IPCCat.

Title

A KNN Research Paper Classification Method Based On Shared Nearest Neighbor 

Author(s)

Yun-lei Cai, Duo Ji ,Dong-feng Cai

Year of Publishing

2010

Published On

NII Japan

Affiliation

Shenyang Institute of Aeronautical Engineering,

PDF

Link 

Abstract

The patents cover almost all the latest, the most active innovative technical information in technical fields, therefore patent classification has great application value in the patent research domain. This paper presents a KNN text categorization method based on shared nearest neighbor, effectively combining the BM25 similarity calculation method and the Neighborhood Information of samples. The effectiveness of this method has been fully verified in the NTCIR-8 Patent Classification evaluation.

Title

Exploring Contextual Models In Chemical Patent Search

Author(s)

Jay Urbain, Ophir Frieder

Year of Publishing

2010

Published On

Research Gate

Affiliation

Milwaukee School of Engineering, Georgetown University

PDF

Link 

Abstract

We explore the development of probabilistic retrieval models for integrating term statistics with entity search using multiple levels of document context to improve the performance of chemical patent search. A distributed indexing model was developed to enable efficient named entity search and aggregation of term statistics at multiple levels of patent structure including individual words, sentences, claims, descriptions, abstracts, and titles. The system can be scaled to an arbitrary number of compute instances in a cloud computing environment to support concurrent indexing and query processing operations on large patent collections. The query processing algorithm for patent prior art search uses information extraction techniques to identify candidate entities and distinctive terms from the query patent’s title, abstract, description, and claim sections. Structured queries integrating terms and entities in context are automatically generated to test the validity of each section of potentially relevant patents. The system was deployed across 15 Amazon Web Services (AWS) Elastic Cloud Compute (EC2) instances to support efficient indexing and query processing of the relatively large 100G+ collection of chemical patent documents. We evaluated several retrieval models for integrating statistics of candidate entities with term statistics at multiple levels of patent structure to identify relevant patents for prior art search. Our top performing retrieval model integrating contextual evidence from multiple levels of patent structure resulted in bpref measurements of 0.8929 for the prior art search task, exceeding the top results reported from the 2009 TREC Chemistry track.

Title

Improving Retrievability Of Patents In Prior-Art Search

Author(s)

Shariq Bashir, Andreas Rauber

Year of Publishing

2010

Published On

Vienna University of Technology

Affiliation

Vienna University of Technology

PDF

Link 

Abstract

Prior-art search is an important task in patent retrieval. The success of this task relies upon the selection of relevant search queries. Typically terms for prior-art queries are extracted from the claim fields of query patents. However, due to the complex technical structure of patents, and presence of terms mismatch and vague terms, selecting relevant terms for queries is a difficult task. During evaluating the patents retrievability coverage of prior-art queries generated from query patents, a large bias toward a subset of the collection is experienced. A large number of patents either have a very low retrievability score or can not be discovered via any query. To increase the retrievability of patents, in this paper we expand prior-art queries generated from query patents using query expansion with pseudo relevance feedback. Missing terms from query patents are discovered from feedback patents, and better patents for relevance feedback are identified using a novel approach for checking their similarity with query patents. We specifically focus on how to automatically select better terms from query patents based on their proximity distribution with prior-art queries that are used as features for computing similarity. Our results show, that the coverage of prior-art queries can be increased significantly by incorporating relevant queries terms using query expansion.

Title

Experiments On Patent Retrieval At NTCIR-4 Workshop

Author(s)

Hironori Takeuchi, Naohiko Uramoto, Koichi Takeda

Year of Publishing

2004

Published On

NII Japan

Affiliation

Tokyo Research Laboratory, National Institute of Informatics 

PDF

Link

Abstract

In the Patent Retrieval Task in NTCIR-4 Workshop, the search topic is the claim in a patent document, so we use the claim text and the IPC information for the similarity calculations between the search topic and each patent document in the collection. We examined the effectiveness of the similarity measure between IPCs and the term weighting for the occurrence positions of the keyword attributes in the search topic. As a result, it was found that the search results are slightly improved by considering not just the text in the search topic but also the hierarchical structural information of the IPCs. In contrast, the term frequencies for the occurrence position of the attribute did not improve the retrieval result.

The above list covers the published papers in the field of AI-based patent search. We hope that it assists you in your ongoing research in creating such AI-based patent search engines. Bookmark this page to have the list handy for future reference.

 

We at PQAI (Patent Quality Artificial Intelligence) are working to create an open-source AI-based library of patent tools to accelerate innovation and improve patent quality. One of our efforts is PQAI’s AI-based prior-art search tool, a collaborative initiative that drives diversity and inclusion by creating a level-playing field for all researchers in terms of prior-art searches. Currently, only big corporations and patent offices have adequate resources for these exhaustive searches. However, PQAI is democratizing the process by allowing zero-budget prior-art checks. We are a non-profit organization and firmly believe in transparency and user privacy. We do not store your data or search queries on our servers unless you specifically ask to do so for future reference.

 

PQAI is always looking for talented minds to help with our initiative. Get involved with us if you want to collaborate, have questions, or just want to say hi.


If you are a researcher working with patent data, you can also check out PQAI’s Researcher Page. You can use our open-source libraries, AI models and datasets to accelerate your work. You are also always invited to contribute to PQAI to help the research community in the field of AI-based patent search.

Recommended Posts

No comment yet, add your voice below!


Add a Comment

Your email address will not be published.