◯ Transform any text into a patent application.オープンソースコム。

2014-05-13 07:38:27 | ♪PFK ASAP NEWS



05/13/2014 Sam Lavigne. opensource.com

Transform any text into a patent application


Figure 2 From Paul Scheerbart’s Perpetual Motion Machine

I wrote a program that transforms literary and philosophical texts into patent applications. In short, it reframes texts as inventions or machines. You can view the code on github.

I was partially inspired by Paul Scheerbart’s Perpetual Motion Machine, a sort of technical/literary diary in which Scheerbart documents and reflects on various failed attempts to create a perpetual motion machine. Scheerbart frequently refers to his machines as “stories” – I wanted to reverse the concept and transform stories into machines.

In this post I’ll provide some details about how I wrote the program, and describe some of the tools that I used.

First, here’s some sample output, listed by invention title and source text:

“A method and device for comprehending theoretically the historical movement” (The Communist Manifesto)
“An apparatus and device for staring into vacancy” (The Hunger Artist by Kafka)
“A device and system for belonging to bringing-forth” (The Question Concerning Technology by Heidegger)
The program operates in four parts. First it generates a title for the invention, then an abstract, then a list of illustrations, and finally a more detailed description of the “embodiments” of the invention.

In general, my methodology is to find common grammatical structures in patent applications, and then extract sentences containing similar grammatical structures from my input texts. To do this, I make heavy use of the Pattern library, which, among many other wonderful features, allows you to perform regular-expression-like searches using parts of speech. For example, here’s how you can use pattern to to search through a text for all instances of an adjective followed by a plural noun:

Python

1
2
3
4
5
from pattern.search import search
from pattern.en import parsetree

t = parsetree('A lot of things are ruining a lot of other things')
print search('JJ NNS', t)
Title Generation

There are a number of grammatical patterns that I noticed in patent application titles – one that stuck out to me is “[NOUN] (and [NOUN]) for [GERUND] [NOUN PHRASE]“. For example:

Method for concealing partial baldness
Method and apparatus for finding love
Web-based system and method for preventing unauthorized access to copyrighted academic texts
To create my invention titles, I simply search through the source text for “VBG * JJ? NP”, which translates to “a gerund, followed by anything, followed by an optional adjective, followed by a noun phrase.” The program selects an arbitrary title from all the options it finds, and then prefixes the title with a random combination of “system”, “method”, “apparatus”, and “device”. Occasionally it’ll add “web-based” into the mix as well. Here are a few of the many possible titles generated from the Communist Manifesto:

a web-based method and device for haunting Europe
an apparatus and device for rounding of the Cape
an apparatus and system for surpassing Egyptian pyramids
a web-based method and device for revolutionising the instruments
a system and device for clearing of whole continents
a web-based apparatus and method for paving the way
a system and apparatus for diminishing the means
an apparatus and system for fighting the bourgeoisie
a method and device for depicting the most general phases
a method and device for begetting a new supply
a method and apparatus for appropriating material products
a device and system for appropriating intellectual products
a system and device for springing from your present mode
a method and apparatus for having the wives
a web-based apparatus and method for desiring to abolish countries and nationality
a system and method for fluctuating between proletariat and bourgeoisie
a system and method for redressing social grievances
Generating an “Abstract”

Typically a patent application will have an abstract that describes in brief what the invention is comprised of. To generate my abstracts, I follow a similar method to the title generation, searching through my source text for instances of adjectives followed by singular or plural nouns. However, in this case I make a small but significant change. I restrict the possible nouns to those that fit into the category of “artifacts”. For example, here’s the abstract that gets generated from Heidegger’s essay on technology:

The devices comprises a wooden bridge, a technical apparatus, a high-frequency apparatus, a whole structure, a human handiwork, a mere handiwork, an autonomous tool, a hydroelectric plant, an actual chalice, an old windmill, a sacrificial chalice.

In order to do this, I wrote a function that searches first for grammatical patterns, and then filters that output based on hypernyms. A hypernym is a word that fits into a level of categorical abstraction up from another given word. So, a hypernym for “car” is “machine”, a hypernym for “pigeon” is “bird”. We can consider words sharing hypernyms as belonging to the same abstract category. This in itself is a fun tool to play with. For example, I can enter the following into my my program:


1
python search.py < kafka.txt 'organism' 'JJ NN'
And I get a list of all the adjectives followed by nouns that fit into the “organism” category in Kafka’s The Metamorphosis:

old man
own mother
timorous visitor
observant sister
tired man
junior salesman
expressive violinist
middle gentleman
good man
unfortunate son
old maid
elderly widow
horrible vermin
elderly mother
chief clerk
wild man
fall victim
sensible person
lazy son
much mother
commercial traveller
young lady
Fun!

Illustrations

The next part of the program generates a list of “illustrations”. This time I search for phrases that fit into a grammatical structure that looks like “DT JJ NP IN * NN”, or, “determiner followed by adjective followed by noun phrase followed by a conjunction followed by anything, followed by a singular noun“. I attach these phrases to other randomly selected phrases commonly found in descriptions of patent illustrations. These “illustrations” come from Gargantua and Pantagruel by Rabelais:

Figure N illustrates the great puffguts of the counsellor.
Figure N is a schematic drawing of the first book of this translation.
Figure N is a perspective view of the old women in rut and heat.
Figure N is an isometric view of the middle finger of his right hand.
Figure N schematically illustrates a little peach-coloured bonnet with a great capon.
Figure N is a block diagram of the inundation of the urinal deluge.
Figure N is a cross section of the perfect image of my body.
I debated attaching actual illustrations to these descriptions, and even wrote a script to scrape Bing images for various patent illustrations, but in the end I decided the texts alone were better. I might explore the idea of programmatically creating illustrations in the future.

Detailed Description

The last part of my program creates a more detailed description of the invention. It does this by searching for “VB|VBD|VBZ|VBG * NN IN * NN”, or “any verb, followed by anything, followed by a noun, followed by a conjunction, followed by anything, followed by a noun.” I attach these, as in the illustrations section, to commonly found phrases in patent descriptions like “the present invention”, and “according to preferred embodiment”. Here are some excerpted results, from the Communist Manifesto:

The present invention is itself the product of a long course. The present invention finds its fitting complement in the most slothful indolence. The present invention creates a world after its own image. The present invention endangers the existence of bourgeois property. The present invention becomes an appendage of the machine.

According to another embodiment, the device layers the foundation for the sway. The device abolishes the right of personally acquired property. The device is the groundwork of all personal freedom.

According to another embodiment, the device is the miserable character of this appropriation. The device is the non-existence of any property. According to a preferred embodiment, the device deprives no man of power.

In accordance with an alternative specific embodiment, the present invention finds its complement in practical absence. The invention alters the character of intervention.

According to another embodiment, the device keeps even pace with dissolution. The invention is the most radical rupture with traditional property. The present invention is the condition for free development. The present invention comprehends the march of modern history.

According to another embodiment, the device is the necessary offspring of its own form. The present invention conceals the reactionary character of criticism. The present invention is the expression of the struggle.

According to a preferred embodiment, the invention expresses the struggle of one class. The invention presupposed the existence of modern bourgeois society. The invention improves the condition of every member.

Closing Thoughts

Soon I’ll create a little web service that runs the script on any user input. For the moment though you can download the code on github. The code base includes a number of (possibly) useful tools:

“machine.py” generates patents
“search.py” searches texts for parts of speech and hypernym combinations (among other things)
“get_illustrations.py” scrapes Bing for patent illustrations
“scraper.py” downloads the full text of patent applications based on keywords
BY THE WAY, if you ever want to kill a few hours, just search google patents for the dirtiest words you can come up with. You will not be disappointed.

Posted on May 11, 2014 by sam. This entry was posted in ITP, Reading and Writing Electronic Text. Bookmark the permalink.



05/13/2014 Sam Lavigne. opensource.com

2014年05月12日 20時00分32秒
特許出願テキストをなんと文学書・哲学書から自動生成してしまうジェネレーター登場

By opensource.com

<img src="https://blogimg.goo.ne.jp/user_image/24/5f/fc4344e08a050cc3116e92cfda63ac06.jpg" border="0">



特許制度とは技術・システム・装置などの発明者を保護するために一定期間独占的に使用できる特許権を付与する制度です。特許権を取得するにはしかるべき機関に特許出願手続きとして申請書などを提出する必要がありますが、そんな特許申請用のテキストを文学書・哲学書から抜粋して自動で生成してくれるジェネレーターのオープンソースプログラムがGithubで公開されています。

Transform any text into a patent application – Sam Lavigne
http://lav.io/2014/05/transform-any-text-into-a-patent-application/


以下はカール・マルクス、フリードリヒ・エンゲルス著「共産党宣言」から生成したサンプル。

(PDFファイル)communist.pages - communist.pdf


さらに、フランツ・カフカの著作「断食芸人」を使って生成された特許申請用テキストサンプルを一部抜粋したもの。双方ともに、いかにも特許フォームに書いてあるような文字列が並んでいます。

An apparatus and device for staring into vacancy
An apparatus and device for staring into vacancy


ABSTRACT


An apparatus and device for staring into vacancy. The devices comprises a good cage, a narrow gangway, an electric pocket, a flower-bedecked cage, an insensitive felt.


BRIEF DESCRIPTION OF THE DRAWINGS


Figure 0 is a diagrammatical view of a lively interest in the hunger
Figure 1 schematically illustrates the all-important striking of the clock
Figure 2 is a cross section of the full glare of the electric pocket
Figure 3 is a perspective view of the completely satisfied spectator
Figure 4 illustrates the whole weight of his body
Figure 5 schematically illustrates the aforementioned chance in public interest
Figure 6 is a schematic drawing of a positive revulsion from professional fasting
Figure 7 is a block diagram of a large circus with its enormous traffic
Figure 8 schematically illustrates the peculiar nature of his performance
Figure 9 is a diagrammatical view of some quiet corner of a circus
Figure 10 is a cross section of the main achievement of his life
Figure 11 is a schematic drawing of the old figure on the board


DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS


The present invention takes a lively interest in the hunger. The present invention takes a sip from a tiny glass. According to a beneficial embodiment, the invention is the sole completely satisfied spectator of his own fast. According to a beneficial embodiment, the invention is the easiest thing in the world. According to another embodiment, the device beats his own record (comprising of anything (such as a document or a phonograph record or a photograph) providing permanent evidence of or information about past events) by a performance. According to a preferred embodiment, the invention shows the artist on the fortieth day (comprising of time for Earth to make a complete rotation on its axis) . The present invention considers the peculiar nature of his performance. According to another embodiment, the device seeks a refuge in some quiet corner. In accordance with an alternative specific embodiment, the present invention loses his sense of the real situation. The present invention made a frame for the cage. According to a preferred embodiment, the invention takes an interest in a hunger. According to a beneficial embodiment, the invention taps his forehead with a finger.


以上のような特許申請用テキストは、4つのプログラムによって生成されています。ジェネレーターは、まず初めに「invention(発明)」に関するタイトルを生成。次に「abstract(要約)」、「a list of illustrations(実例リスト)」、最後に具体的な「embodiments(実施態様)」といった特許申請に使われているフォームでテキストが自動生成される仕組み。プログラムは特許申請によく使われる言い回しを収集し、次に同様の文法構造を含むテキストを抽出しているとのこと。「よく使われる言い回し」の収集は、主に「The Pattern Library(特許図書館)」を利用しているとのこと。

◆1:発明タイトル生成

ジェネレーターのプログラムは、「system(システム)」「method(方法論)」「apparatus(器具)」「device(デバイス)」といた単語に反応して任意のタイトルを組み合わせます。例えば、「共産党宣言」からタイトルを生成すると以下のような発明に関する特許申請になる、というわけです。
・a web-based method and device for haunting Europe(ヨーロッパで頻出するウェブベースの方法論およびデバイス)
・an apparatus and device for rounding of the Cape(岬を丸くするための器具およびデバイス)
・an apparatus and system for surpassing Egyptian pyramids(エジプト・ピラミッドを超えるための器具およびシステム)
・a web-based method and device for revolutionising the instruments(計器を革新するウェブ・ベースの方法論およびデバイス)
・a system and device for clearing of whole continents(大陸をまるごと明確化するためのシステムおよびデバイス)
・a web-based apparatus and method for paving the way(道路舗装のためのウェブ・ベースの器具およびその方法論)
・a system and apparatus for diminishing the means(手段を減少させるためのシステムおよび器具)
・an apparatus and system for fighting the bourgeoisie(中産階級と戦うための器具およびシステム)


◆2:Abstract(要約)生成

典型的に特許出願には発明の構成を説明するAbstract(要約)が必要です。タイトルとは異なり抽出する名詞は「人工物」カテゴリに制限するとのことで、ハイデッガー著「技術への問い」から抽出すると以下のようになります。
The devices comprises a wooden bridge, a technical apparatus, a high-frequency apparatus, a whole structure, a human handiwork, a mere handiwork, an autonomous tool, a hydroelectric plant, an actual chalice, an old windmill, a sacrificial chalice.(当デバイスは木製のブリッジで、高周波を発する技術的な装置である。全ての装置は手作りで、自律ツールを備える水力発電所である。実際は古い風車でいけにえの聖杯を含む)


また、名詞に続く形容詞リストとして、以下のようなフランツ・カフカ著「変身」に登場する「有機体」カテゴリのワードを抽出するプログラムを組み込むことも可能。
・old man(年配の男性)
・own mother(私の母)
・timorous visitor(臆病な来客)
・observant sister(注意深い姉妹)
・tired man(疲れた男性)
・junior salesman(年少のセールスマン)
・expressive violinist(表現力のあるバイオリン奏者)
・middle gentleman(中央の紳士)
・good man(善人)
・unfortunate son(不運な息子)
・old maid(オールドミス)
・elderly widow(初老の未亡人)
・horrible vermin(恐ろしい害虫)
・elderly mother(初老の母親)
・chief clerk(書記長)
・wild man(過激主義者)
・fall victim(犠牲者)
・sensible person(常識家)
・lazy son(怠惰な息子)
・much mother(多くの母親)
・commercial traveller(セールスマン)
・young lady(若い女性)


◆3:Illustrations(実例リスト)生成

実例リストも同じく、特許申請でよく使われる文法構造に合わせて文章を抽出します。以下はフランソワ・ラブレー著「ガルガンチュワとパンタグリュエル」から生成したサンプル。
・Figure N illustrates the great puffguts of the counsellor.(図Nはパフガッツ王子のカウンセラーを例証します)
・Figure N is a schematic drawing of the first book of this translation.(図Nはこの処女作の翻訳本の計画図です)
・Figure N is a perspective view of the old women in rut and heat.(図Nは発情と情熱を持つ老女の透視図です)
・Figure N is an isometric view of the middle finger of his right hand.(図Nは彼の右手の中指の等角図法です)
・Figure N schematically illustrates a little peach-coloured bonnet with a great capon.(図Nは概略的に小さな色が桃につけられたボンネットを大きな食用おんどりで例証します)
・Figure N is a block diagram of the inundation of the urinal deluge.(図Nは大洪水した小便所が氾濫したブロックダイヤグラムです)
・Figure N is a cross section of the perfect image of my body.(図Nは私の体の完全なイメージの断面図です)


当初はBing Imagesで実際の実例を検索するスクリプトを書いていたものの、最終的にテキストのみに変更されているとのこと。しかし改良案を思いつけば実際の実例を組み込む機能も考えているそうです。

◆4:Detailed Description(詳細な記述)生成

最後のプログラムは発明における詳細な記述を生成します。以下は共産党宣言から生成されたサンプル。
The present invention is itself the product of a long course. The present invention finds its fitting complement in the most slothful indolence. The present invention creates a world after its own image. The present invention endangers the existence of bourgeois property. The present invention becomes an appendage of the machine.(本発明は製品自体が長い道のりです。本発明は最も無精な怠惰にそのふさわしい補足を見つけます。本発明は自身のイメージの後に世界を作ります。本発明はブルジョア財産の存在を危険にさらします。本発明は機械の付属物になります。)


これらのジェネレーターは特許を生成する「machine.py」、テキストを検索して組み合わせる「search.py」、Bingから特許実例を取得する「get_illustrations.py」、特許申請テキストをキーワードに基づいてフルダウンロードする「scraper.py」のオープンソースプログラムによって動作しており、以下のGithubページからダウンロードすることができます。実際に特許取得に使えるわけではありませんが、発明者になったような気分になれたり、いかにも真面目な形式のジョークテキストを生成して楽しめるプログラムになっています。

antiboredom/patent-generator · GitHub
https://github.com/antiboredom/patent-generator


なお、制作者のサム・ラヴィーン氏はウェブサイト版の特許申請ジェネレーターも近日公開予定とのことです。


2014年05月12日 20時00分32秒 in ソフトウェア, Posted by logw_ny
22