「python」のブログ記事一覧(5ページ目)-dak ブログ

flask-classy での Web サーバ作成

2021-09-28 23:08:31 | python

flask-classy での Web サーバ作成のメモ。

■test_api_view.py

from flask import Flask
from flask_classy import FlaskView

class TestApiView(FlaskView):
    def index(self):
        return "index() is called\n"

    def search(self):
        return "search() is called\n"

■test.py

from flask import Flask
from flask_classy import FlaskView
from test_api_view import TestApiView

def main():
    app = Flask(__name__)
    TestApiView.register(app)
    app.run()
    return 0

if __name__ == '__main__':
    res = main()
    exit(res)

■実行結果

curl "http://localhsot:5000/testapi/"
index() is called

curl "http://localhost:5000/testapi/search/"
search() is called

python で logging を使ってログ出力

2021-09-20 15:36:02 | python

python で logging を使ってログを出力する方法のメモ。

dictConfig() で yaml でのログ出力設定に従って、logger を初期化します。

import sys
import yaml
import logging
from logging.config import dictConfig
from foo import Foo

def main():
    conf_file = 'config.yml'
    with open(conf_file) as file:
        conf = yaml.safe_load(file)
        dictConfig(conf['logger'])
    logger = logging.getLogger(__name__)

    logger.info('*** main start ***')

    foo = Foo()
    foo.bar()

    return 0

if __name__ == '__main__':
    res = main()
    exit(res)

上記のプログラムから呼び出される Foo クラスでは、コンストラクタで logging.getLogger() で logger を取得し、
bar() メソッドで INFO ログを出力します。

import logging

class Foo:
    def __init__(self):
        self.logger = logging.getLogger(__name__)

    def bar(self):
        self.logger.info('*** bar info ***')

yaml の設定ファイルで logger、その他の設定を読み込みます。

settings:
  key1:
  key2:
  key3:

logger:
  version: 1
  formatters:
    file_log_format:
      format: '[{asctime}] [{levelname}] [{module}] [{funcName}] {lineno}: {message:s}'
      datefmt: '%Y-%m-%d %H:%M:%S'
      style: '{'
  handlers:
    file:
      class : logging.handlers.TimedRotatingFileHandler
      formatter: file_log_format
      filename: test.log
      when: MIDNIGHT
      backupCount: 7
      encoding: utf-8
  root:
      level: INFO
      handlers:
        - file
  disable_existing_loggers: False

上記のプログラムを実行すると、以下のようなログが出力されます。

[2021-09-19 23:32:45] [INFO] [test2] [main] 22: *** main start ***
[2021-09-19 23:32:45] [INFO] [foo] [bar] 13: *** bar info ***

python から kuromoji を実行する方法のメモ。

以前の「python から java プログラムを実行」する方法を元にして、python から kuromoji を実行します。

まず、以下の URL から kuromoji をダウンロードします。
https://github.com/atilika/kuromoji/downloads

そして、パスの指定を簡単にするため、kuromoji の jar とpy4j の jar をカレントディレクトリにコピーします。
以前と同様に以下のプログラムをコンパイル・実行します。

import py4j.GatewayServer;

public class GwTest1 {
    public static void main(String[] args) {
        GwTest1 app = new GwTest1();
        GatewayServer server = new GatewayServer(app);
        server.start();
    }
}

java -classpath .:py4j0.10.9.2.jar:kuromoji-0.7.7.jar GwTest1

python から kuromoji を実行するプログラムは以下のようになります。

import sys
from py4j.java_gateway import JavaGateway

def main():
    gw = JavaGateway()

    str = 'サンプルプログラムを実行します。'

    tokenizer = gw.jvm.org.atilika.kuromoji.Tokenizer.builder().build()
    jtkns = tokenizer.tokenize(str)

    for i in range(len(jtkns)):
        jtkn = jtkns[i]
        tkn = {
            'form': jtkn.getSurfaceForm(),
            'base': jtkn.getBaseForm(),
            'read': jtkn.getReading(),
            'pos': jtkn.getAllFeatures(),
        }
        print(tkn)

    return 0

if __name__ == '__main__':
    res = main()
    exit(res)

上記のプログラムを実行すると、以下のように形態素解析結果が出力されます。

$ python kuromoji1.py
{'form': 'サンプル', 'base': 'サンプル', 'read': 'サンプル', 'pos': '名詞,一般,*,*,*,*,サンプル,サンプル,サンプル'}
{'form': 'プログラム', 'base': 'プログラム', 'read': 'プログラム', 'pos': '名詞,サ変接続,*,*,*,*,プログラム,プログラム,プログラム'}
{'form': 'を', 'base': 'を', 'read': 'ヲ', 'pos': '助詞,格助詞,一般,*,*,*,を,ヲ,ヲ'}
{'form': '実行', 'base': '実行', 'read': 'ジッコウ', 'pos': '名詞,サ変接続,*,*,*,*,実行,ジッコウ,ジッコー'}
{'form': 'し', 'base': 'する', 'read': 'シ', 'pos': '動詞,自立,*,*,サ変・スル,連用形,する,シ,シ'}
{'form': 'ます', 'base': 'ます', 'read': 'マス', 'pos': '助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス'}
{'form': '。', 'base': '。', 'read': '。', 'pos': '記号,句点,*,*,*,*,。,。,。'}

python から java プログラムを実行

2021-09-20 00:20:09 | python

python から java プログラムを実行する方法のメモ。

python から java プログラムを実行するには、py4j を使います。
py4j のインストール方法は以下の通り。

pip install py4j

自分の環境では以下に jar ファイルがインストールされました。

/usr/local/share/py4j/py4j0.10.9.2.jar

まず、java プログラム（GwTest1.java）を作成します。

import py4j.GatewayServer;

public class GwTest1 {
    public static void main(String[] args) {
	GwTest1 app = new GwTest1();
	GatewayServer server = new GatewayServer(app);
	server.start();
    }
}

そして、java プログラムをコンパイル・実行します。

javac -classpath /usr/local/share/py4j/py4j0.10.9.2.jar GwTest1.java
java -classpath .:/usr/local/share/py4j/py4j0.10.9.2.jar GwTest1

次に、python プログラムを作成します。
ここでは、java の Date クラスで日付を取得します。

import sys
from py4j.java_gateway import JavaGateway

def main():
    gw = JavaGateway()

    date = gw.jvm.java.util.Date()
    print(date)

    return 0

if __name__ == '__main__':
    res = main()
    exit(res)

上記のプログラムを実行します。

$ python test1.py
Sun Sep 19 08:33:39 PDT 2021

java の Date クラスにより、python で日付情報を取得することが確認できました。

python での yaml ファイルの読み込み

2021-09-19 23:34:02 | python

python での yaml ファイルの読み込み方法のメモ。

python で yaml を処理するためのライブラリの pyyaml をインストールします。

pip install pyyaml

以下のようにして pyyaml で yaml ファイルを読み込めます。

import sys
import yaml

def main():
    file = sys.argv[1]
    try:
        with open(file) as f:
            obj = yaml.safe_load(f)
            print(type(obj))
            print(obj)
    except:
        sys.stderr.write('error: failed to open %s\n' % (file))
        return 1

    return 0

if __name__ == '__main__':
    res = main()
    exit(res)

上記のプログラムに以下の yaml ファイルを読み込ませてみます。

db:
  host: localhost
  port: 3306
  db:   test-user

vals1:
  - a
  - b
  - c

vals2: [a, b, c]

実行結果は以下のようになります。

'db': {'host': 'localhost', 'port': 3306, 'db': 'test-user'}, 'vals1': ['a', 'b', 'c'], 'vals2': ['a', 'b', 'c']}

python でのコマンドライン引数の取得

2021-09-19 23:19:41 | python

python でのコマンドライン引数の取得方法のメモ。

python でコマンドライン引数を取得するライブラリの OptionParser の使用例。

import sys
import os
from optparse import OptionParser

def get_opts():
    param = {
        'opts': None,
        'args': None,
    }
    
    try:
        optp = OptionParser()
        optp.add_option('-c', dest='config')
        optp.add_option('-i', dest='input')
        optp.add_option('-o', dest='output')
        (opts, args) = optp.parse_args()
        param['opts'] = opts
        param['args'] = args
    except:
        return None

    return param
    
def main():
    param = get_opts()
    if param is None:
        return 1
    
    print(param)
    return 0

if __name__ == '__main__':
    res = main()
    exit(res)

実行例：

$ python3 test1.py -c config.yaml -i input.txt -o output.txt arg1.txt arg2.txt
{'opts': <Values at 0x7fb0dbbe5b70: {'config': 'config.yaml', 'input': 'input.txt', 'output': 'output.txt'}>, 'args': ['arg1.txt', 'arg2.txt']}

-o を指定しない場合には output が None となります。

$ python3 test1.py -c config.yaml -i input.txt arg1.txt
{'opts': <Values at 0x7f1dd27e7b38: {'config': 'config.yaml', 'input': 'input.txt', 'output': None}>, 'args': ['arg1.txt']}

不要なオプション（-e）を指定するとエラーになります。

 python3 test1.py -c config.yaml -i input.txt -o output.txt -e else.txt arg1.txt arg2.txt
Usage: test1.py [options]

test1.py: error: no such option: -e

pyenv のインストールと python のインストール方法

2021-09-09 23:20:27 | python

pyenv のインストールと python のインストール方法のメモ。

pyenv をインストールし、その後で python 3.9.7 をインストールします。

■pyenv のインストール

git clone https://github.com/yyuu/pyenv.git ~/.pyenv

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
echo 'eval "$(pyenv init -)"' >> ~/.bash_profile

source ~/.bash_profile

■python 3.9.7 のインストール
sudo yum install zlib-devel bzip2 bzip2-devel readline-devel sqlite sqlite-devel openssl-devel
pyenv install 3.9.7
pyenv global 3.9.7

python で文字のコードを取得する方法

2021-07-05 23:22:13 | python

python で文字のコードを取得する方法のメモ。

文字から文字のコードを取得する方法。

>>> print("%x" % ord('0'))
30
>>> print("%x" % ord('あ'))
3042

文字のコードから文字に変換する方法。

>>> print("%s" % chr(0x30))
0
>>> print("%s" % chr(0x3042))
あ

nmslib でインデックスを生成する際のスレッド数の指定方法

2021-04-01 00:21:49 | python

nmslib でインデックスを生成する際のスレッド数の指定方法のメモ。

nmslib の createIndex() でパラメータを指定しないと、CPUをフルに使おうとしてくれるのですが、
他のプロセスに CPU を回せなくなることもあります。

createIndex() では indexThreadQty でスレッド数を指定することができます。

index = nmslib.init(method='hnsw', space='l2')
index.createIndex({'indexThreadQty': スレッド数})

nmslib のパラメータ

2021-03-19 23:53:43 | python

ベクトル検索の nmslib のパラメータについてのメモ。

ベクトル間の cos 類似度が高い順に検索するには、以下のように space='cosinesimil' を指定して初期化します。

index = nmslib.init(space='cosinesimil')

■サンプルプログラム（cos 類似度版）

import numpy as np
import nmslib

index = nmslib.init(method='hnsw', space='cosinesimil')

vs = [
    [1.0, 2.0],
    [2.0, 2.0],
    [3.0, 3.0],
]

# インデックス生成
vs = np.array(vs, dtype=np.float32)
index.addDataPointBatch(vs)
index.createIndex({}, print_progress=False)

# 検索
v = np.array([1.0, 1.0], dtype=np.float32)
ids, dists = index.knnQuery(v, 10)

print('ids: %s' % (ids))
print('dists: %s' % (sims))

実行結果

ids: [2 1 0]
dists: [0.0000000e+00 5.9604645e-08 5.1316738e-02]

※cos類似度=1の場合に距離=0

一方、ベクトル間の距離が短い順に検索するには、space='l2' を指定して初期化します。

index = nmslib.init(space='l2')

■サンプルプログラム（距離版）

import numpy as np
import nmslib

index = nmslib.init(method='hnsw', space='l2')

vs = [
    [1.0, 2.0],
    [2.0, 2.0],
    [3.0, 3.0],
]

# インデックス生成
vs = np.array(vs, dtype=np.float32)
index.addDataPointBatch(vs)
index.createIndex({}, print_progress=False)

# 検索
v = np.array([1.0, 1.0], dtype=np.float32)
ids, dists = index.knnQuery(v, 10)

print('ids: %s' % (ids))
print('dists: %s' % (dists))

実行結果

ids: [0 1 2]
dists: [1. 2. 8.]

※距離は2乗和となっている

その他に、method='hnsw' の場合、createIndex() メソッドの第1引数のパラメータに
'post' を指定すると後処理の制御ができます。
0: 後処理なし、1, 2: 後処理あり（2 の方が強力）

python でリストを複数項目でソートする方法

2021-02-11 12:55:25 | python

python で、リストの要素内に複数の項目でソートを行う方法のメモ。

str に文字列、num に数字が入っているデータをソートします。

data1 = [
    {'str': 'a', 'num': 6},
    {'str': 'a', 'num': 5},
    {'str': 'b', 'num': 4},
    {'str': 'b', 'num': 3},
    {'str': 'c', 'num': 2},
    {'str': 'c', 'num': 1},
]

str の昇順、num の昇順でソートする場合には、lambda で str、num の順で指定します。

sorted(data1, key=lambda x: (x['str'], x['num']))
-->
[{'str': 'a', 'num': 5},
 {'str': 'a', 'num': 6},
 {'str': 'b', 'num': 3},
 {'str': 'b', 'num': 4},
 {'str': 'c', 'num': 1},
 {'str': 'c', 'num': 2}]

str の昇順、num の昇順でソートする場合には、lambda で str、num の順で指定します。

sorted(data1, key=lambda x: (x['num'], x['str']))
-->
[{'str': 'c', 'num': 1},
 {'str': 'c', 'num': 2},
 {'str': 'b', 'num': 3},
 {'str': 'b', 'num': 4},
 {'str': 'a', 'num': 5},
 {'str': 'a', 'num': 6}]

str の降順、num の降順でソートする場合には、reverse=True を指定します。

sorted(data1, key=lambda x: (x['str'], x['num']), reverse=True)
-->
[{'str': 'c', 'num': 2},
 {'str': 'c', 'num': 1},
 {'str': 'b', 'num': 4},
 {'str': 'b', 'num': 3},
 {'str': 'a', 'num': 6},
 {'str': 'a', 'num': 5}]

より自由に比較順を制御する必要がある場合には functools モジュールの cmp_to_key を使います。

from functools import cmp_to_key

# str の降順、num の昇順でソート
def cmp1(a, b):
    # str を比較し、比較結果の符号を反転して返す
    cmp_str = 0 if a['str'] == b['str'] else -1 if a['str'] < b['str'] else 1
    if cmp_str != 0:
        return - cmp_str

    # a['str'] が一致した場合は、num の比較結果を返す
    cmp_num = 0 if a['num'] == b['num'] else -1 if a['num'] < b['num'] else 1
    return cmp_num

# str の昇順、num の降順でソート
def cmp2(a, b):
    # str を比較し、比較結果を返す
    cmp_str = 0 if a['str'] == b['str'] else -1 if a['str']

比較関数に cmp1 を指定し、str の降順、num の昇順でソート

sorted(data1, key=cmp_to_key(cmp1)))
-->
[{'str': 'c', 'num': 1},
 {'str': 'c', 'num': 2},
 {'str': 'b', 'num': 3},
 {'str': 'b', 'num': 4},
 {'str': 'a', 'num': 5},
 {'str': 'a', 'num': 6}]

比較関数に cmp2 を指定し、str の昇順、num の降順でソート

sorted(data1, key=cmp_to_key(cmp2))
-->
[{'str': 'a', 'num': 6},
 {'str': 'a', 'num': 5},
 {'str': 'b', 'num': 4},
 {'str': 'b', 'num': 3},
 {'str': 'c', 'num': 2},
 {'str': 'c', 'num': 1}]

num のソート順に関しては、符号の反転でも実現可能

sorted(data1, key=lambda x: (x['str'], - x['num']), reverse=True)
->
[{'str': 'c', 'num': 1},
 {'str': 'c', 'num': 2},
 {'str': 'b', 'num': 3},
 {'str': 'b', 'num': 4},
 {'str': 'a', 'num': 5},
 {'str': 'a', 'num': 6}]

python でひらがな⇔カタカナの変換方法とひらがな・カタカナの判定方法

2021-01-23 12:31:32 | python

python でひらがな⇔カタカナの変換方法とひらがな・カタカナの判定方法のメモ。

ひらがな⇔カタカナの変換、文字種の判別は、両方とも wanakana で実現できます。
wanakana は以下でインストールできます。

pip install wanakana-python

ひらがな⇔カタカナの変換は to_xxx() を使います。

import wanakana
wanakana.to_katakana('あア亜')
==> 'アア亜'
wanakana.to_hiragana('あア亜')
==> 'ああ亜'

ひらがな、カタカナ等の文字種の判定には is_xxx() を使います。

import wanakana
wanakana.is_hiragana('あぁー')    # 「ー」は、ひらがなと判定される
==> True
wanakana.is_hiragana('あ・あ')    # 「・」は、ひらがなとは判定されない
False
wanakana.is_hiragana('あア亜')
==> False

wanakana.is_katakana('アァー')  # 「ー」は、カタカナと判定される
==> True
wanakana.is_katakana('ア・ア')   # 「・」は、カタカナと判定される
==> True
wanakana.is_katakana('あア亜')
==> False

wanakana.is_kanji('亜阿吾')
==> True
wanakana.is_kanji('あア亜')
==> False

「・」はカタカナでの人名表記の際に姓名の区切り文字として使われますが、
その用途を考慮してか「・」は is_katakana() で True と判定されます。

python で Google Cloud Storage 上のデータをダウンロードする方法

2020-08-29 17:16:59 | python

python で Google Cloud Storage 上のデータをダウンロードする方法のメモ。

まず、pip install で Google Cloud Storage のライブラリをインストールします。

pip install google-cloud-storage

以下、Google Cloud Storage 上のデータを文字列に格納する方法です。
インメモリのバイトストリームの io.BytesIO オブジェクトにデータを保存し、
そこからバイナリデータを読み込んで文字列として取得します。

import sys
import io
import json
from google.cloud import storage

path = 'gs://～'
client = storage.Client()

buf = io.BytesIO()
client.download_blob_to_file(path, buf)
str = buf.getvalue().decode('utf-8')
buf.close()

print(str)

python で subprocess を使ってコマンドを実行する方法

2020-08-15 20:44:15 | python

python で subprocess を使ってコマンドを実行する方法のメモ。

■実行するコマンドと引数をリストで指定し、コマンドの実行結果を stdout、stderr に出力する場合：

import subprocess

res = subprocess.run(['ls', '-a'])
print(res)

実行すると、ls -a の結果が stdout に出力されます。

.  ..  test1.py
CompletedProcess(args=['ls', '-a'], returncode=0)

■実行するコマンドと引数を文字列で指定し、コマンドの実行結果を stdout、stderr に出力する場合：

res = subprocess.run('ls -a', shell=True)
print(res)

実行すると、ls -a の結果が stdout に出力されます。
返却値の args が 'ls -a' という文字列になっています。

.  ..  test1.py
CompletedProcess(args='ls -a', returncode=0)

■コマンドの実行結果をプログラム側で取得する場合：

res = subprocess.run('ls -a', shell=True,
                     stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(res)
print(res.stdout)
print(res.stderr)

実行すると、res.stdout、res.stderr に実行結果がバイト列で格納されます。

CompletedProcess(args='ls -a', returncode=0, stdout=b'.\n..\ntest1.py\n', stderr=b'')
b'.\n..\ntest1.py\n'
b''

バイト列ではなく、文字列として取得する場合には、encoding を指定します。

res = subprocess.run('ls -a', shell=True,
                     stdout=subprocess.PIPE, stderr=subprocess.PIPE,
                     encoding='utf-8')
print(res)
print(res.stdout)
print(res.stderr)

出力結果が文字列となっていることがわかります。

CompletedProcess(args='ls -a', returncode=0, stdout='.\n..\ntest1.py\n', stderr='')
.
..
test1.py

wordファイル（.docx）から本文テキストを抽出する方法

2020-08-12 22:31:43 | python

wordファイル（.docx）から本文テキストを抽出する方法のメモ。

docx ファイルは zip ファイルになっているため、zip ファイルとして読み込み、
word/document.xml から本文を抽出します。
以下のプログラムでは、ルビを除去するため、<w:rt>タグを削除しています。

import sys
import json
import re
import lxml.etree
import zipfile


def extract_text(node):
    text = lxml.etree.tostring(node, encoding='utf-8').decode('utf-8')
    text = re.sub('<w:rt>.*?</w:rt>', '', text)
    text = re.sub('<.*?>', '', text)

    return text


def run():
    xmlns = {'w':'http://schemas.openxmlformats.org/wordprocessingml/2006/main'\
}

    zip = zipfile.ZipFile(sys.argv[1])
    if zip is None:
        sys.stderr.write('error: failed to open %s' % (sys.argv[0]))
        return 1

    xmlstr = zip.read('word/document.xml')
    dom = lxml.etree.fromstring(xmlstr)

    text_nodes = dom.xpath('//w:p', namespaces=xmlns)
    for text_node in text_nodes:
        text = extract_text(text_node)
        if text == '':
            continue

        print(text)

    zip.close()

    return 0


if __name__ == '__main__':
    res = run()
    exit(res)

バックナンバー

2025年03月

2025年02月

2025年01月

2024年12月

2024年11月

2024年10月

2024年09月

2024年08月

2024年07月

2024年06月

2024年05月

2024年04月

2024年03月

2024年02月

2024年01月

2023年12月

2023年11月

2023年10月

2023年09月

2023年08月

2023年07月

2023年05月

2023年04月

2023年03月

2023年02月

2023年01月

2022年12月

2022年11月

2022年10月

2022年09月

2022年08月

2022年07月

2022年06月

2022年05月

2022年04月

2022年03月

2022年02月

2022年01月

2021年12月

2021年11月

2021年10月

2021年09月

2021年07月

2021年06月

2021年04月

2021年03月

2021年02月

2021年01月

2020年11月

2020年09月

2020年08月

2020年07月

2020年06月

2020年05月

2020年04月

2020年03月

2020年02月

2019年12月

2019年11月

2019年10月

2019年09月

2019年08月

2019年07月

2019年06月

2019年04月

2019年02月

2019年01月

2018年12月

2018年11月

2018年10月

2018年09月

2018年07月

2018年06月

2013年09月

2013年06月

2012年07月

2012年06月

2012年05月

2012年01月

2011年11月

2011年09月

2011年08月

2011年07月

2011年06月

2011年05月

2011年04月

2011年03月

2011年02月

2011年01月

2010年12月

2010年11月

2007年05月

2007年03月

2007年02月

2007年01月

2006年12月

2006年11月

2006年10月

2006年09月

2006年08月

2006年07月

2006年06月

2006年05月

2006年04月

2006年03月

カレンダー

前月

次月

goo blog おすすめ

	「#gooblog引越し」で体験談を募集中
	【コメント募集中】goo blogでの思い出は？
	おすすめブログ

@goo_blog

お客さまのご利用端末からの情報の外部送信について

goo blog お知らせ

	【11/18】goo blogサービス終了のお知らせ
	【PR】ドコモのサブスク【GOLF me！】初月無料
	【コメント募集中】goo blogでの思い出は？
	「#gooblog引越し」で体験談を募集中

2025年8月
日	月	火	水	木	金	土
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

python、rubyなどのプログラミング、MySQL、サーバーの設定などの備忘録。レゴの写真も。