「python」のブログ記事一覧(4ページ目)-dak ブログ

pip での No module named '_ctypes' への対処方法

2022-03-19 12:03:17 | python

pip でパッケージをインストールする際に以下のエラーが出力される場合の対処方法のメモ。

ModuleNotFoundError: No module named '_ctypes'

libffi-devel をインストールして python をインストールし直すと、このエラーは解消されます。
以下は、python-3.9.7 の場合です。

pyenv uninstall 3.9.7
sudo yum install libffi-devel
pyenv install 3.9.7

pillow で avif 形式の画像を処理できるようにする方法

2022-03-18 21:04:06 | python

pillow で avif 形式の画像を処理できるようにする方法のメモ。

pillow-avif-plugin をインストールします。

pip install pillow-avif-plugin

python プログラム内で pillow-avif-plugin を import すると、
pillow で avif 形式の画像を処理できるようになります。

import pillow_avif
from PIL import Image
...

pyocr による文字認識

2022-03-18 20:56:42 | python

pyocr で文字認識を行う方法のメモ。

まず、必要なライブラリ等をインストールします。
■leptonica のインストール
leptonica のページで rpm の URL を確認します。

https://centos.pkgs.org/8/centos-appstream-x86_64/leptonica-1.76.0-2.el8.x86_64\
.rpm.html

rpm をインストールします。

wget https://vault.centos.org/centos/8/AppStream/x86_64/os/Packages/leptonica-1\
.76.0-2.el8.x86_64.rpm
sudo rpm -ivh leptonica-1.76.0-2.el8.x86_64.rpm

■tesseract のインストール

sudo yum install tesseract tesseract-langpack-jpn.noarch

■pyocr のインストール

pip install pyocr

■プログラム
pyocr を用いた文字認識のサンプルプログラムです。
引数に画像ファイル名を指定すると、画像中のテキストを出力します。

import sys
from PIL import Image
import pyocr.builders

def main():
    img_file = sys.argv[1]

    tool = pyocr.get_available_tools()[0]
    img = Image.open(img_file)
    txt_bldr = pyocr.builders.TextBuilder()
    text = tool.image_to_string(img, lang='jpn', builder=txt_bldr)
    print(text)

    return 0

if __name__ == '__main__':
    res = main()
    exit(res)

python で変数名の文字列でインスタンス変数にアクセス

2022-03-01 23:57:44 | python

python で変数名の文字列でインスタンス変数にアクセスする方法のメモ。

getattr(インスタンス, インスタンス変数名) で指定インスタンスのインスタンス変数の値を取得することができます。
また、setattr(インスタンス, インスタンス変数名, 値) でインスタンス変数に値を設定することができます。

class Test:
    def __init__(self):
        self.var1 = 'abc'
        self.var2 = 'def'

    def get(self, var):
        return getattr(self, var)

    def set(self, var, val):
        setattr(self, var, val)

test = Test()
test.set('var3', 'ghi')
print(test.get('var1'))
print(test.get('var2'))
print(test.get('var3'))

■実行結果

abc
def
ghi

PIL のインストール

2022-02-26 22:46:49 | python

PIL のインストール方法のメモ。

PIL をインストールするには jpeg 関連のライブラリが必要になります。
他にも環境によっては zlib-devel や python-devel が必要な場合もあります。

sudo yum install libjpeg-turbo-devel
sudo yum install zlib-devel
sudo yum install python36-devel
sudo pip install pillow

PIL でバイトデータから画像を生成

2022-02-23 23:15:17 | python

PIL でバイトデータから画像を生成する方法のメモ。

以下では、画像をバイトデータとしてダウンロードして、バイトデータから PIL の Image オブジェクトを生成しています。
バイトデータから Image オブジェクトを生成する際には、BytesIO を使用します。

import requests
from PIL import Image
from io import BytesIO

img_url = 'https://i.xgoo.jp/img/static/global/cmm/sn/logo_gooblog.png'
uto=format&w=384'

res = requests.get(img_url)
inst = BytesIO(res.content)
img_obj = Image.open(inst).convert('RGB')
inst.close()

img_obj.save('img.png')

python で base64 で文字列をエンコード・デコード

2022-02-19 20:21:40 | python

python で base64 で文字列をエンコード・デコードする方法のメモ。

import base64

data = 'あいうえお'
data_enc = data.encode('utf-8')
data_b64 = base64.b64encode(data_enc)
print(data_b64)

data_b64_dec = base64.b64decode(data_b64)
data_dec = data_b64_dec.decode('utf-8')
print(data_dec)

■実行結果

b'44GC44GE44GG44GI44GK'
あいうえお

python lex-yacc で正規表現風の文法を解析

2022-02-06 20:27:06 | python

python lex-yacc で以下のような正規表現風の文法を受け付けられるパーザーを作成します。

abc
(abc|def)
a(b|c)(d|e)f

ただし、(...) の中に (...) は書けないものとします。

lex で使うシンボルは以下の通りです。

CHAR: [^|()] ※|、(、) 以外の文字。
LB:   (
RB:   )
PIPE: |

■test_lex.py

import sys
import ply.lex as lex

tokens = (
    'CHAR',
    'LB',
    'RB',
    'PIPE',
)

t_CHAR = r'[^|()]'
t_LB = r'[(]'
t_RB = r'[)]'
t_PIPE = r'[|]'

def t_error(t):
    sys.stderr.write('illegal char "%s"' % t.value[0])
    t.lexer.sip(t)

lexer = lex.lex()

yacc の文法は以下の通りです。

pttrn       : sub_pttrns
sub_pttrns  : sub_pttrn
            | sub_pttrns sub_pttrn
sub_pttrn   : chars
            | select
select      : LB chars RB
            | LB chars select_iter RB
select_iter : PIPE chars
            | PIPE chars select_iter
chars       : CHAR
            | chars CHAR

■test_yacc.py
以下で p[0] が上記の文法の左辺、p[1:] が右辺です。

import sys
import ply.yacc as yacc
from test_lex import tokens

def p_pttrn(p):
    '''pttrn : sub_pttrns'''
    p[0] = p[1]

def p_sub_pttrns(p):
    '''sub_pttrns : sub_pttrn
                  | sub_pttrns sub_pttrn'''
    if len(p) == 2:
        p[0] = p[1]
    else:
        p[0] = p[1]
        p[0].extend(p[2])

def p_sub_pttrn(p):
    '''sub_pttrn : chars
                 | select'''
    p[0] = p[1]

def p_select_pttrn(p):
    '''select : LB chars RB
              | LB chars select_iter RB'''
    if len(p) == 4:
        p[0] = [{'or': [p[2]]}]
    else:
        p[0] = [p[2]]
        p[0].extend(p[3])
        p[0] = [{'or': p[0]}]

def p_select_iter(p):
    '''select_iter : PIPE chars
                   | PIPE chars select_iter'''
    if len(p) == 3:
        p[0] = p[2]
    else:
        p[0] = [p[2]]
        p[0].extend(p[3])

def p_chars(p):
    '''chars : CHAR
             | chars CHAR'''
    if len(p) == 2:
        p[0] = [p[1]]
    else:
        p[0] = p[1]
        p[0].append(p[2])

上記の文法で以下の文字列を受け付けるか解析してみます。
最後の 'a(b|(c|d))' は (...) がネストしているため、
文法には合わない文字列です。

pttrns = [
    'abc',
    '(abc)',
    '(abc|def)',
    'abc(def|ghi)(jkl|mno)pqr',
    'a(b|(c|d))',
]

parser = yacc.yacc()
for pttrn in pttrns:
    print('====')
    print('pttrn: %s' % (pttrn))
    result = parser.parse(pttrn)
    print('parsed: %s' % (result))

■実行結果

====
pttrn: abc
parsed: ['a', 'b', 'c']
====
pttrn: (abc)
parsed: [{'or': [['a', 'b', 'c']]}]
====
pttrn: (abc|def)
parsed: [{'or': [['a', 'b', 'c'], 'd', 'e', 'f']}]
====
pttrn: abc(def|ghi)(jkl|mno)pqr
parsed: ['a', 'b', 'c', {'or': [['d', 'e', 'f'], 'g', 'h', 'i']}, {'or': [['j', 'k', 'l'], 'm', 'n', 'o']}, 'p', 'q', 'r']
====
pttrn: a(b|(c|d))
parsed: None

文法通りであれば文字列を解析した結果が返却されますが、
最後の文字列は解析結果が None となっていて、文法通りではない文字列であると判定されています。

flask でのパラメータの取得方法

2022-01-04 20:33:40 | python

flask でパラメータを取得する方法のメモ。

POST パラメータを取得する場合は、request.form.get(パラメータ名) で参照します。

@app.route(..., methods=["POST"])
def view():
  val = request.form.get('var')

ファイルを受信する場合は、request.files.get(ファイル名) で参照します。

@app.route(..., methods=["POST"])
def view():
  bytes = request.form.files.get(ファイル名).stream.read()

ファイルを送信するための form は以下のように記述します。

<form name="upload_form" method="post" action="upload" enctype="multipart/form-data">
<input type="file" name="upload_file"><br>
<input type="button" value="アップロード" onclick="upload_form.submit();">
</form>

GET パラメータを取得する場合は、request.args.get(パラメータ名) で参照します。

@app.route(...):
def view():
  val = request.args.get('var')

python でインスタンス変数の一覧を参照

2021-12-31 14:10:01 | python

python でインスタンス変数の一覧を参照する方法のメモ。

__dict__ や vars() でインスタンス変数とその値を参照することができます。
未定義の変数を get() で参照しようとすると、None が返却されます。

import sys

class TestClass:
    def __init__(self):
        self.var1 = "value1"
        self.var2 = "value2"

    def output_vars(self):
        print("* __dict__")
        for var, val in self.__dict__.items():
            print("%s -- %s" % (var, val))

        print("* vars()")
        for var, val in vars(self).items():
            print('%s -- %s' % (var, val))

        print("* vars().get()")
        for var in ['var1', 'var2', 'var3']:
            val = vars(self).get(var)
            print('%s -- %s' % (var, val))

t = TestClass()
t.output_vars()

実行結果

* __dict__
var1 -- value1
var2 -- value2
* vars()
var1 -- value1
var2 -- value2
* vars().get()
var1 -- value1
var2 -- value2
var3 -- None

python で相対 URL を絶対 URL に変換

2021-12-28 23:32:36 | python

python で相対 URL を絶対 URL に変換する方法のメモ。

urllib.parse.urljoin() で相対 URL を絶対 URL に変換することができます。

import urllib
rel_url = '../test/test.html'
base_url = 'http://localhost/dir1/dir2/dir3/page.html'
urllib.parse.urljoin(base_url, rel_url)
> 'http://localhost/dir1/dir2/test/test.html'

xpath で除外条件を指定

2021-12-21 20:43:35 | python

xpath で除外条件を指定する方法のメモ。
特定のタグを除外する場合には、/*[not(self::{タグ})] のように xpath を記述します。

import sys
import lxml.html

htmlstr = """
<html>
<body>
<div class="a">
<div class="aa">div_aa</div>
<div class="ab">div_ab</div>
<div class="ac">div_ac</div>
<script></script>
<style></style>
</div>
</body>
</html>
"""

dom = lxml.html.fromstring(htmlstr)
nodes = dom.xpath('//div[@class="a"]/*[not(self::script or self::style)]')

for node in nodes:
    for text in node.itertext():
        print(text)

web ページからのテキスト抽出で、script タグなど特定のタグを常に除外するなら、
以下のように特定のタグを除外してから xpath で条件に合うタグを取得する方が楽かもしれません。

nodes = dom.xpath('//script|//style')
for node in nodes:
    node.getparent().remove(node)

nodes = dom.xpath('//div[@class="a"]/*')
for node in nodes:
    for text in node.itertext():
        print(text)

lxml で html からテキストを抽出する方法

2021-12-16 23:48:44 | python

lxml で html からテキストを抽出する方法のメモ。

text では、先頭の子要素のテキストしか抽出できませんが、
itertext() では、配下の全要素のテキストを抽出することができます。

import sys
import lxml.html

htmlstr = '<html><body><div class="a">abc<h1>h1</h1>def<h2>h2</h2>ghi</div></bo\
dy></html>'

dom = lxml.html.fromstring(htmlstr)
div = dom.xpath('//div[@class="a"]')[0]

print('div.text')
print(div.text)

print('div.itertext()')
print('\n'.join(div.itertext()))

実行結果

div.text
abc
div.itertext()
abc
h1
def
h2
ghi

python の lxml で html の dom を操作してみる

2021-12-14 00:03:52 | python

python の lxml で html の dom を操作してみる。

import sys
import lxml.html

# div/p0-p4                                                                     
htmlstr = '<html><body><div class="a"><p>p0</p><p>p1</p><p>p2</p><p>p3</p><p>p4</p></div></body></html>'

dom = lxml.html.fromstring(htmlstr)
nodes = dom.xpath('//div[@class="a"]')
div = nodes[0]

# p0 を削除                                                                     
div.remove(div[0])
print(lxml.html.tostring(dom).decode('utf-8'))

# p2 を検索                                                                     
idx = div.index(div[1])
print('idx: %d' % (idx))

# p2 を更新                                                                     
div[idx] = lxml.html.fromstring('<p>p2 new</p>')
print(lxml.html.tostring(dom).decode('utf-8'))

# body を出力                                                                   
print(lxml.html.tostring(div.getparent()).decode('utf-8'))

# p5 を追加                                                                     
p5 = lxml.html.fromstring('<p>p5</p>')
div.append(p5)
print(lxml.html.tostring(dom).decode('utf-8'))

実行結果

<html><body><div class="a"><p>p1</p><p>p2</p><p>p3</p><p>p4</p></div></body></html>
idx: 1
<html><body><div class="a"><p>p1</p><p>p2 new</p><p>p3</p><p>p4</p></div></body></html>
<body><div class="a"><p>p1</p><p>p2 new</p><p>p3</p><p>p4</p></div></body>
<html><body><div class="a"><p>p1</p><p>p2 new</p><p>p3</p><p>p4</p><p>p5</p></div></body></html>

python の flask_classful を使った Web サーバ

2021-10-01 22:45:35 | python

python の flask_classful を使った Web サーバの例。

import json
from flask import Flask
from flask_classful import FlaskView, route

class AppView(FlaskView):
    cls_var = None
    
    def index(self):
        return  'index\n'
    
    def test_url(self):
        result = 'test_url\n'
        result += json.dumps(type(self).cls_var, ensure_ascii=False, indent=2)
        result += '\n'
        return result
    
    @classmethod
    def set_class_var(cls, var):
        cls.cls_var = var
        
def main():
    app = Flask(__name__)
    var = {'key1': 'val1', 'key2': 'val2'}
    AppView.register(app, '/test')
    AppView.set_class_var(var)
    app.run()
    return 0

if __name__ == '__main__':
    res = main()
    exit(res)

上記のプログラムを実行し、レスポンスを確認してみると以下のような結果が返却されます。

$ curl 'http://localhost:5000/test/test_url/'
test_url
{
  "key1": "val1",
  "key2": "val2"
}

バックナンバー

2025年03月

2025年02月

2025年01月

2024年12月

2024年11月

2024年10月

2024年09月

2024年08月

2024年07月

2024年06月

2024年05月

2024年04月

2024年03月

2024年02月

2024年01月

2023年12月

2023年11月

2023年10月

2023年09月

2023年08月

2023年07月

2023年05月

2023年04月

2023年03月

2023年02月

2023年01月

2022年12月

2022年11月

2022年10月

2022年09月

2022年08月

2022年07月

2022年06月

2022年05月

2022年04月

2022年03月

2022年02月

2022年01月

2021年12月

2021年11月

2021年10月

2021年09月

2021年07月

2021年06月

2021年04月

2021年03月

2021年02月

2021年01月

2020年11月

2020年09月

2020年08月

2020年07月

2020年06月

2020年05月

2020年04月

2020年03月

2020年02月

2019年12月

2019年11月

2019年10月

2019年09月

2019年08月

2019年07月

2019年06月

2019年04月

2019年02月

2019年01月

2018年12月

2018年11月

2018年10月

2018年09月

2018年07月

2018年06月

2013年09月

2013年06月

2012年07月

2012年06月

2012年05月

2012年01月

2011年11月

2011年09月

2011年08月

2011年07月

2011年06月

2011年05月

2011年04月

2011年03月

2011年02月

2011年01月

2010年12月

2010年11月

2007年05月

2007年03月

2007年02月

2007年01月

2006年12月

2006年11月

2006年10月

2006年09月

2006年08月

2006年07月

2006年06月

2006年05月

2006年04月

2006年03月

カレンダー

前月

次月

goo blog おすすめ

	「#gooblog引越し」で体験談を募集中
	【コメント募集中】goo blogでの思い出は？
	おすすめブログ

@goo_blog

お客さまのご利用端末からの情報の外部送信について

goo blog お知らせ

	【11/18】goo blogサービス終了のお知らせ
	【PR】ドコモのサブスク【GOLF me！】初月無料
	【コメント募集中】goo blogでの思い出は？
	「#gooblog引越し」で体験談を募集中

2025年8月
日	月	火	水	木	金	土
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

python、rubyなどのプログラミング、MySQL、サーバーの設定などの備忘録。レゴの写真も。