ヒビルテ(2005-02-09)

日々の流転

2005-02-09 [長年日記]

λ. Text.Regexで日本語が扱えない問題を鬼車で解決

久しぶりのHaskellネタ。GHC-6.2.2現在のText.Regexでは非ASCII文字はまともに扱えないことに段々むかついてきたので、鬼車(Oniguruma)を使って解決してみた。

Current(GHC-6.2.2) implementaton of Text.Regex doesn't handle full range of Unicode. It can handle only ASCII (or maybe LATIN-1) characters which is a very small subset of Unicode. This limits the usefulness of Text.Regex since there are many natural languages (such as Japanese) that need non-ASCII characters.

Therefore I modified the implementation of Text.Regex to use Oniguruma(鬼車) which is a very powerful regular expression library. To a wonderful thing, it supports UTF-32. So that the new implementation can handle full range of Unicode.

for Hugs98-Mar2005 (updated 2005-05-07)

Install Oniguruma version 3 (or later).
Apply hugs98-Mar2005-oniguruma3-2.patch to the Hugs source tree.
Run autoconf in the top directory and `libraries/base' directory.
Build hugs as usual.

for GHC-6.4 (updated 2005-05-04)

Install Oniguruma version 3 (or later).
Apply ghc-6.4-oniguruma3-1.patch to the GHC source tree.
Run autoconf in `libraries/base' directory.
Build GHC as usual.

for GHC-6.2.2

ghc-6.2.2-onigd20050204-1.patch.gz
このパッチをあてて、autoconfを実行し、./configure に --enable-oniguruma を指定してビルドすると鬼車が使われるようになります。

Tags: haskell

[ツッコミを入れる]


		2005年 2月
日	月	火	水	木	金	土
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28

日々の流転

2005-02-09 [長年日記]

λ. Text.Regexで日本語が扱えない問題を鬼車で解決

プロフィール

カレンダー

日記内検索

タグ

最近のツッコミ

最近のトラックバック

この日記と連携する

アンテナ等