3 ÆüË·¼çÆüµ
2007-02-11 [ĹǯÆüµ]
_ ·ÈÂÓ³¨Ê¸»ú¥³¡¼¥É¤ÎÁê¸ßÊÑ´¹É½¤òºî¤ë¥¹¥ì
·ÈÂӤΠWeb ¤È¥á¡¼¥ë¤Ç¤Ï¡¢»È¤¦³¨Ê¸»ú¥³¡¼¥É¤¬°ã¤¦¤é¤·¤¤¡£ PC ¤ä¥µ¡¼¥Ð¤«¤é (Ruby ¤ò»È¤Ã¤Æ) ·ÈÂÓ°¸¤Æ¤ËÇÛ¿®¤¹¤ë¥á¡¼¥ë¤Ç³¨Ê¸»ú¤ò»È¤¤¤¿¤¤¡£
¥ë¡¼¥ë:
- ¥¥ã¥ê¥¢¤¬¸ø³«¤·¤Æ¤¤¤ë¥Ç¡¼¥¿¤ò¸µ¤Ë¤¹¤ë¡£
- ÅÅ»ÒŪ¤ËÊÑ´¹¤¹¤ë¡£
ºÇ½é¤«¤é¥Æ¥¥¹¥È·Á¼°¤ÇÍѰդ·¤Æ¤¤¤Ê¤¤¤³¤È¤ËÂФ·¤Æ°°Õ¤ò´¶¤¸¤ë¡£
EZweb (AU)
xdoc2txt ¤Ç typeD.pdf ¤«¤é ¥Æ¥¥¹¥ÈÍ×ÁǤòÃê½Ð¤¹¤ë¡£
$ xdoc2txt -f typeD.pdf
ÆÀ¤é¤ì¤¿ typeD.txt ¤«¤éÊÑ´¹É½¤ò¼è¤ê½Ð¤¹¡£
src = IO.read('typeD.txt')
src.scan(/[0-9A-F]{16,18}/s) do |str|
puts str[-16,16]
end
18ʸ»ú½¦¤Ã¤Æ±¦¤«¤é16ʸ»ú½¦¤¦¡£¤³¤ì¤Ï[0-9A-F]¤ò´Þ¤à¥¿¥¤¥È¥ë¤¬ÊÑ´¹É½¤Ë¤¯¤Ã¤Ä¤¯¤¿¤á¡£ Îã:
ÅŸ»OFFF364EA917945ED64
¤³¤¦¤·¤ÆÆÀ¤é¤ì¤ëÊÑ´¹É½¤Ï¼¡¤Î·Á¼°¤È¤Ê¤ë¡£
- KDDI³¨Ê¸»úÍÑShift-JIS¥³¡¼¥É
- Unicode
- E¥á¡¼¥ëÁ÷½ÐÍÑJIS¥³¡¼¥É
- ¡Ê»²¹Í¡ËE¥á¡¼¥ëÁ÷½ÐÍÑJIS¥³¡¼¥É¤ËÂбþ¤·¤¿Shift-JIS¥³¡¼¥É
F659E481753AEB59 F75EE542773FEC5E F65AE482753BEB5A F75FE5437740EC5F ...
Ruby ¤Ç¤Ï¼¡¤Î¤è¤¦¤ËÍøÍѤǤ¤ë¡£
while line = f.gets
sjis, unicode, email_jis, email_sjis = line.chomp.unpack("A4A4A4A4")
...
SoftBank
scrapi ¤ò»È¤¦¡£
require 'rubygems'
require 'scrapi'
require 'open-uri'
require 'cgi'
# html = File.read('picword_01.php')
uri = 'http://developers.softbankmobile.co.jp/dp/tool_dl/web/picword_01.php'
scraper = Scraper.define do
selector :select_font, "td>font.j10"
process "table[width=100%]>tr" do |tr|
unicode, webcode = select_font(tr).map { |font|
text = self.class.text(font)
text = CGI.unescapeHTML(text)
}
p [unicode, webcode]
true
end
end
scraper.parser_options :show_warnings => true, :char_encoding => 'shiftjis'
# scraper.scrape(html)
scraper.scrape(URI.parse(uri))
Unicode, Web¥³¡¼¥É ¤¬¤¢¤ë¡£
·ÈÂӥ᡼¥ë¤Ç³¨Ê¸»ú¤ò»È¤¦¤È¤¤Ï Unicode ¤ò¸µ¤Ë UTF-8, base64 ¤Ç¤¤¤¤¡¢¤Î¤À¤í¤¦¤«¡£
Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: base64
¥¥ã¥ê¥¢´Ö
¤¢¤È¤Ï´Êñ¡£
´¶¼Õ¡£
[¥Ä¥Ã¥³¥ß¤òÆþ¤ì¤ë]
[]