¥È¥Ã¥× «Á°¤ÎÆüµ­(2007-02-09) ºÇ¿· ¼¡¤ÎÆüµ­(2007-02-12)» ÊÔ½¸

3 ÆüË·¼çÆüµ­


2007-02-11 [ĹǯÆüµ­]

_ ·ÈÂÓ³¨Ê¸»ú¥³¡¼¥É¤ÎÁê¸ßÊÑ´¹É½¤òºî¤ë¥¹¥ì

·ÈÂӤΠWeb ¤È¥á¡¼¥ë¤Ç¤Ï¡¢»È¤¦³¨Ê¸»ú¥³¡¼¥É¤¬°ã¤¦¤é¤·¤¤¡£ PC ¤ä¥µ¡¼¥Ð¤«¤é (Ruby ¤ò»È¤Ã¤Æ) ·ÈÂÓ°¸¤Æ¤ËÇÛ¿®¤¹¤ë¥á¡¼¥ë¤Ç³¨Ê¸»ú¤ò»È¤¤¤¿¤¤¡£

¥ë¡¼¥ë:

  1. ¥­¥ã¥ê¥¢¤¬¸ø³«¤·¤Æ¤¤¤ë¥Ç¡¼¥¿¤ò¸µ¤Ë¤¹¤ë¡£
  2. ÅÅ»ÒŪ¤ËÊÑ´¹¤¹¤ë¡£

ºÇ½é¤«¤é¥Æ¥­¥¹¥È·Á¼°¤ÇÍѰդ·¤Æ¤¤¤Ê¤¤¤³¤È¤ËÂФ·¤Æ°­°Õ¤ò´¶¤¸¤ë¡£

EZweb (AU)

xdoc2txt ¤Ç typeD.pdf ¤«¤é ¥Æ¥­¥¹¥ÈÍ×ÁǤòÃê½Ð¤¹¤ë¡£

$ xdoc2txt -f typeD.pdf

ÆÀ¤é¤ì¤¿ typeD.txt ¤«¤éÊÑ´¹É½¤ò¼è¤ê½Ð¤¹¡£

src = IO.read('typeD.txt')
src.scan(/[0-9A-F]{16,18}/s) do |str|
  puts str[-16,16]
end

18ʸ»ú½¦¤Ã¤Æ±¦¤«¤é16ʸ»ú½¦¤¦¡£¤³¤ì¤Ï[0-9A-F]¤ò´Þ¤à¥¿¥¤¥È¥ë¤¬ÊÑ´¹É½¤Ë¤¯¤Ã¤Ä¤¯¤¿¤á¡£ Îã:

ÅŸ»OFFF364EA917945ED64

¤³¤¦¤·¤ÆÆÀ¤é¤ì¤ëÊÑ´¹É½¤Ï¼¡¤Î·Á¼°¤È¤Ê¤ë¡£

  1. KDDI³¨Ê¸»úÍÑShift-JIS¥³¡¼¥É
  2. Unicode
  3. E¥á¡¼¥ëÁ÷½ÐÍÑJIS¥³¡¼¥É
  4. ¡Ê»²¹Í¡ËE¥á¡¼¥ëÁ÷½ÐÍÑJIS¥³¡¼¥É¤ËÂбþ¤·¤¿Shift-JIS¥³¡¼¥É
F659E481753AEB59
F75EE542773FEC5E
F65AE482753BEB5A
F75FE5437740EC5F
...

Ruby ¤Ç¤Ï¼¡¤Î¤è¤¦¤ËÍøÍѤǤ­¤ë¡£

while line = f.gets
  sjis, unicode, email_jis, email_sjis = line.chomp.unpack("A4A4A4A4")
...

SoftBank

scrapi ¤ò»È¤¦¡£

require 'rubygems'
require 'scrapi'
require 'open-uri'
require 'cgi'

# html = File.read('picword_01.php')
uri = 'http://developers.softbankmobile.co.jp/dp/tool_dl/web/picword_01.php'

scraper = Scraper.define do
  selector :select_font, "td>font.j10"
  process "table[width=100%]>tr" do |tr|
    unicode, webcode = select_font(tr).map { |font|
      text = self.class.text(font)
      text = CGI.unescapeHTML(text)
    }
    p [unicode, webcode]
    true
  end
end
scraper.parser_options :show_warnings => true, :char_encoding => 'shiftjis'
# scraper.scrape(html)
scraper.scrape(URI.parse(uri))

Unicode, Web¥³¡¼¥É ¤¬¤¢¤ë¡£

·ÈÂӥ᡼¥ë¤Ç³¨Ê¸»ú¤ò»È¤¦¤È¤­¤Ï Unicode ¤ò¸µ¤Ë UTF-8, base64 ¤Ç¤¤¤¤¡¢¤Î¤À¤í¤¦¤«¡£

Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: base64

¥­¥ã¥ê¥¢´Ö

¤¢¤È¤Ï´Êñ¡£

´¶¼Õ¡£

[]