class RDoc::Parser

解析器是一个简单的类，它继承自 RDoc::Parser 并实现 scan 方法，以使用解析后的数据填充 RDoc::TopLevel 。

initialize 方法接受一个 RDoc::TopLevel 对象用于填充解析后的内容，要解析的文件名，文件内容，一个 RDoc::Options 对象和一个 RDoc::Stats 对象，用于通知用户已解析的项目。然后调用 scan 方法来解析文件，并且必须返回 RDoc::TopLevel 对象。通过调用 super，这些项目将会为你设置好。

为了被 RDoc 使用，解析器需要注册它可以解析的文件扩展名。使用 ::parse_files_matching 注册扩展名。

require 'rdoc'

class RDoc::Parser::Xyz < RDoc::Parser
  parse_files_matching /\.xyz$/

  def initialize top_level, file_name, content, options, stats
    super

    # extra initialization if needed
  end

  def scan
    # parse file and fill in @top_level
  end
end

属性

parsers[R]

一个数组的数组，它将文件扩展名（或名称）正则表达式映射到将解析匹配文件名的解析器类。

使用 parse_files_matching 注册解析器的文件扩展名。

file_name[R]

正在解析的文件的名称

公共类方法

alias_extension(old_ext, new_ext) 点击切换源码

将一个扩展名别名为另一个扩展名。在此调用之后，以“new_ext”结尾的文件将使用与“old_ext”相同的解析器进行解析

# File rdoc/parser.rb, line 57
def self.alias_extension(old_ext, new_ext)
  old_ext = old_ext.sub(/^\.(.*)/, '\1')
  new_ext = new_ext.sub(/^\.(.*)/, '\1')

  parser = can_parse_by_name "xxx.#{old_ext}"
  return false unless parser

  RDoc::Parser.parsers.unshift [/\.#{new_ext}$/, parser]

  true
end

binary?(file) 点击切换源码

确定文件是否为“二进制”文件，这基本上意味着它具有 RDoc 解析器不应尝试使用的内容。

# File rdoc/parser.rb, line 73
def self.binary?(file)
  return false if file =~ /\.(rdoc|txt)$/

  s = File.read(file, 1024) or return false

  return true if s[0, 2] == Marshal.dump('')[0, 2] or s.index("\x00")

  mode = 'r:utf-8' # default source encoding has been changed to utf-8
  s.sub!(/\A#!.*\n/, '')     # assume shebang line isn't longer than 1024.
  encoding = s[/^\s*\#\s*(?:-\*-\s*)?(?:en)?coding:\s*([^\s;]+?)(?:-\*-|[\s;])/, 1]
  mode = "rb:#{encoding}" if encoding
  s = File.open(file, mode) {|f| f.gets(nil, 1024)}

  not s.valid_encoding?
end

can_parse(file_name) 点击切换源码

返回可以处理特定扩展名的解析器

# File rdoc/parser.rb, line 106
def self.can_parse file_name
  parser = can_parse_by_name file_name

  # HACK Selenium hides a jar file using a .txt extension
  return if parser == RDoc::Parser::Simple and zip? file_name

  parser
end

can_parse_by_name(file_name) 点击切换源码

返回可以处理 file_name 扩展名的解析器。这不依赖于文件是否可读。

# File rdoc/parser.rb, line 119
def self.can_parse_by_name file_name
  _, parser = RDoc::Parser.parsers.find { |regexp,| regexp =~ file_name }

  # The default parser must not parse binary files
  ext_name = File.extname file_name
  return parser if ext_name.empty?

  if parser == RDoc::Parser::Simple and ext_name !~ /txt|rdoc/ then
    case mode = check_modeline(file_name)
    when nil, 'rdoc' then # continue
    else
      RDoc::Parser.parsers.find { |_, p| return p if mode.casecmp?(p.name[/\w+\z/]) }
      return nil
    end
  end

  parser
rescue Errno::EACCES
end

check_modeline(file_name) 点击切换源码

从 file_name 中的模式行返回文件类型

# File rdoc/parser.rb, line 142
def self.check_modeline file_name
  line = File.open file_name do |io|
    io.gets
  end

  /-\*-\s*(.*?\S)\s*-\*-/ =~ line

  return nil unless type = $1

  if /;/ =~ type then
    return nil unless /(?:\s|\A)mode:\s*([^\s;]+)/i =~ type
    type = $1
  end

  return nil if /coding:/i =~ type

  type.downcase
rescue ArgumentError
rescue Encoding::InvalidByteSequenceError # invalid byte sequence

end

for(top_level, content, options, stats) 点击切换源码

查找并实例化给定 file_name 和 content 的正确解析器。

# File rdoc/parser.rb, line 168
def self.for top_level, content, options, stats
  file_name = top_level.absolute_name
  return if binary? file_name

  parser = use_markup content

  unless parser then
    parse_name = file_name

    # If no extension, look for shebang
    if file_name !~ /\.\w+$/ && content =~ %r{\A#!(.+)} then
      shebang = $1
      case shebang
      when %r{env\s+ruby}, %r{/ruby}
        parse_name = 'dummy.rb'
      end
    end

    parser = can_parse parse_name
  end

  return unless parser

  content = remove_modeline content

  parser.new top_level, file_name, content, options, stats
rescue SystemCallError
  nil
end

new(top_level, file_name, content, options, stats) 点击切换源码

创建一个新的 Parser，在实例变量中存储 top_level, file_name, content, options 和 stats。在 +@preprocess+ 中，创建了一个 RDoc::Markup::PreProcess 对象，该对象允许处理指令。

# File rdoc/parser.rb, line 254
def initialize top_level, file_name, content, options, stats
  @top_level = top_level
  @top_level.parser = self.class
  @store = @top_level.store

  @file_name = file_name
  @content = content
  @options = options
  @stats = stats

  @preprocess = RDoc::Markup::PreProcess.new @file_name, @options.rdoc_include
  @preprocess.options = @options
end

parse_files_matching(regexp) 点击切换源码

记录此解析器可以理解的文件类型。

可以多次调用此方法。

# File rdoc/parser.rb, line 203
def self.parse_files_matching(regexp)
  RDoc::Parser.parsers.unshift [regexp, self]
end

remove_modeline(content) 点击切换源码

从文档的第一行中删除 emacs 样式的模式行

# File rdoc/parser.rb, line 210
def self.remove_modeline content
  content.sub(/\A.*-\*-\s*(.*?\S)\s*-\*-.*\r?\n/, '')
end

use_markup(content) 点击切换源码

如果文件开头有一个 markup: parser_name 注释，请使用它来确定解析器。例如

# markup: rdoc
# Class comment can go here

class C
end

注释应显示在 content 的第一行。

如果内容包含 shebang 或编辑器模式行，则注释可能出现在第二行或第三行。

可以使用任何注释样式来隐藏标记注释。

# File rdoc/parser.rb, line 231
def self.use_markup content
  markup = content.lines.first(3).grep(/markup:\s+(\w+)/) { $1 }.first

  return unless markup

  # TODO Ruby should be returned only when the filename is correct
  return RDoc::Parser::Ruby if %w[tomdoc markdown].include? markup

  markup = Regexp.escape markup

  _, selected = RDoc::Parser.parsers.find do |_, parser|
    /^#{markup}$/i =~ parser.name.sub(/.*:/, '')
  end

  selected
end

zip?(file) 点击切换源码

检查 file 是否为伪装的 zip 文件。来自 www.garykessler.net/library/file_sigs.html 的签名

# File rdoc/parser.rb, line 93
def self.zip? file
  zip_signature = File.read file, 4

  zip_signature == "PK\x03\x04" or
    zip_signature == "PK\x05\x06" or
    zip_signature == "PK\x07\x08"
rescue
  false
end

公共实例方法

handle_tab_width(body) 点击切换源码

规范化 body 中的制表符

# File rdoc/parser.rb, line 274
def handle_tab_width(body)
  if /\t/ =~ body
    tab_width = @options.tab_width
    body.split(/\n/).map do |line|
      1 while line.gsub!(/\t+/) do
        b, e = $~.offset(0)
        ' ' * (tab_width * (e-b) - b % tab_width)
      end
      line
    end.join "\n"
  else
    body
  end
end