Thursday, September 8, 2011

A Simple Roll-Your-Own Documentation System

I like to roll my own documentation systems. There, I’ve said it. Not for inline documentation, mind you. For that there is Doxygen and for that I am grateful. Because while I love coding, there is fun coding and not-so-fun coding, and writing C++ parsers tends to fall in the latter category.

So for inline documentation I use Doxygen, but for everything else, I roll my own. Why?

I don’t want to use Word or Pages or any other word processing program because I want my documents to be plain text that can be diffed and merged when necessary. And I want to be able to output it as clean HTML or in any other format I may like.

I don’t want to use HTML or LaTeX or any other presentation-oriented language, because I want to be able to massage the content in various ways before presenting it. Reordering it, adding an index or a glossary, removing deprecated parts, etc. Also, writing <p> gets boring very quickly.

I don’t want to use a Wiki, because I want to check in my documents together with the code, so that code versions and document versions match in the repository. I definitely don’t want to manage five different Wikis, corresponding to different engine release versions. Also, Wiki markup languages tend to be verbose and obtuse.

I could use an existing markup language, such as DocBook, Markdown or ReStructured Text. But all of them contain lots of stuff that I don’t need and lack some stuff that I do need. For example I want to include snippets of syntax highlighted Lua code, margin notes and math formulas. And I want to do it in a way that is easy to read and easy to write. Because I want there to be as few things as possible standing in the way of writing good documentation.

So I roll my own. But as you will see, it is not that much work.

I’ve written a fair number of markup systems over the years (perhaps one too many, but hey, that is how you learn) and I’ve settled on a pretty minimalistic structure that can be implemented in a few hundred lines of Ruby. In general, I tend to favor simple minimalistic systems over big frameworks that try to ”cover everything”. Covering everything is usually impossible and when you discover that you need new functionality, the lightweight systems are a lot easier to extend than the behemoths.

There are two basic components to the system. Always two there are, a parser and a generator. The parser reads the source document and converts it to some kind of structured representation. The generator takes the structured representation and converts it to an output format. Here I’ll only consider HTML, because to me that is the only output format that really matters.

To have something concrete to talk about, let’s use this source document, written in a syntax that I just made up:

@h1 Flavors of ice cream

My favorite ice cream flavors are:

@li Strawberry
@li Seagull

The Parser


The most crucial point of the system is what the structured representation should look like. How should the parser communicate with the generator? My minimalistic solution is to just let the representation be a list of lines, with each line consisting of a type marker and some text.

(:h1, ”Flavors of...”)
(:empty, ””)
(:text, ”My favorite...”)
(:empty, ””)
(:li, ”Strawberry”)
(:li, ”Seagull”)

To some this will probably seem like complete heresy. Surely I need some kind of hierarchical representation. How can I otherwise represent things like a list-in-a-list-in-a-cat-in-a-hat?

No problem, to represent a list item nested in another list, I just use a @li_li tag and a corresponding :li_li type marker. If someone wants three or more levels of nesting I suggest that they rewrite their document. This is supposed to be readable documentation, not Tractatus Logico-Philosophicus. I simply don’t think that deep nesting is important enough to warrant a complicated hierarchical design. As I said, I prefer the simple things in life.

So, now that we know the output format, we can write the parser in under 20 lines:

class Parser
  attr_reader :lines
  
  def initialize()
    @lines = []
  end
  
  def parse(line)
    case line
    when /^$/
      @lines << {:type => :empty, :line => ""}
    when /@(\S+)\s+(.*)$/
      @lines << {:type => $1.intern, :line => $2}
    when /^(.*)$/
      @lines << {:type => :text, :line => line}
    end
  end
end

Of course you can go a lot fancier with the parser than this. For example, you can make a more Markdown-like syntax where you create lists by just starting lines with bullet points. But this doesn’t really change the basic structure, you just need to add more whens in your case-statement.

One useful approach, as you make more advanced parsers, is to have markers that put the parser in a particular state. For example, you could have a marker @lua that made the parser consider all the lines following it to be of type :lua until the marker @endlua was reached.

The Generator


A useful trick when writing HTML generators is to always keep track of the HTML tags that you have currently opened. This lets you write a method context(tags) which takes a list of tags as arguments and closes and opens tags so that exactly the tags specified in the list are open.

With such a method available, it is simple to write the code for outputting tags:

class Generator
  def h1(line)
    context(%W(h1 #{"a name=\"#{line}\""}))
    print line
  end
  
  def text(line)
    context(%w(p))
    print line
  end

  def empty(line)
    context(%w())
    print line
  end
  
  def li(line)
    context(%w(ul li))
    print line
    context(%w(ul))
  end
end

Notice how this works. The li() method makes sure that we are in a <ul> <li> context, so it closes all other open tags and opens the right ones. Then, after printing its content, it says that the context should just be <ul> which forces the closure of the <li> tag. If we wanted to support the :li_li tag, mentioned above, we could write it simply as:

class Generator
  def li_li(line)
    context(%w(ul li ul li))
    print line
    context(%w(ul li ul))
  end
end

Notice also that this approach allows us to just step through the lines in the data structure and print them. We don’t have to look back and forward in the data structure to find out where a <ul> should begin and end.

The rest of the Generator class implements the context() function and handles indentation:

class Generator
  def initialize()
    @out = ""
    @context = []
    @indent = 0
  end
  
  def print(s)
    @out << ("  " * @indent) << s << "\n"
  end
  
  def open(ci)
    print "<#{ci}>"
    @indent += 1
  end
  
  def close(ci)
    @indent -= 1
    print "</#{ci[/^\S*/]}>"
  end
  
  def context(c)
    i = 0
    while @context[i] != nil && @context[i] == c[i]
      i += 1
    end
    while @context.size > i
      close(@context.last)
      @context.pop
    end
    while c.size > @context.size
      @context.push( c[@context.size] )
      open(@context.last)
    end
  end
  
  def format(lines)
    lines.each {|line| self.send(line[:type], line[:line])
    context(%w())
    return @out
  end
end

Used as:

parser = Parser.new
text.each_line {|line| parser.parse(line)}
puts Generator.new.format(parser.lines)

So there you have it, the start of a custom documentation system, easy to extend with new tags in under 100 lines of Ruby code.

There are some things I haven’t touched on here, like TOC generation or inline formatting (bold and emphasized text). But it is easy to write them as extensions of this basic system. For example, the TOC could be generated with an additional pass over the structured data. If there is enough interest I could show an example in a follow-up post.

18 comments:

  1. Commenting on a blog is an art. Good comments create relations. You’re doing great work. Keep it up. This is very informative and interesting for those who are interested in blogging field.Fix microphone in PUBG mobile

    ReplyDelete
  2. Usually I never comment on blogs but your article is so convincing that I never stop myself to say something about it. You’re doing a great job Man,Keep it up.
    very interesting , good job and thanks for sharing such a good blog.
    Read to know about different ways for test mic

    ReplyDelete
  3. world best work
    this is great work
    download linkedin video website where you can download linkedin video

    ReplyDelete
  4. C++ is most amazing language to code .... sometimes its very hard to find solution if stucks ,,,, Fast and Furious 9 John Cena Vest

    ReplyDelete
  5. This applies especially to dairy products typically lacking in fat, such as regular milk and various yogurts. But be careful with full-fat dairy such as cream and cheese all the same, as they are easy to overindulge in.
    how many calories to eat to lose weight calculator
    how many calories to lose weight
    how many calories to lose weight calculator
    how many calories to lose weight fast
    how many carbs a day to lose weight

    ReplyDelete
  6. Your article is excellent. The information provided here was very helpful to me. You can get assistance with your typing skills and speed by visiting the Online Typing Speed Test profile. This tool allows you to measure your typing speed in words per minute (WPM). The typing speed is calculated by dividing the number of typos per word by the number of letters in that word. To see whether you need to improve your typing speed and accuracy, make sure you check them out.

    ReplyDelete
  7. Very nice information, It is valuable and useful to so many people. I am waiting for your next blog. Get In Touch with Us:

    Read More:- Study Abroad Consultants in Gurgaon
    Study In Australia
    Study in USA
    Study in Newzeland
    Study in Canada

    ReplyDelete
  8. Thank you for posting such a great article. Keep it up mate.

    Saksham Yojana Haryana | Saksham Yojana Apply

    ReplyDelete
  9. Really very happy to say, your post is very interesting to read. I never stop myself to say something about it. You’re doing a great job. Keep it up. for More Information Click Here:- HP Scanner Not Working Error

    ReplyDelete
  10. A very awesome blog post. We are really grateful for your blog post. You will find a lot of approaches after visiting your post.
    Hi! We are water treatment company in uae Great points made up above! And
    reverse osmosis troubleshooting thanks…
    I think this is one of the most important information for me. And i am glad reading your article. But should remark on few general things…

    ReplyDelete
  11. Cara mengecilkan perut Sugar and carbohydrates are the highest sources of saturated fat. So by reducing sugar and carbohydrates, the calories that enter the body shrink. Ngobrol Sehat It is advisable to swap sugar and carbohydrates for foods that are high in energy and low in fat. Ngobrol Sehat Having a small belly and thighs and nice as the desire of the majority of women. For those of you who have a large belly and thigh size, of course you want to have a smaller size to support your performance. Ngobrol Sehat Here's how to reduce the stomach and thighs naturally.
    Hidup Sehat | Hidup Sehat

    ReplyDelete
  12. Your article was enjoyable to read, and we enjoyed your writing. To learn more, please visit this website The ultimate mouse clicker test. Please visit the site to read the latest posts Click Speed Test.

    ReplyDelete
  13. IDM Kuyhaa ( IDM Full 6.39 Build 08 Terbaru ) dan sudah Fix Corrupt Popup!! ini merupakan pembaharuan 29 Oktober 2021 dari IDM versi 6.39 build 07 Full kemarin yang kini telah rillis. pembaharuan dari setiap Bug atau kesalahan program terus di lakukan dan di share guna memberikan kenyamanan pengguna seiring update terbaru nya juga browser yang ada.

    IDM Full Terbaru ini mampu membuat kecepatan download jadi lebih ngebut dari pada menggunakan unduhan bawaan browser biasa. Internet Download Manager Memang sering sekali mengupdate software nya, mengupdate ke tingkat IDM Kuyhaa versi baru tentu jauh lebih baik.

    ReplyDelete
  14. Thank you very much for writing such an interesting article on this topic. This has really made me think and I hope to read more. Kindly Visit our Website:- Change AT&T Wi-Fi Password

    ReplyDelete