Ticket #233 (assigned enhancement)

Opened 4 months ago

Last modified 1 month ago

Add support for literate programming

Reported by: oblomov Assigned to: oblomov (accepted)
Priority: must Milestone: current
Component: parser Version: current
Severity: bad Keywords:
Cc: bjorn.buckwalter@gmail.com

Description

Hello,

I'm working on the implementation of a polyglot to support literate programming. I have some code ready to support WEB (Pascal + TeX, the 'first' literate programming framework) and CWEB (C + TeX), which I'm keeping in my git copy of the ohcount repository.

I've hit two problems so far.

One of the them is a Segmentation fault which I can work around by setting MAX_CS_STACK to 2048 in common.h. 1024 is still not enough, and I haven't enquired further on the sweet spot. The value seems to depend on the length of the file that gets parsed, so I suspect it might be some kind of parser state leak.

The second problem is the matter of code vs comment. Consider the typical WEB file: you can find TeX code and comments in it, and Pascal code and comments. The TeX code is actually the 'comment' area of the source file. This is why I tried two different approaches.

One of them the non-program code is considered comment, without further specification (lp-polyglot branch on my git); this correctly evaluates the comment/code ratio, but fails to identify the language used in the documentation area of the file (so it doesn't count towards TeX experience, for example).

The other approach parses the TeX code as TeX code and the Pascal code as Pascal code. This contributes correctly to the 'programming experience' of the authors, but fails to properly identify the comment/code ratio of the source.

So my second question is: is it possible to both identify the sections for their 'language', and consider one language as documentation of the other?

If I can solve these two issues (identification and stack overrun) I'll be able to submit the code for inclusion.

Attachments

litprog.zip (21.5 kB) - added by oblomov on 03/16/2008 02:58:44 AM.
Patchset of my litprog branch

Change History

03/09/2008 01:44:26 AM changed by oblomov

I've solved the stack overrun problem (misprogramming on my side), but I'm still not sure on what's the proper way to have the TeX part count as comment for the program, but as 'TeX code' on itself, except by ignoring the fact that it's TeX. Suggestions?

03/14/2008 04:39:10 PM changed by robin@ohloh.net

I think this is a case where you'll have to pick your poison. You have your choice between having well-commented C with zero TeX, or C code mixed with TeX but zero C comments. I don't see a way around this, because Ohcount has a basic requirement that every source line gets counted exactly once.That means, for example, that you can't count a line both as a C comment and as TeX code.

Most other languages support inline documentation in some way, such as Perl POD. In those cases, we simply consider the entire file to be Perl, and the inline documentation is simply considered Perl comments. As an other example, C# allows documentation to be placed in XML format in comment blocks, but we don't consider that as XML, we just count it as C# comments. I'm not saying this is the best solution, but if you want to be consistent with other languages, that's how it's being done so far.

nothingmuch once made an interesting comment in the Ohloh forums (http://www.ohloh.net/forums/3491/topics/1090): Why does the number of lines of code in each language have to add up to equal the number of carriage returns in the file? Why can't a line be counted both as a C comment and as TeX code? Well, maybe someday Ohcount will allow this, but that doesn't help you today. :-)

I'd simply count the TeX as C comments in this case; but I see your point and there's no best answer here. Please yourself :-)

03/15/2008 01:42:06 AM changed by oblomov

  • owner changed from robin@ohloh.net to oblomov.
  • status changed from new to assigned.

Yes, I've been thinking about it these days and I came to the same conclusion, marking the documentation part of literate programs as comments is the 'best' way, although it does make my heart bleed to think of all that wasted TeX coding ;-)

03/15/2008 03:58:54 AM changed by oblomov

Why did I change the owner to myself? /me scratches head ...

Ok, what I'm doing is the following: I will create a (very generic) "documentation" monoglot (where every line is a comment), and then create the literate programming languages using the Biglot class which I introduced in the metapost patchset (#231)

03/16/2008 02:58:44 AM changed by oblomov

  • attachment litprog.zip added.

Patchset of my litprog branch

03/16/2008 03:02:05 AM changed by oblomov

The .zip file I attached contains the patchset for my litprog branch, that implements som literate programming languages (namely, WEB, CWEB and Literate Haskell). It is based on top of my MetaFont? and MetaPost? patches (#231) and the Pascal string fix (#239). Also, since it implements Literate Haskell, when you commit these patches you can close ticket #223 too.

05/30/2008 06:29:29 AM changed by airforce1

06/02/2008 08:33:41 AM changed by bjornbm

I'm wondering if any progress is being made on this. Is there any chance we could see oblomov's patch applied soon? (I'm particularly interested in the literate Haskell support.)

06/02/2008 08:37:54 AM changed by bjornbm

  • cc set to bjorn.buckwalter@gmail.com.