Re: RFC: Database Report Generator

From: Christopher Browne (cbbrowne_at_acm.org)
Date: 02/05/04


Date: 5 Feb 2004 04:02:52 GMT

roger@whinlatter.uklinux.net (Roger Leigh) wrote:
> After considering your suggestions, it seems that m4 will not be
> sufficiently powerful, and will certainly be arcane when trying to
> accomplish complex tasks. I think perhaps Perl will be a better
> choice--but ideally I'd like for the person writing the report to
> not require a deep understanding of programming.

I'm not sure that there is necessarily any way around that.

After all, there's the part of constructing the SQL queries that will
provide the desired data. That is likely to require _some_ degree of
"programming sophistication," with the further bit of "deepness" that
`thinking relationally' is rather different from `thinking
procedurally.'

> My ideal (pipe-dream) solution would be a normal troff (-ms) file,
> with extra macros to control looping, variable substitution etc., and
> so to the writer it would just be like writing a normal document, with
> some additional SELECT statements. The problem is doing more complex
> stuff:
>
>> > Ideally, and "all-in-one" solution would be preferable, but I need to:
>> > 1) Have a very simple interface so that non-programmers may use it.
>> > 2) Be reasonably extensible, but I'm not bothered about providing a
>> > "full" programming language--if you want that, you may as well use
>> > Perl or Python directly.
>> > 3) Cope with nested queries and expanding the input text
>> > appropriately.
>> > 4) Capable of doing more complex reporting e.g. using plot, gnuplot, R
>> > or other external programs to generate input e.g. figures, while
>> > still having a simple interface.
>>
>> The sorts of things I would have expected to see are thus:
>>
>> 1. Dealing with physical markup, such as formatting numbers, picking
>> typeface modifiers (size, bold/italics, fonts, and such);

> I'd delegate all the markup to the markup language (troff, latex
> etc.), which is entirely the user's choice. WRT numerical
> formatting, I'd like to have that handled by the select statement
> wherever possible. If I go for Perl, that should be able to handle
> more special requirements.

I agree on the numerical formatting part; it is a _great_ convenience
to assign some of the complexity to the DBMS particularly when it can
nicely support that.

I had a chat with one of the core PG guys last week, discussing a
proposed "data conversion" tool; we were, at first, debating what kind
of language to "link in" to allow transforming incoming data. I think
I was the one that had the "news flash" that it was a BAD idea to do
that, but that we should instead use the capability to implement
stored procedures on the SQL side. He's a Tcl fan; I'm a polyglot,
with a bit of a Lisp bias; either would lead to ways of integrating in
transformations that only he or I would ever use as much as once.
Using procedures on the DBMS side makes it all USABLE...

As for the "delegation to markup language," I _can't_ agree.

It _can't_ be delegated totally, as control over what gets rendered
has to occur in the "source" system.

If the destination is to be TeX, then _some_ component at the
"generating what is to be rendered" level has to throw in "{\it " and
"}" to italicize text.

Similarly, something at the "generation" level must generate "<i>" and
"</i>" if the output is to be HTML.

>> 2. Rendering values into columns, and controlling width and
>> alignment;
>
> I'd delegate this to the markup language, too. i.e. I don't want the
> report generator to have any knowledge of this: it would just be a
> regular .TS/.TE (or LaTeX tablular environment).
>
>> 3. Putting values into headers and footers, including raw text,
>> page numbers, section identifiers, and perhaps range indicators
>> (e.g. - this page presents names "ABE to BRO"), and totals;
>
> This is where m4 would fall down. My problem here is that this stuff
> requires knowledge of the page layout, and ideally I don't want to
> have to care about that--the typesetter (groff etc.) will take care of
> that. Doing this stuff would require some stuff being done "manually"
> in a programming language, such as constraining table length and
> generating totals at the end of pages. This is probably not
> avoidable, however.

I don't think it is avoidable, and there is a further bit of
entertainment if we consider HTML.

In HTML, a "page" is not a *** of paper, but rather has the
possibility of scrolling, to an indeterminate degree. (And there are
big disadvantages to having _real_ long pages...)

>> 4. Generalizing #3 further since you might have subheaders and
>> subtotals.
>
> Agreed.
>
>> Those are the vital sorts of abstractions needed in reporting, and
>> the things that I'd expect the focus to be on. In effect, what is
>> needed is a "little language" that has all of those things as "base
>> abstractions" alongside looping constructs (that probably pull data
>> via cursors) to pull data from query fields to put them into the
>> columns and such.
>
> I guess I could write a Perl module to do some basics, and leave the
> rest of Perl available for use as required. My problem with this is
> that the report becomes a script, rather than a datafile read by the
> program, and although the distiction can become blurry, I'd rather
> the people writing the reports didn't feel they were writing
> programs, just fancy text.

My suspicion (and while I wouldn't call it "provably right," I do have
enough relevant battle scars that "suspicion" may understate
things...) is that the only way to evade the notion of "writing
programs" is to have a set of abstractions sufficiently "canned" that
you build reports by dragging graphical components into place in a
quasi-pictorial presentation of the report.

>> I put together a "straw man" proposal of this very sort for GnuCash a
>> few years ago, albeit involving:
>> a) Guile Scheme as the language,
>> b) GnuCash's transaction APIs as the data source,
>> c) The assumption that HTML was a primary data destination (but
>> note I drew inspiration from DocBook, and did keep text output
>> in mind)
>> URL found below...
>
> Sorry, but the URI didn't work. Is it correct?

Evidently not; it's in my CVS repository, but apparently is excluded
from the "pushes" to production...

> Many thanks for your comments--I've got a lot to consider still!

I hope it's helpful. I am a little skeptical that it'll "turn out;" I
don't yet hear something that is The Right Answer.

I consider it a couple of ways:

 1. A successful notion may emerge, and that would be fabulously
     worthwhile.

     We could really use a good framework at work for generating
     reports that can cope with both interactive and batch needs.
     There are a lot of ad-hoc methods in use, and believe me, if a
     great answer falls out, I'm in to using it.

 2. If the "project" is to fail, it is better to poke the holes in it
     early so that there's no lingering, painful, death that chews up
     a lot of your time. If it's looking bad, I'll try to convince
     you to give up, and I think I'm looking out for your interests
     in that :-).

-- 
output = reverse("gro.mca" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/internet.html
Mental health is overrated!!