Re: Method/Program for small "database" with some pictures entries

From: Netocrat (netocrat_at_dodo.com.au)
Date: 05/30/05

  • Next message: Tom Randy: "Re: Looking for the best (free) Linux distro (LiveCD or Install CD)"
    Date: Mon, 30 May 2005 12:03:39 GMT
    
    

    Anton Suchaneck wrote:

    > I want to write as little code as possible. That is not to save time, but
    > because I believe that if a task is written with very few line, it can't
    > take long to modify or improve it.

    Makes sense.

    >> a) how is/will my data be structured
    > In this particular example each picture has a set of parameters.
    > In some else I need to write to get my data sorted, I will have a set of
    > properties which can be defined or undefined, e.g.:
    >
    > graph1.txt:
    > prop_A 1 2883 "abc"
    > prop_B 0
    >
    > graph2.txt:
    > prop_B 1
    > prop_C 8128

    I notice that graph2 doesn't have prop_A and graph1 doesn't have prop_C, so
    how different will the properties applicable to each graph be? How many
    different properties do you envisage? Will you want to be able to easily
    add different properties as time progresses?
     
    >> b) how important is speed of query and storage?
    > Not important. Simple self-made solutions should not exceed the computer's
    > capacity.
    >
    >> c) what size is my database going to be?
    > Rather small (in fact those picture are graphs from scientific papers)

    I was thinking not so much in terms of image size but more of the number of
    images and the amount of data associated with each image.

    >> a reasonable option seems to be extended file attributes.
    > Sounds pretty cool to me. I haven't heard of them before. Maybe I'll have
    > a look soon.
    > So can I store plain strings on files then?

    Sure. There are some simple command line tools. Here is an example usage
    on a file called graph1.gif in the current directory:
    $ setfattr -n user.prop_A -v '1 2883 "abc"' graph1.gif
    $ getfattr -n user.prop_A graph1.gif
    # file: graph1.gif
    user.prop_A="1 2883 \"abc\""
    $ setfattr -x user.prop_A graph1.gif
    $ getfattr -n user.prop_A graph1.gif
    graph1.gif: user.prop_A: No such attribute
    $ setfattr -n user.prop_B -v 0 graph1.gif
    $ setfattr -n user.prop_A -v '1 2883 "abc"' graph1.gif
    $ getfattr -d graph1.gif
    # file: graph1.gif
    user.prop_A="1 2883 \"abc\""
    user.prop_B="0"

    Notice that the property name has to be prepended with "user.". You can
    specify a particular name with -n and value with -v; -x removes the
    attribute. -d prints all available attributes. These commands correspond
    to very similar functions available through glibc so it's easy to code them
    in C.

    For all intents and purposes, graph1.gif appears unchanged to other programs
    not accessing the extended attributes.

    >> The simple option in this case where speed is important is a standalone
    >> database file accessed through a database routine library.
    > I'm a bit frightened of setting up a database and then having cryptic
    > files which I cannot "control". Luckily for BibTeX references JabRef
    > proved to be useful.
    >> On my system I have man references for standalone
    >> database files accessed through C (man 3 dbopen) and Perl (man 3
    >> DB_File). Try one of those or hunt down another library.
    > But as it seem I'm often confronted with database-like task, I might get
    > into this stuff. I thought they are complicating until I saw that in the
    > basic approach you use just an "outer product of tuples".

    Sure, databases are not too hard to get into, although they can grow very
    complicated if need be. Your needs don't sound extreme so I don't think
    you need fear a database.

    >From what you've described, these are your storage options with pros and
    cons:

    a) Extended attributes:
    *pros
            - all data is linked directly to the image file and there is no need to be
    concerned about synchronising with a separate database when you delete/add
    images. This is of note to you since it reduces the amount of code you
    have to write and limits possible inconsistencies.
            - simple mental model - each file has a list of attribute/value pairs that
    you set/read with a single function
            - probably more space efficient than a database if you are going to have
    lots of different properties for different images
    *cons
            - easy to lose attributes if you copy files
            - slightly trickier to use with a scripting language since I doubt that
    there would be much builtin support in most languages but I could be wrong.
            - if your images are stored in a complex directory structure then your code
    will have to deal with navigating through the whole tree for each query.
    This will cause two problems: firstly increased code complexity and
    secondly speed will suffer and your system will not be very expandable - as
    you add more images your search speed will slow much more perceptibly than
    with a database approach.

    b) Separate text file for each database file (same as extended attributes
    but using a separate file to store them):
    *pros
            - compared to extended attributes there is less risk of accidentally losing
    attributes due to copying files
    *cons
            - compared to extended attributes there is much more need for synchronising
    data to ensure for example that when eg. you move an image file you also
    move the corresponding text file; when you you delete an image file you
    delete the text file etc.
            - you will need to write routines to search through the text file to find
    the attribute you are looking for and to set/overwrite attributes - messing
    around that you could avoid.
    In other respects they are the same.

    c) A standalone database file:
    *pros
            - much faster and more expandable than extended attributes/text files. You
    can index them for speed and they are designed better for this task. As
    you add more data you will see much less slow-down than with the previous
    approaches.
            - simple function calls to access and available through many scripting
    languages
    *cons
            - you need make sure you synchronise data so that as you move, add and
    delete images on disk you update the appropriate data in the database file

    d) A database server:
    *pros
            - possibly faster than standalone file especially as your dataset grows
            - able to perform more complex queries without having to script them using
    the full power of SQL - this could save lots of coding time if your queries
    become complex
            - able add and query betweeen different tables so if you start getting into
    deeper analysis you can be sure your environment will support it
    *cons
            - need to setup and maintain a server with all the admin (not much once
    setup) and memory overhead that entails
            - as with standalone db file you will have synchronisation issues

    I'm not going to tell you what's best as I still don't know your situation
    well enough. Suffice it to say that even the most complex of these
    solutions (the database server) is fairly trivial to setup - MySQL does the
    job well and can be readily accessed through Perl, PHP, C, Python, you name
    it...

    >>> Spreadsheets usually don't handle picture entries. Real database are too
    >>> complicated. My best idea so far was using HTML with some complicated
    >>> scripting.
    >> Well you're mixing concepts up in that paragraph.  The first concept is
    >> the storage of the data (eg. in a spread***/database) and the second is
    >> the access of the data (eg. through HTML and scripting).
    > I see. In my case the data is fairly simple (some number and filenames),
    > so the main goal is to write a quick hack for creating print-outs.

    If it's simplicity you're after then some Perl/Python/shell scripting based
    on HTML templates that you design should work fine.


  • Next message: Tom Randy: "Re: Looking for the best (free) Linux distro (LiveCD or Install CD)"