Re: Method/Program for small "database" with some pictures entries
From: Netocrat (netocrat_at_dodo.com.au)
Date: 05/30/05
- Previous message: genomega: "Re: Looking for the best (free) Linux distro (LiveCD or Install CD)"
- In reply to: Anton Suchaneck: "Re: Method/Program for small "database" with some pictures entries"
- Next in thread: Anton Suchaneck: "Re: Method/Program for small "database" with some pictures entries"
- Reply: Anton Suchaneck: "Re: Method/Program for small "database" with some pictures entries"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 30 May 2005 12:03:39 GMT
Anton Suchaneck wrote:
> I want to write as little code as possible. That is not to save time, but
> because I believe that if a task is written with very few line, it can't
> take long to modify or improve it.
Makes sense.
>> a) how is/will my data be structured
> In this particular example each picture has a set of parameters.
> In some else I need to write to get my data sorted, I will have a set of
> properties which can be defined or undefined, e.g.:
>
> graph1.txt:
> prop_A 1 2883 "abc"
> prop_B 0
>
> graph2.txt:
> prop_B 1
> prop_C 8128
I notice that graph2 doesn't have prop_A and graph1 doesn't have prop_C, so
how different will the properties applicable to each graph be? How many
different properties do you envisage? Will you want to be able to easily
add different properties as time progresses?
>> b) how important is speed of query and storage?
> Not important. Simple self-made solutions should not exceed the computer's
> capacity.
>
>> c) what size is my database going to be?
> Rather small (in fact those picture are graphs from scientific papers)
I was thinking not so much in terms of image size but more of the number of
images and the amount of data associated with each image.
>> a reasonable option seems to be extended file attributes.
> Sounds pretty cool to me. I haven't heard of them before. Maybe I'll have
> a look soon.
> So can I store plain strings on files then?
Sure. There are some simple command line tools. Here is an example usage
on a file called graph1.gif in the current directory:
$ setfattr -n user.prop_A -v '1 2883 "abc"' graph1.gif
$ getfattr -n user.prop_A graph1.gif
# file: graph1.gif
user.prop_A="1 2883 \"abc\""
$ setfattr -x user.prop_A graph1.gif
$ getfattr -n user.prop_A graph1.gif
graph1.gif: user.prop_A: No such attribute
$ setfattr -n user.prop_B -v 0 graph1.gif
$ setfattr -n user.prop_A -v '1 2883 "abc"' graph1.gif
$ getfattr -d graph1.gif
# file: graph1.gif
user.prop_A="1 2883 \"abc\""
user.prop_B="0"
Notice that the property name has to be prepended with "user.". You can
specify a particular name with -n and value with -v; -x removes the
attribute. -d prints all available attributes. These commands correspond
to very similar functions available through glibc so it's easy to code them
in C.
For all intents and purposes, graph1.gif appears unchanged to other programs
not accessing the extended attributes.
>> The simple option in this case where speed is important is a standalone
>> database file accessed through a database routine library.
> I'm a bit frightened of setting up a database and then having cryptic
> files which I cannot "control". Luckily for BibTeX references JabRef
> proved to be useful.
>> On my system I have man references for standalone
>> database files accessed through C (man 3 dbopen) and Perl (man 3
>> DB_File). Try one of those or hunt down another library.
> But as it seem I'm often confronted with database-like task, I might get
> into this stuff. I thought they are complicating until I saw that in the
> basic approach you use just an "outer product of tuples".
Sure, databases are not too hard to get into, although they can grow very
complicated if need be. Your needs don't sound extreme so I don't think
you need fear a database.
>From what you've described, these are your storage options with pros and
cons:
a) Extended attributes:
*pros
- all data is linked directly to the image file and there is no need to be
concerned about synchronising with a separate database when you delete/add
images. This is of note to you since it reduces the amount of code you
have to write and limits possible inconsistencies.
- simple mental model - each file has a list of attribute/value pairs that
you set/read with a single function
- probably more space efficient than a database if you are going to have
lots of different properties for different images
*cons
- easy to lose attributes if you copy files
- slightly trickier to use with a scripting language since I doubt that
there would be much builtin support in most languages but I could be wrong.
- if your images are stored in a complex directory structure then your code
will have to deal with navigating through the whole tree for each query.
This will cause two problems: firstly increased code complexity and
secondly speed will suffer and your system will not be very expandable - as
you add more images your search speed will slow much more perceptibly than
with a database approach.
b) Separate text file for each database file (same as extended attributes
but using a separate file to store them):
*pros
- compared to extended attributes there is less risk of accidentally losing
attributes due to copying files
*cons
- compared to extended attributes there is much more need for synchronising
data to ensure for example that when eg. you move an image file you also
move the corresponding text file; when you you delete an image file you
delete the text file etc.
- you will need to write routines to search through the text file to find
the attribute you are looking for and to set/overwrite attributes - messing
around that you could avoid.
In other respects they are the same.
c) A standalone database file:
*pros
- much faster and more expandable than extended attributes/text files. You
can index them for speed and they are designed better for this task. As
you add more data you will see much less slow-down than with the previous
approaches.
- simple function calls to access and available through many scripting
languages
*cons
- you need make sure you synchronise data so that as you move, add and
delete images on disk you update the appropriate data in the database file
d) A database server:
*pros
- possibly faster than standalone file especially as your dataset grows
- able to perform more complex queries without having to script them using
the full power of SQL - this could save lots of coding time if your queries
become complex
- able add and query betweeen different tables so if you start getting into
deeper analysis you can be sure your environment will support it
*cons
- need to setup and maintain a server with all the admin (not much once
setup) and memory overhead that entails
- as with standalone db file you will have synchronisation issues
I'm not going to tell you what's best as I still don't know your situation
well enough. Suffice it to say that even the most complex of these
solutions (the database server) is fairly trivial to setup - MySQL does the
job well and can be readily accessed through Perl, PHP, C, Python, you name
it...
>>> Spreadsheets usually don't handle picture entries. Real database are too
>>> complicated. My best idea so far was using HTML with some complicated
>>> scripting.
>> Well you're mixing concepts up in that paragraph. The first concept is
>> the storage of the data (eg. in a spread***/database) and the second is
>> the access of the data (eg. through HTML and scripting).
> I see. In my case the data is fairly simple (some number and filenames),
> so the main goal is to write a quick hack for creating print-outs.
If it's simplicity you're after then some Perl/Python/shell scripting based
on HTML templates that you design should work fine.
- Previous message: genomega: "Re: Looking for the best (free) Linux distro (LiveCD or Install CD)"
- In reply to: Anton Suchaneck: "Re: Method/Program for small "database" with some pictures entries"
- Next in thread: Anton Suchaneck: "Re: Method/Program for small "database" with some pictures entries"
- Reply: Anton Suchaneck: "Re: Method/Program for small "database" with some pictures entries"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]