Re: public distributed filesystem

From: IsaacKuo (mechdan_at_yahoo.com)
Date: 09/27/05


Date: 26 Sep 2005 15:07:58 -0700


John Hasler wrote:
>Isaac Kuo writes:
>>To fairly balance load, perhaps the system could be set
>>up to exchange data blocks--you only get to send a data
>>block to a peer if you accept one of his data blocks in
>>return.

>An "upload ratio" requirement is a good (though not
>really enforceable) idea,

It needs to be enforceable, and I described a way to
enforce it (using periodic random checks). Otherwise,
leechers will get free storage and non-leechers will
get screwed.

>but I don't think you should send specific blocks
>to specific hosts.

I don't see why not. Obviously, the user shouldn't be
micromanaging everything manually, but rather the
software figures out on its own where to send blocks
of data.

>This should look to the client like a huge distributed
>filesystem.

It should look that way to the user, but the software
needs more detailed information about where the data is.

>We need some sort of distributed mechanism for
>determining where the blocks go

I think a client-server model should be fine for this.
You have a simple server which keeps track of client
locations (IP address, port). A client can "log in"
or "log out" with the server, as well as request a
randomized sublist of client locations from the server.
The client thus builds up a list of potential peers
to contact for saving blocks of data.

Theoretically, the bandwidth requirements for the
server should be very low, and you can have multiple
servers. For the most part, a client will only
make a query when it needs more storage space, and
this is limited by how much storage space it has
itself to share.

>(and, as I mentioned elsewhere, it should not be
>possible to recover any information from the blocks
>stored by any single host.)

IMHO this is a dubious requirement. There's no way
to ensure that a blackhat doesn't have access to
MULTIPLE hosts, which in total have enough blocks
to recover original data. Therefore, security must
depend upon the original data being strongly
encrypted.

IMHO, the advantage of distributing the file blocks
is mainly in maximizing redundancy. Any security
benefit is just icing on the cake.

>>Periodic random checksum queries could confirm
>>whether a peer has honestly kept your data block
>>in storage.

>That would mean that you would have to actively
>know who has blocks of yours. I don't think that's
>a good idea.

Why not?

>If he dumps blocks he should just lose points with
>the system (points for uptime, capacity, bandwidth,
>reliability, etc).

And how do you suppose it's possible to detect when
blocks are dumped? If there is no feedback mechanism,
then you have no way to verify whether or not a data
block is actually being stored. Hacked software can
conveniently misreport how much data is actually
being stored. However, with random checksum queries,
there's no way to fake it--the only way to correctly
respond to any checksum query is to essentially store
all of the block's original data.

The simplest verification system would be to query a
single random byte of data. However, a leecher might
successfully fake it by sheer luck a few times by
storing a fraction of the original data. A random
checksum could instead query against a large section
of the file at a time (for instance, a checksum of
every fifth byte). This makes it unlikely for a
leecher to successfully fake even a single query.

Isaac Kuo



Relevant Pages

  • Re: High CPU in client (Excel, OWC, Proclarity, etc.) accessing Analysis Services
    ... We have encountered similar problems not limited to Excel, but every client. ... occur when there are many dimensions nested on each other on a single axis ... > Performance Guide to optimize the query, the cube, the server, etc. ... > While the query was grinding, the server was doing absolutely nothing. ...
    (microsoft.public.sqlserver.olap)
  • RE: LDAP query for xerox 3545 printer/scanner blocked by SBS 2003?
    ... Client and printer pointing to SBS for DNS ... LDP query from another workstation was successful ... Step2:Please ensure the DNS on workstation is pointed to SBS server. ... Monitor will capture some packets which pass through this interface. ...
    (microsoft.public.windows.server.sbs)
  • Re: Parameterized Query
    ... The only point for a parameter, client side, is the where clause and you ... benefit from execution cache plans on a server like SQL Server. ... if the sql in the sproc is fully qualified. ... Assembling the query, client or server side, will take very few cycles. ...
    (microsoft.public.dotnet.framework.adonet)
  • Re: VLDM Slow performance on Client PC
    ... from the server to the client. ... you can force the execution of the query on the server side rather the ... it creates another system process to hold the dimension in memory. ...
    (microsoft.public.sqlserver.olap)
  • Re: getting WSAECONNABORTED on recv() on Windows XP installations
    ... I agree it could be a peer side problem just as ... readily as a client problem/. ... I don't really have easy access to the server ... >> most Windows XP installations. ...
    (microsoft.public.win32.programmer.networks)