Re: thread memory size





David Schwartz wrote:

You are still confused. "Lock-free" is a misnomer. There are still
locks, they are just not locks that block. For example, the primitive
lock-free algorithms are made out of contain atomic 'compare and
replace' operations or atomic 'swap' operations. These operations
internally contain locks, for example, they lock cache lines.

The fact that in the strictest technical terms a CAS operation does
create a lock, but does not block, is true. But a CAS operation returns
if it fails to succeed. That gives it the opportunity to try again. At
the hardware level it will succeed, eventually. Even with four billion
threads contending for the value, it will succeed almost instantly (It
does not take that long to 4 billion such operations)

Also, CAS and similar operations depend on the cpu architecture and
which specific instruction it has implemented. The simpler ones allow
for less concurrency than the more advanced ones. The latest addition of
advanced concurrent CAS operations are AMDs ASF for AMD64 (Advanced
Synchorinisation Facility). Its experimental yet, but seems to promise
even more concurrency within CAS operations than previous types of CAS
operation.

On a software level, the operations are not single instruction atomic.
This means that a critical region can obtain one lock, then obtain
another lock and cause a deadlock. Even how one accesses a single
variable can be subject to the lock strategy chosen. If one chooses an
operation that specifically uses a CAS instruction only, then its the
same as if it were hardware, but if one uses critical regions or the
like, then there is a high probability it will cause problems.

So even if there is locking in the cpu, it is irrelevant. The reason for
this is that a (software level) lock-free algorithm, can use CAS
operations and obtain a much higher level of concurrency than a lock
based algorithm. Now, with a message passing implementation that is so
simple, in both implementation and model, it will be able to scale and
perform pretty well using these techniques.

+++++++++++++++++++++++++++++++++++

Now lets get back on track again. Unfortunately this discussion have
become a little pedantic and possibly a little self serving, that is
because it was sidetracked into discussing technical details of lesser
importance.

If I were to implement this within one process (i.e. threads shared
memory), I would probably implement a message queue algorithm, that uses
optimistic locking by help of CAS operations or using a state machine.
It depends on which gives the best performance.
That's a good implementation for single process communication between
threads. It can use message passing which reduced the thread level
contention one would normally get if one used lock based algorithms. So
it increases concurrency, scalability and performance by concurrency. It
does come with a small threads level decrease in efficiency, but the
scalability and concurrency outweighs that disadvantages by so much that
it is nothing to care about. As concurrency mechanisms increase its
performance other strategies can be chosen.

But that misses the functional requirement of being able to communicate
between processes on same or different machines. I.e the truly scalable
solution (the ultimate scalability would be atomic computing, but that's
a different matter). To be able to communicate between processes on
different machines network communication is need, that frees us from the
concurrency algorithms one normally would use if the program where
running in a single process. If set up properly, i,e, fiber cables for
networking and so on

But, using network to communicate messages within the same machine or
process is not very efficient so the implementation could be smart and
choose the mechanism depending on the proximity of the peer process.
That means it would use network, Shared Memory or process memory as needed.
But that leads us to a different discussion, how are the processes
interconnected, by 100/1000Mb network, by fibre, by supercomputer fibre
cable switches etc. It all depends on the efficiency of the
communicating technology chosen. With s slow network connection or IP
stack implementation one might want to try to use a Shared Memory
message passing implementation to make it more efficient.




The forest is this -- message passing is inefficient.

for the sake of clarity can you give some definitions and explain a
couple of things before we continue:

how would you implement it with shared memory and without message
passing? (overall solution)

what do you define as efficient and what do you define as inefficient?

The reason
people use it for performance and scalability is because if you limit
yourself to message passing only, you can get the concurrency of
threads without the inherent performance penalties of threads.

And that is not a good reason to use message passing?

Right, the advantage of message passing is that it doesn't have the
limitations of shared memory.

What limitations are you thinking of?

You would suffer all the disadvantages of
threads and get absolutely *none* of the benefits.

And what do you define as the disadvantages and benefits?



Regards

Jan
.



Relevant Pages

  • Re: Lock-free databases
    ... > lock, latch, enqueue, or other name is a lock for the purpose of this ... Database concurrency control. ... be it Oracle or SQL Server ...
    (comp.databases.theory)
  • Re: Are all Ruby built-in objects thread safe?
    ... is an easy way to provide _basic_ thread safety by wrapping all method calls ... dat = create_dat ... So in this case you need a lock around the _complete_ block of code. ... concurrency correctness of an application. ...
    (comp.lang.ruby)
  • Re: Inode Lock Scalability V7 (was V6)
    ... We know how to lock and modify them. ... to it (eg. the particular concurrency you pointed out in Dave's patch ... an inode and with i_lock, cover it from all icache state concurrency. ... start breaking bits and pieces. ...
    (Linux-Kernel)
  • Re: Followup to "Serious concurrency problems on fast systems"
    ... of concurrency control affect execution. ... Plain synchronized on a single shared resource. ... writes than normal synchronization. ... the Java 1.5 lock mechanisms. ...
    (comp.lang.java.programmer)
  • Re: How to update multiple rows atomically
    ... I don't want to lock just one up, ... be subject to deadlock or starvation? ... Whether it might experience deadlock or starvation or any other concurrency problem will depend on what concurrency mechanisms the dbms implements, which concurrency options the dba and various users choose, and how the dbms implements them. ...
    (comp.databases.theory)