  Technical Papers
  several
  (ed. 98/2/4)

  This is a collection of technical papers and documents that were
  posted to the mailing list. It is not necessarily GGI-related, but
  might be useful for GGI development.
  ______________________________________________________________________

  Table of Contents


  1. Vertical Retrace Detection

     1.1 Summarizing what has been discussed
     1.2 Using the vertical blank

  2. Semaphores

     2.1 Semaphores
     2.2 System V semaphores
        2.2.1 (TT
        2.2.2 (TT
        2.2.3 (TT
        2.2.4 (TT
     2.3 POSIX semaphores
        2.3.1 (TT
        2.3.2 (TT
        2.3.3 (TT
        2.3.4 (TT
        2.3.5 (TT
        2.3.6 (TT

  3. Threadsafe programming

     3.1 Preface
     3.2 Cultural
     3.3 Basic
     3.4 In GGI
     3.5 Tricky


  ______________________________________________________________________

  1.  Vertical Retrace Detection



  1.1.  Summarizing what has been discussed

  By Brian Julin, Jan 28, 1998:


  1. If an IRQ is available (we haven't seen many SVGA cards that do
     more than _say_ they can activate IRQ), the kernel _could_ catch
     it. However, latency is high for a signal to get to user space, so
     the IRQs are of limited use.  We would actually want the signal to
     arrive when _blanking_, not vretrace is active anyway, and for the
     signal to be issued even before that so the application gets it at
     or right before blanking starts.

  2. Some common uses for vretrace (page flipping) should be available
     as kernel-space driver functions to avoid this latency.

  3. If using the RTC, it has to be set up for one-shot mode and reset
     on every trigger, as well as calibrated with the hardware, the way
     the PC speaker driver does it, but fortunately at a much lower
     frequency :-).  The periodic RTC freqs don't yeild good results
     because they are not flexible enough.

  4. Very few cards seem to allow you to read the CRTC address counter
     back out of the card.  This being the case, the only way to know
     when blanking is asserted is to calibrate with the vsync pulse via
     polling the "retrace active" registers, and calculate the amount of
     time to offset the event from the mode parameters.  Thus you have
     something like:

     ___________________________________________________________________

              (RTC IRQ)  -> Send Event
                            Set one-shot to 5ms
                            return
              (RTC IRQ)  -> Poll until vretrace bit active
                            unset vretrace bit latch
                            do some calibration calculations.
                            Set one-shot to 60ms + error term
                            return
     ___________________________________________________________________



  5. The kernel clock calibrator might benefit from using the "hretrace
     active" register once it has locked onto the right scanline if it
     does not drift too far.  It still must check back occasionally and
     verify sync with the vretrace.

     The new method of sharing the GC page could allow for some driver-
     libs to aid calibration from user space -- a field could be set in
     the shared page by the driver when the event is issued.  The
     driver-lib could compare jiffies and pass back an average latency
     for the signal and context switch.  Then the clock sync loop would
     be able to work that latency into the offset it applies to the
     vsync pulse.  Thus only the kernel ever does active polling, and
     does so very efficiently we can hope.


  1.2.  Using the vertical blank


  Sengan Baring-Gould found this method, using not the vertical retrace,
  but the vertical blank. The author is Bruce Foley,
  brucef@central.co.nz.

  Sengan quotes:

  This whole issue of snow while using mode 13h baffles me a bit.

  I never really had trouble with this even when I was using the
  vertical retrace to time the writing of my double-buffer out to video
  memory.

  I did have a problem with slightly glitchy scrolling though.  Not so
  much in native DOS, but in a windows 95 DOS box it was noticable
  enough to be annoying.  I guess the main reason for this was that a
  Win95 DOS box steals back chunks of system resource via interrupts...
  The price we pay for preemptive multitasking I guess.

  Anyway, using the vertical non-display has all but eliminated this
  last little problem.  I think the main reason for this is the huge
  amount of extra time you get to write out your data.  Here are the
  timing comparrisons on a VGA, according to Wilton:

  Vertical Retrace:       0.064 Milliseconds

  Vertical Nondisplay:    1.430 Milliseconds (!)

  (timings based on 640x480)

  In computer terms, that's a big difference!!! 1.366 Milliseconds extra
  for writing out that doublebuffer - awsome.

  The price comes in the form of more complex code.  This is because
  unlike the vertical retrace, there is no VGA status register that we
  can read to find out when the vertical non-display is actually
  happening.

  The solution to this is bit 0 of the VGA CRT status register.  This
  bit signifies the Display Enable state.  0 means enabled, 1 means
  disabled.  This bit toggles between these two states during the
  horizontal retrace, and guess what?  Once the gun hits the bottom of
  the screen, it stays off until it starts drawing lines at the top
  again.

  This means all we have to do is time how long it takes for a
  horizontal retrace to occur, and store this value as a trigger.  When
  the the Display Enable is disabled (1) for longer than our trigger
  value (which we double -just to be sure) then we know that the
  Vertical Non-display has occurred, and can safely write to video
  memory without risk of getting caught in the middle of a redraw.  Cool
  eh?

  Implementation will come in two components then.  One to record the
  trigger value, and another one, which will delay writing to Video
  Memory until the trigger value has been exceeded.

  Your client program (probably C, like mine) will look like this.


  ______________________________________________________________________
  int vndtime;            // Our trigger variable

  void Initialize(void)
  {
  ....
    vndtime =  vndtimeout();              // trigger returned as a 16 bit
  value
  ...
  }
  ______________________________________________________________________



  Then, somewhere in your main line, when you are ready to write out
  your double buffer, you call the routine (in external 386 enabled
  Assembler I hope!)  to do it.  You will need to amend the routine to
  examine bit 0 of the CRT register, and do some time-out logic against
  the previously stored trigger.

  Knowing that you have definately read the above before looking at the
  code below, it should all make perfect sense.  Here is the code
  fragment for getting the trigger value:







  ______________________________________________________________________
  VGA_INPUT_STATUS_1              EQU     3DAh

  PUBLIC _vndtimeout

  _vndtimeout PROC

  mov     dx, VGA_INPUT_STATUS_1

  ; wait for vertical retrace

  L101:
  in      al, dx
  test    al, 8
  jz      L101

  ; initialize the loop counter
  mov     cx, 0FFFFh

  ; wait for display enable

  L102:
  in      al, dx
  test    al, 1
  jnz     L102

  ; wait for the end of display enable

  cli

  L103:
  in      al, dx
  test    al, 1
  jz      L103

  ; loop until display enable becomes active

  L104:
  in      al, dx
  test    al, 1
  loopnz  L104

  sti

  neg     cx      ; make CX positive
  add     cx, cx  ; double it for safety
  mov     ax, cx  ; save the result in return register

  ret

  _vndtimeout ENDP
  ______________________________________________________________________



  Ok.  ax now holds the trigger, which MSC & Borland interpret as the
  return variable for word sized values.

  The next code fragment makes use of this variable by pausing within a
  loop until the trigger value is exceeded.  Note that it should be used
  just prior to your access to video memory.  I actually included this
  code in the same routine that does the double buffer copy (via MOVSD).
  The reason being that if your code was in another routine, then there
  is overhead code being executed before Video Memory is accessed.  A
  waste of precious time!!!

  ______________________________________________________________________
  EXTRN _vndtime:WORD             ; Our C trigger value

  PUBLIC _write_double_buffer

  _write_double_buffer PROC

  mov     dx, VGA_INPUT_STATUS_1
  cli                             ; Stop interrupts

  ; wait for display enable

  L201:
  in      al, dx
  test    al, 1
  jnz     L201

  ; wait for the end of display enable

  L202:
  mov     cx, _vndtime    ; CX = maximum number of loops

  L203:
  in      al, dx
  test    al, 1
  jz      L203

  ; wait for display enable

  L204:
  in      al, dx
  test    al, 1
  loopnz  L204
  jz      L202    ; jump if display enable detected
  sti
  ______________________________________________________________________


  If we get to here, then it means the Display Enable Bit has been set
  to 1 for longer than our trigger value.  The Vertical Non-display is
  now happening, so you can start writing out to video memory....


  Well, that's it.  Even if you don't understand the above straight
  away, I can vouche for its correctness.  In all honesty, that is
  pretty much how it appears in the book.  So as long as you understand
  the principle of what it is doing and how to use it, then you are away
  and laughing.

  Since I have just spent the last hour or so of my time putting this
  all together, perhaps you can return the favor and tell me about what
  you do, and perhaps why it is that you spend hours staring into a
  computer screen (much like myself).

  I have been a programmer for many years, but have only recently got
  into PC based programming.  I love it, since it so much less
  restrictive than what I am used to.  I dream of being a games
  programmer, and plan to go part time in my job after I have released
  my first game (probably as shareware) and after I have finished a
  major project I am heavily involved in at work at the moment.

  Anyway, write back, and let me know how you guys get along,

  All the best,


  Bruce.

  --------------------------------------------------------

  This next mail is in response to questions about the above...

  -----------------------------------------------------------

  I will try to answer some of your questions...

  As you know, the Vertical Retrace (VR) is defined by bit 3 (I think)
  of the CRT register being set to on (1).  When this bit is on, we know
  that the display gun is doing a diagonal retrace from the bottom right
  of the screen back to the top left.

  Ok, so now you would like a definition of the VN right?  Consider it
  to be a superset of the VR.  The VN includes the VR but also includes
  some extra time BEFORE the VR has even started.  The key to
  understanding this lies in bit 0 of the CRT register.  As you know,
  this bit is like a mini VR, except it is for the horizontal retrace
  (HR).  The time it takes for each HR is constant.  Therefore, we can
  measure the amount of time that the a single HR takes while the screen
  is being drawn.  That is what the first routine I gave you does.  If
  you look near the end of the code in the first routine, you will see
  me add cx to cx.  cx holds the time that the HR took.  It is doubled,
  simply as a safety precaution -since we will be using it as a trigger
  to tell us when the VN is actually happening.  Consider this: once the
  gun hits the bottom of the screen, the HR stays set to 1 until the VR
  has completed and starts drawing lines at the top of the screen again.
  All we need to do is check to see if it has been set to 1 for an
  amount of time that EXCEEDS the value computed (and stored in cx) from
  the first routine.  If it does, then we know that it must have
  finished drawing the screen.  At this point, it is still a long way
  off from starting the VR, as the timings indicate.

  As for the second routine, this was to demonstrate how you would use
  the value calculated in the first routine.  the variable _vndtime is a
  C variable that holds cx from the first routine.  You will notice I
  moved it from cx to ax just before the ret.  This is because C
  programs that accept 16 bit return values expect them to be in ax.

  The second routine is somewhat confusing to read and understand.  This
  is because of its subtle use of loopnz and jz instructions.  In
  english, the logic flow goes something like this:


  1. Load cx with the trigger value.

  2. Loop 1: Repeat until a scan line is being drawn.

  3. Loop 2: Repeat until either cx = 0 or a scan line is being drawn.

     (The key to this loop is the loopnz instruction.  It decrements cx
     and also checks the zero flag)

  4. Go back to loop 1 IF a scan line was drawn.

     (The zero flag would have been set via the test instruction if this
     was the case)

  5. If we are here, then cx must have been decremented to zero!  This
     means our trigger timeout value has been reached and the VN is now
     official.  We can start writing to video memory right now, even
     though the VR has not started yet.  This gives us a big head start
     in terms of the amount of time we have available before the screen
     starts to repaint itself.
  2.  Semaphores

  (by MenTaLboY mentalboy@geocities.com) This document explains SYS-V
  and POSIX semaphores.


  2.1.  Semaphores

  In general, a semaphore is a 'counter' that you can increment or
  decrement atomically, and also wait for it to reach a certain value.

  Semaphores are typically used in situations where you have a certain
  number of resources, and wish to be able to readily block on the
  condition of unavailible resources.

  Generally, before trying to allocate a resource, you wait for the
  associated semaphore to become nonzero and decrement it atomically
  when it does become nonzero. After you have freed the resource, you
  increment the associated semaphore.


  2.2.  System V semaphores

  SYSV IPC uses 32-bit identifiers, called 'keys', to identify shared
  IPC objects. Generally, each application will have a key hard-coded in
  it, or will generate one at runtime based on the properties of a
  specific file.  Unfortunately, there is no way to be absolutely sure
  that a SYSV IPC key is unique for your application.

  The most commonly used method of generating an IPC key that is
  reasonably unique is to use the ftok() function. It requres
  sys/types.h and sys/ipc.h, and its prototype is as follows:


  2.2.1.  key_t ftok(char *pathname, char proj_id);

  proj_id is an 8-bit value used to (hopefully) make the key somewhat
  more likely to be unique. There is no established system for selecting
  project ids, though. In fact, from the ftok manpage BUGS section:


       The generated key_t value is obtained  stat-ing  the  disk
       file  corresponding to pathname in order to get its i-node
       number and the minor device number of  the  filesystem  on
       which  the  disk file resides, then by combining the 8 bit
       proj value along with the lower 16 bits of the i-node num-
       ber,  along  with  the  8 bits of the minor device number.
       The algorithm does not guarantee a unique key  value.   In
       fact

       o  Two  different  names linking to the same file pro- duce
          same key values.

       o  Using the lower 16 bits of the i-node number, gives some
          chance  (also usually small) to have same key values  for
          file  names  referring  to   different i-nodes.

       o  Not  discriminating  among  major  device  numbers, gives
          some chance of collision (also usually small) for systems
          with multiple disk controllers.


  Now, assuming you have computed (or have determined at compile-time) a
  key that all instances of the application will use (i.e. by each
  instance of the application ftok()ing the same file), you're ready to
  get access to the set of semaphores using that key. You do that with
  the semget() call (defined in sys/sem.h).


  2.2.2.  int semget(key_t key, int nsems, int flags);

  key is the key to used to create the semaphores 'on', nsems is the
  number of semaphores you want in the set, and flags can be any
  combination of IPC_CREAT and IPC_EXCL, whose meanings are similar to
  the flags of the open(2) call. The lower 9 bits of the flags specify
  user, group, and other permissions for the set of semaphores. See the
  semget manpage for more details of permissions.

  The function returns yet another integer identifier that you use to do
  operations on that set of semaphores.

  Most manipulations of a semaphore set are done through the semctl()
  call:


  2.2.3.  int semctl(int semset_id, int sem_number, int command, union
  semun arg);

  semset_id specifies the particular set of semaphores to operate upon,
  sem_number, when appropriate, specifies a particular semaphore within
  that set, and command specifies what to do to that semaphore or set.
  commands include:

     IPC_STAT
        get permission info and put it in the structure specified by
        arg.buf

     IPC_SET
        set permission information, if you have permission to do so

     IPC_RMID
        destroy the semaphore set and wake up all processes waiting on
        it with the error code EIDRM

     GETALL
        obtain the current values of all the semaphores atomically

     GETVAL
        obtain the value of the specific semaphore in the set atomically

     SETALL
        set the values of all the semaphores in the set atomically

     SETVAL
        set the value of a particular semaphore, atomically

  For more commands, and specifics of each, look at the semctl(2)
  manpage.


  2.2.4.  int semop(int semset_id, struct sembuf *semops, unsigned
  nsops);

  The function semop() is used to do operations such as incrementing and
  decrementing semaphores in a set, and waiting for semaphores to reach
  certain values.

  semset_id is the id of the semaphore set to operate upon, and semops
  points to an array of semaphore operations to carry out. If all of the
  operations in semops cannot be carried out, none are. nsops specifies
  the length of the semops array.

  See the semop(2) manpage for details and types of semaphore
  operations.


  2.3.  POSIX semaphores

  All POSIX semaphore functions and types are prototyped or defined in
  semaphore.h, and are completely unrelated to SYSV semaphores.

  To create a new semaphore, you use sem_init().


  2.3.1.  int sem_init(sem_t *sem, int pshared, unsigned int value);


  o  sem points to a semaphore object to initialize

  o  pshared is a flag indicating whether or not the semaphore should be
     shared with fork()ed processes. LinuxThreads does not currently
     support shared semaphores

  o  value is an initial value to set the semaphore to

  2.3.2.  int sem_wait(sem_t *sem);


  o  sem_wait blocks until the specified semaphore object's value is
     greater than zero. it then decrements the semaphore's value by one
     and returns


  2.3.3.  int sem_trywait(sem_t *sem);


  o  like sem_wait, but returns immediately and does not decrement the
     semaphore if it is zero


  2.3.4.  int sem_post(sem_t *sem);


  o  increments the value of a semaphore


  2.3.5.  int sem_getvalue(sem_t *sem, int *valp);


  o  gets the current value of sem and places it in the location pointed
     to by valp


  2.3.6.  int sem_destroy(sem_t *sem);


  o  destroys the semaphore; no threads should be waiting on the
     semaphore if its destruction is to succeed.


  3.  Threadsafe programming

  (by Rodolphe Ortalo,  1998/01/30 (ed. 98/2/4)) General thoughts and
  information about threadsafe programming.




  3.1.  Preface

  Well... Threadsafe programming is not such a difficult thing, but you
  need a clear mind concerning mutual exclusion, semaphores/mutexes,
  and, as a corollary who Djikstra is.

  I've spent several months working on the Chorus microkernel where
  threads are the common (and not the special) thing. Maybe I can state
  a few general things. (Section titles are here to help the hurrying
  reader.)

  3.2.  Cultural

  First, Djikstra is a computer science professor who made several
  theoretical contributions in the programming languages area (and
  probably many other areas :-). He is most famous due to his
  "invention" of computer related semaphores (which have nothing to do
  with traffic lights of any kind).


  Second, in my opinion, the easiest way of considering the problem of
  thread protection is to view a multiple-threaded program as a PARALLEL
  program. Think of the problem with ALL threads executing
  simultaneously (really simultaneously) even though it is not the case,
  and you will see the annoying cases much more easily. (And let the
  linux kernel do its scheduling in the order he wants. Our parallel
  brain has a better way of exhibiting the problems.)



  Third, there are two relatively different problems addressed by the
  use of semaphores and mutexes:

  o  threads synchronization;

  o  and thrads shared data access protection.

  The latter is probably the easiest way of seeing the problem, and the
  most common case. Synchronization may also be an issue, but... well,
  let's speak of the data protection.


  3.3.  Basic

  The problem: when parallel (or real asynchronous) control flows (ie:
  threads) access the _same_ data, you HAVE to protect such accesses.
  Full stop.  There are read/write differences, but, you may think to
  the problem in a simple way: shared data should be protected so, you
  should:

  o  do something before accessing it,

  o  and do another thing after (in order for the other threads to
     access it).

  ==> Suppose you have a global variable called "the_data" used by
  several threads. You _must_ protect it, so you use a mutex:
  "the_data_mutex".  Think to the mutex as a switch (or a barrier with a
  big Guardian Minion).  Before accessing the_data (potentially accessed
  by multiple threads) you do : MutexGet(the_data_mutex). You will be
  allowed to proceed only when the_data is free (and you will sleep
  until then).  After you've done things with the_data you do:
  MutexRelease(the_data_mutex).  Then you get out of the protected area
  and other threads may be allowed to enter in it.


  NB: "local" C variables are really local (they are on the stack and
  each thread has its own stack). So you never need to protect them, of
  course.

  Here we are. We have protected a simple thing. Of course, you may put
  a big abstract mutex "the_mutex_for_everything_shared" in your
  program, and then Get/Release it each time you do something in the
  global space.  But this would be relatively inefficient.  So if you
  have "first_data" and "second_data", and their repective:
  "first_mutex" and "second_mutex", you think that you should do
  Get/Release(first_mutex) before accessing first_data, and
  Get/Release(second_mutex) before accessing second_data.  You are right
  !

  But: there is a new problem arising now. Consider 2 threads doing
  this:


  ______________________________________________________________________
        Th. 1                                     Th. 2
          |                                         |
   MutexGet(first_mutex)                 MutexGet(second_mutex)
          |                                         |
   MutexGet(second_mutex)                 MutexGet(first_mutex)
          |                                         |
  MutexRelease(second_mutex)            MutexRelease(first_mutex)
          |                                         |
  MutexRelease(first_mutex)             MutexRelease(second_mutex)
          |                                         |
          +                                         +
  ______________________________________________________________________



  Each of these threads seems to do the right thing, but in fact, they
  don't! We are completely asynchronous, so the worst can arrive.
  Suppose Th.1 successfully Get the first_mutex, and then just after Th.
  2 successfully Get the second_mutex.  After that, Th.1 will try to get
  the second_mutex and Th.2 the first_one, and none of them will never
  get it. They are waiting for each other to release it so... This is a
  DEADLOCK.

  ===> Therefore, the conclusion is: threads should do The Right Thing,
  i.e. always access the mutexes in the same order !

  This is a general rule: always access the mutexes in the same order!
  If you ever get fb_mutex before gc_mutex, you should always do this in
  all your threads! Of course, you may access only fb_mutex or gc_mutex
  independently... But if you get gc_mutex, don't ever try to Get
  fb_mutex. Okay ?


  3.4.  In GGI


  In GGI: it seems that the IS_CRITICAL, ENTER_CRITICAL and
  LEAVE_CRITICAL macros implement the "the_mutex_for_everything_shared"
  abstraction.

  So, if you do something that has to be thread-safe, you may surround
  all your code with ENTER_CRITICAL and LEAVE_CRITICAL pairs. This may
  be inefficient, but well, probably not, and you are safe at least.  It
  means that nothing else will be done by any GGI code before you
  LEAVE_CRITICAL. (To others: Is it the actual meaning of these macros
  ?)

  NB: A code section surrounded by a MutexGet/Release pair is generally
  called a critical section (hence the macros names). But I don't really
  like this naming, because in fact a section is always critical with
  respect to something (the data accessed, and nothing else). You can
  encapsulate _several_ critical sections inside each other without any
  harm if you are careful. You may even interleave several critical
  sections without any harm, but, you have to be _very_ careful with
  this (and I wouldn't recomment this to anyone).

  You should not protect the library calls, you should protect _your_
  data ! (Well, if you program a library, your data is library data,
  but...:-) Of course, if you use thread-UNsafe libraries, your program
  will not be thread-safe either and you are doing extra work. But well,
  libggi is supposed to be thread-safe one day (and then, it must rely
  on a thread-safe libc for example...).  However, if you pass a
  "pointer" to a library function, the associated data is _your_ data
  (even though the library will modify it, you simply delegate the work
  to it). So you should protect it: but only if your main program is
  multi-threaded.


  The problem with GGI accelerated commands is that, implicitly, a new
  "thread" (in fact a control-flow) can always appear: the "program"
  that executes on the card's chipset (bitBLT engine, 3D engine). The
  data to protect (your data) is the framebuffer... We could use a real
  mutex to protect this things, but well, it is simpler to have a
  "wait_until_finished" approach. You can (and you will be obliged) to
  wait for the on-video-board "thread" to finish.  Hence, each time you
  want to draw on the framebuffer ((either yourself or thanks to the
  library), you must do a ggiFlush (or maybe ggiFinish as Andreas wanted
  to change the name), to let the engine leave the framebuffer to you.
  If you don't do that: you may face a data corruption problem (as with
  first_data or second_data). (NB: It is not so "critical" for graphical
  data, it's "only" ugly...)  You may ask me: but, well, we don't have a
  true mutex, so, if I think truly "parallel", what about the case where
  the 2D engine starts drawing something on the framebuffer and I start
  also ? Should I use ggiFinish _before_ drawing. The answer is no: the
  underlying assumption is that the accelerated engine will never start
  something by itself (at least I hope so). So, you should ggiFinish
  only after a drawing command (and before the next one of course).  By
  the way, we also do a synchronization with ggiFinish as the _whole_
  framebuffer will be "ready" after that call.

  Why not implement a "complete" locking system on the framebuffer (ie:
  Get/Release, or Enter/Leave) ? Simply because it is a very big data,
  and it would be inefficient. Furthermore, unless you use the card's
  memory to store program code, corruption of this big data is only
  garbage on screen, so... Let it be simpler, and the programmer will
  check that things are correct, and be synchronous if he doesn't want
  to worry about this. In this case, the programmer forgets about the
  bitBLT or 3D engine running asynchronously, and simply says to the
  libggi: be SYNC and worry about these stupid things yourself.  This is
  a "convenience" mode, that effectively makes use of a complete locking
  semantic on the framebuffer (maybe a clever one, but well only because
  we are good persons) and manages it itself.  If the programmer wants
  to go ASYNC, then, _he_ will have to manage the ggiFinish problems
  himself. The advantage ? He _can_ do a finer grain control of the data
  and make his application draw in the framebuffer at the same time as
  the card engines, (hopefully in different areas, or the result will
  be... funny :-). In this case, GGI (kgi+libggi) only does controls wrt
  hw safety (prevent hardware locks if needed, etc.).  We may rename
  these macros: GARBAGE_FORBIDDEN, GARBAGE_ALLOWED... :-)

  NB: However, even if you are synchronous wrt graphics (ie: a single
  drawing thread), this does not mean that your program shouldn't
  protect any data. If the AI thread sends data to the drawing thread,
  these data should be protected in the application program.


  3.5.  Tricky

  Now, that you have two simple rules. Some other ones (left unjustified
  here):


  o  don't use a C int to protect the data ! Always use a mutex (or a
     semaphore if you know how it works) ! It is a common fault (and
     sometimes not a fault). This is architecture dependent ! The final
     operation behind a mutex MUST be atomic, really atomic. Therefore,
     it should finish by a test-and-set _hardware_ instruction. I don't
     think C guarantees its test-and-set language ops. to be atomic at
     the hardware level... So you must use a dedicated library. (NB:
     This is why a mutex is never called an int...) Don't try to
     implement a mutex yourself, it may be tricky (it may work on x86
     with gcc and not on alpha with cc for example).

  o  don't do this in too many very concurrent threads:

     ___________________________________________________________________

             while (1) {
                     MutexGet(a_mutex);
                     /* some work */
                     MutexRelease(a_mutex);
             }
     ___________________________________________________________________


  if you are not sure that a "schedule" entry point may appear in the /*
  some work */ (a system call for example). Use semaphores (or verify
  the "semantic" of mutexes). In this case, you may finish with a new
  "starvation" problem: on preemptive systems you only create a perfor-
  mance slowdown, on real-time system, some threads will never execute.

  (NB: This problem is very rare on real systems. In fact I don't know
  any system on which it appears. But this is a symptom showing that a
  mutex is _not_ a semaphore with initial value of 1. Theoretically at
  least, in practice, it is always implemented in this way, and that's
  why you never face the problem ;-)

  o  be careful on our Unix systems... We don't have true parallel
     systems, we have time-sharing deterministic systems. This means
     that your program may work _perfectly_, on all the runs you can
     try, and _not_ be thread safe. This is simply because, with your
     environment, the deadlocks or data corruption conditions never
     occur. If you change the workload, or the time-slice, or something
     else, it may fail. These faults are _very_very_very_ difficult to
     identify a posteriori. The best solution is to READ the code and
     check that you have no obscure zone that you don't understand. (NB:
     This a general good coding practice by the way: I know that, I
     rarely do so and I have lots of bugs. :-)

     If you have more precise questions or comments, you are welcome, of
     course.








