Posts tagged ‘Threading’

Cancelor – a java task cancelation service

Lately I need to support task cancellation in a Java process I’m working on. The straightforward options I know to implement this are:

  1. Thread.interrupt() – the caller interrupts the worker thread (either directly or using Future.cancel()). Some say this is an erroneous approach, but I still haven’t figured out why. However, it is buggy on some recent versions on the JDK, and it is a bit fragile (what if the worker threads create worker threads that also need to be canceled?).
  2. Passing some object (AtomicBoolean?) down to every object you would like to support cancellation. These objects will check the value of this boolean, and should stop if it is false. They can pass the boolean to other objects / tasks. While this works, this boolean cannot be injected, and so must be manually passed along the call stack.

If you want the advantages of the second method, but don’t want to break IOC, here’s how:

First, the usage:

The listener object adds a dependency on ICancelor

public class Foo {
  public Foo(ICancelor cancelor) {
    this.cancelor = cancelor;
    ...
}

It then checks the cancellation state every now and then:

if (cancelor.wasTaskCanceled("TakeOverTheWorld"))
   return;

The top-level thread that wishes to cancel a task simply calls

cancelor.cancelTask("TakeOverTheWorld");

And whenever a task is started, you should call

cancelor.resetTask("TakeOverTheWorld");

I’ll admit using strings for task names is a bit ugly, but this is not a terrible price to pay, assuming you have a few core tasks you intend to support. All that remains is the cancellation service itself:

/**
 * A cancellation service.
 */
public interface ICancelor {
    /**
     * Resets a task to "Not canceled" state
     */
    void resetTask(String name);
 
    /**
     * Returns true iff the a cancelTask was called, and no resetTask was called afterwards.
     */
    boolean wasTaskCanceled(String name);
 
    /**
     * Cancel a task
     */
    void cancelTask(String name);
}
 
public class Cancelor implements ICancelor {
  private final ConcurrentHashMap tasks = new ConcurrentHashMap();
 
    public void resetTask(String name) {
        tasks.put(name, true);
    }
 
    public boolean wasTaskCanceled(String name) {
        Boolean value = tasks.get(name);
        return value != null & value;
    }
 
    public void cancelTask(String name) {
        tasks.put(name, false);
    }
}

Because we rely on task names, there is an assumption here that all classes that play in the cancellation game belong to the same task semantically. If a class is a common class that doesn’t belong to a single task or flow, this approach does not work – in fact, I cannot think of an approach that will work in this case with dependency injection. The common class has to accept the cancellation signal somehow, it must either get an boolean explicit and not from the IOC container, or must check its interrupted state (or some other thread-local state) itself. Any smart ideas on how to solve this problem?

Playing around with PLINQ and IO-bound tasks


I recently downloaded Visual Studio 2010 beta, and took the chance to play with PLINQ. PLINQ, for those of you in the dark ages of .Net Framework 2.0, is parallel LINQ – an extension to the famous query language that makes it easy to write parallel code (essential to programming in the 21th century, in the age of the many-core).

A code sample, as usual, is the best demonstration:

public static int CountPrimes(IEnumerable<int> input)
{
    return input.AsParallel().Where(IsPrime).Count();
}
 
private static bool IsPrime(int n)
{
    for (int i = 2; i*i < n; ++i)
        if (n % i == 0)
            return false;
    return true;
}

This code sample, regardless of using an inefficient primality test, is fully parallel. PLINQ will utilize all your cores when running the above code, and I didn’t have to use any locks, queues, threadpools or any of the more complex tools of the trade. Just tell PLINQ “AsParallel()”, and it works.

I hit some gotcha when I tried to compare the parallel performance with the sequential one. Do you spot the problem in the following code?

public static void CountPrimesTest(IEnumerable<int> input)
{
    // parallel benchmark 
    var timer = new Stopwatch();
    timer.Start();
    CountPrimes(input.AsParallel());
    timer.Stop();
    Console.WriteLine("Counted primes in parallel took " + timer.Elapsed);
 
    // sequential benchmark
    timer = new Stopwatch();
    timer.Start();
    CountPrimes(input);
    timer.Stop();
    Console.WriteLine("Counted primes sequentially took " + timer.Elapsed);
}


This is all fine and dandy when the task at hand is CPU bound, but works pretty miserabbly when your task is IO bound, like downloading a bunch of web pages. Next, I simulated some IO-bound tasks (I used Sleep() to emulate IO – basically not using a lot of CPU for every task):

[ThreadStatic]
private static Random _random;
 
public static List<string> FindInterestingDomains(IEnumerable<string> urls)
{
    // select all the domains of the interesting URLs
    return urls.AsParallel().Where(SexFilter).
                Select(url => new Uri(url).Host).ToList();
}
 
public static bool SexFilter(string url)
{
    if (_random == null)
        _random = new Random();
 
    // simulate a download
    Thread.Sleep(1000);
    var html = "<html>" + _random.Next() + "</html>";
    return html.Contains("69");
}

Testing this with a list of 10 URLs took 5 seconds, meaning LINQ again spun only two cores, which is the number of cores on my machine. This really sucks for IO bound tasks, because most of the time the threads are idle, waiting on IO. Let’s see if we can speed this up:

// Use WithDegreeOfParallelism to specify the number of threads to run
return urls.AsParallel().WithDegreeOfParallelism(10).Where(SexFilter).
              Select(url => new Uri(url).Host).ToList();

This appeared not to work at first, because WithDegreeOfParallelism is just a recommendation or upper bound. You can ask PLINQ nicely to run with ten threads, but it will only allocate two if it so chooses. This is yet another example of C# being more magical than Java – compared to Java’s rich ExecutorService, PLINQ offers less fine grained control.

However, further testing revealed the damage is not so horrible. This is what happened when I put the above code in a while(true):

Tested 10 URLs in 00:00:05.0576333
Tested 10 URLs in 00:00:03.0018617
Tested 10 URLs in 00:00:03.0013939
Tested 10 URLs in 00:00:03.0013175
Tested 10 URLs in 00:00:04.0018983
Tested 10 URLs in 00:00:03.0024044
Tested 10 URLs in 00:00:01.0004407
Tested 10 URLs in 00:00:01.0007645
Tested 10 URLs in 00:00:01.0007280
Tested 10 URLs in 00:00:01.0003358
Tested 10 URLs in 00:00:01.0003347
Tested 10 URLs in 00:00:01.0002470

After some trial and error, PLINQ found that the optimal number of threads needed to run this task under its concurrency guidelines is ten. I imagine that if at some point in the future the optimal number of threads change, it will adapt.

P.S.
If you found this interesting, wait till you read about DryadLINQ – it’s LINQ taken to the extreme, run over a cluster of computers.

Java is less magical than C#

I have been programming in C# for several years now, and recently made the switch to Java (at least for now). I noticed that Java, as a language, is “less magical” than C#.

What do I mean by that is that in C# things are usually done for you, behind the scenes, magically, while Java is much more explicit in the toolset it provides. For example, take thread-local storage. The concept is identical in both langauges – there is often a need for a copy of a member variable that’s unique to the current thread, so it can be used without any locks or fear of concurrency problems.

The implementation in C# is based on attributes. You basically take a static field, annotate it with [ThreadStatic], and that’s it:

[ThreadStatic]
private static ThreadUnsafeClass foo = null;
 
private ThreadUnsafeClass Foo
{
  get
  {
    if (foo != null)
      foo = new ThreadUnsafeClass(...);
 
    // no other thread will have access to this copy of foo
    // note - foo is still static, so it will be shared between instances of this class.
    return foo;
  }
}

How does it work? Magic. Sure, one can find the implementation if he digs deep enough, but the first time I encountered it I just had to try it to make sure it actually works, because it seemed too mysterious.

Let’s take a look at Java’s equivalent, ThreadLocal. This is how it works (amusingly enough, from a documentation bug report):

public class SerialNum {
     // The next serial number to be assigned
     private static int nextSerialNum = 0;
 
     private static ThreadLocal<Integer> serialNum = new ThreadLocal<Integer>() {
         protected synchronized Integer initialValue() {
             return new Integer(nextSerialNum++);
         }
     };
 
     public static int get() {
         return serialNum.get();
     }
 }

No magic is involved here – get() gets the value from a map, stored on the calling Thread object (source code here, but the real beauty is that’s it’s available from inside your IDE without any special effort to install it).

Let’s look at another example – closures.

In C#, you can write this useful piece of code:

var list = new List<int>();
...
// find an element larger than 10
list.Find(x => x > 10);

You can also make this mistake:

var printers = new List<Action>();
...
foreach (var item in list)
{
  printers.Add(() => Console.WriteLine(item));
}
Parallel.Foreach(printers, p => p())

An innocent reader might think this prints all the items in list, but actually this only prints the last items list.Count times. This is how closures work. This happens because the item referred to in the closure is not a new copy of item, it’s actually the same item that’s being modified by the loop. A workaround is to add a new temporary variable like this:

foreach (var item in list)
{
  int tempItem = item;
  printers.add(() => Console.WriteLine(tempItem));
}

And in Java? Instead of closures, one uses anonymous classes. In fact, this is how they are implemented under the hood in C#. Here the same example, in Java:

for (Integer item : list)
{
  final int tempItem = item;
  printers.add(new Action(){
    public void doAction()
    {
      // can't reference item here because it's not final.
      // this would have been a compilation error
      // system.out.println(item);
      System.out.println(tempItem);
    });
}
...

Notice it’s impossible to make the mistake and capture the loop variable instead of a copy of it, because Java requires it to be final. So … less powerful perhaps than C#, but more predictable. As a side note, Resharper catches the ill-advised capturing of local variables and warns about it.

I myself rather prefer the magic of C#, because it does save a lot of the trouble. Lambdas, properties, auto-typing variables… all these are so convenient it’s addictive. But I have to give Java a bit of credit, as the explicit way of doing stuff sometimes teaches you things that you just wouldn’t have learn cruising away in C# land.

Never use synchronized methods or lock on this

Especially when extending a 3rd party base class.

This is a known best practice, but when I read about it I natrually assumed I was smarter than the author of the best practice. The reason not to use synchronized methods (or lock(this)) is that other code might lock on your object too, thus causing nasty deadlocks.

I figured this wouldn’t happen because ‘who would just lock on my object, there’s no chance of that’. Well, this is obviously not safe, but especially so when extending a 3rd-party base class. In my case, I was extending log4j’s AppenderSkeleton, and found out the hard way that log4j obtains locks on the appenders.

The solution:

  1. Use a private lock object (duh), seperating your intended lock semantics from whatever evil outside code will use
  2. Stop assuming that I know best and ‘it will never happen’

Alt.net 2nd conference

I just attended my first alt.net conference (some would call it unconference). The story is about a group of 40 people that came to talk about … whatever they decided to talk about. The conference is self-organizing, with no predetermined lectures or lecturers, and with one healthy rule – if you don’t feel you are learning or contributing at the discussion you are currently having, you have to get up and find another discussion.

Here are some of the talks I attended (here is a semi-readable list of all the talks):

Aspect Oriented Programming

Usages other than logging, AOP frameworks.

Links: Cthru, Post#, Wicca.

Mocking/Stubbing

Reiterate the basic paradigm, emphasize on TypeMock. They are considering a UI tool adding to Visual Studio to help create mocks – meant for people just starting with mocking. The intended usage is:

  1. Write a test, without any mocking
  2. The test will usually fail because some deep class is not configured correctly.
  3. You will see the chain of calls that caused the exception, and be able to automatically generate a mock for any class in the chain.
  4. Rinse & repeat until your test passes

High Scale & Distributed Caches

The discussion focused around what I’d call medium scale – 2-10 nodes that used shared caches like memcached & Azure.

Multithreading

There was a comparison of Microsoft CCR and Parallel Extensions. It seems people still think of parallelization as simply utilizing all your cores, when it’s much more than that. Some applications benefit from multithreading even on a single core machine (think web crawler).

One interesting link – PowerThreading library (see this video for a demonstration of Asynchronous Programming Model using PowerThreading).

Don’t switch your lock object mid-lock

Today I encountered a weird exception from List.AddRange()

[Exception(System.ArgumentException)]:
Destination array was not long enough. Check destIndex and length, and the array's lower bounds.

Looking at the code, I saw it multithreaded like this:

List<int> tmp;
lock (_list)
{
    tmp = _list;
    _list = new List<int>(1);
}
 
// use tmp object like it's owned exclusively by this thread

The thinking here was to avoid “expensive” operations inside the lock, so we locked _list only for enough time to replace it with a new list, and then do the heavy lifting on the tmp variable (assuming no other threads can interfere). Most locks are used immutably – the lock object itself rarely changes identity.

Well, it appears the above code is indeed not thread safe. Here what I recon happened:

  1. Thread 1 acquired a lock on _list
  2. Thread 2 tried acquiring the lock, but blocked.
  3. Thread 1 changed _list to a new instance and released the lock on the old value of _list
  4. Thread 3 came and acquired the lock on the new value of _list
  5. Thread 2 was now released and acquired the lock on the old value of _list

At this point, both thread 2 & 3 got inside the critical section and grabbed the same value of _list (the new one) as their “exclusive” tmp, and then worked on it concurrently outside the lock. The easy solution is not to lock on an object whose identity (not state) changes. In this case, a lock either on the containing object or on a new custom object (object _lock = new object()) should do the trick.

This program reproduces the problem:

using System;
using System.Collections.Generic;
using System.Threading;
 
namespace tstlock
{
    public class LockTester
    {
        private List<int> _list = new List<int>(1);
 
        public void Run()
        {
            var threads = new Thread[40];
            for (int i = 0; i < threads.Length; ++i)
            {
                threads[i] = new Thread(ThreadFunc);
                threads[i].Start();
            }
            foreach (var thread in threads)
                thread.Join();
        }
 
        private void ThreadFunc()
        {
            for (int i = 0; i < 1000000; ++i)
            {
                List<int> tmp;
                lock (_list)
                {
                    tmp = _list;
 
                    // at this point _list is always supposed to be an empty list
                    // because all the additions to it are after the new list is allocated
                    _list = new List<int>(1);
                }
 
                var array = new []{1,2,3,54,5,6};
                tmp.AddRange(array);
                if (tmp.Count != array.Length)
                {
                    throw new Exception("Bug - tmp is not local as we thought");
                }
            }
        }
    }
}

Playing with a few ReaderWriterLocks in .Net

With Delver’s future clouded by a big question mark, I’m looking for a new job. This reminded me of my previous job hunt, and the part we all “love” about it – job interviews.

Last year, two companies I was interviewing for asked me to implement a ReaderWriterLock. It took me more time than I’d hoped, but I got the implementation done. Since this was one of the hardest interview questions (for me at least), I’ve decided to implement one or two versions, and to test the performance vs the built-in reader writer locks available in .NET.

First, I wrote the test code, which is roughly composed of:

  1. Shared (singleton) counters object with two counters, X & Y.
  2. Some reader threads. When a reader reads, it reads both X & Y and makes sure they are equal.
  3. Some writer threads, that increment X & Y together (thus maintaining X == Y).

Then, I tested this setup without any locks, and as expected the reader threads got different values of X & Y.

Next, I implemented the following locks:

  1. DummyReaderWriterLock – a horribly inefficient implementation that is reader/writer-oblivious. It just locks the object regardless of reads/writes.
  2. SemaphoreReaderWriterLock – already a huge improvment, this locks uses a semphore that holds some large number (~2000) locks. Each reader requests only a single lock from the semaphore, and a writer must obtain ALL locks before continuing. Two writers are prevented from deadlocking by a separate Mutex. One immediate problem with this lock is writer starvation. For a writer to obtain the lock, it must first get the mutex and then all locks in the semaphore. This means a single writer is competing against all readers for the semaphore.
  3. EventReaderWriterLock – implemented by two events and a reader count. The first reader & all writers must get signaled in order to get the lock. Once any reader got the lock, other readers are free to enter without blocking. The last reader out is responsible for signaling the event and letting other writers or readers back in.

I also pulled my team leader Oren into this problem, and he came up with a state based implementation – not that far from my own, with an additional “state” integer that represents whether the lock is in “writing mode”, “reading mode” or free. He also added a few TestAndCompare hacks for checking the state.

Finally, I tested the performance of these locks vs two locks that are availble in .NET 3.5: ReaderWriterLock and ReaderWriterLockSlim. I only tested only one scenario, but was pleasantly surprised to discover the performance of both my EventReaderWriterLock and Oren’s StateReaderWriterLock were identical to that of ReaderWriterLockSlim, and better than ReaderWriterLock! (The performance of DummyReaderWriterLock were literally off the charts).

ReaderWriterLock Performance

Here is my implementation:

    /// 
    /// A ReaderWriter lock implemented by events.
    ///
 
    /// An AutoReset event that gives ownership of the lock either to one writer or to all the readers.
    /// A manual reset event that allows readers to enter, and is only reset when the last reader finishes
    /// 
    /// 
    public class EventReaderWriterLock : IDisposable
    {
        /// 
        /// Both readers and writer need this lock to work
        /// 
        private readonly AutoResetEvent _lockAvailble = new AutoResetEvent(true);
 
        /// 
        /// Further readers beyond the first one wait on this object
        /// 
        private readonly ManualResetEvent _canReadEvent = new ManualResetEvent(false);
 
        /// 
        /// Used to synch the reader lock/release
        /// 
        private readonly object _readerLock = new object();
 
        private int _readers;
 
        public void LockReader()
        {
            lock (_readerLock)
            {
                int oldReaders = _readers++;
                if (oldReaders == 0)
                {
                    // I'm the first reader, let's fight for the lock with the writers
                    _lockAvailble.WaitOne();
 
                    // got lock, notify all other readers they can read
                    _canReadEvent.Set();
                }
                else
                {
                    // wait for the first reader to signal me
                    _canReadEvent.WaitOne();
                }
            }
        }
 
        public void ReleaseReader()
        {
            lock (_readerLock)
            {
                int oldReaders = _readers--;
                if (oldReaders != 1)
                {
                    // If I'm not the last reader, I do nothing here
                    return;
                }
 
                // I'm the last, let's forbid other readers but allow writers or a first reader.
                _canReadEvent.Reset();
                _lockAvailble.Set();
            }
        }
 
        public void LockWriter()
        {
            _lockAvailble.WaitOne();
        }
 
        public void ReleaseWriter()
        {
            _lockAvailble.Set();
        }
 
        public void Dispose()
        {
            _lockAvailble.Close();
            _canReadEvent.Close();
        }
    }

And the entire zip with the other RWLocks and test harness.

A few immediate conclusions:

  1. Writing a working reader writer lock is a very non-trivial problem for a job interview – but watching an applicant struggle with it can give you insights about his know-how around multi-threaded code.
  2. Writing an all-purpose RWLock seems like a daunting task. In my exercise I specifically avoided broad considerations such as fairness & readers upgrading to writers. Testing it for all end cases seems almost impossible (though some formal theoretical tools exists for such correction proofs)
  3. At least for some problems, writing your own lean solution can be better (performance wise) than relying on the de facto standard. While our solutions weren’t better than ReaderWriterLockSlim, they were significantly better than ReaderWriterLock – again, only in the context of this test harness.

Bonus: my first implementation didn’t have a lock statement in LockReader() and ReleaseReader(), and used Interlocked.Increment() and Decrement() to update the _readers variable. Still, it contained a hidden bug – can you find it, and understand why the lock is necessary?

Unhandled Exceptions Crash .NET Threads

A little something I learned at DSM today. It appears if any thread in .NET crashes (lets a thrown exception fly through the top stack level), the process crashes. I refused to believe at first, but testing on .NET 2.0 showed it to be true:

(I should really switch to another blog platform, I didn’t find a decent way to write code in Blogspot).

class ThreadCrashTest
{
  static void Main()
  {
    new Thread(Foo).Start();
    for (int i = 0; i < 10; ++i)
    {
      Console.WriteLine(i);
      Thread.Sleep(100);
    }
  }
 
  private static void Foo()
  {
    Console.WriteLine("Crashing");
    throw new Exception("");
  }
}

According to Yan, the behavior on .NET 3 is to crash the AppDomain instead of the entire process.