1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Posts tagged ‘Programming’

git-svn made easy

For the last year, my main usage of git was for my own personal projects – rather basic stuff, consisting of simple commit/push/pull operations. Recently, I wanted to edit some code on the OSQA project, which is unfortunately hosted on SVN. I am not a committer (yet), so if I wanted my work to be source control, I actually had no clear option except using git-svn.

It took me some time to get started, I find that there are still some gotchas that can surprise you if you’re new to git or git-svn. Luckily I stumbled across this lovely series of screencasts. Thomas walks you through the basics, and showcases some more advanced use cases as well. I highly recommend it! (I subscribed to his blog as well)

The order of columns in GROUP BY statements matters on mysql

Take a look at this SQL query:

SELECT COUNT(*) AS cnt
FROM products
WHERE ExternalProductId IS NOT NULL
GROUP BY SourceId, ExternalProductId
HAVING cnt > 1

This query scans the products table and find all duplicates, where a dup is defined as having the same (SourceId, ExternalProductId) as another row. Counter to intuition, it turns out that in mysql, the order of the columns in the GROUP BY statement can make a huge deal, performance wise. Let’s assume we have an index on the pair (ExternalProductId, SourceId). The query should be fast, right?

Wrong. It takes 30 minutes on our sample data set (about 30 million rows). An EXPLAIN query, and SHOW PROCESSLIST revealed that mysql was copying the table or index data to a temporary location before starting to process the actual query. This was taking up most of the execution time.

A quick question to Stack Overflow…

It appears the order of the columns makes all the difference in the world. Switching the GROUP BY columns to (ExternalProductId, SourceID) made the query run in place and not copy any temp data whatsoever, resulting in execution time of 30 seconds instead of 30 minutes! I don’t fully understand why mysql takes the column order under consideration – semantically, the order of the GROUP BY columns doesn’t matter, so it’s a matter of a simple optimization to choose the optimal order.

Java Puzzle – spot the bad code

What’s the most important fault in the following java code (thanks to Roman for both writing and finding the bug :) )?

public class Worker extends Runnable {
...
 
    @Override
    public void run() {
        while(true){
            synchronized (emailMessages){
                try {
                    while(emailMessages.isEmpty()){
                        emailMessages.wait();
                    }
                    mappings.saveMultiple(emailMessages);
 
                }catch (InterruptedException e) {
 
                    Thread.currentThread().interrupt();
                    throw new RuntimeException(e);
 
                } finally{
                    emailMessages.clear();
                }
            }
        }
    }
}

Lock Freedom is overrated (sometimes)

Locking and synchronization is expensive in a multi-threaded environment. When you’re writing a component that should scale to 10K concurrent requests, all your data structures must be lock-free, right?

Wrong.

It is true that if you have a single object that is synchronized in every request, locking that object will severely impact performance when concurrency goes up. However, you should analyze the actual usage pattern and data structures to determine the actual contention.

For instance, suppose you are writing a messaging application, where clients write and read from their respective inboxes. The application as a whole is meant to support 10,000s concurrent clients. The supported operations are writing an object to someone’s inbox, reading objects from a specific inbox, and subscribing to notifications. With this performance profile, it’s seems like a good idea to make the global data structure that maps client’s IDs to inboxes lock-free, otherwise concurrent searches to find a specific inbox will block one another.

However, if any specific inbox is not meant to carry a significant load, there is no harm in implementing the Inbox data structure (that is responsible for one specific Inbox/Message Queue) without restricting yourself to lock-free data structures. Locking when there is no contention is not an expensive operation!

Writing and using lock-free structures is usually harder than locking, except for simple scenarios. In order to maintain class invariants between the different data structures you use, sometimes a lock is simple and effective. You should be familiar with your language’s lock-free options, but don’t be swayed by the hype and “coolness” of lock-freedom, and use it when it’s actually required or where it saves you code. If it complicates your design, try to defer it until you’re convinced you actually need it.

Dealing with Version Branches

At Delver, like many other places, we use version branches to maintain releases. We have one trunk where everything gets integrated, and when we want to stabalize a release version, we create a version branch and do a ‘code freeze’ on this branch. On the version branch, only bug fixes are committed, and no new features are developed.

This process helps us stabilize versions within a matter of days and proceed quickly from trunk to QA to Production.

A problem we experienced with this process was it was hard to make sure all bugfixes were properly merged to trunk. The commonplace practice is to merge all changes from the version branch to trunk at the end of the release cycle (when the version is “frozen”).

There are at least two problems with this approach:

  • First, it is usually one person who is left the ugly task of doing the merge of all these bugfixes that weren’t previously merged, usually in files he didn’t touch and knows nothing about.
  • Second, if there are bugfixes on the version branch after the version is frozen, what is to guarantee they will reach trunk? The sad answer  is “nothing” – we had several regression bugs because people forgot to merge their bugfixes).

Here is our NewAndImprovedProcess™ (as implemented by our very own Sergey Goncharov):

We setup a TeamCity build that monitors all version branches. It runs nightly, and examines the mergeinfo SVN property on all modified files. If it detects files that were committed to the version branch but weren’t merged to trunk, it fails and send an email to the responsible developer. A little convenience feature we sprinkled in was that if you have a change set that you really don’t want to merge to trunk, you can write in the commit note “[NO MERGE]” and the build will ignore this commit (you can also do a ‘recorded merge’, which is the proper SVN way of doing it, although  adding the commit note is faster in a quick-and-dirty way).

Launching Kuzando – a simple task management website

After a few alpha versions, I finish coding the basic features of Kuzando, a simple task management website based on post-its. It is a combination of a calender and a todo list, which doesn’t currently exist in other similar websites.

The code is hosted at Github, and the issue list at Google Code. Also, note that there is a guest account if you want to play with the system (Thanks Anna for the idea).

We currently have 100% user satisfaction, which is to say that Aya (my fiancé, and the one user whom Kuzando was tailored for) is actively using it to plan her schedule at the lab. You’re invited to do the same – if there are some missing features that would make your usage of Kuzando more pleasant, let me know.

On the importance of communication in the workplace

When I just started learning how to program, I thought programming as a profession was about code, design and algorithms – after all, these are the courses that are taught in university, so this must be the most important skills for a programmer, right?

Wrong.

Over the years, I have discovered that effective communication is at least as important, if not more, than the more technical aspects of a programmer’s work (this might be true for other fields, but I can’t really comment about that).

Interpersonal – Top on this list is “being a good and friendly person”. We spend most of our waking hours at work, and progressively fewer of this time is spent on the computer coding. Therefore, you want the people you work with to be … the sort of people you’d like to spend time with (a little self-defining here, but still). Working with someone who’s rude, inconsiderate, or just plain not nice can make the best workplace in the world into the worst.

Responsibility - people you work with, whether team mates, managers or other colleagues, are all a key instrument to your success at your own tasks. Today’s high tech company is a highly chaotic and fast paced environment, in which you are rarely working on some research project all alone in your basement. More often, you are cooperating with other people to create something larger, starting from product managers, tech leads, QA, managers and support. You depend on them for the successful delivery of your work. There’s nothing more annoying than not being able to finish a feature just because some crucial link in this chain neglected his responsibilities.

Email/Face to Face – At most modern organizations, you send and receive dozens of emails every day. Email is very convenient – you can send it without ever leaving your desktop, it’s saved and archived and you use it to reach several people at once. It is also a trap. Some things are better not handled through email or Skype! I cannot stress this issue enough. Countless time have I seen an important process stuck because it was handled via email, and the sender wrongfully assumed that this was enough to make the recipient take the required actions.

When you have something urgent or important, it’s most effective to accompany or replace the email with a face to face conversation. Sometimes all you need to say is “I’ve sent you an email about yada yada, please see the details there because I need it for this and that”. This helps focus the recipient and ensures he’ll take the necessary actions. Another point worth saying about email, is that you should adjust the form of communication to your recipient. Some people can be counted on to reply to most emails within an hour, and to never skip an important email, while others must be ‘nagged’ to repeatedly. If you use Outlook, you should use its reminder system to make sure the emails that are important to you (both outgoing and incoming) get handled.

Avoid long email threads and large recipient lists at any cost! These can serve as announcements, but as soon as such emails turn to discussions, people just tune out and the thread becomes an exercise in uselessness. Identify these wasted keystrokes and instead solve the issues by talking to the key people. If the other party tries to continue a useless email thread, just reply “I’d rather discuss this face to face, it’ll be shorter” for all of us.

When I was young, nobody told me programming was a “people profession”, but the truth is that it really is. What is your opinion?

WarClassLoader – load your classes straight from a war

I was of need of a Java Class Loader that can load classes from within a war file.
It took me a bit to find this – in fact, I found it when I looked for ZipClassLoader (a war file is a zip with a specific folder structure). I found this post, and modified it to take the war folder structure into account:

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
 
/**
 * Loads classes from war files. See http://www2.java.net/blog/2008/07/25/how-load-classes-jar-or-zip
 */
public final class WarClassLoader extends ClassLoader {
    private final ZipFile file;
 
    public WarClassLoader(String filename) throws IOException {
        this.file = new ZipFile(filename);
    }
 
    @Override
    protected Class<?> findClass(String name) throws ClassNotFoundException {
        ZipEntry entry = this.file.getEntry("WEB-INF/classes/" + name.replace('.', '/') + ".class");
        if (entry == null) {
            throw new ClassNotFoundException(name);
        }
        try {
            byte[] array = new byte[1024];
            InputStream in = this.file.getInputStream(entry);
            ByteArrayOutputStream out = new ByteArrayOutputStream(array.length);
            int length = in.read(array);
            while (length > 0) {
                out.write(array, 0, length);
                length = in.read(array);
            }
            return defineClass(name, out.toByteArray(), 0, out.size());
        }
        catch (IOException exception) {
            throw new ClassNotFoundException(name, exception);
        }
    }
}

Two notes:

  1. This code isn’t really production grade, as the streams aren’t closed and whatnot
  2. It is missing one critical addition – if the classes in the war depend on the jars that are included within it, this class loader will not be able to load these classes. I managed to work around the problem because I already had all those jars in my classpath anyway, but in principle the code above needs a bit of extension to be able to work with the embedded jars.