One of the things that is missing from Twitter is an indication of a person’s gender. When you register for Twitter, you do not indicate whether you are male or female, and there is nothing later on that directly identifies your Twitter account as being male or female1.
Nonetheless, it’s often an important question to ask what the balance is between males and females in various groups. And so we must rely on indirect techniques.
The approach I use is fairly straightforward. I first look at the first name of the person. Some first names are very predictive — it is unlikely that a Charles is going to be female, and so any Charles we see I will count as male.
Some names are androgynous (Chris? Lee? Pat?), and we have to look at the person’s profile for indications. People often say “wife of” or “father to”, etc. By looking for these key phrases in a person’s profile I can make an educated guess.
Even with all these techniques, in general I can classify people’s gender about 50% of the time. There are researchers out there who feel they can look at the wording people use in their status updates to make a better assessment of people’s gender, but I have not moved towards that approach yet. It may be true, but I suspect that they may just be building a predictive model for their existing data.
Given that we can only guess at half of the Twitter users, what do we do about the half we cannot guess? At this point, I mostly ignore them.
No matter what, estimations of gender on Twitter will always have a caveat: The statistics apply to those who are identifiably one sex or another only.
1 I appreciate that gender is more complicated than male/female; unfortunately I don’t think I can do much about it in this case.