The Gender Genie: “Inspired by an article in The New York Times Magazine, the Gender Genie uses an algorithm developed by Moshe Koppel, Bar-Ilan University in Israel, and Shlomo Argamon, Illinois Institute of Technology, to predict the gender of an author. Read more about the algorithm at nature.com.”
I pasted some FmH passages of significant length into the algorithm; sometimes it gets my gender right but at times it tells me I “write like a girl.” The algorithm’s authors say it ought to be able to predict the gender of the author of a passage 80% of the time but Genie is candid enough to tell us that her cumulative accuracy is only 50.77% as of when I write this. [I don’t have to tell you that’s about as close to random as you can come in the real world…]
The algorithm depends on the difference between so-called ‘informational’ (categorizing) and ‘involved’ (personalizing) modes, essentially, which are thought of as quintessentially male and female, respectively (they are also thought of as quintessentially ‘nonfictional’ and ‘fictional’, which makes sense). It does a weighted count of what it considers “male keywords” (articles, “some”, numbers, and “it”) vs. “female keywords” (possessive pronouns and ” ‘s”, “for”, and “not” and “n’t”) and gives the passage a “male” or “female” score. Why would the online Gender Genie have break-even success when the original scientific paper gives the algorithm on which it is based an 80% success rate (when tried on over 500 English-language texts in a variety of genres)? Perhaps someone is messing with Genie’s mind (giving incorrect feedback) and/or the passages submitted so far are highly atypical. If it is being fed with largely web-based writing rather than text imported from meatspace, the material is probably overwhelmngly nonfiction or ‘informational’. Moreover perhaps even female writers on the web, being in general more technically and technologically adept, are more ‘informational’ than the norm. Having read more about the algorithm, I can now spot passages in my own writing it is more likely to think ‘girlish’. Try it out yourself.
