Stylometry - Situation or Science Fiction?
Your writing style is like your fingerprint. And I'm not talking about your handwriting here. Do we still write by hand anyway? I'm talking about your unique way of writing. Your choice of words, the way you formulate your sentences, the use of punctuation. Your language and style errors, or the absence of them. To the point or rather elaborate? Identification based on your writing style DNA is coming closer very quickly and is already being applied in practice, including in forensic analyses. Parts of Shakespeare's works were not written by himself, some stylometrists - writing style analysts - claim. True or not, who knows? Still, the development is moving very fast, driven by AI, artificial intelligence. Do you write under a pseudonym because of your sharp pen? Then you'd better watch out. It is probably only a matter of years before stylometrists, armed with a few Terabytes of AI, will figure out, with razor-sharp precision, who you are, what you write and, just as frighteningly, imitate you almost seamlessly.
Stylometry
To a certain extent, we can all recognize different writing styles. A cheerful, sad, sweet or angry text from a friend or a loved one reads differently, compared to a newspaper article, a letter from the bank, or a bailiff's writ of execution. Quite likely, we may also be able to distinguish the different styles of our favorite writers. It becomes more difficult to identify particular patterns in texts from complete strangers. Education level, age, gender, perhaps, we may still make a pretty fair guess. But beyond that? Would we be able to link a given piece of text more specifically? To a certain group? How specific can we make that group? Or even to a single person?
That is precisely the dream of stylometry, the analysis of writing styles. For a very long time, stylometry was a labor-extensive craft. Forensic stylometry has already gained fame in the past due to a number of decisive breakthroughs in criminal cases. Sometimes such breakthroughs took months of analysis. Artificial intelligence is rapidly developing, giving an entire new dimension to writing style analysis. A dream for stylometry experts. A nightmare for others. And certainly not only for criminals.
Applied stylometry
It only takes a certain dose of imagination to come up with and grasp the countless applications of stylometry. Analytical variants described here are identification of criminals and verification of the authenticity of documents. The synthetic variants, the imitation of stylometric profiles of others, are also developing very quickly, often with criminal purposes. Here too, through the use or, better said, misuse of artificial intelligence.
Identification of criminals
Criminals are nowadays smart enough not to send handwritten notes. However, their writing style can flawlessly reveal their identity. Theodore Kaczynski -1) (nicknamed 'The Unabomber') sent letter bombs between 1978 and 1995 to a large number of opponents whom he 'suspected' were strong supporters of modern technology. He himself was a promoter of an anarchist society. In 1995 he wrote his manifesto 'The Industrial Society and its Future'. When the FBI analyzed the 35,000-word document, stylometrists were able to compile a fairly clear profile. On the basis of this, Kaczynski was arrested and convicted. It was a bonus that they found an exact copy of the manifesto at his home. Nowadays, there are known cases in which authors are identified on the basis of just a few lines.
Author identification - other reasons
Author identification does not always have to be aimed at tracking down criminals, although the dividing line here is vague. Up to what point does freedom of speech still apply and where is it violated? And, especially in countries with less self-evident freedom of expression, which jurisdiction applies? And, as a well-known Indian-British writer once found out, will these countries care about jurisdictions that apply outside their territorial borders anyway?
This is one of the (many) reasons for using pseudonyms. In this case, an author writes under a fictitious name and sometimes, but often deliberately not, can be linked to the real name. But here too, the marriage of convenience between stylometry and artificial intelligence throws a wrench in the works. The message is clear. The anonymity of the pseudonym will sooner or later be over.
Authentication verification of documents
The Vrije Universiteit Amsterdam (Free University of Amsterdan - VU) mentions authorship determination as the most important application of stylometry. The relevant web page of the VU -2) seems somewhat dated to me, although authorship determination is, obviously, enormously important from an academic perspective. Consider, for example, the authenticity determination of theses, identification of plagiarism in scientific research and the outsourcing of writing work to so-called ghost writers (people who write as 'writing slaves' in the service of others who are too lazy or incompetent to write themselves).
Too intelligent for artificial intelligence?
'Computers will never win over the human mind' is the big mistake that we as humanity make, time and again, in our unbridled arrogance, until we are put in our place, every now and then. The chess computer Deep Blue, which wiped Garry Kasparov off the board in 1997. The Israeli Waze, which made map reading redundant in 2008. Commonly used self-driving cars are only a matter of time. And so is the stylometric computer: write an A4 and I know who you are.
There are always those who try to outsmart 'the computer'. With varying degrees of success. Gari Kasparov was punished mercilessly when he thought he could mislead Deep Blue. And there are also experiments with authors who try to trip up the stylometric computer program, for example by deliberately making language errors or building in crooked sentence structures. But modern artificial intelligence can pick this up and even add characteristics to the 'author DNA'. Authors seem to have their own personal preferences for making 'stupid mistakes'. 'Everything you say will be used against you' thus takes on a new dimension.
Synthetic stylometry
If we think a little deeper about the possibilities of synthetic stylometry, it all becomes very scary. Suppose a criminal could steal my entire email and text files. With a decent portion of computer capacity it should not be that complicated to analyse my communication style, possibly per addressee too. After all, I communicate differently with my wife than with the tax authorities. If that criminal is going to use my email address, my text number, my social media and my stylometric analysis to send messages in my name, and is even able to send realistic answers to any questions, then it all suddenly looks very scary. Can you feel, where the real danger of stylometry lies?
Privacy aspects
The last word has not yet been said about the privacy aspects. If personal writing characteristics are identifiable, they can also be recorded. What rules will apply to this? The same rules as for storing fingerprints and DNA? What exactly will be recorded? What is the evidentiary value of a 'match'? What conclusions will be drawn from the data? Mrs. M uses the expression 'I'll make mincemeat of you' twice in a text of 700 words. Does that mean that she fits the profile of a serial killer?
So now, what?
The message is to stay sharp! Keep being aware of what you write, as the time of the anonymity of the pseudonym will come to an end. Play it straight, if you haven't already, because plagiarism is no longer going under the radar. And arm yourself against abuse of your stylometric profile by malicious people. The consequences of losing control over your email, your text and your social media will only get worse. Protect these, like you protect your own home.
Footnotes
1) https://www.youtube.com/watch?v=-EVXdhstw14&t=27s - This is a link to YouTube material published by Erica Brozovsky, who owns all copyrights. Use of this material must comply with all YouTube terms.
2) https://www2.fgw.vu.nl/werkbanken/dighum_nl/strategieen/tekstanalyse/stylometrie.php?form=MG0AV3