A Web-Forum Free of Disguised Profanity by Means of Sequence Alignment 1

Profanity is the use of offensive, obscene, or abusive voca-bles or expressions in public conversations. A big source of conversations in text format nowadays are digital media such as forums, blogs, or social networks where malicious users are taking advantage of their ample worldwide coverage to disseminate undesired profanity aimed at insulting or denigrating opinions, names, or trademarks. Lexicon-based exact comparisons are the most common filters used to prevent such attacks in these media; however, ingenious users are disguising profanity using transliteration or masking of the original vocable while still conveying its intended semantic (e.g. by writing piss as P!55 or p.i.s.s ), hence defeating the filter. Recent approaches to this problem, inspired in the sequence alignment methods from comparative genomics in bioinformatics, have shown promise in unmasking such guises. Building upon those techniques we have developed an experimental Web forum (ForumForte) where user comments are cleaned of disguised profanity. In this paper we discuss briefly the techniques and main engineering artefacts obtained during the developing of the software. Empirical evidence reveals filtering effectiveness between 84% and 97% at vocable level depending on the length of the profanity (with more than four letters), and 86% at sentence level when tested in two sets of real user-generated-comments written in Spanish and Portuguese. These results suggest the suitability of the software as a language-independent tool.


Introduction
An essential feature of Web2.0 digital media is their ability to crowdsource user-generating-content, motivating collaboration and mutual construction of scenarios so as to yield richer user experiences [1].An illustrative example is digital forums and blogs, where multiple users generate written comments about their own or other's opinions.Unfortunately, some users abuse of this freedom of speech for inappropriate purposes such as insulting, degrading, or boosting opinions, participants, brands or any other concept by means of offensive or obscene language.For these reasons, usually this kind of digital services must be moderated by website administrators in order to guarantee profanity-free user-generated text content.
Lexicon-based filters, which screen text against a blacklist of forbidden terms are a naïve moderation tool.A weakness in these filters is that they carry on exact comparisons missing variants with involuntary typos or misspellings, or more worryingly, variants disguised with transliterated or masking symbols written deliberately to circumvent the filter; the resulting variants still visually convey the actual meaning of the profanity.
Take for example the vulgar slang term piss transliterated as P!55 or masked as p-i-s-s or worse still, a combination of both, P-!-5-5.Any of these attacks would easily defeat a literal comparison filter, but the message would still be clear for most readers.It is evident that the number of guises of this type grows combinatorially in size; thus, the lexicon-based approach is impractical.The anomaly is illustrated in Figure 1.
Similar anomalies have been identified in many other digital platforms; [2], [3] and even more, have been characterized as a security threat [4].Recent approaches to tackle this problem taking inspiration on bioinformatics techniques for sequence alignment of genomes from different organisms have shown promising results [5], [6].Building upon those results, we have developed an experimental profanity-safe Web forum (ForumForte).Our software was conceived as a concrete application of our previous results on the problems of revealing masked terms in spam email [5], automatic evaluation of fill-in-theblank questionnaires [7], and automatic syntax verification of short blocks of code in programming languages [8].An in-depth technical description of the filtering mechanism would be reported in a forthcoming paper.In the following sections we describe the software, the technology behind it, and a proof-of-concept of its potential application for content moderation in mainstream applications such as newspaper forums and micro-blogging media platforms.This is an extended version of a short paper recently published in the Proceedings of the 10th Colombian Computer Conference (10CCC) that was held in Bogotá on September, 2015 (see [9]).We note in passing that ForumForte is distributed as free software under the New BSD License and is available online or for download at: http://tinyurl.com/ForumForte.

Similarity Trees of Disguised Profanity
As it was observed before, the lexicon-based (exact comparison) approach against profanity disguising is not practical.A similar comparison difficulty was faced by bioinformaticians some decades ago in the field of comparative genomics, where the goal was to find common genetic motifs between different families of species.The "text" in that case was regarded as the sequences of DNA and protein molecules from the genome and proteome of living organisms [10], [11], written in an alphabet of letters representing the initials of their molecules (in the genome case {A, G, C, T} for (A)denine, (C)ytosine, (G)uanine and (T)hymine).Different organisms would have different genomes but when the sequences are aligned, similarities between sub-regions (genes) are found, except for a few places that differ.The small variations are due to mutations that insert, delete, or substitute one molecule or the other.The mutation may imply a change on the function of the phenotype that the gene codes for.
An example of sequence alignment is shown in Figure 2, where a gene whose phenotype is expressed during the synthesis of vitamin C is depicted for six species of mammals.The resemblance of the sequences is striking, although some differences, most surely due to mutations, are highlighted.As it can be seen, the last exon of the gene is very similar among cows, dogs, and rats, whereas it differs in one molecule deleted across humans and great apes.It is known that the former group can make their own vitamin C whereas the latter group has to take it from diet.The evolutionist assumption that explains these variations in the genome is that mutations occurred during millions of generations of descendants from a common ancient genome, yielding the diversification of different species.Such diversification can be depicted as a phylogenetic tree, where branches grow every time a preserved mutation happens.In the previous example, the tree depicting mutations in the last exon is shown in Figure 3 (mutations highlighted in red).The genetic code of the rat is placed as the root, meaning that it would be the common ancestor of the six species (variants).From there, two branches diverge: one branch goes through the dog (substitution, A → C) down to the cow (another substitution, different location, C → T); the other branch goes to an unknown middle ancestor (a deletion, C) from where two branches open up, one for the orangutan (substitution, G → A), and one for humans and chimpanzees (substitution, G → T).We adapted the idea of phylogenetic tree diversification to the profanity disguise anomaly described earlier.That is, we assume that the guises of a profanity vocable grow down in a similarity tree from a common ancestor (its canonical text) to the variants obtained by recurring application of edits or corrections made on the predecessors (see Figure 4, edits highlighted in red).Instead of keeping all the possible trees whose depth increase combinatorially with all possible edits, the idea behind our profanity detection mechanism is to trace back the disguised variant up to its common ancestor via classical sequence alignment algorithms [10], [11] or alternatively, using approximate string matching algorithms [13] which were independently developed by computer scientists for the same purpose.The key concept of these algorithms is the edit distance between two texts [14], which is the number of character corrections (substitutions, deletions, or insertions) that are needed to transform one text into the other.A special-purpose distance would be designed to correctly account for the edits that yield the profanity guises, as we shall describe later.The approximate sequence matching algorithm (see e.g.[10]- [12]) carries out a pairwise comparison of the characters in the two sequences (the user-generated text and the canonical profanity vocable), while accumulating the number of edits (insertions, deletions, or substitutions) needed to transform one sequence into the other.The essential insight for detecting transliteration is to overlook the substitution of visually "twin" symbols (e.g.substituting 'o' by any of {0, °, ó, ò, ö, ô, Ø, θ, O}, see Figure 5), whereas for masking, the insight is to overlook the insertion of bogus segmentation characters such as { ., * , ~ , ¦ , -, _ , : , ; , ,""}.These couple of edits should add no value to the distance (or difference) between the two sequences, whereas edits such as deletion, insertion or any other substitution should count.Hence, the design of the edit distance between two symbols in their respective sequences is outlined in Figure 6, where d indicates if the edit counts or not in each of the cases mentioned above (this function was originally introduced in [5]).

Figure 5. An excerpt of the lists of twins substitutions
Source: authors' own elaboration

Software Design
We embarked on the development of ForumForte as a test-bed to verify the robustness of the filter mechanism described above, within an easy-to-use open forum where comments can be written about particular topics; no censorship is carried out, nor personal or usage information is collected.Simply put, Forum-Forte is a forum wall that screens comments against profanity; when profanity or disguised profanity is detected, the corresponding fragments are overwritten with a mask of asterisks, and both the original and filtered text sequences are posted to the wall.This software was designed as a Web application based on the software architectural MVC pattern [15] and the Java EE platform [16].In the following, we shall describe the most representative design artifacts obtained during its development.
Let us start by summarizing the use-case scenarios for the software (Figure 7).There are two kinds of forum users: visitors and administrator.A visitor can inspect the fora pages as well as their contents, filter them by topics and indeed, post a comment in which case the profanity filter mechanism is activated and detection statistics would be collected.Lastly, he can download files for installation and upload a JSON file for batch processing of comments, as we will explain later on.
The other type of user is the administrator.This user has a password-protected account which allows him to carry out basic maintenance tasks such as forum and subject creation, elimination, cleaning, password update, etc.His other usages are related to the filter mechanism: updating the profanity canonical lexicon, looking at the performance statistics per forum or profanity vocable, and fine-tune the profanity tolerance parameters.These tolerances refer to the maximum edit distance for which two text sequences can be considered as equivalent (a value τ ∈ {0, 1, 2, 3}).The structural model of the software was designed as an MVC-based class diagram organized in three packages, namely business, model, and controller (see Figure 10).The business package includes classes Forum, Admin, Filter and Bean; these classes implement the logics of the functionalities previously described.On the other hand, the model package encapsulates ForumBoard, Subject, Threshold, Transliteration, Profanity; these classes are responsible of managing the visualization of forum comments and user interface.Finally, the controller package consists of classes Controller, Login, Log4jInit and Admin-Control; these classes control interaction with both kinds of users.Detailed views of this general model are also available.Next we discuss briefly the persistence model of the software, which was implemented as a relational database using the Java Persistence API framework that carries out the mapping from the structural model.The corresponding ER diagram is shown in Figure 11, consisting of tables: forum, subject, comment, filteredComment, profanity, and tolerance.The latter were included because the administrator may fine-tune the edit-distance tolerance per profanity term, and therefore statistics per tolerance are collected.

Datasets
Two real-life datasets of user-generated-comments were fed to the software in order to test its effectiveness.The first dataset is a collection of 300 user comments (in Spanish) from the publicly-available news forums of a Colombian newspaper, traced during a time frame of 25 months (01-Jan-2011 to 31-Jan-2013).Every message in this dataset contains profanity that was not blocked by the forum filter.
A wide assortment of comments was included in this dataset: regarding word length, 58% (176) comments are shorter than 30 words, 25% (74) are medium-long, between 31-50 words, and the remainder 17% (50) ranges between 51 and 150 words.The majority of comments include only one use of profanity (76% or 227 comments), followed by two uses of profanity (20% or 61 comments); there are extreme cases such as a comment containing seven occurrences of swearing, and cases where the full comment is precisely one swearing term.The dataset contains 9293 words in total, 537 of which are swearing (5.8%).
Associated with this dataset, we also gathered a lexicon of 60 Spanish swearing terms, varying in length, the mode being 4-6 letters long (60%), with extremes being three 9 letter-long and one 2-letter long words.Excerpts of the datasets are illustrated in Table 1 and Table 2.

Source: authors' own elaboration
The second dataset consists of 2500 user-generated-comments written in Portuguese, extracted from a sports news website.This dataset was previously made publicly available in [17].In there, human annotators inspected the dataset and identified 521 messages with some use of profanity, either disguised or not.However, as we shall highlight next, ForumForte was able to discover overlooked comments with occurrences of swearing.The blacklist lexicon for this dataset contains 40 base profanities in the Portuguese language.Recalling the ample differences between Portuguese and Spanish in written aspects such as the use of special diacritics (e.g."ã," "ç," "ê") and digraphs (e.g."lh," "nh"), this dataset may provide interesting information about the potential of the software in different language-speaking communities.

Online Release and On-Site Installation
ForumForte is installed an available online in http://tinyurl.com/ForumForte(last visit: April 15th, 2016).The Web application works on any Web browser (Firefox, Chrome, Explorer, and Safari).Alternatively, interested users can download and install the software in their own servers (user and installation guide are also available from the same URL).

Forum visitor usage
A visitor may browse the available fora by clicking the Foros option in the menu bar, which redirects to the page shown in Figure 13.Any choice here would display the list of subjects in each forum previously defined by the administrator.
Then, by choosing one of the subjects, the visitor would be taken to the actual forum page, where comments made by other visitors would be shown (see Figure 14).We remark that interaction with the forum is anonymous and no private or network access information is collected by the system.The layout of an actual forum is very intuitive and easy to use.Basically, there is a text box on the top of the page for the visitor to write his or her comment.The visitor can choose to either clean the current content of the text box, or to post the comment in which case, it would be screened by the filter engine.
Once the text is processed, the original and filtered text would be posted to the forum wall as the most recent entry (just below the text box).
Each entry consists of the filtered text sequence aligned over the original text.If one or many disguised profanities are detected, they would be overwritten with a mask of asterisks on the top line of the entry.Comments are kept in the wall in chronological order, most recent first.Lastly, the visitor may choose to display all comments in the forum, only profanity-marked comments, or only profanity-free comments (see Figure 15).

Forum administration
The administrator main page shows the dash board of Figure 16, where typical maintenance operations in forums, subjects, profanity lexicon, and statistics reports are carried out.For this purpose, the user should choose the Administración option in the menu page, and confirm his identity with a valid password.The software supports a unique administrator whose username and password can be updated at convenience using the Login choice.The remainder options aforementioned redirect to Web pages where the administrator can create, eliminate, or clean contents of the corresponding item.For the sake of illustration, a sample subject administration page is shown in Figure 17 where the clean up and remove commands are visible.The administrator can navigate to the actual forums pages and visualize the comments or participate in the discussion by clicking over the forum name.Similarly, the profanity lexicon module allows creating, removing, or tuning the transliteration tolerance of non-admitted vocables in the forum pages.In addition, the statistics module reports detection rates discriminated by forum or by profanity vocable (see Figure 18).The latter are furthermore broken down into individual rates per tolerance parameter, providing valuable information for tuning purposes with respect to particular vocables and transliteration attack patterns.We remark that forum statistics are computed instantly with its current comment contents whereas profanity statistics are historical, beginning at the moment they were created or assigned a different tolerance parameter.Statistics are lost when the associated item is removed.ForumForte features an interesting interface to apply its filtering mechanism to external sources of user-generated text.This feature takes advantage of the JSON format for content extraction and storing provided by the Twitter® social network through its publicly-available API.This digital media platform is essentially a worldwide community forum for free short text messaging (comments no longer than 140 characters, known as tweets) with no moderation, and therefore a real-world practical scenario to test our development.

External input
In order to try this feature, the user should prepare an input file with a JSON format.Such file can be obtained by logging in into the Twitter API console with a valid user account 5 and then extracting tweets for a chosen user profile or trend topic.The file can then be processed in ForumForte entering the specially-designed forum found in the path Foros→Filtrado de trinos.In contrast with the other fora, this one features two buttons to load the JSON file and to submit it to the filtering engine (see Figure 19), which would subsequently process each tweet in the file and post it to the forum wall as if it was originally typed in by a visitor.We highlight that by adhering to the Twitter JSON format, it is possible to filter user-generated text from other sources.

Experiments with the Spanish Dataset
A first experiment aimed at testing the detection effectiveness of ForumForte was initially conducted on the Spanish dataset described in Section 2.3.Profanity ground truth was obtained by manually labelling the occurrences of disguised swearing from the profanity lexicon observed in each comment within the dataset.This experiment was carried out by processing the dataset using the JSON external input feature and statistics module of ForumForte.The 300 comments were screened against each profanity item in the lexicon, while verifying correct detections and logging statistics.Notice that the goal of this experiment was to evaluate detection rates at the vocable level, that is, correct identification of instances of swearing terms, either plain or corrupted, in the whole dataset.
The resulting detection rates are reported in Table 5, which demonstrate the feasibility and promise of the method in real world scenarios.For the sake of easier illustration of the results, they are presented summarized by grouping profanities according to their character length.In each group, the tests were repeated for the values  = {0, 1, 2, 3} of the tolerance parameter.Grayed values in the table denote detection counters closest to the ground truth (values above the ground truth indicate occurrence of false positives).Source: authors' own elaboration These results reveal that in profanity of short length (m ≤ 6) a tolerance  = 0 is more effective.This may be explained because in short sequences, insertion or deletion of even just a single character would result in a not immediately easy to interpret variant of the original word (e.g.coo→coño, pQuta→puta); therefore attackers prefer to use substitutions to obtain profanity guises in these groups.On the other hand, medium or larger size profanity (m ≥ 7), require higher tolerances  = {1, 2} for better detection, since in these cases deletions and insertions are easier to interpret and thus attackers tend to use such edits rather than substitutions to disguise profanity.Lastly, a tolerance  = 3 is clearly not recommended in any group since it yields extremely high false positive rates.
In brief, we remark that the effectiveness of the software depends on the length of the profanity.For example, in lengths m = 4, …, 9 the detection rates are: 91%, 85%, 97%, 88%, 96%, 84%.On the other hand, the rates for shorter profanities (m = 2, 3) are less confident (100% + false positives, 76%), which may be explained due to the difficulty to corrupt or disguise the original term with such limited number of letters without affecting its visual recognition (for example, the Spanish swearing term "hp" would become illegible with just one deletion or one non-twin substitution).

Experiments with the Portuguese Dataset
The second experiment was aimed at further testing suitability of the detection mechanism of ForumForte for other languages different to the one for which it was originally designed (Spanish).For this purpose, we conducted tests on the Portuguese dataset described in Section 1.3.Experimental settings were analogous to those described in Section 2.2, except that in this case the goal was to evaluate detection effectiveness at the message level (because this dataset was originally characterized in that way [17]), that is, to correctly reject messages with any profanity content either plain or corrupted.Hence, since the experiment is a binary classification task, we report the results as a confusion matrix that is shown in Table 6.In order to simplify the analysis we set  = 0 for all the profanities in the lexicon.Let us first focus on the comments originally labeled as profanity.The software correctly rejected 85% (441 out of 521) of them and incorrectly accepted 15% (80 comments).A closer inspection on these 80 comments revealed that 20 of them contain variants of a two-letter long Portuguese swearing term (cú); again, this kind of really short profanities are harder to detect because the limited number of feasible edits to disguise them without losing their legibility, as it happened as well on the Spanish dataset experiment.Besides, without the accent, this particular sequence can be found as a syllable of many legitimate Portuguese words, thus an exact match would be better to achieve a more accurate detection.Notice that, provided that we overlook these 20 comments, the sensitivity of the software on this dataset would increase to 89%.Now, regarding the comments originally labeled as non-profanity, the experiment obtained a false positive rate of 4% (85 comments).In this respect, again a closer inspection was conducted and we found that 40 of them were possibly mislabeled in the original dataset, that is, they actually may be instances of Portuguese profanity (an excerpt of these findings are shown in Table 7).Therefore, we modified these labels so that now the dataset contains 561 profanity comments; subsequently we run the experiments again and collected the Christian Mogollón Pinzón, Sergio Rojas-Galeano results in the confusion matrix of Table 8.It can be seen that in the modified dataset the software improved its effectiveness, achieving a sensitivity of 86% and a false positive rate of 2%.This fact indicates that the software can also be used as a tool to corroborate annotation made by human moderators and even suggest overlooked cases for further inspection.Source: authors' own elaboration

Conclusions
Profanity disguising in user-generated text exploits the robustness of the human mind to visually interpret the semantics of a message overwritten with substitutions of twin symbols or insertion of bogus segmentations.Lexicon-based exact comparison filters, on the contrary, have limited ability to detect such variants.Here we have briefly described a technique inspired on algorithms for sequence alignment widely used in bioinformatics, and its implementation as a software prototype to prevent this anomaly on a Web forum.Our empirical study of this software indicates its potential applicability as a tool for content moderation in different communities, both at vocable or sentence level.Its detection (classification) mechanism is language independent and requires no training before use other than setting up a lexicon of plain forbidden terms.
There remain several issues that need further investigation in future work.One particular aspect is related to refining the mechanism so as to improve its effectiveness in detecting guises of very short profanities (sequences with three or less symbols), as in both of the tested datasets in different languages, these cases proved hard to identify.Adapting the lexicon or tuning the tolerance parameter  automatically by learning from changing disguising patterns adopted by the users is another interesting avenue of research (e.g., bagging filters with different tolerances or tuning the costs of different edit operations).Besides, deployment of the software on large-scale real-life content-generation environments, considering the associated algorithmic or computational issues for speed and concurrency, is also appealing.
Besides, it would be worth exploring further scenarios for potential application of the tool.Obvious choices are social networks such as Twitter, Facebook and Instagram.Offensive comments in these digital platforms may lead to more severe gender or psychological incidents of cyber-bullying, racism, or sexual harassment.Recent studies have shown that these are becoming prevalent and serious problems [2], [18], [19].On the practical side in this line of work, we anticipate our tool may contribute as a content pre-processor, for example to reconstruct corrupted comments (because of intentional disguising or involuntary misspellings) that can be then used as input for other information extraction and machine learning techniques for content classification, such as those described in [20].On the research side, such studies are now routinely relying on crowdsourced labelling of large scale datasets [18], [19], precisely because of the difficulty of single-handedly labelling such very high volumes of comments; our tool can help to validate, augment, or automatize the labor of human annotators so as to minimize the risk of overlooking positive cases, as it was shown in our study (Section 2.3).

Figure 1 .
Figure 1.Disguised profanity in a Colombian newspaper forum

Figure 2 .
Figure 2. Sequence alignment of genes coding for vitamin C synthesis in six species Human Chimpanzee Orangutan Cow Dog Rat

Figure 3 . 2
Figure 3. Phylogenetic tree of the gene exon coding for vitamin C in Figure2

A
Web-Forum Free of Disguised Profanity by Means of Sequence Alignment

Figure 4 .
Figure 4. Similarity tree of disguised variants of the offensive word BASTARD

Figure 6 .
Figure 6.Edit distance function used in the filter mechanism

Figure 8 .
Figure 8. Posting a comment activity diagram

Figure 15 .
Figure 15.Display options of forum comments

Figure 16 .
Figure 16.The administrator dash board

Figure 17 .
Figure 17.A sample subject administrator page

Figure 18 .
Figure 18.The performance statistics module

Figure 19 .
Figure 19.The specially designed forum for processing external JSON Twitter files

Table 2 .
Example of plain profanity from the Spanish lexicon

Table 4 .
Example of plain profanity from the Portuguese lexicon [17]ce: author's own presentation of data gathered from[17]

Table 5 .
Detection rates in the real-life dataset

Table 6 .
Confusion matrix for predictions on the Portuguese dataset

Table 7 .
An excerpt of profanity comments originally labeled as non-profanity

Table 8 .
Confusion matrix for predictions on the modified Portuguese dataset