Tag Archives: twitter

Supplemental: Online Morphosyntactic Innovations in Other Languages

After learning about Because-X, I was on the hunt to find more syntactic innovations that were unique to the internet. Specifically, I was looking for innovations in other languages. And I was able to find two articles for this week – one focusing on a Japanese structure, and one focusing on some innovations in Spanish. 

 

Kudasai & Making Requests

Kudasai is a Japanese politeness marker. We can consider it to be equivalent to English ‘please’. In standard Japanese it can follow verbs ending in -te (63). But people started to notice it being used in a new way online. In this nonstandard usage kudasai was following an imperative verb (A verb form used to command or to order, like “Sit!” or “Pass me the salt” in English). As in English, imperative verbs are used in specific situations, and are rude to use outside of those situations. A parent can tell a child “sit down!” but it’s not socially acceptable for a child to tell an adult “sit down”. So we have a very interesting context here of combining an imperative verb with ‘please’. 

This specific structure (called X-siro+kudasai, siro being the ending of an imperative verb) was being used to express indirect requests on various online forums (65). These requests could involve asking for someone to correct answers or give advice on buying a computer. A few examples were not indirect requests, but simply expressed the emotion of the speaker, such as the message “Please release a new game software rather than making the game into an anime” (66). Obviously, none of the forum users (as far as the poster was aware) were involved in developing video games, so this is more of a rhetorical request.

But why was this structure being used? After all, there are already two ways to express requests in standard Japanese (67). Naya suggests that these polite forms of making a request would be too formal for conversations on these online forums. There is a sense of camaraderie between the users, and using this formal form would feel out of place, and orient the speaker as more of an outsider (69). At the same time, the relationship between the speaker and the readers is ambiguous. The speaker doesn’t know which readers can fulfill their request, and going by general rules of conversation we should only make requests if we expect that the addressee can fulfill them (70). So using only the imperative form would feel rude, since it’s both flouting this social rule and it is placing the speaker in a position of authority where they are allowed to use such forms. X-siro+kudasai seems to be a compromise between the informality of the situation and the desire to still maintain some politeness (69). 

Within the x-siro+kudasai structure, Naya argues the imperative functions as a private expression which “expresses the mental state of the writer”, while the addition of kudasai makes the expression public (73). This aligns with the discussion from Supplemental: Because-X, where we explained that in Japanese expressions are private by default and require some sort of marking to become public expressions. However, kudasai is working slightly differently in this setting than addressee-oriented expressions normally do (73). Rather than marking the message as being intended for an audience which can fulfill the request, kudasai is marking that the message is being oriented toward readers (73). The sentence as a whole is meant to serve as an expression of desire, with the request being an implied secondary meaning (73-75). This lines up with how the imperative functions in other private expressions in Japanese – as a desire or a wish, rather than a command (74). Thus, it seems like the x-siro+kudasai structure is indeed another example of a private expression within a larger public expression. 

 

Spanish Innovations

The piece by De Benito Moreno covered three different types of new morphosyntactic (word construction and sentence construction) innovations within Spanish-speaking online context. 

The first is the extension of suffixes, specifically -í. In Standard Spanish, -í is a diminutive and affectionate prefix that can be added to proper names, common nouns, and adjectives (15). Examples include Marí (from María), papí (from papa, father) and rubí (from rubio, blond)(15). But some Twitter users are extending the -í suffix to other types of words, like greetings (‘holí’ from ‘hola’, hello) and verbs (‘te quieri muchi’, rather than the standard ‘te quiero mucho’, I love you) (17, 19). In standard usage, the -í suffix is used to connote a sense of affection for the referent (‘Marí’shows affection for María and so on) but ‘quieri’ isn’t showing affection toward the actual action of loving. Rather, it is giving the entire message an affectionate tone or showing affection for the object of that verb, ‘te’ (you) (19). 

The second innovation is the re-categorization of fuerte, ‘strong’,  as an adverb. In standard Spanish fuerte can only be used with a few categories of verbs, those being verbs describing speech and verbs describing movement/contact (“hablar fuerte”, speak loudly; “apretar fuerte”, press hard)(20). Twitter users are now extending fuerte to verbs of all types, including those describing abstract concepts like pensar, to think, and saber, to know (20-21).

The third innovation, which is perhaps the most interesting, regards how users have been using ojalá. Ojalá is a fixed Spanish expression, roughly meaning ‘I wish’ or ‘I hope’.  It does not conjugate for person or tense, and can be used on its own or it can take a subordinate clause with a subjunctive verb (21-22).  For example, you could say “Ojalá que lo del Madrid sea un mal sueño”, or ‘I hope that what’s happened with Real Madrid was a bad dream” (22). But you could not say ‘Ojalá un libro nuevo’, or ‘I wish a new book’. Well, at least you could not say that previously. Twitter users have been using ojalá in more innovative ways (22-23). Take the examples “Ojalá unas terceras elecciones”, ‘I wish [for] third eleccions’, or “Ojalá estar viajando constantemente”, ‘I wish I were constantly traveling’ (23). In this way, ojalá is acting similar to because-x in that we have a structure which is now being used to connect clauses in a non-standard way.

What I find especially interesting is how we can also explain some ojalá structures and their meanings by viewing them as public or private utterances. In some of these innovative ojalá uses, a subject can be omitted, as in the tweet “Ojalá encerrada con Aston Kutcher en un ascensor”, “I wish [I were] locked with Ashton Kutcher in an elevator” (27). Despite there being no verbs or other markers in the sentence indicating a first person reading, Twitter users interpret this message as expressing the desires of the speaker despite the subject not being explicitly noted (28). This is very similar to the argument that because-x functions as a private utterance expressing the perspective of the speaker. However, unlike because-x or kudasai, ojalá does not seem to be creating a private expression within a larger public expression here. I am also unsure whether the use of ojalá here is marking the message as a private expression, or whether the private nature of these tweets with dropped subjects is simply coincidental. 

De Benito Moreno suggests that users continue to use these innovations because they create a feeling of familiarity between users (32). Affectionate suffixes aren’t often used in formal writing, after all. And if we do think that ojalá involves a private expression, similar to that of because-x, we can also say that ojalá serves a similar function of creating a sense of intimacy between participants, because the reader must adopt the speaker’s perspective in order to understand the meaning of the sentence. But this may not be true of all cases of ojalá, so future analysis may be needed.

However, the argument that these innovations serve to create familiarity does not fully explain the extended usage of fuerte, which to my knowledge does not have any specific connotations indicating informality or friendliness. We could perhaps argue that abstract verbs can be emphasized in face to face conversations with tone, body language, and so on that aren’t available in digital dialogues. So the extension of fuerte to abstract verbs was a way to extend an already present strategy of emphasis. But this still doesn’t clearly link it to creation of familiarity. Again, future analysis is needed to determine whether fuerte is part of strategies of creating intimacy online, or if it is serving some other function. 

 

Conclusion

If these two pieces are any proof, we can see that new syntactic structures are arising in online contexts. Between kudasai, ojalá, and because-x, we can also see that these structures cover a variety of meanings and purposes, from connecting cause and effect to making requests to expressing wishes. But what all three have in common is that they are being used in interpersonal communication and they play with the intimacy being created or violated between the author and their audience. This further supports Kanetani’s suggestion that because-x and similar structures are arising due to a particular need to bridge the geographic and emotional distance involved with digital communication. 

 

References

De Benito Moreno, Carlota. “‘The Spanish of the Internet’: Is That a Thing?: Discursive and Morphosyntactic Innovations in Computer Mediated Communication.” English and Spanish: World Languages in Interaction, edited by Danae Perez et al., Cambridge University Press, Cambridge, 2021, pp. 258–286.

Naya, Ryohei. “An Innovative Use of Kudasai in Social Networking Services.”Annals of “Dimitrie Cantemir” Christian University: Linguistics,Literature and Methodology of Teaching, vol. 17, no. 1, 2017, pp. 62–78.

 

Leave a Comment

Filed under Supplemental

Week 9: Non-English Internet Language

Up until now we’ve looked at a lot of English-related content. But this week we’re looking specifically at what ‘internet language’ looks like for speakers of other languages.

 

Malay Netspeak

Izazi’s article on Malay Twitter users describes several different features found in a sample of Malay tweets (17). To find these tweets they used makan, ‘to eat’, as a keyword. Makan is commonly used in Malay informal greetings, so researchers thought this word could help them find a variety of tweets specifically coming from a Malay context (21). Interestingly, in some of the tweets makan was the only Malay word the users wrote (30). Since users made the intentional choice to include a Malay word, makan functioned as a marker of Malay identity (31). English appeared frequently in the sample (22). Some slang found in the sample is derived or borrowed from English, including words like ‘yup’, ‘baby’, and ‘cool’ (20). Similarly, abbreviations like ‘omg’ ‘lol’ and ‘idk’ were the three most common abbreviations found in the sampled corpus (23). We could argue, considering the frequency of English, that the choice to use Malay words may be driven by similar motivations to those users from week 7, who chose to include specific features from their dialect in their tweets in order to invoke a specific regional or cultural identity. 

There are other ‘netspeak’ features which are not directly based on English words. One feature found in the dataset were shortened words, which often dropped vowels (22). Yang(a preposition) became yg, orang(‘man’)became org, and so on. Other words were shortened by dropping their first letter (rumah, ‘house’, becoming umah) or their first syllable (macam, ‘like, such as’, becoming cam) (22). 

Another feature related to spellings is onomatopoeic spellings. Examples from the sampled tweets include ‘haha’, representing laughter and ‘huhu’ to represent crying (23). These spellings allow users to bring sounds present in spoken speech into the written medium. As in English usage, laughter may be used to express genuine amusement or to express a more sarcastic or ironic attitude (24). 

Phonetic replacements of words are also fairly common (24). Interestingly, Malay users also incorporate English pronunciations of numbers into these phonetic constructions, like in the example 21ku, which represents the word tuanku (24). Users must use the English pronunciations of ‘two’ and ‘one’ in order to decipher the construction. 

Users also, at least in one instance, use a non-phonetic symbol to represent a word – that of x to mean tidak, or ‘no’ (24). X, in this case, does not represent any of the sounds in the word, but could be interpreted as a visual representation of the word’s meaning. X can also be combined with other words to create phrases. Xkisah, for example, is equivalent to tidak kisah, ‘I don’t mind’ (24). 

One feature that English and Malay users of Twitter seem to share is that of the keysmash: a nonsensical string of letters that carries no informational meaning, but does convey a sense of emotion (25). The idea here, of course, is that the user is so overcome with emotion that they aren’t able to focus enough to type actual words to describe how they’re feeling. Another feature shared across both these communities of users is that of letter repetition (25). Users can repeat letters within the word – in this example, the final letters of words – to emphasize the message and the emotion behind the message (25, 27).  Finally, capitalization and superfluous punctuation (!!!) are used similarly across the two groups to add emphasis to words or phrases (28)

Malay users also play with creative spellings of words. English words might be subjected to Malay grammatical conventions (‘I’ becomes iolls, ‘we’ becomes weolls) (25). Users might change the spelling of words slightly to allow for wordplay (26). Since Malay spelling is phonetic, users can also misspell words in order to intentionally invoke a humorous sounding pronunciation (26). Changes in spelling also occur with foreign loanwords or names (McDonald’s, Starbucks), which users can choose to spell phonetically according to local pronunciation (Mekdonel, Setabak) (27). In this case, the spelling change seems to be used less for humorous effect and more to localize the word. 

Finally, Izazi touches on the use of emoji in the dataset (29). However, this analysis is fairly limited to one function of emoji: that of illustrative emoji that support and re-inforce the literal meaning of the text and the emotion behind it. For example, one of the tweets for this section has text that says the user can’t stop eating cookies, paired with the cookie emoji. Further research or analysis would need to be done to see if Malay users use emoji for other functions, like to clarify intent or to collaboratively make meaning with the literal text. 

As we can see, Malay users take advantage of a variety of features in their tweets. Some allow for dropped letters, enabling brevity, while others add characters to a message (30). Some borrow from English vocabulary, while others focus on native Malay words. Some of these features allow users to invoke spoken speech, while others are not connected to spoken speech (30). As with English features, these features allow Malay users to write informally and indicate emotion or enthusiasm.  

 

Japanese Honorifics

Japanese honorifics are a grammatical feature that doesn’t really have an English equivalent. They allow for speakers to clarify their relationship to the addressee or a third party (2). But this gets a bit tricky when users are communicating online. Users may not have information on who others are or what respect they are demanded (Liu 2). Online communication also tends towards the informal anyway. What results are some interesting and creative uses of honorifics in regards to honorifics that refer to third parties. 

Japanese honorifics used for third parties (‘referents’) can be roughly categorized into two types: respect form and humble forms (2). Respect forms allow users to express their deference by elevating the prestige of the referent. Humble forms allow users to downgrade themselves, thus showing respect to the referent (2). These honorifics are expressed through different verbs forms and affixes that can be attached to nouns, adjectives, or adverbs (3). 

So if these honorifics are used to indicate politeness, how do we indicate impoliteness? There are two ways impoliteness can occur in online communication: when a speaker intentionally communicates impoliteness, and when the audience perceives or constructs the speaker’s behavior as impolite (6). So a speaker could indicate impoliteness through the insincere use of honorifics (5). Using extremely polite honorifics or using honorifics where they aren’t required may read as sarcastic or ironic (5). Although previous studies looking at the insincere use of honorifics focused on spoken speech, there is no reason why this cannot also be used in online contexts. 

In online contexts, impoliteness can be constructed in several ways. It could be directly marked on a word (or be implied by not marking polite forms) (6). It could also be marked through mismatches of the content of a message and its grammatical forms (ie, speaking negatively about a person but referring to them with polite forms) (6).

So, having considered all of this, how are Japanese speakers using referent honorifics online? Liu collected and analyzed 13,855 comments from four Yahoo Japan News articles, all published in the first week of July, 2018 (7-8). Referent honorifics were actually fairly rare, appearing in 2.5% of all comments (for the individual articles, referent honorifics appeared in 1-5% of the comments) (8). The article with the highest percentage (5%) of comments with referent honorifics was about then Japanese prime minister Abe (8). The social expectation of referring to Abe with honorifics was carried over to this online context (8). The other articles were not centered around high-ranking politicians or other celebrity figures, which may account for the low frequency of referent honorifics in the comments on those articles (7). 

Most of the referent honorifics were respect forms, rather than humble forms (9). Referent forms also had higher rates of non-normative usage (9). In general humble forms are rarer than respect forms in face to face communication, which may explain the gap in use between respect and humble forms (9). However, we may also be able to attribute this fact to the idea that commenters tend to be anonymous. As such, a reader has no way of knowing what the commenter’s relationship to a referent actually looks like in terms of social standing. They have no reference point against which to compare the humble form of an honorific. On the other hand, when speaking about a referent the audience has an idea of what that referent’s social standing is, either because they are a public figure (like Abe) or because the article provides readers with information about the referents (such as their occupation, their behavior, or their character). A reader then has a reference point to compare a respect form honorific to. 

Honorifics had several different functions within the data. They could be used to show respect or admiration, as in comments praising students at the University of Tokyo, the country’s most prestigious school (10). Sometimes they were used to conform to professional standards, as when a commenter who worked at a senior boarding house referred to clients (10). Commenters also used honorifics when they were expressing agreement with other users in the comment section, although these honorifics were generally limited to one or two verbs and were not used throughout the message (10). Other usages were less sincere. As mentioned above, users can mismatch honorifics with message content to create a sense of sarcasm. One user criticized the work of journalists, and referred to commenters who supported the work of journalists as having ‘valuable opinions’, with this phrase being preceded by a polite honorific (10-11). Mock politeness is created by the contrast between the user’s negative opinion of these journalists and their supporters and the respectful form attached to ‘valuable opinions’. Mismatches can also occur when referents are referred to both with insults and with overly respectful honorifics (12-13). In these non-standard uses “there are always co-occurring linguistic features in the text signposting the poster’s negative attitude and hence their intended message”, thus preventing other readers from misinterpreting their use of honorifics as sincere (13).

In short, the move to online contexts has not displaced referent honorifics entirely. However, they are used at much lower rates than in spoken language. Honorific usage is driven both by politeness and by genuine respect. Those commenters who criticize other commenters for not using referent honorifics when discussing respected public figures see honorific usage as a polite thing to do. But the data also shows that sincere honorific usage generally only appears when the commenter has a favorable view of the referent, suggesting that honorific usage is also dependent on the user having genuine respect for the referent (14). The findings also show that honorifics can be used to construct impoliteness and sarcasm by mismatching these polite forms with impolite adjectives, descriptions, or criticisms (14).

 

Russian Netspeak

Expressive features used by Russian users on Instagram, Twitter, and LiveJournal have many things in common with English and Malay features, at least if we can take the 145 respondents in Olga Novikova’s 2021 study as representative of the general Russian-speaking internet user population (69). 

Users use graphical features like bolding or capitalization to emphasize words (71). Unlike users in previously referenced studies, however, the surveyed users here also used underscores around words as a way to emphasize. Russian users also used strikethrough text in interesting and innovative ways. The studied users used strikethrough text to express their opinions while also acknowledging said opinions might be viewed negatively by society in general or other users (71). This strikethrough text would then be followed by a milder form of the opinion. For example, one user asked (translated) “did the police readers recognize you…?”; here, the user expresses self awareness of how their opinion (‘police’) might be reframed by outsiders (‘readers’) (71). 

As with some other linguistic communities, the surveyed users used letter repetition to evoke sounds of emotion (‘Mmmm’, ‘Aaaaah’) or to emphasize words (Даааа, ‘yessss’) (71). Users may violate spelling or punctuation rules, although it is unclear whether there are any patterns to these violations and whether they are meant to elicit any specific effect on the reader (72, 77). Ellipses, as in some English examples, can be used to indicate pauses in the text (79). In the provided example, the ellipses follow a question the author rhetorically asks of the readers; Novikova suggests that these pauses can be used to create a sense of back-and-forth with the reader, giving them a chance to stop and consider the author’s words.

English does make an appearance in the data. Some English words are directly borrowed, Latin script and all (76). Others are borrowed from English or other languages like German, Japanese, or Korean, with their spellings adjusted for the Cyrillic script and Russian pronunciation (76). Other new words are created from native Russian words by merging words together, although whether these new words are limited to online contexts or not is unclear (77).

 

Final Thoughts

Although these three papers do not represent all linguistic communities, or even all speakers of a certain language, we can see some common threads throughout all three. Malay and Russian both make use of English loanwords as well as features shared with English, such as letter repetition or intentional use of nonstandard spellings. Now, whether these features arose through contact with English-speaking users or whether they developed on their own is unclear. But these features must have some sort of crosslinguistic usefulness if speakers of other languages are continuing to use them, even in contexts which are not necessarily aimed toward English-speaking audiences or native English speakers. 

The Japanese paper focuses on honorifics, which do not have an equivalent in English or the Malay and  Russian papers. But it does show us that linguistic communities besides English-speaking ones (or, at least, Japanese-speaking users) are using and adjusting native linguistic elements to communicate their ideas online. That is, using language online which does not directly reflect formal writing standards is not something that is unique to English-based contexts. 

Finally, these three papers show that investigating other linguistic communities and seeing how they use language in online contexts is a viable exercise and, more than that, a necessary exercise if we wish to see which elements of ‘netspeak’ are useful crosslinguistically, which elements are unique to English, which elements are unique to other languages, and which elements may have influenced or been imported to other linguistic online communities. 

 

References

Izazi, Zulkifli Zulfati, and Tengku Mahadi Tengku-Sepora. “Slangs on Social Media: Variations among Malay Language Users on Twitter.” Pertanika Journal of Social Sciences & Humanities, vol. 28, no. 1, 2020.

Liu, Xiangdong. “Japanese referent honorifics in computer-mediated communication.” Language@Internet, vol 19, 2021. https://www.languageatinternet.org/articles/2021/liu.

Novikova, Olga, et al. “Linguistic Analysis of Insta, Twit Posts and LJ Blogs in the Context of Their Functions (Based on the Russian Language).” International Journal of Interactive Mobile Technologies, vol. 15, no. 5, May 2021, pp. 66–86.

Leave a Comment

Filed under Weekly Post

Week 7: The Use of Dialects Online

This week’s readings involve dialect and how users translate dialect to online written contexts. If we assume that the informal language people use online is influenced by their informal spoken speech, then it would make sense to see dialectal differences being codified through non-standard spellings or other orthographic or grammatical changes.

 

Sakha

Sakha, also known as Yakut, is a language spoken by about 450,000 people in Yakutia/the Republic of Sakha, a region in northeastern Russia. One researcher, Jenanne Ferguson, looked at how Sakha was being used online and found that some users were carrying over dialectal differences into their informal writing (131). She specifically looked at the use of word-initial ‘h’. In some dialects of Sakha, words which begin with an ‘s’ will instead be pronounced as if they begin with an ‘h’ if they are following a word that ends with a vowel. 

For users who write this dialectal difference into their online writing, the feature functions as a marker of local identity (134-135). Even the choice to use Sakha itself reflects this, since many Sakha speakers are bilingual in Russian (134). Since online identity is mainly created through language, the use of a specific language or a change in spelling marks the user as from a specific geographic location and similarly allows them to find other individuals who speak their dialect (132, 138, 140). 

At the same time, this close link between language use and identity opens up users to discourse surrounding cultural preservation through ‘correct’ language (132). Not all dialects of Sakha include this ‘s’ to ‘h’ change, and its use in writing is not sanctioned by any official Sakha language groups (140). Those who don’t have this dialectal difference view users with this spelling difference as seeking their own unique identity, which is only heightened by the contentious relationships between the different regions of Sakha speakers (139). As a result, users from different dialects argue over whether this dialectal difference should be used in writing. 

These arguments are also further complicated by the larger history of Yakutia. Sakha speakers have long faced pressures to assimilate into a larger Russian culture, so modern-day speakers want to push back against this assimilation (143). However, they are split on the best route forward. For ‘h’ users, writing their dialectal difference further separates them from Russia, since Russian phonology does not use the Sakha ‘h’ (141). In fact, these users will even overuse ‘h’ in words where it wouldn’t occur in spoken speech, so their choice to include it in writing is definitely not solely about imitating their own spoken speech (141). Non-’h’ users, meanwhile,fear that nonstandard spellings will further separate Sakha speakers into distinct regional groups, thus splintering a unified Sakha identity and making it more difficult for Sakha speakers to stand up against assimilationist attitudes and actions (143-144). 

As we can see here, the choice to include a specific dialectal feature is not simply driven by the desire to write down oral speech, but is also used to convey specific meaning outside of the content of any given message. For Sakha ‘h’ users, it’s a choice that conveys their specific geographic region and emphasizes their Sakha identity. In some cases this can help to create intimacy between users of the same dialect (147). At other times it leads to arguments with other speakers. Speakers must decide for themselves whether the potential intimacy with other speakers is worth the fallout they might encounter.

 

Dialects in Northern England

Do users in other places who speak other languages also choose to use dialect in their writing? Andrea Nini would say yes, at least in regards to dialects in northern England on Twitter. Nini was able to collect tweets with geographic data attached to them, allowing for researchers to find out what elements of northern dialects were appearing in tweets and at what frequency. 

Researchers had to consider two criteria as they chose what features to focus on within the data. The first was that the features must be “socially salient enough to be used orthographically as an index of local dialects”; in other words, they must be recognized by the speakers as being a feature of the dialect (271). Secondly was that the chosen features “must be plausibly encoded in orthographic representations”; the features needed to translate from spoken language to the written word (271). So researchers would not be able to study features like the dark /l/ , since it can’t easily be represented in writing. 

For this study, researchers chose eleven features that could be represented in writing, which included variations in both consonants and vowels (271). For example, one feature in Northern dialects is TH-stopping and TH-fronting, where the dental fricative represented by ‘th’ is instead realized by another consonant. Written examples would include “tink” instead of “think” or “wiv” instead of “with” (273). They were then able to match tweets with these features to the geographic location of the user. 

Nini and her team found that although these non-standard spellings of dialectal difference were fairly infrequent, “for most of [the features] clear geographical patterns can be detected and this suggests that the geographical signal contained in these frequencies is also relatively strong” (276). So these non-standard spellings did seem, in general, to be reflective of phonetic differences in dialect (288). Because these variants are overall infrequent, this suggests that users who do use these variants are choosing to use them intentionally. So although these variants may sometimes result in shortened words, brevity is not the driving factor here. Rather, these users are trying to “convey a particular identity or stance” through their linguistic choices (286-287). These features also are not usually appearing in isolation, which Nini suggests indicates interplay between these various features (290). In some cases, users may be using these dialectal features “as part of a wider linguistic style tailored to a user’s own dialectal identity” (290). In other cases, these variants may be used to imitate certain accents and thus invoke certain regional identities (278, 280).  

Due to Twitter’s setup, there is no way for researchers to determine whether the features someone uses in online discourse are the same features they use in spoken language (290). However, researchers may be able to determine the salience of a feature – that is, the extent to which that feature is noticed and linked to a regional identity – by examining how frequently it is being used and where the users employing it are based (289). Researchers may even be able to make solid guesses about which features are seen by outsiders as being emblematic of a particular dialect, versus which features speakers of that dialect recognize as being emblematic of their own dialect (290). 

 

Conclusion

As shown in these two examples, linguistic choices in online communication can allow users to assert regional identities. This both allows users to express pride in their identity, but it also impacts interpersonal interactions by allowing users to find common ground with other users who speak their dialect. Although I can’t make generalizations on how common of a phenomenon this is for other dialects or other minority languages, users of Sakha and Northern English dialects are unlikely to interact frequently due to the geographic distance between the two regions and the fact that Sakha speakers are more likely to use Russian as a lingua franca rather than English. So I think it is unlikely that Sakha users were influenced by Northern English users to use their dialect online, or vice versa. If these separate communities are choosing to use dialect online to assert their identity, I think it’s likely that this is also occurring in other communities as well. 

 

References

Ferguson, Jenanne. “Don’t Write It With ‘h’! Standardization, Responsibility and Territorialization When Writing Sakha Online.” Responsibility and Language Practices in Place, edited by Laura Siragusa and Jenanne K. Ferguson, vol. 5, Finnish Literature Society, 2020, pp. 131–52, https://doi.org/10.2307/j.ctv199tdgh.10. Accessed 25 Nov. 2022.

Nini, Andrea, et al. “The Graphical Representation of Phonological Dialect Features of the North of England on Social Media.” Dialect Writing and the North of England, edited by Patrick Honeybone and Warren Maguire, Edinburgh University Press, 2020, pp. 266–96, http://www.jstor.org/stable/10.3366/j.ctv182jrdf.16. Accessed 25 Nov. 2022.

Leave a Comment

Filed under Weekly Post

Week 3: Ethics, Limitations, and Challenges

This week’s theme is Ethics, Limitations, and Challenges. The sources I looked at focused more on how to collect and analyze language samples from online, rather than the content of these samples. What ethical questions do we need to consider when collecting linguistic data? What challenges are involved with data collection? 

 

Ethics

Researching online language, like all research, requires us to consider how to make ethical research choices. One of the largest issues might be the blurred line between public and private online content. Although most people are aware that anything posted on the internet is no longer private, most users also expect a degree of privacy when it comes to their emails and texts. Furthermore, even if a person publicly posts to a public social media account or website, they may not be intending for their words to reach a large audience. But researchers might also prefer this sort of data point, since the users are not being influenced (consciously or not) by the idea that someone might be analyzing their writing (Hou 36). The issue of informed consent and permission to use material is thus a relevant issue when looking at research being done on digital language (Lewis 14). Issues of privacy also arise. Although it is unlikely that any singular quoted tweet could be traced back to an English-speaking Twitter user, what if researchers are analyzing a smaller linguistic community? 

Depending on the research question, it might be easiest for researchers to collect data from users who are clearly public figures or organizations (Hou 41). But if research is focusing specifically on informal language, slang, or other linguistic phenomena that are less likely to appear in a corporate or professional context, the usefulness of data from public figures might be low. 

 

Challenges

Besides ethical considerations, there are multiple other challenges that may impact data collection and analysis. First of all, there is an extensive amount of data available to researchers (Crystal 10). Of course this can be a good thing, but filtering data to create a usable sample may pose some difficulties depending on the specific questions researchers are asking. The sheer amount of data also may influence researchers to focus on the sites/platforms that are easiest for data collection (such as Twitter) and overlook sites/platforms that are more difficult to research. If you can get all your data from Twitter, why would you choose to look elsewhere? (This is of course a generalization, but researchers only have so much time, manpower, and funding).

The second challenge is that some of this data may not be relevant to some research due to the rate of change online. Site updates may impact the type of data being produced (like Twitter raising the character limit for tweets) and site demographics may shift in just a few years time (Facebook at its inception was primarily used by college-aged young adults, but now is being used more by adults in their 30s and older). This is also true with studies done on traditionally published works or offline communities, but in general traditional publishing is more conservative and less inclined to abrupt, rapid change. 

Anonymity can also pose issues when trying to determine demographic data for any given sample. Factors like age, geographic location, and gender may not be readily shared by a user. So any analysis that wants to look at differences within or between specific demographics will be missing some data (14). In some contexts anonymity might pose less of a problem; for example, if a researcher is looking at vlogs of individuals using ASL, then this demographic information might be more obvious. But this type of data collection brings us back to the question of ethics and informed consent – the more information that researchers have about any one person in the data set, the higher the possibility that that person’s privacy might be compromised (Hou 40).  

Another challenge arises simply from the formatting of different sites. For example, how should retweets be treated if a researcher is analyzing language use on twitter (Crystal 40)? Reduplication in the data set could skew results, but removing  retweets entirely could ignore how users are interacting with each other or how they are reacting to or interpreting certain linguistic structures. Similarly, users may compose tweets which have incomplete utterances or whose meaning is difficult to figure out without additional context (41, 45). Removing these tweets might be necessary in some cases, but it does mean that researchers will be losing some data. The difficulty of 

The ability for posts to be edited or deleted might also cause issues. When citing data from someone else, including the data’s provenance – its line of history from you to its original creator – can allow for someone to check for errors and, if there are errors, to pinpoint where and possibly how they occurred (Lewis 8, 13). What should be done if a research cites a particular post, and that post no longer exists? Or what if that post has subsequently been changed? 

 

Final Thoughts

As time goes on, more tools will be developed to help researchers filter data, and more people will bring forward ideas as to how we can best overcome these challenges of collecting online linguistic data without compromising the privacy and informed consent of those users that data is being collected from. 

And, as we continue on this semester, we’ll have to consider how these challenges may have impacted the data we are looking at.

 

Citations:

Crystal, David. Internet Linguistics : A Student Guide, Taylor & Francis Group, 2011. ProQuest Ebook Central, https://ebookcentral.proquest.com/lib/smith/detail.action?docID=801579.

Hou, Lynn, et al. “Working with ASL Internet Data.” Sign Language Studies, vol. 21, no. 1, 2020, pp. 32–67, https://www.jstor.org/stable/26984276. Accessed 21 Sept. 2022. 

Lewis, W., Farrar, S. & Langendoen, T. (2006), Linguistics in the Internet Age: Tools and Fair Use, in ‘Proceedings of the EMELD’06 Workshop on Digital Language Documentation: Tools and Standards: The State of the Art’. Lansing, MI. June 20-22, 2006.

Leave a Comment

Filed under Technology, Weekly Post