#languageanalysis | Explore Tumblr posts and blogs

#languageanalysis

Explore tagged Tumblr posts

Visit Tumblr Blog

Explore Tumblr blogs with no restrictions, modern design and the best experience.

Last Seen Tumblr Blogs

codeconstructiona

Untitled

43 posts

blog-manu-factur

blog_MANU_factur

23 posts

diabaungo

memes/art/doujinshi translation

55 posts

bt-av

2K posts

ministerofinstagram

#instagram

6K posts

Fun Fact

In February 2021, Tumblr had 518.6 million blog accounts.

thetaxguyin · 1 year ago

Text

Zipf's Law: The Hidden Order of Language and Distribution

Have you ever wondered why certain words appear frequently while others remain rare in written or spoken language? Enter Zipf’s Law—a fascinating principle that unveils the underlying patterns of word usage and distribution in human communication. In this blog post, we’ll explore what Zipf’s Law is, how it applies to language, and what implications it holds for understanding the structure of…

View On WordPress

#communicationpatterns #computationallinguistics #languageanalysis #linguistics #naturallanguageprocessing #power-lawdistribution #wordfrequency #Zipf&039;sLaw

0 notes

elevateenglish-blog · 6 years ago

Photo

A text will always be written in a particular narrative so you can always comment and analyse it! Swipe for definitions and effects ➡️ ⠀⠀⠀⠀ #gcse #gcsememes #gcse2019 #gcse2020 #gcseenglish #gcseenglishliterature #gcseenglishlanguage #gcserevision #gcse #motivation #motivationalquotes #studygram #studymotivation #studytips #studyinspiration #revision #revisionnotes #revisiontips #revisionmotivation #languageanalysis #narratives https://www.instagram.com/p/B6YZDltFoh4/?igshid=1n3hqx4c0qmsm

0 notes

accentbase · 6 years ago

Video

youtube

An Introduction To The International Phonetic Alphabet (Part 1 What is t...

#linguistics #language #english #pronunciation #phonetics #phonemes #internationalphoneticalphabet #IPA #englishlearning #englishstudy #languageanalysis #listeningskills #studyskills

0 notes

isshinotasuke · 2 years ago

Link

0 notes

myopicmage · 8 years ago

Text

Sitecore, Solr, and Many languages

Sitecore 7 added a content search API to interact with Lucene and Solr. I'm sure anyone who has ever worked with search will tell you that search is hard, as it requires a lot of customisation that is entirely per-site, and what works for someone else might not work for you.

I'm here to tell you what worked for me, a really specific use case involving Sitecore 8.1 update 4, Apache Solr 6.2, and searching 7 regions with 4 different languages.

We're using some internal libraries on top of the content search API, but they eventually make the same calls as everyone else.

We started out with a fairly standard content search, which... mostly worked, even across languages. Condensed form:

var context = SearchIndex.CreateSearchContext(); var query = context.GetQueryable<oursearchresults>(); query.Content.Like(queryArgs);

There are actually a few issues with this approach:

The way our site is set up, 90% of the content we care about is actually in an item's components, not on the item itself.

This treats all languages the same way. Sitecore will send the same query to solr no matter the language being searched: _content:(*queryArgs*)

This will only give exact matches (even though .Like() is used)

Issue 1 is solved with a computed index field.

public class VisualizationField : MediaItemContentExtractor { public override object ComputeFieldValue(IIndexable indexable) { string baseValue = base.ComputeFieldValue(indexable) as string; Item indexItem = indexable as SitecoreIndexableItem; if (!ShouldIndexItem(indexItem)) { return baseValue; } var dataSources = Globals.LinkDatabase .GetReferences(indexItem) .Where(link => ShouldProcessLink(link, indexItem)) .Select(link => link.GetTargetItem()) .Where(targetItem => targetItem != null && targetItem.Versions.Count > 0) .Distinct(); var result = new StringBuilder(); if (!string.IsNullOrEmpty(baseValue)) { result.AppendLine(baseValue); } foreach (var dataSource in dataSources.Where(ShouldIndexDataSource)) { dataSource.Fields.ReadAll(); foreach (var field in dataSource.Fields.Where(ShouldIndexField)) { result.AppendLine(field.Value); } } return result.ToString(); } }

The ShouldProcess and ShouldIndex methods check to see whether or not something is actually related, and whether or not something should be put into the solr index based on some pretty basic parameters (correct content type, whether or not the component is actually being rendered).

Issue 2 caused me a great deal of stress until I stumbled across a blog post from the Sitecore 7 era. Sitecore added the concept of CultureExecutionContexts, which is a really fancy way of saying you can tell Sitecore to send over a search for content_t_{lang} instead of just _content by using this:

var context = SearchIndex.CreateSearchContext(); var culture = new CultureInfo(Sitecore.Context.Language.Name); var cultureCtx = new CultureExecutionContext(culture); var query = context.GetQueryable<oursearchresults>(cultureCtx); query.Content.Like(queryArgs);

And now your solr queries will look like this:

`content_t_{lang}:(*queryArgs*)`

Huzzah! You're searching specific languages! The problem quickly becomes, now you're doing language-specific exact match queries, which isn't very helpful.

Enter stemming algorithms.

The basic idea is that you give solr a word like engineer, and it boils the word down to the word's stem, so that you can run queries like engineer, engineers, engineered, or engineering and it will give you the same results. There are stemmers for basically every language you can think of, and the solr documentation explains how to use them far better than I ever could. The example schema.xml file generated by Sitecore actually contains basic analyzers that work fairly well. You will likely want to tweak them to fit your needs, but for an out-of-the-box solution, they work.

Once you've put the correct analyzers in place, restarted solr (this is important, solr does not pick up schema changes on the fly), and reindexed, you should now be getting decent search results in multiple languages.

Now is when language-specifics come into play. One of the languages this client supports is Polish, which does not come with out-of-the-box support from solr. Thankfully, there are already instructions for how to set that up.

The problem language for us, so far, has been German. German is what's known as a fusional language, which means that they tend to make new words by shoving old ones together. For instance, the German word for engineer is "ingenieur" and the word for civil engineer is "bauingenieur." This creates an issue for our search purposes, as "bauingenieur" and "ingenieur" should both return results for "ingenieur." The problem is solved with the Dictionary Compound Word Token Filter, a solr filter that will break words like bauingenieur down into their components "bau" and "ingenieur," so your results become what you'd expect. This requires a German word list, which can be a bit tricky to find, but once you have it, it works beautifully.

At this point, our search results have become downright useful and accurate (though we haven't implemented nice-to-haves like spellchecking and synonym searches), but there's a subtle bug. Sitecore isn't sending over the _content field to solr for each individual language properly. If your setup is like ours, with a very thin item and all of the pertinent content in subcomponents, the _content field in the index is going to be very sparse, basically containing nothing but the content in the top level item itself.

This is a subtle bug, and one that took several hours of debugging and someone far more versed in Sitecore than me to finally solve, but the issue is in the computed index field for the _content field.

var dataSources = Globals.LinkDatabase .GetReferences(indexItem) .Where(link => ShouldProcessLink(link, indexItem)) .Select(link => link.GetTargetItem()) .Where(targetItem => targetItem != null && targetItem.Versions.Count > 0) .Distinct();

This code will only get the components in Sitecore's default language. The rest of the code will properly put the correct language content from the top-level item into the index, but one of the checks it makes is whether or not a component is in the layout of that item in that version and language. If you have an item that only exists in the default Sitecore language, this works fine, but for any other language it's not going to get any of the subcomponents.

I haven't found any documentation about this, but the solution that is working for us is bringing in a LanguageSwitcher:

using (var switcher = new LanguageSwitcher(indexItem.Language)) { public class VisualizationField : MediaItemContentExtractor { public override object ComputeFieldValue(IIndexable indexable) { string baseValue = base.ComputeFieldValue(indexable) as string; Item indexItem = indexable as SitecoreIndexableItem; if (!ShouldIndexItem(indexItem)) { return baseValue; } var dataSources = Globals.LinkDatabase .GetReferences(indexItem) .Where(link => ShouldProcessLink(link, indexItem)) .Select(link => link.GetTargetItem()) .Where(targetItem => targetItem != null && targetItem.Versions.Count > 0) .Distinct(); var result = new StringBuilder(); if (!string.IsNullOrEmpty(baseValue)) { result.AppendLine(baseValue); } foreach (var dataSource in dataSources.Where(ShouldIndexDataSource)) { dataSource.Fields.ReadAll(); foreach (var field in dataSource.Fields.Where(ShouldIndexField)) { result.AppendLine(field.Value); } } return result.ToString(); } } }

Once you rebuild and reindex with the proper computed index field, your components will be properly indexed, your search results correct, and, hopefully, your clients happy.

#sitecore #solr #sitecore 8.1 #solr 6.2 #.net

0 notes

mixandmatcha · 7 years ago

Text

The Picture of Dorian Gray by Oscar Wilde

Symbolism- Symbolism is a literary element used in literature to help readers understand a literary work.

The Picture of Dorian Gray Symbols

The Yellow Book

It symbolizes Dorian Gray’s beliefs. The book was given by Lord Henry Wotton as a gift. His influence on Dorian Gray became more powerful because Dorian Gray based his life and actions on the book. While reading the book, Dorian feels similar to the character in the book. The color “Yellow” symbolizes poison, which made Lord Henry’s influence poisonous to Dorian Gray. Lord Henry is like of of those people who tend to brainwash you as a person and you’ll be able to change your mindset because of that influence.

The Picture

The picture of Dorian Gray reflects his personality. Dorian wanted to remain beautiful and young even though he will grow old. The painting will remind Dorian of his youth. As time goes by, the painting became ugly so Dorian hid it in a room and suspect that Basil was the one behind it. The picture of Dorian Gray symbolizes the decaying of his soul. It also mirrors his wrongdoings. The picture bear the guilt instead of Dorian Gray himself. The picture became Dorian Gray’s weakness and when he stab it, it backfired and Dorian Gray became ugly like the picture whereas the picture remained youthful like how it was supposed to be.

The Theater

The theater is an important place in the novel because it is where Dorian Gray met his first love---Sybil Vane. She is a theater actress and Dorian Gray often watch her plays. It symbolizes their meeting place.

#DorianGray #Symbolism #CreativeWriting #LanguageAnalysis

2 notes · View notes

elevateenglish-blog · 6 years ago

Photo

Another basic technique we can discuss with more appropriate terminology! Swipe for definitions and effects ➡️ ⠀⠀ ⠀⠀⠀ #gcse #gcsememes #gcse2019 #gcse2020 #gcseenglish #gcseenglishliterature #gcseenglishlanguage #gcserevision #gcse #motivation #motivationalquotes #studygram #studymotivation #studytips #studyinspiration #revision #revisionnotes #revisiontips #revisionmotivation #languageanalysis ⠀⠀ https://www.instagram.com/p/B6YYUPnFwYr/?igshid=1kcxylcsj0z98

#gcse #gcsememes #gcse2019 #gcse2020 #gcseenglish #gcseenglishliterature #gcseenglishlanguage #gcserevision #motivation #motivationalquotes #studygram #studymotivation #studytips #studyinspiration #revision #revisionnotes #revisiontips #revisionmotivation #languageanalysis

0 notes

elevateenglish-blog · 5 years ago

Photo

Percy Bysshe Shelley - Ozymandias - Language analysis 🔎⠀ ⠀ ⠀ #gcse #gcsememes #gcse2019 #gcse2020 #gcseenglish #gcseenglishliterature #gcseenglishlanguage #gcserevision #gcse #motivation #motivationalquotes #studygram #studymotivation #studytips #studyinspiration #revision #revisionnotes #revisiontips #revisionmotivation #poetry #powerandconflict #powerandconflictpoetry #ozymandias #percybyssheshelley #languageanalysis https://www.instagram.com/p/B6d5iwvlj1p/?igshid=lzc7wekrlqxb

1 note · View note

elevateenglish-blog · 6 years ago

Photo

Did you know there are different types of alliteration? 🤯⠀ ⠀ Swipe for the definitions and effects to improve your key terminology and analysis!⠀ ⠀ #gcse #gcsememes #gcse2019 #gcse2020 #gcseenglish #gcseenglishliterature #gcserevision #gcse #motivation #motivationalquotes #studygram #studymotivation #studytips #studyinspiration #revision #revisionnotes #revisiontips #revisionmotivation #alliteration #literarytechniques #analysis #languageanalysis https://www.instagram.com/p/B6WHULzFHwr/?igshid=11v0nuoptmypj

0 notes