Tumgik
#natural language processing (nlp)
compling-studies · 1 year
Text
Tumblr media
2023-04-25 • 16/100 days of NLP
Finished up on the summary of linear regression as a form of prep for the fall classes. Currently, the coding part is not that clear to me but it should become easier as I code more.
48 notes · View notes
redheaded-techpriest · 6 months
Note
hey are u really studying NLP for translation? can you explain why DeepL seems to be so much better than google translate?
Hey! I'll do my best with what I've figured out through their proprietary BS!
My short answer is pretty much just due to different learning methods and dictionaries, with Google's Neural Machine Translation system and DeepL's convolutional neural net both having different advantages.
Google pulls from every google indexed site which offers a lot of languages, but of wildly varying qualities, and then runs it through their neural machine translation-inator, trying to match sentences based on its learned linguistic and semantic rules (Example Based MT), and for those without enough training data for the NMT-inator it just runs them through a statistical maximization function.
DeepL started off as Linguee, just a translation dictionary and expanded into sentences by pulling from bilingual texts, essentially trading quantity for quality of material, and then running it through their convolutional neural net, running the samples through filters trying to break down the important features, letting it have a bit more of a natural feeling translation.
3 notes · View notes
mathart · 1 year
Text
Tumblr media
CrAIyon Literalness
8 notes · View notes
bits-of-ds · 2 years
Text
Cosine Similarity; For checking similarity of documents, etc.
Cosine similarity is a measure of checking the similarity between two documents, texts, strings, etc.
It does so by representing the query as vectors in n-dimensional space. It then measures the angle between these vectors and gives the similarity based on the cosine of this angle.
If the queries are completely similar the angle will be zero; Thus the cosine similarity will be: > cos(angle_between_the _vectors)=cos(0)= 1
If the queries are completely dissimilar the vectors will be perpendicular; Thus the cosine similarity will be: > cos(angle_between_the _vectors)=cos(90)= 0
If the queries are completely opposite the vectors will be opposite to each other; Thus the cosine similarity will be: > cos(angle_between_the _vectors)=cos(180)= -1
The cosine similarity, mathematically, is given by:
Tumblr media
Let's see an example:
Doc1 = "this is the first document" Doc2 = "this document is second in this order"
Vector representation of these documents: Doc1 = [1,1,1,0,1,1,0,0] Doc2 = [1,0,1,1,1,2,1,1]
Tumblr media
ΣAiBi = (1*1)+(1*0)+(0*1)+(1*1)+(0*1)+(0*1)+(1*0)+(1*2) = 4 √(ΣAi)^2 = √(1+1+0+1+0+0+1+1) = √5 √(ΣBi)^2 = √(1+0+1+1+1+1+0+4) = √9
Cosine similarity = 4/(√5*√9) = 0.59
The Cosine Similarity is a better metric than Euclidean distance because if the two text document far apart by Euclidean distance, there are still chances that they are close to each other in terms of their context.
13 notes · View notes
softmaxai · 1 year
Text
NLP, an acronym for Natural Language Processing, is the computer’s ability to acknowledge human speech and its meaning. NLP solutions providers in India helps Businesses using NLP solutions to improve the website flow and enhance conversions, chatbots for customer support and it saves time and money.
2 notes · View notes
git-commit-die · 1 year
Text
ChatGPT, LLMs, Plagiarism, & You
This is the first in a series of posts about ChatGPT, LLMs, and plagiarism that I will be making. This is a side blog, so please ask questions in reblogs and my ask box.
Why do I know what I'm talking about?
I am a machine engineer who specializes natural language processing (NLP). I write code that uses LLMs every day at work and am intimately familiar with OpenAI. I have read dozens of scientific papers on the subject and understand how they work in extreme detail. I have 6 years of experience in the industry, plus a graduate degree in the subject. I got into NLP because I knew it was going to pop off, and now here we are.
Yeah, but why should I trust you?
I've been a Tumblr user for 8 years. I've posted my own art and fanart on the site. I've published writing, both original and fanfiction, on Tumblr and AO3. I've been a Reddit user for over a decade. I'm a citizen of the internet as much as I am an engineer.
What is an LLM?
LLM stands for Large Language Model. The most famous example of an LLM is ChatGPT, which was created by OpenAI.
What is a model?
A model is an algorithm or piece of math that lets you predict or make mimic how something behaves. For example:
The National Weather Service runs weather models that predict how much it's going to rain based on data they collect about the atmosphere
Netflix has recommendations models that predicts whether you'd like a movie or not based on your demographics, what you've watched in the past, and what other people have liked
The Federal Reserve has economic models that predict how inflation will change if they increase or lower interest rates
Instagram has spam models that look at DMs and automatically decide whether they're spam or not
Models are useful because they can often make decisions or describe situations better than a human could. The weather and economic models are good examples of this. The science of rain is so complicated that it's practically impossible for a human to make sense of all the numbers involved, but models are able to do so.
Models are also useful because they can make thousands or millions of decisions much faster than a human could. The recommendations and spam models are good examples of this. Imagine how expensive it would be to run Instagram if a human had to review every single DM and decide whether it was spam.
What is a language model?
A language model is a model that can look at a piece of text and tell you how likely it is. For example, a language model can tell you that the phrase "the sky is blue" is more likely to have been written than "the sky is peanuts."
Why is this useful? You can use language models to generate text by picking letters and words that it gives a high score. Say you have the phrase "I ate a" and you're picking what comes next. You can run through every option, see how likely the language model thinks it is, and pick the best one. For example:
I ate a sandwich: score = .7
I ate a $(iwnJ98: score = .1
I ate a me: score = .2
So we pick "sandwich" and now have the phrase "I ate a sandwich." We can keep doing this process over and over to get more and more text. "I ate a sandwich for lunch today. It was delicious."
What makes a large language model large?
Large language models are large in a few different ways:
Under the hood, they are made of a bunch of numbers called "weights" that describe a monstrously complicated mathematical equation. Large language models have a ton of the weights--as many as tens of billions of them.
Large language models are trained on large amounts of text. This text comes mostly from the internet but also includes books that are out of copyright. This is the source of controversy about them and plagiarism, and I will cover it in greater detail in a future post.
Large language models are a large undertaking: they're expensive and difficult to create and run. This is why you basically only see them coming out of large or well-funded companies like OpenAI, Google, and Facebook. They require an incredible amount of technical expertise and computational resources (computers) to create.
Why are LLMs powerful?
"Generating likely text" is neat and all, but why do we care? Consider this:
An LLM can tell you that:
the text "Hello" is more likely to have been written than "$(iwnJ98"
the text "I ran to the store" is more likely to have been written than "I runned to the store"
the text "the sky is blue" is more likely to have been written than "the sky is green"
Each of them gets us something:
LLMs understand spelling
LLMs understand grammar
LLMs know things about the world
So we now have an infinitely patient robot that we can interact with using natural language and get it to do stuff for us.
Detecting spam: "Is this spam, yes or no? Check out rxpharmcy.ca now for cheap drugs now."
Personal language tutoring: "What is wrong with this sentence? Me gusto gatos."
Copy editing: "I'm not a native English speaker. Can you help me rewrite this email to make sure it sounds professional? 'Hi Akash, I hope...'"
Help learning new subjects: "Why is the sky blue? I'm only in middle school, so please don't make the explanation too complicated."
And countless other things.
2 notes · View notes
algoworks · 1 year
Text
Transform the way we interact with machines and elevate your business with the power of Natural Language Processing! 🤖🗣️🚀
3 notes · View notes
medsocionwheels · 1 year
Text
Build and Interpret a Basic Structural Topic Model in R
New R tutorial available! Follow my 10-step process for estimating and interpreting a basic structural topic model without covariates.
Preview the Tutorial With Sound (slides with commentary) @medsocionwheels Structural topic modeling: my 10 step process for estimating and interpreting a basic structural topic model without covariates in R. Full #tutorial available on medsocionwheels.com! #TopicModeling #NLP #StructuralTopicModel #QuantitativeResearch #QualitativeResearch #ResearchMethods #R #LearnR #CodingTikTok #rstats…
Tumblr media
View On WordPress
2 notes · View notes
Text
python matching with ngrams
# https://pythonprogrammingsnippets.com def get_ngrams(text, n): # split text into n-grams. ngrams = [] for i in range(len(text)-n+1): ngrams.append(text[i:i+n]) return ngrams def compare_strings_ngram_pct(string1, string2, n): # compare two strings based on the percentage of matching n-grams # Split strings into n-grams string1_ngrams = get_ngrams(string1, n) string2_ngrams = get_ngrams(string2, n) # Find the number of matching n-grams matching_ngrams = set(string1_ngrams) & set(string2_ngrams) # Calculate the percentage match percentage_match = (len(matching_ngrams) / len(string1_ngrams)) * 100 return percentage_match def compare_strings_ngram_max_size(string1, string2): # compare two strings based on the maximum matching n-gram size # Split strings into n-grams of varying lengths n = min(len(string1), len(string2)) for i in range(n, 0, -1): string1_ngrams = set(get_ngrams(string1, i)) string2_ngrams = set(get_ngrams(string2, i)) # Find the number of matching n-grams matching_ngrams = string1_ngrams & string2_ngrams if len(matching_ngrams) > 0: # Return the maximum matching n-gram size and break out of the loop return i # If no matching n-grams are found, return 0 return 0 string1 = "hello world" string2 = "hello there" n = 2 # n-gram size # find how much of string 2 matches string 1 based on n-grams percentage_match = compare_strings_ngram_pct(string1, string2, n) print(f"The percentage match is: {percentage_match}%") # find maximum ngram size of matching ngrams max_match_size = compare_strings_ngram_max_size(string1, string2) print(f"The maximum matching n-gram size is: {max_match_size}")
4 notes · View notes
claudigitools · 2 years
Text
Scalenut
Unleash the Power of Tomorrow,TODAY!
Join us in the Mega Launch of our Game-Changing features, that will blow your mind and transform the way you create SEO Content.
Why you should not miss this Webinar:
Mega Launch of Exceptional Features
Master. Learn. Deliver.
Get a Glimpse Into The Future
Achieve Competitive Advantage
Who Should Attend this Webinar:
Start-up Founders
Content Strategists
Agencies
Content Creators
Reserve my SPOT
Most Loved AI-Powered SEO
And Content Marketing Platforms
3 notes · View notes
datascienceunicorn · 2 years
Text
HT @DeepLearningAI_
5 notes · View notes
compling-studies · 1 year
Text
Tumblr media Tumblr media
2023-04-28 • 19/100 days of nlp
some more logistic regression notes to prepare for the fall classes
18 notes · View notes
meelsport · 10 days
Text
How AI is Revolutionizing Voice Search Technology
The Hidden Link Between AI Voice Search and SEO: What You Need to Know AI-powered voice search is revolutionizing how users interact with technology, turning searches into seamless, conversational experiences.
Voice search is transforming how we interact with technology, turning searches into effortless conversations. No more typing—just speak to your device, and AI does the rest. In this blog, we’ll explore the evolution of voice search. We’ll discuss how AI powers it and why businesses must adapt to stay competitive. The Evolution of AI Voice Search Technology AI voice search technology has come a…
0 notes
fuerst-von-plan1 · 14 days
Text
Die Rolle von KI in der effizienten Echtzeit-Datenverarbeitung
In der heutigen digitalen Ära spielt die Echtzeit-Datenverarbeitung eine entscheidende Rolle in verschiedenen Branchen, von Finanzdienstleistungen bis hin zu Gesundheitswesen und Internet der Dinge (IoT). Die enorme Menge an Daten, die kontinuierlich generiert wird, erfordert fortschrittliche Technologien, um relevante Informationen in Echtzeit zu extrahieren und zu analysieren. Künstliche…
0 notes
alim0355 · 19 days
Video
youtube
Latihan Cara Menjual Online
0 notes
jarrodcummerata · 22 days
Text
The Future of Customer Service: How NLP Is Shaping the Industry
Tumblr media
Discover how Natural Language Processing (NLP) is transforming customer service with AquSag Technologies. Our latest blog explores the future of NLP and its impact on customer interactions, including advancements in chatbots and virtual assistants, sentiment analysis, automated ticketing systems, personalization, multilingual support, and enhanced data insights. Learn how NLP is revolutionizing customer service and how AquSag Technologies can help your business leverage these innovations to improve efficiency and customer satisfaction. Explore our insights and see how NLP can elevate your customer service operations.
0 notes