#NLProc
Explore tagged Tumblr posts
Text
We believe in the impact of exceptional content. Showcase your finest work, inspire others, and contribute to the enrichment of Canada's digital sphere. Together, we can elevate the standard of online publications and inspire a new wave of thought-provoking narratives.

#TheBadassReborn#NLProc#GoodPeopleToBankWith#AmritMahotsav#PowerOfUnity#ParsiNewYear#NavrozMubarak#guestblogpostingservice#guestbloggingsitesincanada
0 notes
Text
PhD Researcher Charles Univeristy, Institute of Formal and Applied Linguistics We are looking for a Ph.D. candidate to join us in Prague to do research on multilingual #NLProc and multilingual language models. See the full job description on jobRxiv: https://jobrxiv.org/job/charles-univeristy-institute-of-formal-and-applied-linguistics-27778-phd-researcher/?feed_id=60694 #ScienceJobs #hiring #research Prague #Czechia #PhDStudent
0 notes
Text
A lg model that generates coherent parodies of song lyrics given a topic and the lyrics of a song
Call it Weird AI (with an i, not L)
45 notes
·
View notes
Text
On SHRDLU
Terry Winograd is a renowned scientist from Stanford. In his youth, Terry became famous for creating the SHRDLU program. SHRDLU was the part of his dissertation at the MIT AI Lab — a virtual environment control system using commands in plain English. The screen displayed a field containing different objects (cubes, balls, cones, ...), and the operator could give commands like "take the big ball" and "put it on the blue cube". The system answered questions about blocks locations, processed movement commands, tolerably dealt with resolving anaphoras in context, answered questions about previous actions, and worked out basic object physics. (SHRDLU can be considered the first example of interactive fiction.) Terry published all this magic in the late sixties, which caused a hype around research in the field AI and NLP, which later turned into some disappointment. It was largely due to the fact that in SHRDLU the environment parameters (types of objects, their characteristics, command dictionary) were so well chosen that, on the one hand, the system worked good with the reduced task space, and on the other hand, it was perceived as universal, but all attempts to somehow develop it and do something more useful than pyramids on cubes have failed. The first AI winter began almost with this, and Winograd himself, disappointed, switched to the research of human-computer interfaces in a broader sense.
The SHRDLU code was written in Micro-Planner, implemented on MacLisp for a DEC PDP-6 computer running ITS OS. That Planner language was developed by the same Winograd and his colleagues and was a mixture of Lisp with some ideas that were later used in Prolog. (The success of SHRDLU was the basis for the spread of Planner and eventually had some influence on the authors of Prolog.) In addition, as was customary in those times, SHRDLU was self-modifying, i.e. while processing an input, it could rewrite and restart its own code. Finally, in a classic nocturnal frenzy before submitting the dissertation, Terry had to make several patches directly into the machine code of his Lisp interpreter. As a result, it turned out that the SHRDLU code could only be executed by this modified binary, for which there was no source code. This did not bother anyone, and the program was distributed among the people in more than a dozen copies (remember, there were only 23 PDP-6 computers in the world). However, as the ITS operating system was updated, the compatibility of this modified Lisp binary with it was gradually lost, and according to the recollections of Dave MacDonald (student of Terry Winograd), after a few years SHRDLU completely stopped working — the source code was preserved, a transcript of the dialogue with the program used in the thesis ("SHRDLU demo") also was preserved, a video of the working SHRDLU does exist, but there was no OS + hardware to run it. Since then, people have been trying from time to time to restore the algorithm to work, there are many clones quite similar to the original, but none of them gives exactly the same answers that are given in the SHRDLU demo.
As for the name, Terry writes he wanted to choose a "fake" abbreviation, and the first thing that came to mind was the word SHRDLU, which he met in his youth in magazines like Mad Magazine, where it was used to denote nonsense. This tradition came from the times when linotype typesetters filled in erroneous lines with these letters — there was no backspace on the linotype, and, having made a mistake, it was easier to fill the line to the end with garbage so that a proofreader would then reject it; sometimes, however, proofreaders missed such lines and SHRDLUSHRDLUSHRDLU was printed. Why exactly these letters? Keys on the linotype were arranged by order of frequency in English, the first row was ETAOIN and the second was SHRDLU. That's it.
1 note
·
View note
Text
My last blog post was on 3rd July 17, it has been a long time since I have posted anything here. I do have 5 posts in my drafts that I never got around completing over the last year, but I hope to publish them soon. On June 17 I moved to Bengaluru and since then I have been busy trying out a variety of things but now I think I do have a clear idea of the things that I want to keep doing in the future and this is me compiling them here.
ADAM Question Answering System
What has been done?
ADAM featured on spaCy Universe
Since January I have been working on ADAM Qas project (5hirish/adam_qas), focusing on improving some basic modules. The project previously relied upon the official Wikipedia APIs Python wrapper for information fetching and scrapping and I wasn’t satisfied with it very much. It had some issues with searching ambiguous terms and the project required more fine-grained information scrapping from the articles. So I removed the Wikipedia python module and implemented its APIs to search terms and fetch the articles. Later with the help of XPath queries, I would extract and separate all the structured information from the unstructured information thus allowing me to store the tabular data, information cards in the articles as JSON objects. I am planning on writing a separate blog post on this and will dwell on its implementation there.
Next thing was that the project lacked a good storage strategy and hence I choose Elasticsearch to store the extracted structured and unstructured data. Working on Elasticsearch was a lot of fun and I did mess up the mappings a lot of times during mapping updates. I also had to use a custom language analyzer since the default one did not perform well with stemming, I wanted a stricter stemmer, more on this discussed in Github Issue #20 Elasticsearch with custom English analyzer.
Using Elastic search enabled me to improve and fine tune my search query with negations and boolean operators and it also takes care of the scoring and ranking. More on this here Github Issue #19 Elasticsearch Full-Text search strategies.
Whats in the pipeline?
The project is still very immature but it has a lot of potentials, and the community on Github is starting to notice it and I think it would be better if I discuss whats next in the pipeline for the project and what is the current state of the project. I think it is very important to have full transparency in an Open Source project for it to thrive.
The most pressing issue right now is to query the structured data stored in Elasticsearch. This structured data usually has answers to factual questions, I have stumbled across JMESPath a query language for JSON and it seems quite promising so as soon as I have bandwidth I will be picking it up.
Other things that I have planned are, working on improving the answer extraction, I am not satisfied with Vector Space model and I am looking at some other alternative. Also, to replace rule-based query constructor with an unsupervised one which would also allow us to benefit from the Question’s class/category.
Tweet Scrapper
In April I discovered Kenneth Reitz’s repo, twitter scrapper, but unfortunately, it was Python 3.6+ supported as it depended on requests_html module. So, I decided to build my own twitter scrapper purely using XPath queries and release it on PyPi. It extracts the tweet with all of its meta-data, mentions, hashtags, external links. I have also added a few examples using Jupyter notebooks like Tweet generator using Markov Chain and Gensim Topic Modeling using Latent Dirichlet Allocation model, to demonstrate the usage of this module. You can check out the repo at 5hirish/tweet_scrapper.
pip install tweetscrape
My perspective on Django and Flask
In the last year, I had the chance to work with Django for a project and this year I am working with Flask. I was very much dismissive of Flask and worshiped Django (exaggerating) until a few months ago when a friend of mine convinced me to give a try for at least a day. And after a week of playing around with Flask, I have learned to keep an open mind toward things. I loved that Flask was lightweight, very easy to learn, gives you a lot of freedom, and you are working on building your APIs the next minute you set up the project. I feel like setting up a Django project is like a ritual and I feel guilty if I skip on setting up an unnecessary thing. Most of the things in Django are already set up and ready to go hence giving you a very small space to customize. Both framework’s usage really depends on your use case and comparing them toe-toe is just pointless.
Other stuff
I have been working on a personal project of mine and hope to release a Developer Preview by November, so most of my efforts are dedicated towards it, but I’ll try my best to blog more often and squeeze ADAM Qas’s development in between. I am very much excited about this project and will update here when its out.
Recently I have also picked up a new hobby- trekking and I am loving it a lot and I am trying to accommodate a trek once a month.
This slideshow requires JavaScript.
Let's catch up - It has been a while... talking about ADAM project's future, new Python library, Flask and my new hobby...! My last blog post was on 3rd July 17, it has been a long time since I have posted anything here.
0 notes
Text
Аляксей Северын. "Нейронными сетями не занимается только ленивый", ч. 2/2

Гэта другая частка гутаркі з Аляксеем Северыным. Першую можна знайсці тут.
Што вы выкарыстоўваецце зараз для задачы па summarization?
Мы используем machine learning (ML), нейронные сети тоже. Мне кажется, что об этом на wired есть статья. Я не могу говорить детали, помимо того, что уже публично известно.
Давай падумаем пра будучыню. Што чакае human-computer interface? Якія змяненні будуць?
Все больше и больше end-to-end систем, neural-based. Они будут вообще полностью заменять огромные массивные компоненты, например, machine translation. Изначально это была очень сложная система с большим compressing компонентом, с pipelines. И сейчас Google анонсировал, что уже несколько языковых пар, например, Chinese-English – это просто end-to-end система machine translation. Тоже самое будет происходить в других областях, speech recognition и question answering. На сегодняшний день в академии это есть, но момент, когда это все окончательно сформируется в production, пока еще неясeн.
Год, два, тры?
Я думаю, что в ближайший год, но это будет инкрементно. Мое мнение, что через два года большие куски сложных систем, где используется machine learning, станут просто одной большой нейронной сетью. Все неизбежно движется в этом направлении.
Адна з мэтаў суполкі – развіваць цікаўнасць да вобласці. Таму, што можаш пажадаць маладым даследчыкам?
Смотря какие цели. Они могут быть разные: построить карьеру, найти хорошую работу, продвинуться в своем исследовании...
Напрыклад, давай уявім сабе маладога даследчыка, які толькі пачаў развівацца ў накірунку задачы question answering. У яго сomputer science background пасля беларускага ўніверсітэта. Што яму рабіць?
Путь не такой быстрый. Человеку, который находится на одном уровне, сложно посмотреть назад и по полочкам разложить. Мне кажется, что я даже ни одной книги не прочитал полностью. Когда занимаешься исследованием, у тебя просто нет такой возможности, как сесть и прочитать полностью книгу.
Я читал только научные статьи. Мой совет: если прочитал статью и понял, что очень нравится идея, то сразу же нужно сесть и воспроизвести ее в коде. Сейчас задача упрощается, потому что стало очень популярно релизить код. Когда я учился, около 5 лет назад, это было не настолько распространено - код публиковался в порядке исключения. На сегодняшний день релиз кода - это признак хорошего тона. Кстати, у меня есть очень хороший совет для начинающих - участвовать в соревнованиях на Kaggle. В свое время я достаточно сильно прокачался, даже в какой-то момент на 26 место "заполз", что достаточно круто. Меня это многому научило. Т.е. надо уметь работать руками, потому что самые лучшие идеи приходят в процессе практики.
У исследователя очень хорошо должна быть развита интуиция, тогда действительно могут прийти какие-то дельные идеи. Для того, чтобы понять недостатки или ограничения какой-то модели, в первую очередь, нужно с ней поработать практически. На мой взгляд, очень много внимания уделено созданию различных архитектур и моделей, но что касается nlp, то здесь очень большое количество задач сводится к sequence prediction, например, как в machine translation. То есть способы, которые позволяют генерировать structured output, что выражается, например, в использовании правильной loss-function. Сегодня большинство использует cross-entropy loss, т.е. оптимизируют, там где сеть выдает по одному слову за раз. Но есть достаточно большое количество статей, где используется reinforcement learning, CRF Objective function поверх всего. Это все помогает, все полезно, потому очень важно понимать, как всё работает.

Frameworks?
Я работаю с TensorFlow сейчас, когда учился - Theano. На сегодняшний день, я считаю, что Theano пока самый удачный framework, причем простой в использовании. И сообщество породило много надстроек, напр. Lasagne, Keras. Когда я начинал в своё время - я писал все с нуля.
Сейчас нейронными сетями не занимается только ленивый.
Дзякую за цікавую размову. Спасибо.
Images’ sources [1], [2]
Author: @yauhen_minsk for @nlprocby
0 notes
Photo

KirkDBorne https://twitter.com/KirkDBorne/status/1635743099644157956 https://t.co/m6xtTyEAdK March 15, 2023 at 05:42AM
#Statistics and Machine Learning #infographics from @AbacusAI Learn about their complete #MLOps Platform at https://t.co/m6xtTyEAdK ————— #Automation #BigData #DataScience #AI #MachineLearning #DeepLearning #DataScientists #ML #NeuralNetworks #NLProc #EnterpriseAI #AIStrategy https://t.co/XrukTssur7
— Kirk Borne (@KirkDBorne) Mar 14, 2023
0 notes
Text
Tweeted
RT @Sheraj99: #DataScience Tools #AI #SQL #Analytics #Python #Rstats #Reactjs #IIoT #ML #Linux #Serverless #flutter #javascript #TensorFlow #BigData #CloudComputing #Codenewbie #programming #Coding #100DaysOfCode #Jupyter #PyTorch #DL #R #DataScientists #DataAnalytics #NLProc #CyberSecurity https://t.co/bt4cr0mxXo
— Dennis Patel (@ITMob) Feb 2, 2022
0 notes
Text
Haystack 1.0 – open-source NLP framework to build NLProc back end applications
https://github.com/deepset-ai/haystack Comments
0 notes
Text
Join us in revolutionizing the Canadian guest posting landscape! Unleash your creativity, share your insights, and make your mark on the digital world.

#CanadianGuestPosting#ContentConnections#mapleleafwords#BeyondFast#tuesdayvibe#NLProc#lazyweb#canadianguestpostingservices#guestpostingsitescanada
0 notes
Link
An interesting article about the social process of creating machine translation datasets in Khoekhoegowab. Excerpt:
On the surface, Wilhelmina Ndapewa Onyothi Nekoto and Elfriede Gowases seem like a mismatched pair. Nekoto is a 26-year-old data scientist. Gowases is a retired English teacher in her late 60s. Nekoto, who used to play rugby in Namibia’s national league, stands about a head taller than Gowases, who is short and slight. Like nearly half of Namibians, Nekoto speaks Oshiwambo, while Gowases is one of the country’s roughly 200,000 native speakers of Khoekhoegowab.
But the women grew close over a series of working visits starting last October. At Gowases’s home, they translated sentences from Khoekhoegowab to English. Each sentence pair became another entry in a budding database of translations, which Nekoto hopes will one day power AI tools that can automatically translate between Namibia’s languages, bolstering communication and commerce within the country.
“If we can design applications that are able to translate what we’re saying in real time,” Nekoto says, “then that’s one step closer toward economic [development].” That’s one of the goals of the Masakhane project, which organizes natural language processing researchers like Nekoto to work on low-resource African languages.
Compiling a dataset to train an AI model is often a dry, technical task. But Nekoto’s self-driven project, rooted in hours of close conversation with Gowases, is anything but. Each datapoint contains fragments of cultural knowledge preserved in the stories, songs, and recipes that Gowases has translated. This information is as crucial for the success of a machine translation algorithm as the grammar and syntax embedded in the training data.
Read the whole thing.
#linguistics#khoekhoegowab#namibia#khoekhoe#masakhane#natural language processing#nlp#nlproc#machine translation#corpora#parallel corpora#ai#data#low-resource languages#low resource languages#digitally disadvantaged languages#data science
169 notes
·
View notes
Text
A tweet
Facebook AI NLP is hiring research interns for next year (summer 2021). If you are pursuing your PhD in any NLP area, and have published in top-tier ML and NLP conferences, come and work with our talented scientists and engineers. Send me a pm or comment here!#NLProc
— Yashar Mehdad (@YasharMehdad) October 22, 2020
0 notes
Photo

"Results of the 2020 #NLProc Industry Survey led by Ben Lorica, Ph.D. @JohnSnowLabs @GradientFlowR is now available!⠀ ⠀ #NLP budgets are growing, discover the tools & cloud platforms people use, their challenges & best use cases. Learn how NLP is used now ↓ https://bit.ly/33AStj3 ⊕ [ Link on Bio ] ⊕"
0 notes
Link
The subtle art of #chatbot development — Client Requirements versus Client Expectations: https://t.co/gCxm9t86JI —————#DevOps #MLOps #AIOps #BigData #DataScience #MachineLearning #AI #ConversationalAI #NLProc #NLU #NLG #abdsc pic.twitter.com/fY6WB5ggkw
— Kirk Borne (@KirkDBorne) May 18, 2020
via: https://ift.tt/1GAs5mb
0 notes
Photo
KirkDBorne https://twitter.com/KirkDBorne/status/1604162615571148801 https://t.co/RkVwrXf7EA December 18, 2022 at 02:12AM
[Download FREE 306-page PDF] #BigData, #DataMining, and #Analytics — Components of Strategic Decision-Making: https://t.co/RkVwrXf7EA ——————— #DataAnalytics #DataScience #BI #MachineLearning #AI #DataStrategy #AnalyticsStrategy #DataLeadership #StreamAnalytics #NLProc #TextMining https://t.co/Owh1x50PXi
— Kirk Borne (@KirkDBorne) Dec 17, 2022
0 notes
Photo
RT @gdm3000: Amazing!! Deep Learning-based NLP techniques are going to revolutionize the way we write software. Here's Deep TabNine, a GPT-2 model trained on around 2 million files from GitHub. Details at https://t.co/8J2v8Ns7n4 #nlproc https://t.co/VWB4XsyD4T
0 notes