Tumblr shoots itself in the foot with an ego-trip of panopticist surveillance powered by dumb AI. This blog is no longer updated.
Don't wanna be here? Send us removal request.
Text
What is NLU in humans?
Today I was thinking about understanding in terms of information processing, circuits, and activation patterns.
If we say that an average Google search corresponds to 1kJ of energy consumption, can we also think of understanding a sentence for humans in terms of energy consumption? Lexicon, idioms, grammar structures, contextual references, analogies--some instances are harder to parse than others. A single sentence of 12 words may contain many dozens of these patterns that need to be understood. Can we assign a “level of difficulty” to each pattern? Pattern X would be 1.2kJ/mil, that is, the “average” brain consumes 1.2 kilojoules of energy over 1 million encounters of understanding this pattern. But we know that neural networks reshape themselves. It becomes more efficient. Understanding is a lot “easier” for an expert. Then we talk about “level-appropriate materials”, these are rough categories, what if it becomes very granular? Through which methodologies can we make level selection adaptive to almost an individual?
What does it mean to understand the word ‘marmot’? For all individuals who are deemed to be capable of understanding this word, the activation patterns may be different, or they may be very similar when analysed down to principal components. I would guess the subjective, episodic experience of exposure to everything that contributes to the formulation of understanding of this word is different for each individual, so is perhaps how each instance of exposure is processed. But when it converts to semantic understanding, are the patterns materially different?
4 notes
·
View notes
Text
The problem with flashcards is that there isn’t much on the back of the card except a *shorthand*, thus it’s not useful for ingesting the concept for the first time. Even for something low-information density as a concrete noun for some common object--an apple is just an apple, right? Some demonstrations in real context is still useful: can the apple be eaten? Does the apple decompose? You need this running around of ontological exercises to solidly place the idea in a web of plausible combinatorics. Maybe in some natural languages an aeroplane does not fly, it can only be flown, I don’t know. An apple is rarely just an apple. The shorthand is only effective once this ingestion of ontology is largely complete, then upon each instance of repair (seeing the flashcard for active recall) the connection is available through “symbolic transfer”. The shorthand is the access symbol, what it brings to mind is already in your mind.
1 note
·
View note
Text
Meaning as an emergent phenomenon
At my level of ~A2 German, often I encounter a sentence where all the parts of the sentence are “known” words, but I just do not get the meaning of the sentence.
I can think of a corollary example in English, “He would have been let go...”
It’s easy to imagine someone new to English having trouble decoding the meaning. The phrase embeds so many layers of grammatical features:
Inflected “to have been + past participle”
Passive structure
Subjunctive “would have"
Strong forms: “let” being the past participle of “to let”
Plus idiom “to let go”
Strong verb “to let” followed by another verb
These features are not explained at the level of lexicon or morphology.
2 notes
·
View notes
Text
Non-trivial efforts
What does it mean to speak a language?
Let’s consider this question in the context that many language learning products claim to help users achieve such a goal. Actually, forget that context for a moment—very often people are just curious about how much effort is involved to reach the “near-native” level.
Basically, some lasting changes in memory and cognition must occur—my inkling is almost everyone on this topic grossly misunderestimates the profoundness of such changes and the amount of “work” entailed.
Obviously, there are established frameworks for assessing the level (e.g. A1-C2), though I often think about other anecdotal or qualitative criteria…
For example, I consider my English to be “near-native”, here’s a list of observations of what I’m able to do:
Get amused by standup comedy
Read text in intentionally obscure style and register, e.g. academic work, terms and conditions, fine print
Write structured paragraphs of logical arguments or articles that people consider well-written
Find reading entertaining (similar to watching TV)
I’m not able to any of these in German, which I estimate to be at A2/B1 level… Note that I have already spent thousands of hours on German, and more in terms of casual exposure—the point is, attainment of functional abilities is not trivial.
What I’m able to do in German:
Understand 50%+ of what is said or written in general contexts (non-specialised)
Get some sentences in conversational contexts completely
Know approximately 2,000+ words (lemmas)
Understand the phonological structure, know what is plausible and what is not—the same with grammatical features, e.g. endings
Remember lyrics from songs etc.
Now, I’m not able to do most of this in Finnish… What can I do in Finnish?
Order food or beer (no customisation)
Recognise some words
Have a very incomplete understanding of phonology and word structure
Use very basic expressions
Imitation and memorise longer sentences (e.g. “Are your parents happily married?”)
Mind you, I was able to order KFC in Russia, with practically zero hours spent on Russian, except reading Cyrillics… so a lot of this is “cognitive problem-solving” instead of language acquisition—I did take a course on Finnish and listened to some audio courses (estimated hundreds of hours)
So that’s why I’m very sketical towards claims that you can “speak” anything with trivial efforts—this doesn’t mean the process has to take years—you just need a very systematic regimen. What I really want to do, is to “mechanise” this process, an analogous illustration would be: https://www.quora.com/How-many-Chinese-characters-can-a-foreign-language-student-expect-to-learn-in-one-year-How-about-two-years/answer/David-Rosson
Areas that I want to explore:
Vocabulary: Not just 4,000 words, but 20,000 words, 50,000 words
Usage: Phrases; idioms and expressions; nuanced semantic gradients
Morphology: Word formation; conjugations and declensions
Sentence structures and patterns; “combinatorics”
Lots of graded reading: news, discussions, wikis
A tutor for intensive dialogue training
0 notes
Text
Formula-based Lexicography
Vocabulary lists often deal with target words and their meanings as one-to-one translations in gloss form. For example:
vertreiben ⇒ expel
This sometimes results in out-of-context erros, e.g. "I'm wearing a *clock" because the item-to-item mapping really doesn't offer more information -- the learner then conflates this tenuous link between the two item with equivalence in semantic values (which are often context-dependent).
The same semantic role may be filled by a single verb, or a phrasal verb, or some long winded expression. For example: "to call someone out on it" may very well translate to a single transitive verb -- just look at how many parts are stuffed into it!
What we want to find, are real, natural, authentic expressions that fulfill such a role, in either language -- rather than two items that happen to follow the same form or lexical category. Sometimes a verb may translate to an idiom.
Some people advocate using example sentences. They are good, especially pithy, gem-like ones. But they are not the minimal unit of demostration. This minimal unit is a "formula".
{aus} [Land, Gebiet] verteiben ⇒ to expel {from} [..., ...]
{sich} (DAT) [die Zeit] {mit} [etw] vertreiben ⇒ to pass [the time] {with} [sth]
Here on each side (for each language) there is a patterned template, that shows several pieces of information in addition to just the 'pivotal' item:
Collocations: 'to dress oneself' or 'to dress a wound' -- the same pivotal item may map onto different words in the other language.
Phrasal components: such as prepositions, which ones to use and where to place them.
Thematic roles and cases: it also forms a template for who's doing what to whom -- and in what inflectional form should each component appear.
I imagine the best way to make such a dictionary (or a vocabulary list) is to start with high-quality, curated corpora, and from there, real n-gram collocation data.
1 note
·
View note
Text
Verb reduction
"Ah, wrong again! It was a knife. But, 'stab' implies the blade was thrust into the victim, whereas this wound was produced by it being hurled into her chest."
Here the word 'to hurl' is probably a low frequency word, to make the sentence easier to understand, the contrasted keywords may as well be replaced by words of higher frequency, for example, 'pushed' and 'thrown'.
Now, if we analysed the elements within the semantic value:
to hurl = to throw + with force
If we drop the adverbial specification, it becomes a hypernym, a "more general" verb, a higher level or "easier" word.
In this particular case, it also happens that 'hurled' is regular and 'thrown' is irregular -- we say that the higher verb is "stronger".
If we make the verbs stronger still, both 'thrust->push' and 'hurl->throw' can be replaced by 'put'. Now we arrive at a level of what I would call "very strong verbs", which I imagine are the "essential slots" in almost every language, each to be filled by a common verb, phrasal verb, or some expression.
Just off random musing, I imagine they would fill these categories:
ATTRIBUTION: have (possess), exist, be at (location), be called (name)
CAUSATION: make (create, prepare), make (cause, transform, enforce), look like (appearance), undertake/undergo, try to
VOLITION: want, like (pleased), hope/wish, agree/accept/allow, be allowed, advocate, attack/reject, care/mind/be concerned about, command, promise
PROGRESS: go, come; come back (return), give back; move (self), move (object); start, stop/end, wait for, wait (inaction); continue, keep (object)
COGNITION: know, understand, consider, think/feel (opinion); keep in mind, recall, choose/decide/judge, intend/plan to; feel (emotion), believe
SENSES: look/behold, see (detect), listen to, sample (taste), touch/feel up
CUSTOMS: eat, drink, sleep, sit, walk, meet, read, write, buy, reside
TRANSACTION: fill/fit, obtain/collect, look for (search), take away, take with, give/pass to, profer, lose/let go, put/leave, hide, tell (notify), demonstrate
[Some of these are reflective pairs]
Apart from these, some languages would have modals. For an ideal (for example, a constructed proposal), it should have these categories along three axes and varying along some degree or gradiant:
be bound to -> “We must/have to do that.” benefit from -> “I need some chocolate.” be pressed to -> “I should go now.” be emphatically -> “It IS!” “I WILL.”
have a strong desire for -> “I want some chocolate!” have the willingness to -> “I could/would do that.” expect enjoyment from -> “I could have some coffee.” request politely -> “Would you …?” “Could you…?”
have the ability to -> “Can you swim?” have the possibility to -> “It might rain today.” present likelihood of -> “That must be it.” “It will rain.” be allowed to -> “You may eat the chocolate now.”
1 note
·
View note
Text
Phylogenic Components of Language
Ritualised reference (lexicon)
Combinatorial encoding (phonology)
Central coherence (semantics: Gestalt meaning)
Procedural coordination (syntax)
1 note
·
View note
Text
More thoughts on central coherence
Central coherence = priming activation?
Consider these examples:
rise up rise down* fall down fall up* come down come up come here come there*
The semantic implications inherent of a word impose constraints that spread onto the neighbouring environment. When we consider a verb in a second language, we can think about whether it implies motion, direction, change of state, agency and so on.
Chairs come in different shapes; birds fly but balls and time also fly. This is the native or folk ontology that differentiates one nuanced concept from another. There is a lot of detail that may not be immediately obvious.
If meaning were to be derived from mere association rather than the aggregate result of activation, "going to bed" would have to components "walking up to" and the destination object "the bed". Only when the semantic network of "what a bed is" (which can range from a mattress to a pile of hay, as Plato would say) becomes activated (and proceduralised through idiomatic use) -- we can arrive at the actual meaning of "going to sleep".
In weak central coherence, the symbolic, encoded representation; the subject can still memorise a sequence such as 'cat' or 'orange', very much the same way for memorising a phone number. There is the item-to-item association, but very limited spreading. The associative mapping of meaning (on an isolating basis) is present, but there isn't so much of the generalised "induction of meaning" that allows for fast and robust comprehension.
0 notes
Text
De/Re-generative Grammar
This is the notion that instances of actual language use eminate from Platonian representations, of abstract models of the language, and go through a process of sytematic decay, distortion, or realisation when they are produced.
Running speech is a de/re-generated product of idealised speech. Colloquial grammars are de/re-generated from formal models of expression. Hence it's difficult for learners to faithfully reproduce authentic expressions through mere imitation, because the output itself is the product of a process of decay -- they haven't got the original, and they cannot let it decay the right way.
An analogy is apple juice, you get it from apples, that's fairly simple. But to concoct an artificial flavour that tries to imitate apple juice, is hard.
0 notes
Text
I always thought the Germans say 'den' like 'din', now I know I'm not alone.
Harding, S., & Meyer, G. (2003). Changes in the perception of synthetic nasal consonants as a result of vowel formant manipulations. Speech Communication, 39(3), 173-189.
"The nasal prototypes /m/ and /n/ were used in all experiments, together with a range of preceding vowels differing only in the frequency and transitions of their second formant (F2). When no explicit transitions were present between the vowel and nasal, the perception of each nasal changed from /m/ to /n/ as the vowel F2 increased. Introducing explicit formant transitions removed this effect, and listeners heard the appropriate percepts for each nasal prototype. However, if the transition and the nasal prototype were inconsistent, the percept was determined by the transition alone. In each experiment, therefore, the target frequency of the vowel F2 transition into the nasal consonant determined the percept, taking precedence over the formant structure of the nasal prototype."
0 notes
Text
The genesis of language: a summary based on Arbib's talk
1.1 Extracting meaning: example of visual processing, from edge detection to thematic analysis -- feature extraction and contextual probabilities -- snapped onto a schema of recognition.
1.2 Central coherence: from features to themes, with flexibility and tolerance for variations and noise => robust reduction.
1.3 Abstract representations: ability to generalise => robust induction.

2. The repertoire of manual operations: "reach -> grip -> retrieve" => a mental store of available options: sequential actions towards proximal and ultimate goals. See: Alstermark et al. (1981).
3. Mirror neurons: registering operations without performing them, i.e. a mental representation of actions/movement/gestures in others.
4. Implications for fitness: imitation, transmission of skills; competitive advantage in anticipating others' moves; empathy or theory of mind.
5. Ritualisation: the evolution and emergence of bodily signals -- the ability to achieve a function (e.g. determine hierarchy) without performing the full sequence of available actions (e.g. fighting to death).

6. Now the picture is almost complete:
Linking actions to meaning -> performable actions serving a goal.
Registering actions (gestures) -> mirrored recognition.
From meaning to gesture -> ritualisation.
Robustness in recognition -> allows abstraction.
7. Now the gesture or symbol referring to a meaning or idea can be far removed from the original sequence.
For example, when you pull out your smart phone, and “dial” a number by touching the screen. The gestures with which you communicate with the computer are really many steps away from the etymology, there's no dial and you are not really dialing anything -- except you are performing an action signified by such a word.

And that essentially what a lexis allows you to do: representing ideas using abstract symbols that are far removed from the original action sequence or quality or thing or even its associated pentomimes.
Actually the above only goes to the level of bonobos on lexigrams, that's only about one third of the story. The second step is to explain how speech is basically "audible gestures", and how a combinatorial encoding system takes over -- along the expansion of lexicon (Acredolo & Goodwyn, 1985; Capirci et al., 1996; Butcher, 2000; Iverson & Goldin-Meadow, 2005) where it goes from one-word to one-word-plus (gesture) to two-word. See also: Anisfeld, M., Rosenberg, E. S., Hoberman, M. J., & Gasparini, D. (1998). Lexical acceleration coincides with the onset of combinatorial speech. First Language, 18(53), 165-184.
Then the third part is explaining the emergence of generative grammar... a rule-based system for planning and executing sequences. Perhaps see: Fitch, W. T. (2011). The evolution of syntax: an exaptationist perspective. Frontiers in evolutionary neuroscience, 3.

[Video]: in slow-motion, you can see the cat modifying the "tactical positioning" of its footholds as well as various "action components" with high precision in executing a well-coordinated leap sequence.
[Continued]
Saying "Ahhh" can be just another gesture, it's no more "removed" or abstract than clapping hands (which happens to be an audible gesture) -- only that you are "clapping" your vocal folds to make the sound.
Consider these "units of meaning" with no sonorant components and seemingly non-conformative to how English phonology would define a word:
"Psst!"
"Pff..."
"Tsk tsk..."
"Shhh!"
They are closer to "audible gestures" than to lexical items with a re-combinatory encoding scheme (that is, made up by combining and re-arranging phonemes).
This difference in between or threshold is what I alluded to as the "switch" from referential gestures to linguistic phonology. I have two speculations about this:
1. This "phonology module" -- though this module may be psycholinguistically but not neurologically real i.e. it's actually an interplay of various exapted (rather than de novo) sub-systems, as Arbib would say -- emerged at some point of the evolutionary course. And it gave its bearers (our common ancestors) an advantage because the vastly expanded lexical capacity of a combinatorial system.
2. This "module" matures along some point of the developmental course, roughly corresponding to the sharp "kink" or inflection point you see in the vocabulary curve. The child would move from controlled gestures and gesture-like utterances to multiple gestuers and expanded one-word vocabulary and coordinated word-plus-gesture uses, and eventually to a switch onto a phonologically based model.
5 notes
·
View notes
Text
Thoughts of the Day:
Why do diphthongs move along with monophthongs when sound shifts happen? It must be that, the "vowel targets" underlying both categories are actually doing the moving, that is, the sign posts defining the vowel space are shifting, rather than exemplar positions or definitions of individual sounds.
Transcription is a model. It should be useful but needs not be true. Approaching what is true is the work of theories vetted by empirical investigation. The (potential) danger of phonetic realism is that it conflates interpretation with documentation, and applicability with validity.
1 note
·
View note
Text
Questions of the Day:
Assimilation has often been described in terms of how neighbouring segments affect each other -- and the output are thought of as segments with changed features -- what if the products are something else altogether?
To what extend are segments real? We often assume they have abstract mental representations, and each of them holds a bundle of features together -- they are seen as units on which phonological rules operate -- are these units an illusion?
1 note
·
View note
Text
Topics of potential interest:
Multi-word Units on the Frequency List
Effect of Mass Exposure to Citations
Typology: Permutation of Constituents
Measuring Comprehension with Eyetracking
Efficient Sample Test of Vocabulary Size
Memory, Delusions, and Hypnosis
Dance Dance Revolution for Prosody
Platonic Models of Generative Register
Visual Feedback for Vowel Targeting
Neat, Informative Graphs with ggplot2
...
So far it's only Tuesday...
0 notes
Text
The hiddenness of abstraction
From: a discussion about what we can extract from spectrograms.
What make the analysis so confounding and "hidden" is the many layers of conversion from abstraction to realisation. What we can observe and collect are only at the surface, many steps removed from their "top-down" origins. And as Anderson said quite cryptically: "Physical events are notoriously neutral." Phonologists tend to think (and phoneticians and researchers working on speech synthesis have come to realise this too) of speech articulation as a continuous stream of "gestures". It's kind of like interpretive dance, or choreography, where you are trying to convey contrastive meaning using perceivable motion. It's not what was conventionally thought of as a string of idealised "targets" stitched together.
Now imagine what you can capture are images, then you have to solve vision, starting from edge detection and all that, and then the anatomy and stick figures, and then step-by-step you get to the system of movement, and then perhaps to meaning. I would speculate it's notoriously difficult for computers to "understand" interpretive dance. Phonological features and classes (the elements out of which we make speech, and the rules for doing so) are abstract. There used to be an impression that they must have some articulatory and then acoustic targets (as in, "this is the sound to produce"); but apparently no.
People who lost front teeth or just had dental aneasthetics can still speak, and we can still understand them to an extent.
The phoneme /r/ varies so greatly in manner and place, it's very hard to explain their relations through acoustics.
Sign languages have concepts and processes analogous to phonemes, co-articulation, rhyming and so on; only they use hand shapes, postion, movements, and facial gestures instead.
Phonology is realised through anatomy, but not bound by it. You could imagine an alien species with a completely different set of organs as articulators, or imagine sci-fi implants giving us the ability to use very novel gestures -- it just so happened that our ancestors went down the path of utilising the vocal tract. And if you look at cross-linguistic data, even just the vocal tract can have a very diverse range of expressive possibities, some less obvious than others, from clicks to tones to labial protrusion to breathiness to ingressives and more. The spectrogram gives us spectral and temporal resolution, depending on the maths applied to it, we can see the individual beat of the vocal folds, we can see harmonics, and we can see resonance characteristics, and the patterns of acoustic energy. But after all physical events are just pointers to the real thing, like moving shadows cast by hand puppets.

And naturally when we look at these, we interpret them as both physical phenomena and "linguistically interesting cues" that we are searching for. When coding in Praat, for example, you would hear people talk about: voicing, formants, turbulance, closures, "energy droping off", movements, glottalisation, periodicity, and so on -- multiple levels of representation and interpretation (acoustics, anatomic, phonetic) mixed altogether. What we want to extract are not intrinsic in the sound data, therefore they cannot be analysed in isolation. They must be interpreted with reference to a whole range of constraints and predictions from phonetics to phonology, from anatomy to sociolinguistic to pragmatics and beyond. At this point, I have pretty much started to ramble, so I'll just leave it there with some blog posts on the topic:
When we look at acoustic evidence...
Meaning and Psychophysics
1 note
·
View note
Text
When a noun becomes a verb, the logical structure or construction of its semantic value is not always predictable:
to drug to poison to paint (vt.) to water (vt.)
to paint (vi.) to water(vi.)
to fire
to fish
to worm to weed
1 note
·
View note