Friday, June 13, 2025

Anyone who has had to deal with issues of polysemy and the semantic alignment of legacy databases knows that what Quine called the radical indeterminacy of translation is more than an academic issue:  but at least where human intracultural communication is concerned, and even to some degree regarding intercultural communication, the real problem of indeterminacy of translation may be why it isn’t even more of a problem.  Four considerations seem to me to be especially salient.

  1. Language use and cognition are different things.  Cognition is vastly older than language:  it is based on discriminating high-level features and doing things with those discriminations that promote environmental success. Arguably, many types of single-celled organisms are already harboring versions of this.  Language, in contrast, evolved in a hominin or proto-hominin societal setting to transmit encodings of the upshots of discrimination and the stages of cognition to other individuals in the community, in order to support what we might fairly call 'chain of thought' evaluation and refinement in a social setting.  Perhaps only in a final stage did chain-of-thought evaluation start getting used re-entrantly in individual psychologies to review and hopefully improve on the individual's own thought processes.
  2. It's hardly surprising that discrimination and purely discrimination-based, alinguistic cognition in different humans should involve similar if not contextually identical feature discriminations.  That's because it's implicit in the ontogeny of the individual brain, which in turn reflects the phylogeny of the species, which evolved having to repeatedly solve the same environmentally imposed categories of problems.  This baseline similarity is a necessary condition for language being possible at all.
  3. Point 2 is necessary for language, but not sufficient; it compels the question of why, modulo some close-but-no-cigar communication systems in other species, we don't see grammatical language happening anywhere else in nature.  Here, I strongly suspect the key lies in those genetic bottlenecks that we now know happened early in our history, one of which we strongly suspect came close to resulting in the extinction of our species.  In short, the explanation is that we're considerably inbred, with the result that our respective neural ontogenies, architectures, and functions aren't merely similar but more nearly identical than obtains for any other vertebrate.  This gives the background context on why the mutations that enabled language were actually supportive reproductive success.  For any two individuals of our species, the odds were overwhelmingly in favor of there being enough points in common between their fundamental cognitive processes to support probably-approximately-correct exchange of encodings of those processes.
  4. Like many other signalling systems in biology, human language is extremely flexible, with extreme ‘canalization’ of function:  i.e., there are many different ways of conveying essentially the same thing.  In some degree, imprecision can be an advantage, when linguistic contextual understanding on the part of the audience can be counted on to elide any mismatch that might otherwise occur; indeed, it seems plausible that our extreme efficiency at internalizing the subsymbolic tropisms of ‘language games’ in the Wittgensteinian sense evolved precisely because of this consideration.  The cautionary note is that there might be a scalability limit: one that is reached when science and especially data science become a part of culture.  Communicating, and, especially, successfully applying scientific verification and falsification criteria demands rigid and extensionally static categories. However, the more exacting categorical definitions become, the greater the likelihood of misunderstanding if these are not explicitly and carefully articulated - and such precision in explication and the disputation it can give rise to carries its own information-theoretic cost.  In short, it seems reasonable to think that it is precisely when we transition from the realm of unstructured to structured data that ‘probably approximately correct’ ceases to be adequate for linguistic communication and the broad semantic bandwidth that is the great strength of natural language in turn becomes a weakness.  On a related note, it may be no coincidence that the technicality of the reasoning required appears to be a primary factor imposing significant limit on the performance of large language models.

Hopefully, these reflections may be of some use in better understanding the problems inherent in data federation and the design of artificial reasoners that are intended to assist with it.

0 Comments:

Post a Comment

<< Home