Not logged in.

Quick Search - Contribution

Contribution Details

Type	Dissertation
Scope	Discipline-based scholarship
Title	Generalization and Constraints in Learning Machines
Organization Unit	Department of Informatics (Burkhard Stiller)
Authors	Daniel Christoph Meier
Supervisors	Kurt Bauknecht Rolf Pfeifer
Language	English
Institution	University of Zurich
Faculty	Faculty of Economics, Business Administration and Information Technology
Number of Pages	149
Date	1997
Abstract Text	Computers work in a very precise, tireless, and fast manner. Nevertheless mechanisms by which computers could learn themselves and perform very simple everyday tasks arc virtually unknown. Such mechanisms that allow machines to learn from individual experiences and relate them adequately to other similar situations involve the key issue of how to generalize from past experiences to deal with future situations. Hence, the goal of this thesis is to investigate the laws governing reliability in learning. This document describes a constructive approach to building learning machines for real-world applications, which try to solve many well-studied issues and problems originating from mathematics and real-world computing. The concept presented is that there arc learning machines, which have generally valid structural constraints incorporated in their architecture in order to achieve satisfactory performance for mastering challenging real-world tasks. Such structural constraints arc given and theoretically evaluated by the constructive and consistent generalization theory developed by Vapnik and Chervonenkis (Vapnik 1995b). Based on this theory, the factors governing generalization arc further utilized to classify and compare learning machines that exploit invariants. These include, among the various, discussed learning machines, learning from hints (Abu-Mostafa 1994), estimating the tangent of invariances by “Tangent Prop” (Simard and LeCun 1994), the group-invariance theorem (Minsky 1961), and the hyper-plane point invariance theorem (Nillson 1965). This comparison is conducted from the perspective of including generally valid geometric restrictions, i.c., geometrical structures learned by the learning machine arc adjusted such that they comply with the sensed data. Then two now neural network paradigms Me described: 1. Structurally Constrained Resource-Allocating Neural Networks (SCRAN), and 2. Topological Growing Structures (TGS). Both methods allocate incrementally new neurons and regulate their structure to comply with the sensed or measured data sets. Both rely on geometric constructs founded on sample vectors. The first kind of network belongs to the class of supervised architectures, whereas the second belongs to the class of unsupervised ones. The first algorithm is based on the assumption of continuity and monotony. The assumption applies to samples of one category belonging to the same partition. For such a partition, the monotony continuity constraint is defined such that all points between the center of gravity of a partition and its samples belong to the same partition. This geometric construct is generally concave. It is the base construct of the learning algorithm, which is adjusted by an empirical risk minimization scheme. Furthermore, the complexity of this algorithm is studied by the above-mentioned generalization theory. This and other benchmarking problems corroborate the superior generalization power of SCRANs. The second algorithm that implements learning by constraints is established by constructing a topological projection of the data onto its inherent minimized subspace by a self-organizing unsupervised learning scheme. The projection, motivated by Kohonen’s self-organizing feature maps, is constructed by adapting the number of neurons, their weights spawning up the subspace as well as their interconnection topology. Thus, the tessellation is constructed such that the neuronal weights arc only placed in parts of the state space that rue effectively visited by the problem studied. Another important property of the mapping is that it is distribution preserving. These methods are applied to tasks such as credit worthiness analysis, differentiation between mines and rocks, empirical effective VC-dimension estimation, and phoneme analysis. To conclude, our two network paradigms, namely SCRANs and TGSs, arc compared to the others mentioned above by applying the presented generalization scheme and the principle of topological learning by structural constraints. Then the various issues and problems arc reevaluated from the perspective of the learning machine, thereby adapting its structural constraints and architecture to the contemplated problem domain. This important principle is called Topological Induction, and it is the key principle for learning machines in the real world.
Zusammenfassung	Computer arbeiten sehr genau, unermüdlich und immer schneller. Trotzdem sind die Mechanismen, welche es Computern erlauben einfache Aufgaben in unserer Welt selbst zu erlernen und auszuführen weitgehend unbekannt. In diesem Dokument wird ein konstruktiver Ansatz zum Bau von Lernmaschinen für Echt-Welt-Probleme eingeführt, welcher auf Erkenntnissen der Mathematik und der Maschinenlerntheorie basiert. Die vertretene These besagt, dass die Ausdrücksmöglichkeit oder Kapazität einer Lernmaschine durch allgemein gültige Restriktionen beschränkt werden muss und kann. Diese Restriktionen werden in die Architektur der Lernmaschine direkt eingebaut und müssen in Einklang mit der Struktur des Problems gebracht werden, damit sich die Maschine schnell lernend und zuverlässig arbeitend entwickelt. Solche strukturelle Restriktionen werden definiert und analytisch bewertet durch die Generalisierungstheorie von Wapnik und Tscherwonenkis (Wapnik and Tscherwonenkis 1979) (Vapnik 1995b). Basierend auf den Faktoren, welche aus dieser Theorie hergeleitet wurden, um die Generalisierungsfähigkeit einer Lernmaschine zu bestimmen, wird eine Klassifikation entwickelt. Diese dient dem Vergleich der verschiedenen Lernalgorithmen, wobei die Kriterien bestimmt werden mit denen die Maschinen ihre Zuverlässigkeit optimieren. Die verglichenen Algorithmen sind unter anderem Lernen durch Hinweise (“hints”) (Abu-Mostafa 1994), Abschätzen der Tangenten von Mannigfaltigkeiten der Invarianzklassen durch “Tangent Prop” (Simard and LeCun 1994), das Gruppeninvarianztheorem (Minsky 1961), und das Hyperebenen-Punkte-Invarianztheorem (Nillson 1965). Dieser Vergleich wird immer unter dem Blickwinkel des Einbezugs von allgemein gültigen, geometrischen Restriktionen durchgeführt. Anschliessend werden zwei neue künstliche Neuronale Netzwerktypen vorgeschlagen: 1. struktur-restringierte ressourcen-allozierende Netze (SRRAN) und 2. topologisch wachsende Strukturen (TWS). Beide Architekturen allozieren inkrementell neue Neuronen und regulieren ihre Struktur so, dass sie mit den Daten der Problemstellung zur Übereinstimmung gelangen. Beide Netze konstruieren ihre geometrischen Strukturen auf der Basis von Datenvektoren. Die erste Architektur wird als Lernen mit Unterweisung bezeichnet, wogegen in der zweiten Lernen ohne Unterweisung stattfindet. Der SRRAN-Algorithmus basiert auf Annahmen der Kontinuität und der Monotonie, welche für jede gegebene Partition von Beispielen gültig sein muss. Für solche Partitionen besagt die Monotonie-Kontinuität-Restriktion, dass alle Punkte zwischen dem Schwerpunkt der Partition und ihrer Datenpunkte wieder zur Partition gehören. Diese geometrischen Strukturen stellen im allgemeinen konkave Regionen dar. Sie repräsentieren die Grundstruktur des künstlichen Neuronalen Netzes, so wie etwa Hyperebenen im Adaptiven Linearen Element (Adaline) (Widrow and Lau 1990). Der zweite Algorithmus implementiert Lernen durch Strukturen basierend auf der Konstruktion einer topologischen Projektion der Daten auf ihren inhärenten, minimierten Unterraum durch ein selbstorganisierendes Lernschema. Die Projektion, motiviert durch Kohonen’s selbst-organisierendc Merkmalskarten, wird aufgebaut durch Anpassung der Anzahl Neuronen, ihrer Gewichte wie auch ihrer Konnektivität. Der so aufgespannte Unterraum beschränkt sich auf effektiv besuchte Bereiche des Datenraums. Die so gelernte Karte besitzt die Eigenschaft der Verteilungserhaltung. Die zwei vorgestellten Neuronalen Netze werden unter anderem auf das Problem der Bonitätsbeurteilung, der Unterscheidung zwischen Minen und Felsen bei Sonarmessungen, der empirischen Bestimmung der VG-Dimension sowie der phonetischen Karten angewandt. Zum Abschluss vergleichen wir die Netze mit den eingangs erwähnten, zentralen Problemstellungen induktiven Lernens. Zum Schluss zeigen wir, dass die Arehitektur der Lernmaschine der Struktur des Problems angepasst werden muss. Wir nennen das Einbauen und Entwickeln von Restriktionen durch den Lernalgorithmus in die Architektur der Lernmaschine das Prinzip der topologischen Induktion. Dies ist das zentrale Prinizip für lernende Systeme in unserer Welt.
PDF File	Download
Export	BibTeX EP3 XML (ZORA)