flreader – Dateibrowser

Inhalt

Aktueller Ordner: ARS_ExplainableAI
⬅ Übergeordnet
ARS_Deep_Eng.tex

% English Version
\documentclass[12pt,a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{amsmath,amssymb}
\usepackage{graphicx}
\usepackage{xcolor}
\usepackage{hyperref}
\usepackage{geometry}
\geometry{a4paper, left=3cm, right=3cm, top=3cm, bottom=3cm}
\usepackage{setspace}
\onehalfspacing
\usepackage{parskip}
\usepackage[english]{babel}
\usepackage{csquotes}
\usepackage{microtype}
\usepackage{booktabs}
\usepackage{longtable}
\usepackage{array}
\usepackage{listings}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{float}
\usepackage{url}
\usepackage{natbib}
\usepackage{titling}

\lstset{
  language=Python,
  basicstyle=\ttfamily\small,
  keywordstyle=\color{blue},
  commentstyle=\color{green!40!black},
  stringstyle=\color{red},
  showstringspaces=false,
  numbers=left,
  numberstyle=\tiny,
  numbersep=5pt,
  breaklines=true,
  frame=single,
  backgroundcolor=\color{gray!5},
  tabsize=2,
  captionpos=b
}

\title{\Huge\textbf{From Scheme to DeepProbLog} \\[2mm]
       \LARGE ARS as a Methodological Blueprint \\[2mm]
       \LARGE for Modern Neuro-Symbolic Programming}

\author{
  \large
  \begin{tabular}{c}
    Paul Koop
  \end{tabular}
}

\date{\large 1994--2026}

\begin{document}

\maketitle

\begin{abstract}
This paper traces the methodological continuity from early implementations of 
the Algorithmic Recursive Sequence Analysis (ARS) in Scheme, Pascal, and Lisp 
(1992--1994) to contemporary neuro-symbolic programming frameworks such as 
DeepProbLog (2018). I argue that the ARS already embodied the core principles 
of neuro-symbolic integration—pattern recognition (System 1), rule-based 
reasoning (System 2), probabilistic uncertainty quantification, and explainability 
by design—decades before the term "neuro-symbolic AI" was coined. The paper 
first reconstructs the ARS's proto-neuro-symbolic architecture, then introduces 
DeepProbLog as a modern framework that implements similar principles with 
greater scalability, and finally demonstrates a DeepProbLog implementation of 
the classic ARS sales conversation corpus. The synthesis shows that ARS provides 
a methodological blueprint that DeepProbLog can instantiate technically. I 
conclude with a research agenda for integrating ARS's methodological rigor with 
DeepProbLog's computational power.
\end{abstract}

\newpage
\tableofcontents
\newpage

\section{Introduction: From Lisp to DeepProbLog}

The Algorithmic Recursive Sequence Analysis (ARS), as documented in the early 
Jupyter notebooks and code files from 1992--1994, represents one of the earliest 
systematic attempts to integrate pattern recognition with rule-based reasoning 
in the analysis of sequential social interactions. The three core implementations—

\begin{itemize}
    \item \textbf{Induktor in Scheme}: Inducing probabilistic context-free 
    grammars (PCFG) from terminal symbol strings through transition counting,
    \item \textbf{Parser in Pascal}: Validating the well-formedness of sequences 
    using a chart parser,
    \item \textbf{Transduktor in Lisp}: Generating new sequences from the 
    induced grammar,
\end{itemize}

—collectively embody what today is called \textbf{neuro-symbolic AI}. They combine 
data-driven pattern discovery (the inductor) with symbolic rule application 
(the parser and transducer), and they quantify uncertainty through probabilistic 
weights.

In the intervening three decades, the field has developed more sophisticated 
frameworks for neuro-symbolic integration. One of the most prominent is 
\textbf{DeepProbLog} \citep{manhaeve2018deepproblog}, which extends the 
probabilistic logic programming language ProbLog with neural predicates learned 
by deep networks. DeepProbLog allows users to define symbolic rules with 
probabilities, while neural networks learn the probabilities of ground facts 
from data.

This paper makes three contributions:

\begin{enumerate}
    \item It reconstructs the ARS architecture as a \textbf{proto-neuro-symbolic} 
    system and maps its components to contemporary neuro-symbolic concepts.
    
    \item It introduces DeepProbLog as a modern framework that implements the 
    same principles with greater scalability and neural integration.
    
    \item It presents a \textbf{DeepProbLog implementation} of the classic ARS 
    sales conversation corpus, demonstrating how the ARS methodology can be 
    ported to a modern neuro-symbolic framework.
\end{enumerate}

The overarching thesis is that \textbf{ARS provides a methodological blueprint 
that DeepProbLog can instantiate technically}. The two approaches are not 
competitors but complements: ARS contributes methodological rigor and 
interpretive grounding; DeepProbLog contributes scalability and neural learning.

\section{The ARS Architecture as Proto-Neuro-Symbolic System}

\subsection{Three Components, Three Cognitive Functions}

The ARS's three implementations can be mapped to the System 1 / System 2 
distinction popularized by Kahneman \citep{kahneman2011thinking} and adopted 
by neuro-symbolic AI research \citep{marcus2020next}:

\begin{table}[H]
\centering
\caption{ARS Components as Cognitive Systems}
\label{tab:cognitive}
\begin{tabular}{@{} p{3cm} p{4cm} p{6cm} @{}}
\toprule
\textbf{Component} & \textbf{Cognitive Function} & \textbf{Neuro-Symbolic Mapping} \\
\midrule
Induktor (Scheme) & Pattern recognition, transition counting & System 1 (learning from data) \\
Parser (Pascal) & Structural validation, well-formedness checking & System 2 (rule application) \\
Transduktor (Lisp) & Generative rule application & System 2 (symbolic generation) \\
Multiagent (Python) & Role assignment, interaction & Hybrid (decision tree + grammar) \\
\bottomrule
\end{tabular}
\end{table}

\subsection{The Probabilistic Grammar as a Neuro-Symbolic Interface}

The induced probabilistic context-free grammar (PCFG) serves as the central 
neuro-symbolic interface:

\begin{verbatim}
(KBG -> . VBG)
(VBG -> . KBBd)
(KBBd -> . VBBd)
(VBBd -> . KBA)
(KBA -> . VBA)
(VBA -> . KBBd) (VBA -> . KAE)
(KAE -> . VAE)
(VAE -> . KAE) (VAE -> . KAA)
(KAA -> . VAA)
(VAA -> . KAV)
(KAV -> . VAV)
\end{verbatim}

Each production rule has a probability (implicitly 1.0 in this simplified 
grammar, but weighted by empirical frequencies in the full implementation). 
The grammar is simultaneously:

\begin{itemize}
    \item \textbf{Symbolic}: Rules are explicit, inspectable, and falsifiable.
    \item \textbf{Probabilistic}: Rule applications have probabilities based 
    on empirical frequencies.
    \item \textbf{Generative}: New sequences can be generated by applying rules.
    \item \textbf{Verifiable}: The parser can check whether a sequence is 
    well-formed.
\end{itemize}

These four properties are exactly what contemporary neuro-symbolic frameworks 
aim to achieve.

\subsection{The Multiagent System as Neuro-Symbolic Prototype}

The Python multiagent system (Zellen 29--33 in the notebook) is particularly 
instructive:

\begin{lstlisting}[caption=Multiagent Role Assignment]
# Entscheidung über die Rollenverteilung basierend auf Ware und Zahlungsmittel
if agent_k_ware > agent_v_ware:
    agent_k_role = 'Käufer'
    agent_v_role = 'Verkäufer'
else:
    agent_k_role = 'Verkäufer'
    agent_v_role = 'Käufer'
\end{lstlisting}

This decision tree is a \textbf{symbolic rule} (System 2) that determines agent 
roles based on a simple pattern (System 1: comparing two numbers). The subsequent 
interaction follows the probabilistic grammar. This is a hybrid architecture: 
the role assignment is deterministic and rule-based; the dialogue generation 
is probabilistic and grammar-based.

The ARS thus anticipates the \textbf{Neural | Symbolic} pattern in Kautz's 
taxonomy \citep{kautz2020third}: neural (or heuristic) perception determines 
symbolic roles; symbolic reasoning (the grammar) governs subsequent behavior.

\section{DeepProbLog: A Modern Neuro-Symbolic Framework}

\subsection{What DeepProbLog Is}

DeepProbLog \citep{manhaeve2018deepproblog} extends the probabilistic logic 
programming language ProbLog with \textbf{neural predicates}. A neural predicate 
is a predicate whose truth probability is computed by a neural network. For 
example, a neural predicate `digit(image, d)` might represent the probability 
that an image shows digit `d`.

DeepProbLog programs consist of:

\begin{itemize}
    \item \textbf{Facts}: Ground atoms with probabilities (e.g., `0.5::edge(a,b)`).
    \item \textbf{Rules}: Logical implications (e.g., `path(X,Y) :- edge(X,Y)`).
    \item \textbf{Neural predicates}: Predicates defined by neural networks.
    \item \textbf{Queries}: Questions to be answered probabilistically.
\end{itemize}

Inference in DeepProbLog computes the probability of a query given the program 
and the neural network outputs. Learning updates the neural network weights to 
maximize the likelihood of observed data.

\subsection{Mapping ARS Concepts to DeepProbLog}

\begin{table}[H]
\centering
\caption{Mapping ARS to DeepProbLog}
\label{tab:mapping}
\begin{tabular}{@{} p{4cm} p{4cm} p{4cm} @{}}
\toprule
\textbf{ARS Concept} & \textbf{DeepProbLog Concept} & \textbf{Explanation} \\
\midrule
Terminal symbols & Ground facts & `KBG`, `VBG`, `KBBd`, etc. \\
Production rules & Logical rules & `next(X,Y) :- transition(X,Y)` \\
Transition probabilities & Fact probabilities & `0.8::next(KBG, VBG)` \\
Induktor (transition counting) & Neural predicate learning & Learned from data \\
Parser (well-formedness) & Proof search & Query `next(KBG, VBG)` \\
Transduktor (generation) & Sampling from distribution & `sample(next(Start, X))` \\
Multiagent roles & Probabilistic decision rules & Role assignment with probability \\
\bottomrule
\end{tabular}
\end{table}

\subsection{Why DeepProbLog Is a Natural Successor to ARS}

DeepProbLog preserves the key methodological virtues of ARS:

\begin{enumerate}
    \item \textbf{Explainability}: Rules are explicit and inspectable.
    \item \textbf{Probabilistic uncertainty}: Probabilities quantify uncertainty.
    \item \textbf{Generative capacity}: New sequences can be generated.
    \item \textbf{Verifiability}: Queries can be checked.
\end{enumerate}

But it adds capabilities that ARS lacks:

\begin{enumerate}
    \item \textbf{Neural integration}: Neural networks can learn probabilities 
    from raw data (images, text, audio), not just from pre-coded categories.
    
    \item \textbf{Scalability}: DeepProbLog can handle large datasets through 
    stochastic gradient descent.
    
    \item \textbf{Continuous learning}: The neural network can be updated 
    incrementally as new data arrives.
    
    \item \textbf{Deep feature learning}: Neural networks can automatically 
    discover relevant features, reducing the need for manual category formation.
\end{enumerate}

\section{DeepProbLog Implementation of the ARS Corpus}

\subsection{The Terminal Symbols as Probabilistic Facts}

The first step is to encode the ARS terminal symbols as probabilistic facts. 
The transition probabilities are learned from the corpus:

\begin{lstlisting}[caption=DeepProbLog Encoding of ARS Grammar]
% DeepProbLog implementation of ARS grammar for sales conversations
% Based on the Aachen market transcript (1994)

% Terminal symbols as predicates
predicate(kbg/0). predicate(vbg/0). predicate(kbbd/0). predicate(vbbd/0).
predicate(kba/0). predicate(vba/0). predicate(kae/0). predicate(vae/0).
predicate(kaa/0). predicate(vaa/0). predicate(kav/0). predicate(vav/0).

% Neural predicates for transition probabilities
nn(transition, [in:symbol, out:symbol]) :: neural_network.

% Rules: well-formed sequences follow transitions
% Start symbol is KBG (customer greeting)
next(S) :- transition(start, S).

% Recursive rule for sequences of length > 1
next([A,B|Rest]) :-
    transition(A, B),
    next([B|Rest]).

% Query: probability that a given sequence is well-formed
query(well_formed(Sequence)) :- next(Sequence).

% Generation: sample a well-formed sequence
sample(well_formed(S)) :- next(S).
\end{lstlisting}

\subsection{Learning Transition Probabilities from Data}

The neural network for transition probabilities can be trained on the terminal 
symbol sequences extracted from the ARS corpus. The corpus is:

\begin{verbatim}
KBG VBG KBBd VBBd KBA VBA KBBd VBBd KBA VBA KAE VAE KAE VAE KAA VAA KAV VAV
\end{verbatim}

In DeepProbLog, we can encode this as training data:

\begin{lstlisting}[caption=Training Data Encoding]
% Training examples: observed transitions
train(transition(kbg, vbg), true).
train(transition(vbg, kbbd), true).
train(transition(kbbd, vbbd), true).
train(transition(vbbd, kba), true).
train(transition(kba, vba), true).
train(transition(vba, kbbd), true).
train(transition(vba, kae), true).
train(transition(kae, vae), true).
train(transition(vae, kae), true).
train(transition(vae, kaa), true).
train(transition(kaa, vaa), true).
train(transition(vaa, kav), true).
train(transition(kav, vav), true).

% Negative examples (optional)
train(transition(kbg, kbbd), false).
train(transition(vbg, vbg), false).
\end{lstlisting}

The neural network learns to assign high probabilities to the observed 
transitions and low probabilities to unobserved ones. After training, 
the network approximates the empirical transition frequencies.

\subsection{The Multiagent System in DeepProbLog}

The multiagent system can be implemented as a set of probabilistic rules 
with role assignment:

\begin{lstlisting}[caption=Multiagent System in DeepProbLog]
% Agent roles based on goods and money endowments
% These could be learned by neural networks from data
nn(role, [in:goods, in:money, out:role]) :: role_network.

% Role assignment rule
agent_role(A, buyer) :- goods(A, G), money(A, M), role(G, M, buyer).
agent_role(A, seller) :- goods(A, G), money(A, M), role(G, M, seller).

% Interaction rules based on roles
utterance(A, kb) :- agent_role(A, buyer), start_turn.
utterance(A, vg) :- agent_role(A, seller), previous_utterance(_, kb).

% Grammar-based dialogue continuation
next_utterance(A, Sym) :-
    previous_utterance(_, PrevSym),
    transition(PrevSym, Sym),
    agent_role(A, _).

% Query: probability distribution of next utterance given current state
query(next_utterance(seller, Sym)).
\end{lstlisting}

\subsection{Explainability in DeepProbLog}

A key advantage of DeepProbLog over pure neural networks is \textbf{explainability 
by design}. For any query, DeepProbLog can provide a proof tree:

\begin{lstlisting}[caption=Explainability Output]
| ?- explain(well_formed([KBG, VBG, KBBd, VBBd, KBA])).

Proof:
1. well_formed([KBG, VBG, KBBd, VBBd, KBA])
   ← next([KBG, VBG, KBBd, VBBd, KBA])
2. next([KBG, VBG, KBBd, VBBd, KBA])
   ← transition(KBG, VBG) ∧ next([VBG, KBBd, VBBd, KBA])
3. transition(KBG, VBG) ← neural_network(KBG, VBG) [p = 0.67]
4. next([VBG, KBBd, VBBd, KBA])
   ← transition(VBG, KBBd) ∧ next([KBBd, VBBd, KBA])
5. transition(VBG, KBBd) ← neural_network(VBG, KBBd) [p = 1.00]
   ... (continued)

Probability: 0.67 × 1.00 × 0.67 × ... = 0.42
\end{lstlisting}

This proof tree is directly interpretable and maps precisely to the ARS 
grammar rules. The only difference is that the probabilities are learned 
by a neural network rather than counted manually.

\subsection{Comparison with the Original ARS Implementation}

\begin{table}[H]
\centering
\caption{ARS vs. DeepProbLog Implementation}
\label{tab:comparison}
\begin{tabular}{@{} p{4cm} p{4cm} p{4cm} @{}}
\toprule
\textbf{Criterion} & \textbf{ARS (Scheme/Lisp)} & \textbf{DeepProbLog} \\
\midrule
Probability learning & Manual counting & Neural network learning \\
Rule representation & Association lists & Logical predicates \\
Parsing algorithm & Chart parser (hand-coded) & Proof search (built-in) \\
Generation & Custom transducer & Sampling from distribution \\
Explainability & Traceable via code & Proof trees \\
Scalability & Low (n=8) & High (n > 1000) \\
Neural integration & None & Full (neural predicates) \\
\bottomrule
\end{tabular}
\end{table}

The DeepProbLog implementation preserves the methodological virtues of ARS 
while adding scalability and neural learning. It is not a replacement but 
a \textbf{technical instantiation} of the same methodological principles.

\section{Toward a Synthesis: ARS as Blueprint, DeepProbLog as Engine}

\subsection{What ARS Contributes to DeepProbLog}

The ARS methodology offers three lessons for DeepProbLog practitioners:

\begin{enumerate}
    \item \textbf{Interpretive grounding}: The meaning of symbols must be 
    documented. A DeepProbLog program with uninterpreted symbols is not 
    explanatory. ARS shows how to ground symbols in qualitative interpretation.
    
    \item \textbf{Separation of structure and statistics}: ARS maintains a 
    strict separation between structural rules (deterministic, logical) and 
    statistical regularities (probabilistic, empirical). DeepProbLog's 
    mixture of logical rules and neural probabilities risks conflating these 
    levels. ARS suggests keeping them separate in the program structure.
    
    \item \textbf{Falsifiability as validation}: ARS insists that grammars 
    must be falsifiable by counterexamples. DeepProbLog's validation typically 
    relies on likelihood maximization. ARS suggests supplementing this with 
    qualitative falsification tests.
\end{enumerate}

\subsection{What DeepProbLog Contributes to ARS}

Conversely, DeepProbLog offers three enhancements to ARS practitioners:

\begin{enumerate}
    \item \textbf{Scalable learning}: ARS's manual transition counting does 
    not scale. DeepProbLog's neural learning can handle thousands of examples.
    
    \item \textbf{Raw data integration}: ARS requires pre-coded terminal 
    symbols. DeepProbLog can learn directly from raw data (text, images, 
    audio) through neural predicates.
    
    \item \textbf{Continuous updating}: ARS grammars are static. DeepProbLog 
    networks can be updated incrementally as new data arrives.
\end{enumerate}

\subsection{A Research Agenda for Neuro-Symbolic ARS}

Based on this synthesis, I propose a research agenda:

\begin{enumerate}
    \item \textbf{Port the ARS corpus to DeepProbLog}: Complete the 
    implementation of the sales conversation grammar in DeepProbLog, 
    including all 12 terminal symbols and their transition probabilities.
    
    \item \textbf{Add neural predicates for raw audio/text}: Train neural 
    networks to map raw transcripts directly to terminal symbols, bypassing 
    manual coding.
    
    \item \textbf{Implement the multiagent system}: Build a full multiagent 
    system where agents learn roles and interaction patterns through 
    DeepProbLog.
    
    \item \textbf{Validate with the XAI criteria}: Evaluate the DeepProbLog 
    implementation against the ARS XAI criteria (meaningfulness, accuracy, 
    knowledge limits).
    
    \item \textbf{Scale to larger corpora}: Apply the DeepProbLog ARS to 
    larger datasets (hundreds or thousands of conversations) to test 
    scalability.
\end{enumerate}

\section{Conclusion}

This paper has traced the methodological continuity from early ARS 
implementations in Scheme, Pascal, and Lisp to contemporary neuro-symbolic 
programming in DeepProbLog. I have argued that ARS already embodied the 
core principles of neuro-symbolic integration—pattern recognition, rule-based 
reasoning, probabilistic uncertainty, and explainability by design—decades 
before the term was coined.

The mapping from ARS concepts to DeepProbLog is direct and natural. The 
probabilistic grammar becomes a set of logical rules with neural predicates; 
the parser becomes proof search; the transducer becomes sampling. DeepProbLog 
does not replace ARS but \textit{instantiates} its methodological blueprint 
with modern computational tools.

The synthesis is not a competition but a complement. ARS provides the 
methodological rigor and interpretive grounding that DeepProbLog (and 
neuro-symbolic AI more generally) often lacks. DeepProbLog provides the 
scalability and neural learning that ARS lacks. Together, they point toward 
a \textbf{methodologically grounded, scalable neuro-symbolic framework} for 
the analysis of sequential social interactions.

The question for future research is not whether ARS or DeepProbLog is superior. 
Both are tools for different purposes. The question is how to integrate them 
so that the methodological lessons of ARS inform the technical development 
of DeepProbLog, and the computational power of DeepProbLog extends the reach 
of ARS.

\newpage
\begin{thebibliography}{99}

\bibitem[Kahneman(2011)]{kahneman2011thinking}
Kahneman, D. (2011). \textit{Thinking, Fast and Slow}. Farrar, Straus and Giroux.

\bibitem[Kautz(2020)]{kautz2020third}
Kautz, H. (2020). The third AI summer: AAAI Robert S. Engelmore Memorial Award 
Lecture. \textit{AI Magazine}, 43(1), 93-104.

\bibitem[Koop(1992)]{koop1992parser}
Koop, P. (1992). \textit{Demo-Parser Chart-Parser Version 1.0}. Pascal source code.

\bibitem[Koop(1994)]{koop1994scheme}
Koop, P. (1994). \textit{Grammatikinduktion empirisch gesicherter 
Verkaufsgespräche}. Scheme source code.

\bibitem[Koop(1994)]{koop1994lisp}
Koop, P. (1994). \textit{Sequenzanalyse empirisch gesicherter 
Verkaufsgespräche}. Lisp source code.

\bibitem[Koop(2023)]{koop2023notebook}
Koop, P. (2023). \textit{Qualitative Sozialforschung und Große Sprachmodelle}. 
Jupyter Notebook.

\bibitem[Manhaeve et al.(2018)]{manhaeve2018deepproblog}
Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T., \& De Raedt, L. (2018). 
DeepProbLog: Neural probabilistic logic programming. \textit{Advances in 
Neural Information Processing Systems}, 31.

\bibitem[Marcus(2020)]{marcus2020next}
Marcus, G. (2020). The next decade in AI: Four steps towards robust artificial 
intelligence. \textit{arXiv preprint arXiv:2002.06177}.

\end{thebibliography}

\end{document}