# Welcome to OpenPAS

OpenPAS is an open-source library that is capable of probabilistic argumentation and propositional logic operations amongst other things. OpenPAS contains a full implementation of Probabilistic Argumentation Systems (PAS) which is a framework that is a combination of propositional logic and probability theory.

OpenPAS also contains a simple but capable interactive console called PASC and a related script runtime that can execute OpenPAS commands.

With OpenPAS it is possible to represent uncertain knowledge such as an uncertain fact, or an uncertain rule, as well as certain knowledge, and describe complex relations that link together such knowledge using propositional logic. Then it is possible to perform probabilistic analysis on any chosen hypothesis that can be expressed as part of this knowledgebase.

An uncertain fact can be a statement such as: “It may rain tomorrow.” An uncertain rule may be: “If it rains tomorrow it may delay my train.” In PAS, such statements have a probability value associated with them, e.g.: “It may rain tomorrow with 0.5 probability.”

R.  Haenni,  J.  Kohlas,  and  N.  Lehmann,  “Probabilistic  Argumentation Systems,” in Handbook of Defeasible Reasoning and Uncertainty Management Systems, Volume 5: Algorithms for Uncertainty and Defeasible Reasoning, J. Kohlas and S. Moral, Eds. Kluwer, Dordrecht, 2000, pp. 221–287.

Our scope in this article will be about understanding how PAS can be applied to real world problems while introducing the necessary concepts and the idea. The interested reader can also read the article cited above for a well-grounded formal introduction to PAS and many useful concepts and techniques.

## An uncertain problem

The best way to understand the idea and concepts of PAS may be through a simple example of uncertain knowledge in real life. Following from above, let us consider a commuter trying to get to work by train where his train may be delayed. We can consider a simple model where we know a train can get delayed by problems caused by heavy rain or a person falling sick on a train. We can build on prior information to construct our expected probability of each of these two events happening on a given day:

$p(Heavy\_rain) = 0.1$

$p(Person\_sick\_on\_train)=0.05$

We also need to consider that not every time there is heavy rain there is trouble:

$p(Rain\_causes\_train\_problem)=0.2$

Let us express our model using propositional logic:

• Heavy rain causes train problems: $Heavy\_rain \wedge Rain\_causes\_train\_problem \rightarrow train\_delay$
• Person falling sick always causes train delay: $Person\_sick\_on\_train \rightarrow train\_delay$

We want to understand the probability of the train being delayed on a given day. Clearly, the supporting conditions for this event are:

• There is heavy rain and rain causes train problem
• Person sick on train

We can express this as a single propositional sentence as follows:

$(Heavy\_rain \wedge Rain\_causes\_train\_problem) \vee Person\_sick\_on\_train$

We want to find the probability of this state. We will assume all these events to be stochastically independent. We can do this calculation by observing that, if a person is sick then the train is delayed. Otherwise, there has to be rain and train problem caused by rain as follows:

$p(train\_delay) = p(Person\_sick\_on\_train) + (1 - p(Person\_sick\_on\_train)) \cdot p(Heavy\_rain) \cdot p(Rain\_causes\_train\_problem)$

$p(train\_delay) = 0.05 + 0.95 \cdot 0.1 \cdot 0.2 = 0.05 + 0.019 = 0.069$

We use a truth table to express these conditions once again. We can see there that rows 4-7 are included as causing a train delay due to the person sick condition. The heavy rain and rain causing problems conditions contribute row 3 only, and we end up with rows 3-7.

Table Truth table and supporting conditions

 Row $Person\_sick\_on\_train$ $Heavy\_rain$ $Rain\_causes\_train\_problem$ $train\_delay$ 0 F F F 1 F F T 2 F T F 3 F T T $\checkmark$ 4 T F F $\checkmark$ 5 T F T $\checkmark$ 6 T T F $\checkmark$ 7 T T T $\checkmark$

The significance of these rows becomes more apparent once we include the probablities on the table. Consider the following:

Table Probabilities and supporting conditions

 Row $Person\_sick\_on\_train$ $Heavy\_rain$ $Rain\_causes\_train\_problem$ Probability $train\_delay$ 0 0.95 0.9 0.8 0.684 1 0.95 0.9 0.2 0.171 2 0.95 0.1 0.8 0.076 3 0.95 0.1 0.2 0.019 $\checkmark$ 4 0.05 0.9 0.8 0.036 $\checkmark$ 5 0.05 0.9 0.2 0.009 $\checkmark$ 6 0.05 0.1 0.8 0.004 $\checkmark$ 7 0.05 0.1 0.2 0.001 $\checkmark$

In this table we have substitued the T and F truth values with the underlying probabilities. For example, where there would be a T for $Heavy\_rain$, we have the value of $p(Heavy\_rain)$, and where we would have F we now have $1- p(Heavy\_rain)$. The “Probability” coloumn contains a multiplication of all these probabilities. We can think of each row here as a scenario where we pick one condition as T or F, and calculate the overall probability. Note that all probabilities add to 1. We can also see that the probabilities of rows 3-7 give us the same answer as we found earlier:

$p(train\_delay) = 0.19 + 0.036 + 0.009 + 0.004 + 0.001 = 0.069$

## Uncertain problem in PAS

Now we can start discussing PAS. PAS builds on propositional logic. In propositional logic we have facts and simple rules. Facts are represented using literals. Such a simple fact is known as a proposition in PAS. Each uncertain piece of information is represented using an assumption which is a special class of proposition that has a probability associated with it. In our example, the assumptions are: $Person\_sick\_on\_train$, $Heavy\_rain$, and $Rain\_causes\_train\_problem$. We use one proposition: $train\_delay$

Table: Examples of real-life statements in PAS

 type example PAS fact “Train is delayed.” $train\_delay$ uncertain knowledge “There may be heavy rain.” $Heavy\_rain$ (certain) rule “A sick person causes train delay.” $Person\_sick\_on\_train \rightarrow train\_delay$ uncertain rule “Heavy rain may cause train delay.” $Heavy\_rain \wedge Rain\_causes\_train\_problem \rightarrow train\_delay$

A central concept in PAS is a scenario. Each row in the above table corresponds to a PAS scenario which is a selected T/F state for all the assumptions. Each scenario has a corresponding probability that is the multiplication of the involved T/F selections of assumptions as we do on that table.

PAS builds on the idea of representing sets of scenarios using logical sentences. For example, the rows 0-3 in the truth table can be expressed using: $\neg Person\_sick\_on\_train$.

In PAS, a clause is a finite disjunction $l_1 \vee ... \vee l_n$ of literals, the empty clause is a falsity (ie. always false): $\bot$. A term is a finite conjunction $l_1 \wedge ... \wedge l_n$ of literals, the empty term is a tautology (ie. always true): $\top$. A conjunction of clauses is known as a Conjunctive Normal Form (CNF), a disjunction of terms is known as a Disjunctive Normal Form (DNF).

Consider the proposition $train\_delay$. We sought the probability of this proposition being true. This is known as our hypothesis in PAS denoted using $h$. A hypothesis is typically a clause or a CNF.

All the rules we define together make up the knowledgebase denoted using $\xi$. A knowledgebase is typically expressed as a CNF. This is handy for constructing it using horn clauses where we have a conjunction of clauses. In our example, we have:

$\xi = (Heavy\_rain \wedge Rain\_causes\_train\_problem \rightarrow train\_delay) \wedge (Person\_sick\_on\_train \rightarrow train\_delay)$

The scenarios which make the hypothesis true are known as the quasi-supporting scenarios. The significance of the term quasi- can only be understood in the presence of contradictions in the knowledgebase. We will touch this later. When such scenarios are expressed as logial sentences, they take the form of terms. Such terms are known as quasi-supporting arguments. Our quasi-supporting arguments here are the set: $\{(Heavy\_rain \wedge Rain\_causes\_train\_problem), Person\_sick\_on\_train\}$.

All the quasi-supporting arguments together constitute the quasi-support denoted using $QS(h,\xi)$ for a give hypothesis and knowledgebase. It is normally expressed as a DNF. In our example we have:

$QS(h,\xi)=Person\_sick\_on\_train \vee (Heavy\_rain \wedge Rain\_causes\_train\_problem)$

The probability of the quasi-support is known as the degree of quasi-support, and is denoted using $dqs(h,\xi)$:

$dqs(h, \xi) = p(QS(h, \xi))$

In the example we earlier found:

$dqs(h,\xi)=0.069$

## Using OpenPAS PASC

PASC is the OpenPAS console and script runtime. It is a tool to get started using OpenPAS easily right away. See the OpenPAS project on GitHub to download and get started using PASC.

Let us see how we can construct the PAS instance we discussed above using OpenPAS.

init


You can try to copy/paste these into a PASC console.

Then we create the assumptions, and the proposition(s):

create_assumption: Heavy_rain, 0.1
create_assumption: Person_sick_on_train, 0.05
create_assumption: Rain_causes_train_problem, 0.2
create_proposition: train_delay

In PASC, if a command uses parameters these are passed after a colon “:”. The command then parses what has been passed after “:”. It is possible that no parameters are needed. In this case the colon can be omitted. The command “help” can be used to display all the valid commands, and “help: command” can be used (replacing command with a real command name) to get more detailed help about a command.

Now that we have all our literals in place, we can add the rules into the knowledgebase using horn clauses:

add_horn: Heavy_rain Rain_causes_train_problem -> train_delay
add_horn: Person_sick_on_train -> train_delay

This is done using “->” to separate the body and the head; and empty space between literals is interpreted as a LogicalAnd.

We can now ask OpenPAS the quasi-support for the hypothesis train_delay:

> qs: train_delay
Creating BDD probability computer with 1024 nodes
Using dot file at: dotfile.dot
qs = [Heavy_rain Rain_causes_train_problem + Person_sick_on_train]

The QS is expressed as a DNF. The square brackets “[ ]” are used to designate a logical sentence, and “+” is used for LogicalOr. (Strictly speaking, OpenPAS presents the minimal quasi-support as quasi-support here, but the difference is “academic” for the purposes of this article, see the referenced article at start for more about this.)

Confirming that this matches our examination above, we can also ask the degree of quasi-support:

> dqs: train_delay
dqs = 0.069000
(duration = 51.725317 miliseconds)

You may wonder what the BDD and the dot file is about. OpenPAS has currently two probability computation engines. This gets selected during initialisation, and it currently defaults to be a binary decision diagram (BDD) based engine which uses JavaBDD. A BDD is a data structure that makes it easier to calculate the probability of a DNF. When a BDD is used to make a probability calculation, OpenPAS can create a .dot file that represents the BDD used for this. This is not significant initially and not a topic for this article, but for larger knowledgebases the optimisation of the underlying BDD may become a significant problem.

## OpenPAS fundamental concepts

There are a few key concepts to using both PASC and OpenPAS as a library. We have seen many of these above, but it is beneficial to iterate over them. OpenPAS contains a capable propositional engine. This engine recognises the following fundamental concepts: literal, logical operator, expression, and sentence.

A literal is the atomic logical unit in OpenPAS, and can be of these types: proposition, assumption, and special. The first two correspond to the respective PAS concepts, and the last one of these represents the values: True (tautology) and False (falsity).

A logical operator is used to connect literals (e.g. see here). OpenPAS recognises: LogicalAnd, LogicalOr, and Negation as valid operators.

An expression is made up of one or more literals. OpenPAS recognises two types of expressions: terms and clauses. These correspond to the logical definitions made above. The order of literals is usually not significant except in the case of horn clauses where the ordering of literals is preserved.

A sentence is made up of one or more expressions. OpenPAS recognises two types of sentences: disjunctive normal form (DNF) and conjunctive normal form CNF. These also correspond to the logical definitions as made above. The ordering of expressions in a sentence is not computationally significant. An empty DNF is considered to be False, and an empty CNF is True. Special literals are used inside expressions and sentences to make them True or False where needed.

In PAS it is possible to reduce the degree of support for a hypothesis when further knowledge becomes available to the system. This is achieved by creating contradictions. A scenario is said to be contradictory (or inconsistent) relative to knowledgebase $\xi$ if it implies falsity under $\xi$. All of the inconsistent scenarios in a PAS instance together is called the contradiction. Let us consider a simple PAS instance with assumptions A, B; a proposition x, and the following knowledgebase:

$A \rightarrow x$

$B \rightarrow \neg x$

Here we use the negation operator “¬”. This set up gives us the following PAS scenarios:

Table PAS scenarios supporting the hypothesis and for contradiction

 Scenario $A$ $B$ $h=x$ $h=\bot$ (contradiction) 0 F F 1 F T 2 T F $\checkmark$ 3 T T $\checkmark$ $\checkmark$

It is easy to see that only assumption A supports the hypothesis x (scenarios 2 and 3), but when both A and B are T that scenario becomes inconsistent (scenario 3). So, we have the following:

$QS(x,\xi)=a$

and

$QS(\bot,\xi)=a \wedge b$

We can now finally discuss “quasi-support” vs. “support” for a hypothesis. In PAS, quasi-support does not take into account whether a scenario is consistent or not. Therefore, the degree of quasi-support includes the inconsistent scenarios. We define a supporting scenario as a quasi-supporting scenario that is not contradictory. Similarly, a supporting argument is a term representation for supporting scenarios, and support is all the supporting arguments together denoted using $SP(h,\xi)$. In our example, we therefore have:

$SP(x, \xi)=a \wedge \neg b$

In fact, due to the way quasi-support is defined, any quasi-support includes all inconsistent scenarios. This is a potentially confusing aspect of PAS, and it stems from the way material implication works. As we mentioned earlier, quasi-supporting scenarios make the hypothesis true. Logically, quasi-support can be expressed using material implication as follows:

$\alpha \wedge \xi \rightarrow h$

where $\alpha$ is a quasi-supporting argument. Recall, how material implication is defined:

$a \rightarrow b \Leftrightarrow \neg a \vee b$

We can see that falsity is a quasi-supporting argument of all hypotheses as $\bot \rightarrow h$ evaluates to be true ($\top \vee h$) regardless of h.

Now we define the degree of support as follows:

$dsp(h,\xi)=\frac{p(SP(h,\xi))}{p(C(\xi))}$

where $C(\xi)$ is the quasi-support for all consistent scenarios (this a slight deviation from the original article but it is equivalent). This supposes that inconsistent scenarios are not allowed to be valid considerations, and calculates posterior probalities based on this.

Extending this further we get:

$dsp(h,\xi)=\frac{p(QS(h, \xi)) - p(QS(\bot, \xi))}{1 - p(QS(\bot, \xi))}$

$dsp(h,\xi)=\frac{dqs(h, \xi) - dqs(\bot, \xi)}{1 - dqs(\bot, \xi)}$

where $dqs(\bot, \xi)$ is the quasi-support for contradiction in the knowledgebase. You will notice here that the dsp calculation can be done entirely using only two dqs calculations. We are able to do this because, as discussed above, the quasi-support for a hypothesis contains all the inconsistency in a knowledgebase, so we can subtract $dqs(\bot, \xi)$ from $dqs(h, \xi)$ for any h to get the degree of quasi-support for only the consistent scenarios. This saves us from having to find the logical expression for $SP(h, \xi)$.

Let us see how contradiction is handled in OpenPAS. We can create the above PAS instance as follows:

init
create_assumption: A, 0.5
create_assumption: B, 0.5
create_proposition: x
add_horn: B -> ¬x

The quasi-support is expected:

> qs: x
Creating BDD probability computer with 1024 nodes
Using dot file at: dotfile.dot
qs = [A]

> qs: False
qs = [A B]

The support only contains consistent scenarios (and terms) as expected:

> sp: x
sp = [A ¬B]

The degrees of quasi-support are:

> dqs: x
dqs = 0.500000
(duration = 10.540797 miliseconds)
> dqs: False
dqs = 0.250000
(duration = 4.656347 miliseconds

The degree of support as defined above is:

> dsp: x
dsp = 0.333333

We can observe that this can be computed using the two dqs values:

$dsp(x, \xi)=\frac{0.5 - 0.25}{1 - 0.75}=\frac{0.25}{0.75}=\frac{1}{3}$

Internally, OpenPAS indeed uses these two dqs values for this calculation as opposed to using the logical SP expression. OpenPAS will also cache and re-use the value of $dqs(\bot, \xi)$ once it is calculated.

OpenPAS also allows us to see the unnormalised degree of support:

> udsp: x
dsp = 0.250000

This is defined simply as:

$dsp_u(h, \xi)=p(SP(h,\xi))$

where we know:

$dsp_u(h, \xi)=dqs(h, \xi) - dqs(\bot, \xi)$

The original PAS article does not relate to a $dsp_u$ definition, but in real world applications I have found it to give useful values based on the context.

## OpenPAS as a Java library

PASC uses the underlying OpenPAS library which is built using Java. It is entirely possible to avoid the OpenPAS library all together and work through creating .ops scripts. There are examples of this in the source tree. This has the advantage of being a domain specific language that is focusing solely on the PAS aspect, and it leaves the user to concentrate on the language they know well (which may not be Java).

PASC is currently not a full scripting environment (e.g. it does not have loops, variables, functions etc.), so the expectation is that the user uses another language to create .ops scripts which feed into PASC as a runtime.

For those who are interested in using OpenPAS as a Java library, OpenPAS works through various interfaces which are fully documented. Each interface has at least one implementation, and implementation classes are provided by a factory. The OpenPAS Java module is the gateway to most publicly available functionality. The best place to get acquainted with the use of OpenPAS as a library is to look at the examples in the source tree and the Javadocs.

There are over 100 unit test cases (which in turn may contain many tests themselves) in OpenPAS. These contain a wealth of possible uses inside OpenPAS, but a lot of this focuses on the implementation of the library with visibility to the internals and may not be entirely relevant to a user of the OpenPAS library.

## Conclusion

This introductory article seeks to provide an initial taste of what PAS and OpenPAS have to offer. We have seen the main concepts of PAS and how they are represented in OpenPAS. There was more focus on PASC which is the PAS console and script runtime, but it is also possible to use OpenPAS as a Java library.

We have looked at simle and small problems because they are easier to understand intuitively. But PAS becomes useful in dealing with cases where the number of assumptions involved goes much higher than a handful. Note that the underlying problem of computing the probability of a DNF sentence is exponential with the number of unique literals involved. This means that the performance aspect of any PAS library is critical. We have not look at the performance aspect here, but relevant topics such as efficient sum of disjoint products algorithms, use of BDDs, or use of approximations is a central concern for PAS (and OpenPAS).

Topics like effective use of PAS for real-world problems, ways of considering and using contradiction as a tool, gaining better insights into logical arguments for a hypothesis (in addition to the dsp value), or ways of dealing with very large PAS instances using approximations remain unexplored.