Bayesian Networks

India
October 12, 2006 2:15am CST
50 AI MAGAZINE u n d e r s t a n d i n g (Charniak and Goldman 1989a, 1989b; Goldman 1990), vision (Levitt, Mullin, and Binford 1989), heuristic search (Hansson and Mayer 1989), and so on. It is probably fair to say that Bayesian networks are to a large segment of the AI-uncertainty community what resolution theorem proving is to the AIlogic community. Nevertheless, despite what seems to be their obvious importance, the ideas and techniques have not spread much beyond the research community responsible for them. This is probably because the ideas and techniques are not that easy to understand. I hope to rectify this situation by making Bayesian networks more accessible to the probabilistically unso- Over the last few years, a method of reasoning using probabilities, variously called belief networks, Bayesian networks, knowledge maps, probabilistic causal networks, and so on, has become popular within the AI probability and uncertainty community. This method is best summarized in Judea Pearl’s (1988) book, but the ideas are a product of many hands. I adopted Pearl’s name, Bayesian networks, on the grounds that the name is completely neutral about the status of the networks (do they really represent beliefs, causality, or what?). Bayesian networks have been applied to problems in medical diagnosis (Heckerman 1990; Spiegelhalter, Franklin, and Bull 1989), map learning (Dean 1990), language Bayesian Networks without Tears Eugene Charniak I give an introduction to Bayesian networks for AI researchers with a limited grounding in probability theory. Over the last few years, this method of reasoning using probabilities has become popular within the AI probability and uncertainty community. Indeed, it is probably fair to say that Bayesian networks are to a large segment of the AI-uncertainty community what resolution theorem proving is to the AI-logic community. Nevertheless, despite what seems to be their obvious importance, the ideas and techniques have not spread much beyond the research community responsible for them. This is probably because the ideas and techniques are not that easy to understand. I hope to rectify this situation by making Bayesian networks more accessible to the probabilistically unsophisticated. 0738-4602/91/$4.00 ©1991 AAAI …making Bayesian networks more accessible to the probabilistically unsophisticated. phisticated. That is, this article tries to make the basic ideas and intuitions accessible to someone with a limited grounding in probability theory (equivalent to what is presented in Charniak and McDermott [1985]). An Example Bayesian Network The best way to understand Bayesian networks is to imagine trying to model a situation in which causality plays a role but where our understanding of what is actually going on is incomplete, so we need to describe things probabilistically. Suppose when I go home at night, I want to know if my family is home before I try the doors. (Perhaps the most convenient door to enter is double locked when nobody is home.) Now, often when my wife leaves the house, she turns on an outdoor light. However, she sometimes turns on this light if she is expecting a guest. Also, we have a dog. When nobody is home, the dog is put in the back yard. The same is true if the dog has bowel troubles. Finally, if the dog is in the backyard, I will probably hear her barking (or what I think is her barking), but sometimes I can be confused by other dogs barking. This example, partially inspired by Pearl’s (1988) earthquake example, is illustrated in figure 1. There we find a graph not unlike many we see in AI. We might want to use such diagrams to predict what will happen (if my family goes out, the dog goes out) or to infer causes from observed effects (if the light is on and the dog is out, then my family is probably out). The important thing to note about this example is that the causal connections are not absolute. Often, my family will have left without putting out the dog or turning on a light. Sometimes we can use these diagrams anyway, but in such cases, it is hard to know what to infer when not all the evidence points the same way. Should I assume the family is out if the light is on, but I do not hear the dog? What if I hear the dog, but the light is out? Naturally, if we knew the relevant probabilities, such as P(family-out | light-on, ¬ hearbark), then we would be all set. However, typically, such numbers are not available for all possible combinations of circumstances. Bayesian networks allow us to calculate them from a small set of probabilities, relating only neighboring nodes. Bayesian networks are directed acyclic graphs (DAGs) (like figure 1), where the nodes are random variables, and certain independence assumptions hold, the nature of which I discuss later. (I assume without loss of generality that DAG is connected.) Often, as in figure 1, the random variables can be thought of as states of affairs, and the variables have two possible values, true and false. However, this need not be the case. We could, say, have a node denoting the intensity of an earthquake with values no-quake, trembler, rattler, major, and catastrophe. Indeed, the variable values do not even need to be discrete. For example, the value of the variable earthquake might be a Richter scale number. (However, the algorithms I discuss only work for discrete values, so I stick to this case.) In what follows, I use a sans serif font for the names of random variables, as in earthquake. I use the name of the variable in italics to denote the proposition that the variable takes on some particular value (but where we are not concerned with which one), for example, earthquake. For the special case of Boolean variables (with values true and false), I use the variable name in a sans serif font to denote the proposition that the variable has the value true (for example, family-out). I also show the arrows pointing downward so that “above” and “below” can be understood to indicate arrow direction. The arcs in a Bayesian network specify the independence assumptions that must hold between the random variables. These independence assumptions determine what probability information is required to specify the probability distribution among the random variables in the network. The reader should note that in informally talking about DAG, I said that the arcs denote causality, whereas in the Bayesian network, I am saying that they specify things about the probabilities. The next section resolves this conflict. To specify the probability distribution of a Bayesian network, one must give the prior probabilities of all root nodes (nodes with no predecessors) and the conditional probabilities Articles WINTER 1991 51 light-on family-out dog-out bowel-problem hear-bark Figure 1. A Causal Graph. The nodes denote states of affairs, and the arcs can be interpreted as causal connections. Articles 52 AI MAGAZINE of all nonroot nodes given all possible combinations of their direct predecessors. Thus, figure 2 shows a fully specified Bayesian network corresponding to figure 1. For example, it states that if family members leave our house, they will turn on the outside light 60 percent of the time, but the light will be turned on even when they do not leave 5 percent of the time (say, because someone is expected). Bayesian networks allow one to calculate the conditional probabilities of the nodes in the network given that the values of some of the nodes have been observed. To take the earlier example, if I observe that the light is on (light-on = true) but do not hear my dog (hear-bark = false), I can calculate the conditional probability of family-out given these pieces of evidence. (For this case, it is .5.) I talk of this calculation as evaluating the Bayesian network (given the evidence). In more realistic cases, the networks would consist of hundreds or thousands of nodes, and they might be evaluated many times as new information comes in. As evidence comes in, it is tempting to think of the probabilities of the nodes changing, but, of course, what is changing is the conditional probability of the nodes given the changing evidence. Sometimes people talk about the belief of the node changing. This way of talking is probably harmless provided that one keeps in mind that here, belief is simply the conditional probability given the evidence. In the remainder of this article, I first describe the independence assumptions implicit in Bayesian networks and show how they relate to the causal interpretation of arcs (Independence Assumptions). I then show that given these independence assumptions, the numbers I specified are, in fact, all that are needed (Consistent Probabilities). Evaluating Networks describes how Bayesian networks are evaluated, and the next section describes some of their applications. Independence Assumptions One objection to the use of probability theory is that the complete specification of a probability distribution requires absurdly many numbers. For example, if there are n binary random variables, the complete distribution is specified by 2n-1 joint probabilities. (If you do not know where this 2n-1 comes from, wait until the next section, where I define joint probabilities.) Thus, the complete distribution for figure 2 would require 31 values, yet we only specified 10. This savings might not seem great, but if we doubled the …the complete specification of a probability distribution requires absurdly many numbers. light-on (lo) family-out (fo) dog-out (do) bowel-problem (bp) hear-bark(hb) Figure 2. A Bayesian Network for the family-out Problem. I added the prior probabilities for root nodes and the posterior probabilities for nonroots given all pos
1 response
• India
24 Oct 06
yes ..its ok...