ocf.co.nz

Filtering input in a complex environment

In my last post I touched on the idea that to be successful in a complex environment, any AI system would need to filter its input, essentially throwing away irrelevant information at a very early stage in order to keep up with the rapid flow of incoming data. In this post, I will elaborate on this idea.

Input filtering, as I see it, is about the quick (and possibly dirty) extraction of information from incoming data. It’s about turning the rapid flow of data into a comparative trickle of high-quality information, which itself becomes a data stream for higher-level subsystems.1

Filtering input is not about simply ignoring data; if we wanted to do this we could simply use fewer sensors. Consider an agent connected to a high resolution digital camera. It is true that, to reduce the current of the incoming data stream, the agent could discard or ignore the values of certain pixels. For example, it could ignore the outer edge of the frame and consider only the center, or ignore every second pixel to achieve a full but lower-resolution view. But, discarding input is equivalent to simply using poorer sensors, so does not gain the agent any advantage over simply being fitted with these poorer sensors in the first place.2

What filtering input is about is taking what information we want from some data, and then discarding that data. With the data goes a lot of other information which we could have extracted, but which we can do without. To return to a vision example, consider my old 640×480 webcam, which provides 307,200 pixels per frame. Each pixels requires 16 bits to encode, and the camera can serve 15 frames per second. That means my camera provides 70.3Mb of data per second, which is a lot.

Now imagine this camera connected to a hypothetical AI security system for detecting intruders, the Securicam5000. Inside the Securicam5000 is a subsystem which detects the degree of change from one frame to the next. For every frame, this subsystem provides a 4b message to whichever other subsystems need it, to say how much the frame differs from the previous one on a scale of 0 to 15. (Let’s assume for now that the designers have established that the degree of change from frame to frame is relevant to the overall goal of detecting intruders.)

What our subsystem achieves here is to filter the input. If the Securicam5000 was fitted with a whole array of similar input filtering subsystems, then the actual incoming data (all 70.3Mb per second of it) could be discarded once the relevant information had been extracted. Even if there were one hundred input filtering subsystems in the Securicam5000, each contributing on average, say, 10b per frame, that’s still only gives a 14.6Kb/s trickle of information to be passed to the higher-level subsystems. That’s a rate of almost 5000 times less than the raw input stream, which is much more manageable for a processor of limited speed.

As Sherlock Holmes gathers data about a case, he often gets to a point where he has all but solved the case and starts looking for the final evidence against his number one suspect. Meanwhile, Dr. Watson (and the reader) has no idea how Holmes can be so sure he has found the best lead. To the non-detective, ruling out some of the other suspects seems foolish and dangerous. But it is Holmes’ power to identify and discard irrelevant information that frees up his mind to deduce who did it.

For example, say Holmes’ client said he saw the servant admiring the painting late one night, but Holmes observed that the servant lacked the intelligence to successfully sell the painting. Holmes can throw away the possibility that servant stole the painting and even remove from his consciousness the distracting anecdote about the servant’s suspicious activity. Meanwhile, Watson grows more confused with every new piece of information, and suffers from information overload.

I believe that the visual system and other systems in the brain must do something similar to what Holmes does in his detective work, but subconsciously. To cope with a stream of data even greater that that of my old webcam, or any modern digital camera for that matter, the raw video data must be converted into a slower stream of simple information: presence of a curve here, presence of some lines there, lots of blank space. This data is more manageable by the later stages of the visual system, which in turn can produce a simple internal model of what exists in the world: a hot cup of tea to my right, and a pen and paper in front of me, and blue skies above.

In turn, my highest-level subsystems can take this model and play with it, and forget about the raw visual data from which it was extracted. I reach for the teacup because my internal model tells me it is there, not because I have consciously and painfully processed screeds of visual data and deduced that a teacup must be there – no system on earth could actually process so much data so quickly.

As I said previously, filtering input is not about simply ignoring data. To blank-out every second pixel on a camera, or in the human case every second rod or cone on our retina, would be as foolish as Watson trying to combat information overload by ignoring every second fact of the case. Firstly, this would be harmful to Watson’s ability to solve the case. And furthermore, this butchering of the input only achieves a data reduction of 50 percent, where the kind of reduction needed is likely to be far greater than that. We cannot throw away data indiscriminately and expect to be better off.

The question arises then, how do we know what information to keep and what to throw away. The answer will, of course, depend on the particular problem you are trying to solve. In the domain of triaging patients complaining of chest pain, Lee Goldman was able drill down to to exactly four pieces of relevant information: whether the patients ECG is normal, whether the pain is unstable angina, whether there is fluid in the patients lungs, and whether the patients systolic blood pressure is below 100. He achieved this by applying statical methods to years worth of patient records.

In the case of human vision, one might argue that it is evolution which has equipped the visual system with knowledge of what information is relevant and what is not. In the hypothetical example of the Securicam5000, the designers decided, using their expert knowledge, that how much the input changes from one frame to the next is among the information relevant to the problem of detecting intruders. Judging which information is relevant is a matter of carefully thinking about and experimenting with the problem you are trying to solve.

1 It may appear I am equating data with information, but this is not exactly my intention. I am using the data/information distinction like a north/south distinction. Anywhere you stand, north is on one side of you and south on the other. For any AI subsystem you look at, data comes from one side and information goes out the other; so one subsystem’s information is another subsystem’s data.

2 However, one could imagine an optimization process where a high resolution device was used in a series of tests, ignoring different input in each test in order to find the minimal set of sensors needed for a particular agent in a particular environment to achieve a particular task.

Leave a Reply