Artificial General Intelligence — Safety Problems — Part 1

6 min readJun 19, 2022

Today’s artificial intelligence systems are doing extraordinary things in every where you look. They are transforming the way technologies are built not only for today, but also for the future. With such profound impact in our daily life, it is only fair to ask: How do these systems work (This article), How safe are these systems are, what are the typical challenges (Part II) and what are some of proposed solutions (Part III).

disclaimer: I am not an expert in this area. I find the concepts around AI safety extremely interesting. I wanted to collect my notes and share my understanding. I will be grateful for any feedback and/or suggestion.

But first, let us define a few semantics so that we do not get distracted by the buzzwords.

Artificial General Intelligence (AGI)

Artificial Intelligence is a promise of developing technologies in clever ways so that it can perform any tasks humans can do, only much better. The improvement can be in speed, in accuracy and in generality. While today’s AI systems can do very well in the first two parameters, humans win hands down in the parameter of generality.

For example, an intelligent AI chess player can outperform human chess players, and an intelligent computer vision software may accurately recognise faces with high accuracy. But the computer vision software can not play chess, and the chess AI has no notion about what a face it. In general, any average human can both play chess and recognise faces. Generality is a key trait of our cognitive prowess which today’s AI systems lack.

It is good to be aware of this major difference. Artificial General Intelligence is a major area of active research. In this article we will discuss one of the promising way: Reinforcement Learning. Let us find out what it is.

Reinforcement Learning

Have you ever heard about “Failures are pillars of Success” or “Your actions have consequences”?

Well, then you know what reinforcement learning is. But let us also understand the definition.

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximise the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

In Reinforcement learning set up, an agent is interacting with an environment. The agent can take actions in the environment, and thus can modify the state of the environment. By doing so, the environment returns some observation and a reward to the agent. Based on the rewards and observations over a period of time the agent can come up with a policy or behaviour which lets it achieve its goal. Over time the agent can learn or exhibit behaviours which can maximise reward by “learning” from its actions, observations and reward signals.

From agent’s perspective, there are two possible types of the environment. Either it knows the entire environment (ex: current configuration of the chess board) or it knows the environment partially (ex: game playing). It may be counter-intuitive to understand that knowing the environment may not be desirable always. We will discuss a bit about exploration strategies later.

It seems complicated , so let’s understand how a child learns the age old lesson: Do not play with fire.

In this model, child is the agent and his house with a burning candle is the environment.
Being a child, he/she is entitled to take random actions and one of which can be touching the flame.
By touching the flame he/she feels pain and that is a negative reward.
Now he/she knows not to touch the flame ever again or take precautions before he/she does so. The child updates his/her behaviour and makes a policy to not touch the flame again.
As a result of this, he/she makes observations about the fire, candle and pain and learns about them.

In the above model, we have not touched upon goals. So what are goals and how many types of goals are there?

Terminal Goals

Goals are fundamentally something the agent wants to achieve. Terminal goals are a true reflection of the final objective. For example: in the example above, the terminal goal is not to have pain (or not to harm the body). Terminal goals are goal primitives and can not be further driven by any other goals.

Instrumental or Enabling Goals

These are the goals which are stepping stones of achieving the terminal goal. There can be many instrumental goals or there can be only few depending on the complexity of the terminal goals.

In general instrumental goals are

more concrete goals
easier to learn
often common across terminal goals.

For example, learning to be alive in a complex environments is a key skill an agent would like to learn, regardless of its terminal goal. Because if the agent dies then it will not be able to achieve the terminal goal.

Reinforcement learning has not been in mainstream AI/ML until recently. It has been explored heavily in game development and robotics.

Then, what is Intelligence?

With the concepts discussed above, intelligence can now be defined as

an intelligent agent should be able to understand the observations and rewards well enough so it can model how the environment is
It has ways to reason about the steps it has taken which leads to good outcomes (so it can take more of such steps) and bad outcomes (so that it can avoid such steps). Typically this is a deep neural network learning component
It keeps improving the understanding about environment and making better decisions to achieve its goals.

The promise of artificial general intelligence is to find agents who can intelligently pursue a diverse set of terminal goals in complex and real world around us.

It sounds really complex, right? But we have precedence. Evolution theory can be modelled as a reinforcement learning system which acted in real world to optimise the goal of increasing the probability of maximising offsprings by making many tweaks within and across species. It is not an efficient method but it only proves, given enough time and resource, it is possible to come up with generally intelligent systems.

Now that we have defined what is intelligence for an agent, what are the things should we expect from our superhuman artificial general intelligence systems

We expect AGI agents to be able to outperform humans in general cognitive tasks
Not only can they outperform humans in traditional sense of speed and accuracy, but also by identifying and exploring numerous strategies which are never explored by humans.
They will learn NOT to make mistakes, ie take decisions which will lead to negative rewards (Or at least won’t make mistakes we do)

All good so far? Nice!! Let us explore some of the tools and frameworks available for interacting with Reinforcement learning and Environment framework to train AI agents.

Open AI Gym Framework

Open AI Gym is one of the best places to start gathering some intuition about reinforcement learning.

Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball.

Gym implements many simple and complex environments to interact and train. Gym also provides a standardised way to create, measure and publish benchmarks as everyone is using same sets of environments.

Python Libraries

keras-rl2 is an implementation of various reinforcement algorithms with integration of keras, pytorch and tensorflow. It also supports open-ai gym out of the box.
open-ai-baseline and its fork stable-baseline are high quality RL algorithm implementations with particularly integration with Open AI Gym.

There are few other notable python libraries, you can find them in the post mentioned in references. However these libraries are still fairly nascent and evolving.

AGI Safety

Reinforcement learning and in general Artificial General Intelligence is a subject of active research. There is no real agreement around any timeframe when we can see something substantial and useful, but surveys say it may take somewhere between 40–100 years to happen.

AGI safety is a one area of active research to identify problems of approaches and consequences of such a system, should it exists, can pose in a real world. In the next part of this series ,we will discuss few interesting safety problems. In the final part we will discuss how to address and measure some of the proposed solutions.

References

https://towardsdatascience.com/5-frameworks-for-reinforcement-learning-on-python-1447fede2f18